From kxu at openjdk.org Wed Oct 1 01:23:30 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 1 Oct 2025 01:23:30 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v12] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp - futher refactor counted loop conversion - WIP: remove unused #include - WIP: refactor structs to classes - WIP: removed dead code, renamed fields and signatures - Merge branch 'openjdk:master' into counted-loop-refactor - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp # src/hotspot/share/opto/loopnode.hpp - Merge branch 'master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp # src/hotspot/share/opto/loopnode.hpp # src/hotspot/share/opto/loopopts.cpp - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - further refactor is_counted_loop() by extracting functions - ... and 19 more: https://git.openjdk.org/jdk/compare/0366d882...b1d27675 ------------- Changes: https://git.openjdk.org/jdk/pull/24458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=11 Stats: 1209 lines in 3 files changed: 613 ins; 291 del; 305 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From dlong at openjdk.org Wed Oct 1 02:14:31 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 1 Oct 2025 02:14:31 GMT Subject: RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v6] In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 19:26:43 GMT, Boris Ulasevich wrote: >> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. >> >> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. >> >> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. >> >> The problems is that shift count `n` may be too large here: >> >> class Pipeline_Use_Cycle_Mask { >> protected: >> uint _mask; >> .. >> Pipeline_Use_Cycle_Mask& operator<<=(int n) { >> _mask <<= n; >> return *this; >> } >> }; >> >> The recent change attempted to cap the shift amount at one call site: >> >> class Pipeline_Use_Element { >> protected: >> .. >> // Mask of specific used cycles >> Pipeline_Use_Cycle_Mask _mask; >> .. >> void step(uint cycles) { >> _used = 0; >> uint max_shift = 8 * sizeof(_mask) - 1; >> _mask <<= (cycles < max_shift) ? cycles : max_shift; >> } >> } >> >> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: >> >> // The following two routines assume that the root Pipeline_Use entity >> // consists of exactly 1 element for each functional unit >> // start is relative to the current cycle; used for latency-based info >> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { >> for (uint i = 0; i < pred._count; i++) { >> const Pipeline_Use_Element *predUse = pred.element(i); >> if (predUse->_multiple) { >> uint min_delay = 7; >> // Multiple possible functional units, choose first unused one >> for (uint j = predUse->_lb; j <= predUse->_ub; j++) { >> const Pipeline_Use_Element *currUse = element(j); >> uint curr_delay = delay; >> if (predUse->_used & currUse->_used) { >> Pipeline_Use_Cycle_Mask x = predUse->_mask; >> Pipeline_Use_Cycle_Mask y = currUse->_mask; >> >> for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) >> y <<= 1; >> } >> if (min_delay > curr_delay) >> min_delay = curr_delay; >> } >> if (delay < min_delay) >> delay = min_delay; >> } >> else { >> for (uint j = predUse->_lb; j <= pre... > > Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains three commits: > > - use uint32_t for _mask > - remove redundant code > - 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' My understanding is that pipeline latencies and instruction cost "latencies" are two different things, and only the latter affect matching/selection. If OptoScheduling is turned off, pipeline scheduling should be turned off, but we will still get correct instruction selection based on "ins_cost", not "fixed_latency" or "pipe_class". ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3354435818 From kxu at openjdk.org Wed Oct 1 04:48:35 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 1 Oct 2025 04:48:35 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v19] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 69 commits: - Merge remote-tracking branch 'origin/master' into arithmetic-canonicalization - update naming and comments - Merge branch 'openjdk:master' into arithmetic-canonicalization - Merge remote-tracking branch 'origin/master' into arithmetic-canonicalization - Allow swapping LHS/RHS in case not matched - Merge branch 'refs/heads/master' into arithmetic-canonicalization - improve comment readability and struct helper functions - remove asserts, add more documentation - fix typo: lhs->rhs - update comments - ... and 59 more: https://git.openjdk.org/jdk/compare/0366d882...ce23d393 ------------- Changes: https://git.openjdk.org/jdk/pull/23506/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=18 Stats: 852 lines in 6 files changed: 851 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From kxu at openjdk.org Wed Oct 1 05:00:50 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 1 Oct 2025 05:00:50 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v3] In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 16:30:51 GMT, Christian Hagedorn wrote: >> Resolved conflict with [JDK-8357951](https://bugs.openjdk.org/browse/JDK-8357951). @chhagedorn I'd appreciate a re-review. Thank you so much! > > Thanks @tabjy for coming back with an update and pinging me again! Sorry, I completely missed it the first time. I will be on vacation starting tomorrow for two weeks but I'm happy to take another look when I'm back :-) @chhagedorn Thanks for reviewing. Sorry this took longer than I'd like. I've made the following changes: - fixed typos - better naming and pattern (e.g., `_` prefixes, unneeded code/brackets/spaces) - changed `strcut` to `class` - hide fields behind accessor methods - extracted relevant code into classes methods where makes sense - resolved conflict Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3354745863 From kxu at openjdk.org Wed Oct 1 05:07:54 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 1 Oct 2025 05:07:54 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:50:38 GMT, Emanuel Peter wrote: >> Ping @eme64 again for awareness. :) > > @tabjy > >> I could, at very least, try to swap LHS and RHS if no match is found > > I think that would be a good idea, and not very hard. You can just have a function `add_pattern(lhs, rhs)`, and then run it also with `add_pattern(rhs, lhs)` for **swapping**. > > Personally, I would have preferred a recursive algorithm, but that could have some compile time overhead. @chhagedorn Was a little more skeptical about the recursive algorithm. > > It seems the motivation for this change is the benchmark from here: > ArithmeticCanonicalizationBenchmark > https://ionutbalosin.com/2024/02/jvm-performance-comparison-for-jdk-21/#jit-compiler > > This benchmark is of course somewhat arbitrary, and so are now all of your added patterns. Having a most general solution would be nice, but maybe the recursive algorithm is too much, I'm not 100% sure. Of course we now still have cases that do not optimize/canonicalize, and so someone could write a benchmark for those cases still.. oh well. > > What I would like to see for **testing**: add some more patterns with IR rules. More that now optimize, and also a few that do not optimize, just so we have a bit of a sense what we are still missing. > > @rwestrel Filed this issue. I wonder: what do you think we should do here? How general should the optimization/canonicalization be? @eme64 Thank you for reviewing! Those are very valid suggestion, especially on naming as this PR evolves. I've done the following: - updated naming (mostly with "serial addition" to "collapsable addition (into multiplication)") - updated comments - moved test file - merged in master Please enjoy your time off! Once GHA passes, @rwestrel could you please give this a quick review if you have some time? Thank you very much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-3354754246 From kxu at openjdk.org Wed Oct 1 05:08:05 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 1 Oct 2025 05:08:05 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v18] In-Reply-To: References: <53Ado9oN1yU5hgOPU2feecxsArD5yoycn09ZWPNK4AQ=.69035bde-9bec-442e-8dc2-ddd268df9d07@github.com> Message-ID: On Wed, 17 Sep 2025 15:02:38 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 67 commits: >> >> - Merge branch 'openjdk:master' into arithmetic-canonicalization >> - Merge remote-tracking branch 'origin/master' into arithmetic-canonicalization >> - Allow swapping LHS/RHS in case not matched >> - Merge branch 'refs/heads/master' into arithmetic-canonicalization >> - improve comment readability and struct helper functions >> - remove asserts, add more documentation >> - fix typo: lhs->rhs >> - update comments >> - use java_add to avoid cpp overflow UB >> - add assertion for MulLNode too >> - ... and 57 more: https://git.openjdk.org/jdk/compare/173dedfb...7bb7e645 > > src/hotspot/share/opto/addnode.cpp line 424: > >> 422: // Note this also converts, for example, original expression `(a*3) + a` into `4*a` and `(a<<2) + a` into `5*a`. A more >> 423: // generalized pattern `(a*b) + (a*c)` into `a*(b + c)` is handled by AddNode::IdealIL(). >> 424: Node* AddNode::convert_serial_additions(PhaseGVN* phase, BasicType bt) { > > The name `convert_serial_additions` now seems a bit off. Because we really cover a lot of other cases too. > Really you cover `a + pattern` and `pattern + a`, where `pattern` is one of the cases from `find_serial_addition_patterns`. > > Maybe it could be called `AddNode::Ideal_collapse_variable_times_con`. Because in the end you want to find cases that are equivalent to `a * some_con`. > > Lead the documentation with that as well, rather than the series of additions. Because the series of additions is not the pattern you actually match here. The series of additions is only one of the use-cases, and there are others. Thank you. I also like the wording of *collapsing* additions better. I've updated names and comments accordingly. > test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 24: > >> 22: */ >> 23: >> 24: package compiler.c2; > > I would put the test in a more specific directory. I think the `igvn` directory would be a good canditate, because `Ideal` is part of IGVN ;) Move to the `.gvn` package. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2393460358 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2393460843 From mchevalier at openjdk.org Wed Oct 1 06:30:15 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 1 Oct 2025 06:30:15 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed Message-ID: Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. When we have something like // any loop while (...) { /* something involving limit */ } // counted loop with zero trip guard if (i < limit) { for (int i = init; i < limit; i++) { ... } } and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. But the method `PhaseIdealLoop::do_unroll` has the assert https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. We could make the assert a lot smarter and test for equality of these nodes, up to upcoming simplifications... But that is quite complex for questionable benefit. That is IGVN's job, let's not do it twice! Phi simplification is also not the simplest. After all, nothing wrong seems to happen: each function seems to do its job right, the graph seems well-structured, with the right semantics. So maybe the assert is just too strong by assuming that the graph would always be cleaned up before reaching it. And then, the best for the moment is simply to weaken the assert: if StressLoopPeeling is on, then the assert might not hold and it's ok. Indeed, by skipping the assert, all end up fine: `do_unroll` replaces the input of the `OpaqueZeroTripGuard` and of the loop halting condition separately: https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L2025 https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L2046 In the future, if we indeed get this situation in the wild, without stress peeling, maybe we should make `do_unroll` more robust, or make it give up if it detects that the graph doesn't fulfill the expected state yet, and give IGVN a chance first. Thanks, Marc ------------- Commit messages: - Weakening the assert Changes: https://git.openjdk.org/jdk/pull/27586/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361608 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27586/head:pull/27586 PR: https://git.openjdk.org/jdk/pull/27586 From rcastanedalo at openjdk.org Wed Oct 1 07:19:47 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 1 Oct 2025 07:19:47 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed In-Reply-To: References: Message-ID: <9awWr9h5kEOfyPA2jCyHzFxgadv_5jWXH4omqrlvf_g=.9bcd55d5-3830-450f-b05e-42fdda457b39@github.com> On Wed, 1 Oct 2025 00:23:20 GMT, Marc Chevalier wrote: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... Relaxing the assertion sounds reasonable, given that we have not seen it fail without `StressLoopPeeling`. I think it would be good to add a comment above each modified assertion briefly justifying why the invariant is not checked when `StressLoopPeeling` is enabled. > I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. What do you mean by "hardcoded"? It sounds like you have constructed an example program where the assertion fails without `StressLoopPeeling`, but I guess that is not what you mean because otherwise the strategy of relaxing the assertion only when `StressLoopPeeling` is enabled would not be sufficient, right? In any case, consider adding a regression case to the changeset. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27586#pullrequestreview-3287862037 From mhaessig at openjdk.org Wed Oct 1 07:31:55 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 1 Oct 2025 07:31:55 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v2] In-Reply-To: References: Message-ID: On Sun, 10 Aug 2025 12:27:55 GMT, Tobias Hotz wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes after review > > Thanks for the fast review! The main reason for all the if cases is that min_int / (-1) is undefined behavior in C++, as it overflows. All code has to be careful that this special case can't happen in C++ code, and that's the main motivation behind all the ifs. I've added a comment that describes that. > Otherwise, you would be right: Redudant calculations are no problem, min and max would take care of that. > > Regarding testing: I only ran tier1 tests on my machine and GHA @ichttt, are you still working on this? :slightly_smiling_face: ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3355104402 From aph at openjdk.org Wed Oct 1 07:33:49 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 1 Oct 2025 07:33:49 GMT Subject: RFR: 8368962: hotspot/cpu/aarch64/bytecodes_aarch64.{hpp, cpp} is unused In-Reply-To: <-CZFa42PRn1THvcZzRbcIy_HC2GC_vsQjw09KhkXDw0=.8ff20473-ecc2-4649-b0fd-def97a451ecb@github.com> References: <-CZFa42PRn1THvcZzRbcIy_HC2GC_vsQjw09KhkXDw0=.8ff20473-ecc2-4649-b0fd-def97a451ecb@github.com> Message-ID: On Tue, 30 Sep 2025 14:07:56 GMT, Andrew Haley wrote: >> Remove deadcode. > > Super, thanks. > @theRealAph Do we want to wait for a second reviewer, or are we good with just one given this is a simple delete? I think this fine and trivially correct, but we need two reviews for HotSpot patches. I've been told off before when I didn't do this. ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27577#issuecomment-3355109610 From mhaessig at openjdk.org Wed Oct 1 07:39:53 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 1 Oct 2025 07:39:53 GMT Subject: RFR: 8368962: hotspot/cpu/aarch64/bytecodes_aarch64.{hpp, cpp} is unused In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 13:41:42 GMT, Justin King wrote: > Remove deadcode. This looks good and testing passed. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/27577#pullrequestreview-3287927539 From roland at openjdk.org Wed Oct 1 07:47:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 1 Oct 2025 07:47:56 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v12] In-Reply-To: References: Message-ID: <_5Hm3XJpWIYumEbCGu5fDiNQsaVxIj0wuszhbV8ZNkE=.d545c7be-bdaf-473b-ad95-0f47c86a1166@github.com> On Wed, 1 Oct 2025 01:23:30 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > > # Conflicts: > # src/hotspot/share/opto/loopnode.cpp > - futher refactor counted loop conversion > - WIP: remove unused #include > - WIP: refactor structs to classes > - WIP: removed dead code, renamed fields and signatures > - Merge branch 'openjdk:master' into counted-loop-refactor > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > > # Conflicts: > # src/hotspot/share/opto/loopnode.cpp > # src/hotspot/share/opto/loopnode.hpp > - Merge branch 'master' into counted-loop-refactor > > # Conflicts: > # src/hotspot/share/opto/loopnode.cpp > # src/hotspot/share/opto/loopnode.hpp > # src/hotspot/share/opto/loopopts.cpp > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - further refactor is_counted_loop() by extracting functions > - ... and 19 more: https://git.openjdk.org/jdk/compare/0366d882...b1d27675 src/hotspot/share/opto/loopopts.cpp line 1695: > 1693: !n->is_OpaqueInitializedAssertionPredicate() && > 1694: !n->is_OpaqueTemplateAssertionPredicate() && > 1695: !n->is_Type()) { This change seems unrelated. Bad merge? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2393732195 From mhaessig at openjdk.org Wed Oct 1 07:51:08 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 1 Oct 2025 07:51:08 GMT Subject: RFR: 8368866: compiler/codecache/stress/UnexpectedDeoptimizationTest.java intermittent timed out In-Reply-To: References: Message-ID: On Mon, 29 Sep 2025 14:30:32 GMT, SendaoYan wrote: > Hi all, > > When I run test compiler/codecache/stress/UnexpectedDeoptimizationTest.java standalone, test occupy about 6 CPU threads. > > The default timeout factor was change from 4 to 1 by [JDK-8260555](https://bugs.openjdk.org/browse/JDK-8260555). This make test compiler/codecache/stress/UnexpectedDeoptimizationTest.java intermittent timed out when this test run with other tests simultancely, because this is stress test which may affected by other tests. > > So I want to change the default timeout value from 120 to 240, this will make this test run success steady. Testing passed all green. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/27550#pullrequestreview-3287960354 From duke at openjdk.org Wed Oct 1 08:34:53 2025 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane Ampudia) Date: Wed, 1 Oct 2025 08:34:53 GMT Subject: RFR: 8368780: IGV: Upgrade to Netbeans Platform 27 Message-ID: This PR upgrades IGV and its dependencies to the newest Netbeans Platform 27, released on August 21, 2025. It also supports running the latest (LTS) JDK 25. It has been tested that IGV still behaves as expected after the upgrade. ------------- Commit messages: - Update copyright years - Update README - Move Nashorn version definition and update it - Fix for new annotation policy from JDK 23 - Update IGV Java version requirement Changes: https://git.openjdk.org/jdk/pull/27579/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27579&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8368780 Stats: 11 lines in 3 files changed: 4 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/27579.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27579/head:pull/27579 PR: https://git.openjdk.org/jdk/pull/27579 From rcastanedalo at openjdk.org Wed Oct 1 08:50:24 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 1 Oct 2025 08:50:24 GMT Subject: RFR: 8368780: IGV: Upgrade to Netbeans Platform 27 In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 13:57:23 GMT, Ant?n Seoane Ampudia wrote: > This PR upgrades IGV and its dependencies to the newest Netbeans Platform 27, released on August 21, 2025. It also supports running the latest (LTS) JDK 25. > > It has been tested that IGV still behaves as expected after the upgrade. Thank you, Ant?n! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27579#pullrequestreview-3288145399 From bmaillard at openjdk.org Wed Oct 1 09:02:56 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 1 Oct 2025 09:02:56 GMT Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop [v12] In-Reply-To: References: Message-ID: > This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`. > > ### Context > > The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored: > > > static public void test() { > x = 0; > for (int i = 0; i < 20000; i++) { > x += i; > } > x = 0; > } > > > After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late. > > This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues). > > ### Detailed Analysis > > In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`. When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later. > > This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect. > > This is what the IR looks like after the creation of the post lo... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Use PhaseIdealLoop::is_member ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27225/files - new: https://git.openjdk.org/jdk/pull/27225/files/71ff706f..73ee9546 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=10-11 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27225/head:pull/27225 PR: https://git.openjdk.org/jdk/pull/27225 From roland at openjdk.org Wed Oct 1 09:07:37 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 1 Oct 2025 09:07:37 GMT Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop [v12] In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 09:02:56 GMT, Beno?t Maillard wrote: >> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`. >> >> ### Context >> >> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored: >> >> >> static public void test() { >> x = 0; >> for (int i = 0; i < 20000; i++) { >> x += i; >> } >> x = 0; >> } >> >> >> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late. >> >> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues). >> >> ### Detailed Analysis >> >> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`. When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later. >> >> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect. >> >> This is wh... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Use PhaseIdealLoop::is_member Thanks for making the change. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27225#pullrequestreview-3288206610 From bmaillard at openjdk.org Wed Oct 1 09:07:40 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 1 Oct 2025 09:07:40 GMT Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop [v11] In-Reply-To: <5dZBZ1jSHKg6iAvFbEL8r5GChlIJQz2Xac3RKzp9lMA=.a8b6a617-d703-432c-b1c7-5c61c07f3b21@github.com> References: <5dZBZ1jSHKg6iAvFbEL8r5GChlIJQz2Xac3RKzp9lMA=.a8b6a617-d703-432c-b1c7-5c61c07f3b21@github.com> Message-ID: <5GdZKPzSy7C4GUc03q5slSyW0OQ5hAVS3-p88yN020E=.a39814f4-4761-4bbc-b853-28a188f4f030@github.com> On Tue, 30 Sep 2025 14:13:45 GMT, Roland Westrelin wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bad github commit suggestion > > src/hotspot/share/opto/loopTransform.cpp line 1800: > >> 1798: // body was cloned as a unit >> 1799: IdealLoopTree* input_loop = get_loop(get_ctrl(store->in(MemNode::Memory))); >> 1800: if (!outer_loop->is_member(input_loop)) { > > Same here. Actually, I wonder if a new method that also does the `get_ctrl()` (or `ctrl_or_self()`), wouldn't be useful given that pattern must be quite common. Makes sense. It seems there are a lot of occurrences indeed, maybe I should address this in a separate RFE. Btw, it seems we could also change the return type of `PhaseIdealLoop::is_member` from `int` to `bool`, to stay consistent with `IdealLoopTree::is_member`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2393914818 From roland at openjdk.org Wed Oct 1 09:07:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 1 Oct 2025 09:07:41 GMT Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop [v11] In-Reply-To: <5GdZKPzSy7C4GUc03q5slSyW0OQ5hAVS3-p88yN020E=.a39814f4-4761-4bbc-b853-28a188f4f030@github.com> References: <5dZBZ1jSHKg6iAvFbEL8r5GChlIJQz2Xac3RKzp9lMA=.a8b6a617-d703-432c-b1c7-5c61c07f3b21@github.com> <5GdZKPzSy7C4GUc03q5slSyW0OQ5hAVS3-p88yN020E=.a39814f4-4761-4bbc-b853-28a188f4f030@github.com> Message-ID: On Wed, 1 Oct 2025 09:02:40 GMT, Beno?t Maillard wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1800: >> >>> 1798: // body was cloned as a unit >>> 1799: IdealLoopTree* input_loop = get_loop(get_ctrl(store->in(MemNode::Memory))); >>> 1800: if (!outer_loop->is_member(input_loop)) { >> >> Same here. Actually, I wonder if a new method that also does the `get_ctrl()` (or `ctrl_or_self()`), wouldn't be useful given that pattern must be quite common. > > Makes sense. It seems there are a lot of occurrences indeed, maybe I should address this in a separate RFE. Btw, it seems we could also change the return type of `PhaseIdealLoop::is_member` from `int` to `bool`, to stay consistent with `IdealLoopTree::is_member`. Indeed, no reason for a return type of `int`. Sure, a separate RFE works. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2393920482 From bmaillard at openjdk.org Wed Oct 1 09:12:34 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 1 Oct 2025 09:12:34 GMT Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop In-Reply-To: <0sO2cPw0cvqc012qfyLQLLTukDO2q85ry3tGavZ3ZPM=.6d8c9b58-2eb5-46c0-ac53-d5041588d8ea@github.com> References: <0sO2cPw0cvqc012qfyLQLLTukDO2q85ry3tGavZ3ZPM=.6d8c9b58-2eb5-46c0-ac53-d5041588d8ea@github.com> Message-ID: <5qqJor0aW62I9Tqv2lQ1HwOhS_zvvnwufr5LVk3Wsl4=.81200860-1a00-4823-8de9-46eaa2de008a@github.com> On Fri, 12 Sep 2025 09:22:04 GMT, Roland Westrelin wrote: >> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`. >> >> ### Context >> >> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored: >> >> >> static public void test() { >> x = 0; >> for (int i = 0; i < 20000; i++) { >> x += i; >> } >> x = 0; >> } >> >> >> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late. >> >> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues). >> >> ### Detailed Analysis >> >> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`. When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later. >> >> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect. >> >> This is wh... > > Not a review but a comment on the missing Phis. Your description makes it sound like if the `OuterStripMinedLoop` was created with `Phis` from the start, there would be no issue. That's no true AFAICT. The current logic for pre/main/post loops creation would simply not work because it doesn't expect the `Phis` and it would need to be extended so things are rewired correctly with the outer loop `Phis`. The inner loop would still have no `Phi` for the sunk store. So the existing logic, once fixed, would not find it either and you would need some new logic to find it maybe using the outer loop `Phis`. The current shape of the outer loop (without the Phis) is very simple and there's only one location where the Store can be (on the exit projection of the inner loop right above the safepoint which is right below the exit of the inner loop and can't be anywhere else). So you added logic to find the Store relying on the current shape of the outer loop. If the outer loop had `Phis`, some a lternate version of that logic could be used. They seem like 2 ways of doing the same thing to me and nothing tells us one is better than the other. In short, I don't find this bug a good example of something that would work better if we had `Phi`s on the outer loop. I wouldn't say the root cause is that we don't have `Phi`s on the outer loop either. Thank you for your review @rwestrel! And apologies for not replying to your comment earlier, I saw it right before leaving on vacation and then forgot. I agree with what you said, and I may have overlooked that aspect while writing my explanation. Thanks for clearing that out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27225#issuecomment-3355423526 From mchevalier at openjdk.org Wed Oct 1 09:18:26 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 1 Oct 2025 09:18:26 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 00:23:20 GMT, Marc Chevalier wrote: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... I mean that I went in the peeling policy to write conditions such as "first time, return 'no peeling' only the second time, run normally", to try to reproduce the right sequence on one particular input. So it's not very far of stressing: instead of deciding randomly, I hardcode in the C++ whether it should run normally or exit immediately for each call to the peeling policy. It's mostly useful for the case where the natural run would peel, but I don't want it to happen too early. Unlike stressing that would not return "should peel" when the normal heuristic would not peeling. But that's the only difference. I was trying to get closer from a natural (no stress) example, but at the end, it ended up being too similar to stressing to be really conclusive about real world cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3355446368 From rrich at openjdk.org Wed Oct 1 09:29:30 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 1 Oct 2025 09:29:30 GMT Subject: RFR: 8368861: [TEST] compiler/floatingpoint/ScalarFPtoIntCastTest.java expects x86 IR on non-x86 platforms [v2] In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 16:16:08 GMT, Richard Reingruber wrote: >> This removes X86 specific IR checks that are applied if the X86 specific feature *avx10_2* is not present. >> >> We could make the checks dependent on the platform being X86 if we wanted to keep them but I don't see a value in doing so. >> >> Tested on X86 and PPC64 > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Keep checks and limit them to x64 Tests with the current version also pass on ppc64. Thanks for testing and reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27546#issuecomment-3355479613 From rrich at openjdk.org Wed Oct 1 09:29:31 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 1 Oct 2025 09:29:31 GMT Subject: Integrated: 8368861: [TEST] compiler/floatingpoint/ScalarFPtoIntCastTest.java expects x86 IR on non-x86 platforms In-Reply-To: References: Message-ID: On Mon, 29 Sep 2025 12:31:22 GMT, Richard Reingruber wrote: > This removes X86 specific IR checks that are applied if the X86 specific feature *avx10_2* is not present. > > We could make the checks dependent on the platform being X86 if we wanted to keep them but I don't see a value in doing so. > > Tested on X86 and PPC64 This pull request has now been integrated. Changeset: 5a2700f2 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/5a2700f231d72e2241703c1d17b308f031e8566c Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod 8368861: [TEST] compiler/floatingpoint/ScalarFPtoIntCastTest.java expects x86 IR on non-x86 platforms Reviewed-by: sviswanathan, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/27546 From rcastanedalo at openjdk.org Wed Oct 1 09:35:20 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 1 Oct 2025 09:35:20 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 09:16:00 GMT, Marc Chevalier wrote: > I mean that I went in the peeling policy to write conditions such as "first time, return 'no peeling' only the second time, run normally", to try to reproduce the right sequence on one particular input. So it's not very far of stressing: instead of deciding randomly, I hardcode in the C++ whether it should run normally or exit immediately for each call to the peeling policy. It's mostly useful for the case where the natural run would peel, but I don't want it to happen too early. > > Unlike stressing that would not return "should peel" when the normal heuristic would not peeling. But that's the only difference. I was trying to get closer from a natural (no stress) example, but at the end, it ended up being too similar to stressing to be really conclusive about real world cases. I see, thanks for the clarification. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3355500628 From adinn at openjdk.org Wed Oct 1 09:38:23 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 1 Oct 2025 09:38:23 GMT Subject: RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v6] In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 02:11:31 GMT, Dean Long wrote: >> Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains three commits: >> >> - use uint32_t for _mask >> - remove redundant code >> - 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' > > My understanding is that pipeline latencies and instruction cost "latencies" are two different things, and only the latter affect matching/selection. If OptoScheduling is turned off, pipeline scheduling should be turned off, but we will still get correct instruction selection based on "ins_cost", not "fixed_latency" or "pipe_class". @dean-long > My understanding is that pipeline latencies and instruction cost "latencies" are two different things, and only the latter affect matching/selection. If OptoScheduling is turned off, pipeline scheduling should be turned off, but we will still get correct instruction selection based on "ins_cost", not "fixed_latency" or "pipe_class". Doh! Of course, you are right. I was mistaken in believing that 'ins_cost' was used to derive 'fixed_latency' for the mach node. It seems that it is inherited from the pipeline class declared in the instruction. So, capping and/or tweaking the latencies will be neutral as far as instruction selection is concerned and, given the way the masking works, will be unlikely even to change scheduling. We should still look at whether the pipeline model is actually benefiting us when it comes to scheduling and either modify it or, as a minimum, adjust the inappropriately high latencies encoded by some arches while ensuring this has no detrimental effect on code scheduling. I will update the description in https://bugs.openjdk.org/browse/JDK-8368971 to reflect your observation above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3355512594 From mhaessig at openjdk.org Wed Oct 1 10:51:49 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 1 Oct 2025 10:51:49 GMT Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop [v12] In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 09:02:56 GMT, Beno?t Maillard wrote: >> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`. >> >> ### Context >> >> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored: >> >> >> static public void test() { >> x = 0; >> for (int i = 0; i < 20000; i++) { >> x += i; >> } >> x = 0; >> } >> >> >> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late. >> >> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues). >> >> ### Detailed Analysis >> >> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`. When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later. >> >> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect. >> >> This is wh... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Use PhaseIdealLoop::is_member Marked as reviewed by mhaessig (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27225#pullrequestreview-3288529694 From roland at openjdk.org Wed Oct 1 12:23:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 1 Oct 2025 12:23:30 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - merge - Merge branch 'master' into JDK-8354282 - fix & test ------------- Changes: https://git.openjdk.org/jdk/pull/24575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=01 Stats: 267 lines in 4 files changed: 201 ins; 14 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From snatarajan at openjdk.org Wed Oct 1 12:28:38 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 1 Oct 2025 12:28:38 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v2] In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: > The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). > > ### Fix > Implemented the suggested refactoring. > > ### Testing > Github Actions, Tier 1-3 Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: - fixing test failure - addressing review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26902/files - new: https://git.openjdk.org/jdk/pull/26902/files/79fc6a6a..30ef8eed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=00-01 Stats: 170 lines in 2 files changed: 88 ins; 75 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/26902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26902/head:pull/26902 PR: https://git.openjdk.org/jdk/pull/26902 From roland at openjdk.org Wed Oct 1 12:33:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 1 Oct 2025 12:33:55 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: <2uqd_nRO0UZWonQnFDqkWYvrYwTGQbDEDnWx3C4eoAo=.65472aeb-e9c2-4f99-8728-d4c7e1afaf57@github.com> Message-ID: On Thu, 24 Apr 2025 09:12:39 GMT, Roland Westrelin wrote: > If a `CastII` that does not narrow its input has its type being a constant, do you think GVN should transform it into a constant, or such nodes should return the bottom type so that it is not folded into a floating `ConNode`? The current patch allows constant folding. I could (since I last commented on this PR) write test cases for which this causes issues. Disallowing constant folding seems too strict to me. I propose the issues related to constant folding be handled in a separate PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3356091219 From roland at openjdk.org Wed Oct 1 12:52:42 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 1 Oct 2025 12:52:42 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: <7Y1VflHDgnEChBAv9bwWH5ayU-K9ngRa3BfPjgzzHP0=.61111d18-a22e-4cb5-9492-e50f5524ac08@github.com> References: <7Y1VflHDgnEChBAv9bwWH5ayU-K9ngRa3BfPjgzzHP0=.61111d18-a22e-4cb5-9492-e50f5524ac08@github.com> Message-ID: On Wed, 23 Apr 2025 10:49:41 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - merge >> - Merge branch 'master' into JDK-8354282 >> - fix & test > > src/hotspot/share/opto/castnode.cpp line 39: > >> 37: const ConstraintCastNode::DependencyType ConstraintCastNode::WidenTypeDependency(true, false, "widen type dependency"); // not pinned, doesn't narrow type >> 38: const ConstraintCastNode::DependencyType ConstraintCastNode::StrongDependency(false, true, "strong dependency"); // pinned, narrows type >> 39: const ConstraintCastNode::DependencyType ConstraintCastNode::UnconditionalDependency(false, false, "unconditional dependency"); // pinned, doesn't narrow type > > Is there really a good reason to have the names `Regular`, `WidenType`, `Strong` and `Unconditional`? Did we just get used to these names over time, or do they really have a good reason for existance. They just don't really mean that much to me. Calling them (non)pinned and (non)narrowing would make more sense to me. So `NonPinnedNarrowingDependency`, `NonPinnedNonNarrowingDependeny`, `PinnedNarrowingDependency` and `NonPinnedNonNarrowingDependency`? Or to avoid using a negation for the one that's the weakest dependency: `FloatingNarrowingDependency`, `FloatingNonNarrowingDependency`, `NonFloatingNarrowingDependency` and `NonFloatingNonNarrowingDependency `? What do you think @eme64 ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2394460598 From jcking at openjdk.org Wed Oct 1 13:26:39 2025 From: jcking at openjdk.org (Justin King) Date: Wed, 1 Oct 2025 13:26:39 GMT Subject: Integrated: 8368962: hotspot/cpu/aarch64/bytecodes_aarch64.{hpp, cpp} is unused In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 13:41:42 GMT, Justin King wrote: > Remove deadcode. This pull request has now been integrated. Changeset: c69456e8 Author: Justin King URL: https://git.openjdk.org/jdk/commit/c69456e87aeb8653ce23bc7f579c254511bbf2d1 Stats: 59 lines in 2 files changed: 0 ins; 59 del; 0 mod 8368962: hotspot/cpu/aarch64/bytecodes_aarch64.{hpp,cpp} is unused Reviewed-by: aph, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/27577 From tholenstein at openjdk.org Wed Oct 1 13:45:05 2025 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 1 Oct 2025 13:45:05 GMT Subject: RFR: 8368675: IGV: nodes are wrongly marked as changed in the difference view In-Reply-To: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com> References: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com> Message-ID: On Fri, 26 Sep 2025 08:34:34 GMT, Roberto Casta?eda Lozano wrote: > This changeset refines IGV's node difference analysis to ignore changes in node properties that are derived by IGV, as opposed to generated by HotSpot. Derived properties include the label and color of each node. Ignoring changes in these properties prevents IGV from wrongly marking equal nodes as "changed" (colored in yellow) when showing the difference between two graphs: > > before-after > > **Testing:** tier1 and manual testing on a few graphs. Marked as reviewed by tholenstein (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27515#pullrequestreview-3289221743 From rcastanedalo at openjdk.org Wed Oct 1 13:58:32 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 1 Oct 2025 13:58:32 GMT Subject: RFR: 8368675: IGV: nodes are wrongly marked as changed in the difference view In-Reply-To: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com> References: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com> Message-ID: On Fri, 26 Sep 2025 08:34:34 GMT, Roberto Casta?eda Lozano wrote: > This changeset refines IGV's node difference analysis to ignore changes in node properties that are derived by IGV, as opposed to generated by HotSpot. Derived properties include the label and color of each node. Ignoring changes in these properties prevents IGV from wrongly marking equal nodes as "changed" (colored in yellow) when showing the difference between two graphs: > > before-after > > **Testing:** tier1 and manual testing on a few graphs. Thanks for reviewing, Toby! :slightly_smiling_face: ------------- PR Comment: https://git.openjdk.org/jdk/pull/27515#issuecomment-3356469411 From rcastanedalo at openjdk.org Wed Oct 1 13:58:34 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 1 Oct 2025 13:58:34 GMT Subject: Integrated: 8368675: IGV: nodes are wrongly marked as changed in the difference view In-Reply-To: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com> References: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com> Message-ID: On Fri, 26 Sep 2025 08:34:34 GMT, Roberto Casta?eda Lozano wrote: > This changeset refines IGV's node difference analysis to ignore changes in node properties that are derived by IGV, as opposed to generated by HotSpot. Derived properties include the label and color of each node. Ignoring changes in these properties prevents IGV from wrongly marking equal nodes as "changed" (colored in yellow) when showing the difference between two graphs: > > before-after > > **Testing:** tier1 and manual testing on a few graphs. This pull request has now been integrated. Changeset: 182fbc2b Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/182fbc2b836d27410ccd0da512acb17bac9363c1 Stats: 27 lines in 5 files changed: 14 ins; 2 del; 11 mod 8368675: IGV: nodes are wrongly marked as changed in the difference view Reviewed-by: mchevalier, mhaessig, dfenacci, tholenstein ------------- PR: https://git.openjdk.org/jdk/pull/27515 From bmaillard at openjdk.org Wed Oct 1 14:09:09 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 1 Oct 2025 14:09:09 GMT Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop [v11] In-Reply-To: References: <5dZBZ1jSHKg6iAvFbEL8r5GChlIJQz2Xac3RKzp9lMA=.a8b6a617-d703-432c-b1c7-5c61c07f3b21@github.com> <5GdZKPzSy7C4GUc03q5slSyW0OQ5hAVS3-p88yN020E=.a39814f4-4761-4bbc-b853-28a188f4f030@github.com> Message-ID: <-lH90b5RFJ--9KtCYCcB0XqkTfQGChx-wHlnIKNqfqM=.3b4c8979-d805-4f4f-a521-361836db37a8@github.com> On Wed, 1 Oct 2025 09:05:01 GMT, Roland Westrelin wrote: >> Makes sense. It seems there are a lot of occurrences indeed, maybe I should address this in a separate RFE. Btw, it seems we could also change the return type of `PhaseIdealLoop::is_member` from `int` to `bool`, to stay consistent with `IdealLoopTree::is_member`. > > Indeed, no reason for a return type of `int`. Sure, a separate RFE works. I have filed [JDK-8369002](https://bugs.openjdk.org/browse/JDK-8369002). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2394727832 From mchevalier at openjdk.org Wed Oct 1 14:44:31 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 1 Oct 2025 14:44:31 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v2] In-Reply-To: References: Message-ID: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Add test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27586/files - new: https://git.openjdk.org/jdk/pull/27586/files/f694490c..d0458b2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=00-01 Stats: 71 lines in 1 file changed: 71 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27586/head:pull/27586 PR: https://git.openjdk.org/jdk/pull/27586 From kxu at openjdk.org Wed Oct 1 15:23:37 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 1 Oct 2025 15:23:37 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v12] In-Reply-To: <_5Hm3XJpWIYumEbCGu5fDiNQsaVxIj0wuszhbV8ZNkE=.d545c7be-bdaf-473b-ad95-0f47c86a1166@github.com> References: <_5Hm3XJpWIYumEbCGu5fDiNQsaVxIj0wuszhbV8ZNkE=.d545c7be-bdaf-473b-ad95-0f47c86a1166@github.com> Message-ID: On Wed, 1 Oct 2025 07:45:26 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: >> >> - Merge remote-tracking branch 'origin/master' into counted-loop-refactor >> >> # Conflicts: >> # src/hotspot/share/opto/loopnode.cpp >> - futher refactor counted loop conversion >> - WIP: remove unused #include >> - WIP: refactor structs to classes >> - WIP: removed dead code, renamed fields and signatures >> - Merge branch 'openjdk:master' into counted-loop-refactor >> - Merge remote-tracking branch 'origin/master' into counted-loop-refactor >> >> # Conflicts: >> # src/hotspot/share/opto/loopnode.cpp >> # src/hotspot/share/opto/loopnode.hpp >> - Merge branch 'master' into counted-loop-refactor >> >> # Conflicts: >> # src/hotspot/share/opto/loopnode.cpp >> # src/hotspot/share/opto/loopnode.hpp >> # src/hotspot/share/opto/loopopts.cpp >> - Merge remote-tracking branch 'origin/master' into counted-loop-refactor >> - further refactor is_counted_loop() by extracting functions >> - ... and 19 more: https://git.openjdk.org/jdk/compare/0366d882...b1d27675 > > src/hotspot/share/opto/loopopts.cpp line 1695: > >> 1693: !n->is_OpaqueInitializedAssertionPredicate() && >> 1694: !n->is_OpaqueTemplateAssertionPredicate() && >> 1695: !n->is_Type()) { > > This change seems unrelated. Bad merge? Nice catch! Yes it looks like a bad merge. Sorry! Changes from 8354383(https://github.com/openjdk/jdk/commit/a2f99fd88bd03337e1ba73b413ffe4e39f3584cf) is re-added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2394994618 From kxu at openjdk.org Wed Oct 1 15:23:33 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 1 Oct 2025 15:23:33 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v13] In-Reply-To: References: Message-ID: <6zw0uSB1sUHZTyDUXDjiXcB0Chmu0XH1cEngzhG-UNk=.b239a687-cfb7-49a3-993a-34327a83c4de@github.com> > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: 8354383: C2: enable sinking of Type nodes out of loop Reviewed-by: chagedorn, thartmann (cherry picked from commit a2f99fd88bd03337e1ba73b413ffe4e39f3584cf) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/b1d27675..2cf1da18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=11-12 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From roland at openjdk.org Wed Oct 1 15:31:42 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 1 Oct 2025 15:31:42 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - test seed - more - Merge branch 'master' into JDK-8351889 - Merge branch 'master' into JDK-8351889 - more - test - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25386/files - new: https://git.openjdk.org/jdk/pull/25386/files/e93275c6..bf984838 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=00-01 Stats: 290750 lines in 5350 files changed: 178755 ins; 74753 del; 37242 mod Patch: https://git.openjdk.org/jdk/pull/25386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25386/head:pull/25386 PR: https://git.openjdk.org/jdk/pull/25386 From roland at openjdk.org Wed Oct 1 15:31:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 1 Oct 2025 15:31:43 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Thu, 22 May 2025 08:35:18 GMT, Roland Westrelin wrote: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... Anyone else has an opinion on this? Mine is that: - fix is not guaranteed to be sufficient but there's no indication it's not - fix is low risk and we can always revisit it if we hit some more related failure So I would go with this fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3356971546 From iveresov at openjdk.org Wed Oct 1 19:10:39 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 1 Oct 2025 19:10:39 GMT Subject: RFR: 8368698: runtime/cds/appcds/aotCache/OldClassSupport.java assert(can_add()) failed: Cannot add TrainingData objects Message-ID: That's a bit of a bug trail from [JDK-8366948](https://bugs.openjdk.org/browse/JDK-8366948). We need to check if the TD snapshot has happened before attempting to modify the data. ------------- Commit messages: - Check for snapshot Changes: https://git.openjdk.org/jdk/pull/27593/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27593&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8368698 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27593/head:pull/27593 PR: https://git.openjdk.org/jdk/pull/27593 From heidinga at openjdk.org Wed Oct 1 19:44:40 2025 From: heidinga at openjdk.org (Dan Heidinga) Date: Wed, 1 Oct 2025 19:44:40 GMT Subject: RFR: 8368698: runtime/cds/appcds/aotCache/OldClassSupport.java assert(can_add()) failed: Cannot add TrainingData objects In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 19:04:37 GMT, Igor Veresov wrote: > That's a bit of a bug trail from [JDK-8366948](https://bugs.openjdk.org/browse/JDK-8366948). We need to check if the TD snapshot has happened before attempting to modify the data. lgtm ------------- Marked as reviewed by heidinga (no project role). PR Review: https://git.openjdk.org/jdk/pull/27593#pullrequestreview-3290826341 From iklam at openjdk.org Wed Oct 1 20:09:04 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 1 Oct 2025 20:09:04 GMT Subject: RFR: 8368698: runtime/cds/appcds/aotCache/OldClassSupport.java assert(can_add()) failed: Cannot add TrainingData objects In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 19:04:37 GMT, Igor Veresov wrote: > That's a bit of a bug trail from [JDK-8366948](https://bugs.openjdk.org/browse/JDK-8366948). We need to check if the TD snapshot has happened before attempting to modify the data. LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27593#pullrequestreview-3290908319 From iveresov at openjdk.org Wed Oct 1 23:19:58 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 1 Oct 2025 23:19:58 GMT Subject: Integrated: 8368698: runtime/cds/appcds/aotCache/OldClassSupport.java assert(can_add()) failed: Cannot add TrainingData objects In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 19:04:37 GMT, Igor Veresov wrote: > That's a bit of a bug trail from [JDK-8366948](https://bugs.openjdk.org/browse/JDK-8366948). We need to check if the TD snapshot has happened before attempting to modify the data. This pull request has now been integrated. Changeset: 4df41d2a Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/4df41d2a751e2942c2188ed01313d78e681835bc Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8368698: runtime/cds/appcds/aotCache/OldClassSupport.java assert(can_add()) failed: Cannot add TrainingData objects Reviewed-by: heidinga, iklam ------------- PR: https://git.openjdk.org/jdk/pull/27593 From bulasevich at openjdk.org Wed Oct 1 23:52:02 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 1 Oct 2025 23:52:02 GMT Subject: RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v6] In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 19:26:43 GMT, Boris Ulasevich wrote: >> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. >> >> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. >> >> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. >> >> The problems is that shift count `n` may be too large here: >> >> class Pipeline_Use_Cycle_Mask { >> protected: >> uint _mask; >> .. >> Pipeline_Use_Cycle_Mask& operator<<=(int n) { >> _mask <<= n; >> return *this; >> } >> }; >> >> The recent change attempted to cap the shift amount at one call site: >> >> class Pipeline_Use_Element { >> protected: >> .. >> // Mask of specific used cycles >> Pipeline_Use_Cycle_Mask _mask; >> .. >> void step(uint cycles) { >> _used = 0; >> uint max_shift = 8 * sizeof(_mask) - 1; >> _mask <<= (cycles < max_shift) ? cycles : max_shift; >> } >> } >> >> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: >> >> // The following two routines assume that the root Pipeline_Use entity >> // consists of exactly 1 element for each functional unit >> // start is relative to the current cycle; used for latency-based info >> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { >> for (uint i = 0; i < pred._count; i++) { >> const Pipeline_Use_Element *predUse = pred.element(i); >> if (predUse->_multiple) { >> uint min_delay = 7; >> // Multiple possible functional units, choose first unused one >> for (uint j = predUse->_lb; j <= predUse->_ub; j++) { >> const Pipeline_Use_Element *currUse = element(j); >> uint curr_delay = delay; >> if (predUse->_used & currUse->_used) { >> Pipeline_Use_Cycle_Mask x = predUse->_mask; >> Pipeline_Use_Cycle_Mask y = currUse->_mask; >> >> for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) >> y <<= 1; >> } >> if (min_delay > curr_delay) >> min_delay = curr_delay; >> } >> if (delay < min_delay) >> delay = min_delay; >> } >> else { >> for (uint j = predUse->_lb; j <= pre... > > Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains three commits: > > - use uint32_t for _mask > - remove redundant code > - 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' Thanks everyone for the help, thoughtful discussion and reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3358540154 From bulasevich at openjdk.org Wed Oct 1 23:52:03 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 1 Oct 2025 23:52:03 GMT Subject: Integrated: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' In-Reply-To: References: Message-ID: <1BtNNvmJ8pE1gYZUzUCp-NWaIwe6ubwHaZdhG7kWuEU=.41d1f994-3632-4aba-9b18-61a327f64205@github.com> On Fri, 22 Aug 2025 00:47:48 GMT, Boris Ulasevich wrote: > This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. > > The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. > > This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. > > The problems is that shift count `n` may be too large here: > > class Pipeline_Use_Cycle_Mask { > protected: > uint _mask; > .. > Pipeline_Use_Cycle_Mask& operator<<=(int n) { > _mask <<= n; > return *this; > } > }; > > The recent change attempted to cap the shift amount at one call site: > > class Pipeline_Use_Element { > protected: > .. > // Mask of specific used cycles > Pipeline_Use_Cycle_Mask _mask; > .. > void step(uint cycles) { > _used = 0; > uint max_shift = 8 * sizeof(_mask) - 1; > _mask <<= (cycles < max_shift) ? cycles : max_shift; > } > } > > However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: > > // The following two routines assume that the root Pipeline_Use entity > // consists of exactly 1 element for each functional unit > // start is relative to the current cycle; used for latency-based info > uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { > for (uint i = 0; i < pred._count; i++) { > const Pipeline_Use_Element *predUse = pred.element(i); > if (predUse->_multiple) { > uint min_delay = 7; > // Multiple possible functional units, choose first unused one > for (uint j = predUse->_lb; j <= predUse->_ub; j++) { > const Pipeline_Use_Element *currUse = element(j); > uint curr_delay = delay; > if (predUse->_used & currUse->_used) { > Pipeline_Use_Cycle_Mask x = predUse->_mask; > Pipeline_Use_Cycle_Mask y = currUse->_mask; > > for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) > y <<= 1; > } > if (min_delay > curr_delay) > min_delay = curr_delay; > } > if (delay < min_delay) > delay = min_delay; > } > else { > for (uint j = predUse->_lb; j <= predUse->_ub; j++) { > const Pipeline_Use_Element *currUse = element(j); > if (predUse->_used & currUse->_used) { > ... This pull request has now been integrated. Changeset: fa3af820 Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/fa3af820ad310704e8d25cf496f676e09d60797d Stats: 17 lines in 1 file changed: 0 ins; 6 del; 11 mod 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/26890 From epeter at openjdk.org Thu Oct 2 00:54:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 2 Oct 2025 00:54:52 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: <7Y1VflHDgnEChBAv9bwWH5ayU-K9ngRa3BfPjgzzHP0=.61111d18-a22e-4cb5-9492-e50f5524ac08@github.com> Message-ID: On Wed, 1 Oct 2025 12:50:13 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/castnode.cpp line 39: >> >>> 37: const ConstraintCastNode::DependencyType ConstraintCastNode::WidenTypeDependency(true, false, "widen type dependency"); // not pinned, doesn't narrow type >>> 38: const ConstraintCastNode::DependencyType ConstraintCastNode::StrongDependency(false, true, "strong dependency"); // pinned, narrows type >>> 39: const ConstraintCastNode::DependencyType ConstraintCastNode::UnconditionalDependency(false, false, "unconditional dependency"); // pinned, doesn't narrow type >> >> Is there really a good reason to have the names `Regular`, `WidenType`, `Strong` and `Unconditional`? Did we just get used to these names over time, or do they really have a good reason for existance. They just don't really mean that much to me. Calling them (non)pinned and (non)narrowing would make more sense to me. > > So `NonPinnedNarrowingDependency`, `NonPinnedNonNarrowingDependeny`, `PinnedNarrowingDependency` and `NonPinnedNonNarrowingDependency`? > > Or to avoid using a negation for the one that's the weakest dependency: > > `FloatingNarrowingDependency`, `FloatingNonNarrowingDependency`, `NonFloatingNarrowingDependency` and `NonFloatingNonNarrowingDependency `? > > What do you think @eme64 ? Either of these sound great :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2396368531 From dlong at openjdk.org Thu Oct 2 04:06:56 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 2 Oct 2025 04:06:56 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 15:31:42 GMT, Roland Westrelin wrote: >> The test case has an out of loop `Store` with an `AddP` address >> expression that has other uses and is in the loop body. Schematically, >> only showing the address subgraph and the bases for the `AddP`s: >> >> >> Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> CastPP#110 >> >> >> Both `AddP`s have the same base, a `CastPP` that's also in the loop >> body. >> >> That loop is a counted loop and only has 3 iterations so is fully >> unrolled. First, one iteration is peeled: >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> The `AddP`s and `CastPP` are cloned (because in the loop body). As >> part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is >> called. It finds the test that guards `CastPP#283` in the peeled >> iteration dominates and replaces the test that guards `CastPP#110` >> (the test in the peeled iteration is the clone of the test in the >> loop). That causes `CastPP#110`'s control to be updated to that of the >> test in the peeled iteration and to be yanked from the loop. So now >> `CastPP#283` and `CastPP#110` have the same inputs. >> >> Next unrolling happens: >> >> >> /-> CastPP#110 >> /-> AddP#400 -> AddP#401 -> CastPP#110 >> Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 >> \ -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> `AddP`s are cloned once more but not the `CastPP`s because they are >> both in the peeled iteration now. A new `Phi` is added. >> >> Next igvn runs. It's going to push the `AddP`s through the `Phi`s. >> >> Through `Phi#477`: >> >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 >> \ -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> Through `Phi#360`: >> >> >> /-> AddP#134 -> CastPP#110 >> /-> Phi#509 -> AddP#401 -> CastPP#110 >> Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 >> -> Phi#514 -> CastPP#283 >> ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - test seed > - more > - Merge branch 'master' into JDK-8351889 > - Merge branch 'master' into JDK-8351889 > - more > - test > - fix What if we just relax the assert? I failed to figure out what this assert is protecting us from by looking at the code. So what happens in a product build or when this assert is commented out? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3358952355 From iveresov at openjdk.org Thu Oct 2 04:21:22 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 2 Oct 2025 04:21:22 GMT Subject: RFR: 8369033: Remove dead code in training data Message-ID: Remove dead code ------------- Commit messages: - Remove dead code Changes: https://git.openjdk.org/jdk/pull/27600/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27600&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369033 Stats: 23 lines in 2 files changed: 0 ins; 14 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/27600.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27600/head:pull/27600 PR: https://git.openjdk.org/jdk/pull/27600 From rcastanedalo at openjdk.org Thu Oct 2 05:46:45 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 Oct 2025 05:46:45 GMT Subject: RFR: 8369033: Remove dead code in training data In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 04:13:54 GMT, Igor Veresov wrote: > Remove dead code Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27600#pullrequestreview-3292504311 From dfenacci at openjdk.org Thu Oct 2 06:22:47 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 2 Oct 2025 06:22:47 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee [v4] In-Reply-To: References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> Message-ID: On Wed, 3 Sep 2025 06:50:26 GMT, Damon Fenacci wrote: >> # Issue >> The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one. >> In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan. >> >> # Cause >> The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading. >> >> For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method. >> What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`. >> >> More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same. >> >> # Fix >> >> This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. This can possibly happen in other situations (e.g JVMTI class redefinition). So, instead of modifying the assert (to check for invalid dependencies) we avoid re-setting the callee method ... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8355354: add stress comment @dean-long, since you checked this PR already, could I ask you for a final review (accept/reject) as well? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26441#issuecomment-3359323975 From roland at openjdk.org Thu Oct 2 08:06:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 Oct 2025 08:06:08 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v19] In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 04:48:35 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 69 commits: > > - Merge remote-tracking branch 'origin/master' into arithmetic-canonicalization > - update naming and comments > - Merge branch 'openjdk:master' into arithmetic-canonicalization > - Merge remote-tracking branch 'origin/master' into arithmetic-canonicalization > - Allow swapping LHS/RHS in case not matched > - Merge branch 'refs/heads/master' into arithmetic-canonicalization > - improve comment readability and struct helper functions > - remove asserts, add more documentation > - fix typo: lhs->rhs > - update comments > - ... and 59 more: https://git.openjdk.org/jdk/compare/0366d882...ce23d393 src/hotspot/share/opto/addnode.cpp line 555: > 553: > 554: // Pattern (1) > 555: if (lhs.valid && rhs.valid && lhs.variable == rhs.variable) { Would it make sense to add a method to `Multiplication`, let's say `add`. The 3 `if`s here would then be replaced by: Multiplication res = lhs.add(rhs); if (res.valid()) { return res; } return find_simple_addition_pattern(n, bt); and the logic from the 3 ifs would be moved to `Multiplication::add`. What do you think? src/hotspot/share/opto/addnode.hpp line 46: > 44: virtual uint hash() const; > 45: > 46: struct Multiplication { Is there a benefit to this being a `struct` instead of a `class`? src/hotspot/share/opto/addnode.hpp line 47: > 45: > 46: struct Multiplication { > 47: bool valid = false; field names usually start with a `_` so `bool _valid` src/hotspot/share/opto/addnode.hpp line 60: > 58: }; > 59: > 60: static Multiplication find_collapsible_addition_patterns(const Node* a, const Node* pattern, BasicType bt); Shouldn't these methods be member methods of `Multiplication` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2397605619 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2397579068 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2397581099 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2397584858 From duke at openjdk.org Thu Oct 2 08:26:46 2025 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane Ampudia) Date: Thu, 2 Oct 2025 08:26:46 GMT Subject: RFR: 8368780: IGV: Upgrade to Netbeans Platform 27 In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 13:57:23 GMT, Ant?n Seoane Ampudia wrote: > This PR upgrades IGV and its dependencies to the newest Netbeans Platform 27, released on August 21, 2025. It also supports running the latest (LTS) JDK 25. > > It has been tested that IGV still behaves as expected after the upgrade. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27579#issuecomment-3359822128 From duke at openjdk.org Thu Oct 2 08:26:46 2025 From: duke at openjdk.org (duke) Date: Thu, 2 Oct 2025 08:26:46 GMT Subject: RFR: 8368780: IGV: Upgrade to Netbeans Platform 27 In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 13:57:23 GMT, Ant?n Seoane Ampudia wrote: > This PR upgrades IGV and its dependencies to the newest Netbeans Platform 27, released on August 21, 2025. It also supports running the latest (LTS) JDK 25. > > It has been tested that IGV still behaves as expected after the upgrade. @anton-seoane Your change (at version 3452bfebcb59ecdc2939170a0eb8e83dc794e032) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27579#issuecomment-3359825183 From rcastanedalo at openjdk.org Thu Oct 2 08:38:24 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 Oct 2025 08:38:24 GMT Subject: RFR: 8368780: IGV: Upgrade to Netbeans Platform 27 In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 13:57:23 GMT, Ant?n Seoane Ampudia wrote: > This PR upgrades IGV and its dependencies to the newest Netbeans Platform 27, released on August 21, 2025. It also supports running the latest (LTS) JDK 25. > > It has been tested that IGV still behaves as expected after the upgrade. Let's wait for a [second approval](https://openjdk.org/guide/#hotspot-development) before integrating, even though this is a pure IGV change, for good measure. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27579#pullrequestreview-3293299114 From rcastanedalo at openjdk.org Thu Oct 2 08:48:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 Oct 2025 08:48:53 GMT Subject: RFR: 8368780: IGV: Upgrade to Netbeans Platform 27 In-Reply-To: References: Message-ID: <_VBKX3bcdaus6_IwcgUXx-2fpl1g0dxdKnDXGYM6LbA=.ff7a5c3c-da6d-4fd2-85db-5a7d0d446294@github.com> On Tue, 30 Sep 2025 13:57:23 GMT, Ant?n Seoane Ampudia wrote: > This PR upgrades IGV and its dependencies to the newest Netbeans Platform 27, released on August 21, 2025. It also supports running the latest (LTS) JDK 25. > > It has been tested that IGV still behaves as expected after the upgrade. Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27579#pullrequestreview-3293348138 From roland at openjdk.org Thu Oct 2 09:08:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 Oct 2025 09:08:06 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: - review - infinite loop in gvn fix - renaming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/c509ef56..aff5894b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=01-02 Stats: 61 lines in 10 files changed: 11 ins; 0 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From roland at openjdk.org Thu Oct 2 09:08:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 Oct 2025 09:08:08 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: <7Y1VflHDgnEChBAv9bwWH5ayU-K9ngRa3BfPjgzzHP0=.61111d18-a22e-4cb5-9492-e50f5524ac08@github.com> References: <7Y1VflHDgnEChBAv9bwWH5ayU-K9ngRa3BfPjgzzHP0=.61111d18-a22e-4cb5-9492-e50f5524ac08@github.com> Message-ID: On Wed, 23 Apr 2025 10:56:51 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: >> >> - review >> - infinite loop in gvn fix >> - renaming > > @rwestrel thanks for looking into this one! > > I have not yet deeply studied the PR, but am feeling some confusion about the naming. > > I think the `DependencyType` is really a good step into the right direction, it helps clean things up. > > I'm wondering if we should pick either `depends_only_on_test` or `pinned`, and use it everywhere consistently. Having both around as near synonymes (antonymes?) is a bit confusing for me. > > I'll look into the code more later. I pushed an update with the renaming suggested by @eme64 and an extra comment with example use cases ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3360011105 From bkilambi at openjdk.org Thu Oct 2 09:25:46 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 2 Oct 2025 09:25:46 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: <-Ei1bFBHQvpeD3n7j8WuhV572oNW1b9X8FI488DMigI=.d1f9c421-b0f5-49e0-9ac5-97732ca82c4f@github.com> On Mon, 29 Sep 2025 07:18:42 GMT, Xiaohong Gong wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 272: > >> 270: if (length_in_bytes > 16 || !is_feat_fp16_supported()) { >> 271: return false; >> 272: } > > Reductions with `length_in_bytes < 8` should also be skipped. Because such operations are not supported now, while the IRs with 32-bit vector size might exist, right? Hi @XiaohongGong, yes `length_in_bytes < 8` is also not supported and currently we support only for vector lengths of 8B and 16B. IRs with 32-bit vector size might exist but we do not have an optimized implementation for 32B vector lengths and thus I have disabled it. Instead of that, it generates the 16B scalarized Neon instruction sequence for a 32B vector length. Is this what you were asking? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2397961057 From roland at openjdk.org Thu Oct 2 09:31:48 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 Oct 2025 09:31:48 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 04:03:58 GMT, Dean Long wrote: > What if we just relax the assert? I failed to figure out what this assert is protecting us from by looking at the code. So what happens in a product build or when this assert is commented out? For this particular test case, nothing. The assert is right before the cast nodes are removed, anyway. Once they are removed, the `AddP` in the chain all have the same base input. The risk, I think, is if some code that transforms a chain of `AddP`s (some time before the assert) wrongly assume they all have the same base. It's also easier to write such a transformation if it's an invariant that a chain of `AddP`s have the same base (it's one less thing to worry about). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3360125566 From bkilambi at openjdk.org Thu Oct 2 10:23:46 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 2 Oct 2025 10:23:46 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: <4VXHOCR1YSoMVbDbB8j-j18Z-_VbO0y5fJfyp3IjQ9c=.19485011-9cb3-4016-a642-61cee81adcd1@github.com> On Mon, 29 Sep 2025 08:04:06 GMT, Xiaohong Gong wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1900: > >> 1898: fmulh(dst, dst, vtmp); >> 1899: ins(vtmp, H, vsrc, 0, 7); >> 1900: fmulh(dst, dst, vtmp); > > Do you know why the performance is not improved significantly for multiply reduction? Seems instructions between different `ins` instructions will have a data-dependence, which is not expected? Could you please use other instructions instead or clear the register `vtmp` before `ins` and check the performance changes? > > Note that a clear of `mov` such as `MOVI Vd.2D, #0` has zero cost from V2's guide. Are you referring to the N1 numbers? The add reduction operation has gains around ~40% while the mul reduction is around ~20% on N1. On V1 and V2 they look comparable (not considering the cases where we generate `fadda` instructions). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2398197879 From roland at openjdk.org Thu Oct 2 10:45:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 2 Oct 2025 10:45:47 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores Message-ID: In the `test1()` method of the test case: `inlined2()` calls `clone()` for an object loaded from field `field` that has inexact type `A` at parse time. The intrinsic for `clone()` inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the load of `field` is optimized out because it reads back a newly allocated `B` written to `field` in the same method. `ArrayCopy` can now be optimized because the type of its `src` input is known. The type of its `dest` input is the `CheckCastPP` from the allocation of the cloned object created at parse time. That one has type `A`. A series of `Load`s/`Store`s are created to copy the fields of class `B` from `src` (of type `B`) to `dest` of (type `A`). Writting to `dest` with offsets for fields that don't exist in `A`, causes this code in `Compile::flatten_alias_type()`: } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { // Static fields are in the space above the normal instance // fields in the java.lang.Class instance. if (ik != ciEnv::current()->Class_klass()) { to = nullptr; tj = TypeOopPtr::BOTTOM; offset = tj->offset(); } to assign it some slice that doesn't match the one that's used at the same offset in `B`. That causes an assert in `ArrayCopyNode::try_clone_instance()` to fire. With a release build, execution proceeds. `test1()` also has a non escaping allocation. That one causes EA to run and `ConnectionGraph::split_unique_types()` to move the store to the non escaping allocation to a new slice. In the process, when it iterates over `MergeMem` nodes, it notices the stores added by `ArrayCopyNode::try_clone_instance()`, finds that some are not on the right slice, tries to move them to the correct slice (expecting they are from a non escaping EA). That causes some of the `Store`s to be disconnected. When the resulting code runs, execution fails as some fields are not copied. The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` when `src` and `dest` classes don't match as this seems like a rare enough corner case. ------------- Commit messages: - test & fix Changes: https://git.openjdk.org/jdk/pull/27604/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27604&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339526 Stats: 99 lines in 2 files changed: 99 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27604/head:pull/27604 PR: https://git.openjdk.org/jdk/pull/27604 From syan at openjdk.org Thu Oct 2 12:40:01 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 2 Oct 2025 12:40:01 GMT Subject: RFR: 8368866: compiler/codecache/stress/UnexpectedDeoptimizationTest.java intermittent timed out In-Reply-To: <66Jgjyc-l4-WSm_20x3Lw4vnPfmxLzJfgSLeY8tOqxk=.e198cca7-3e3a-43d5-a061-69d246ccc9ef@github.com> References: <66Jgjyc-l4-WSm_20x3Lw4vnPfmxLzJfgSLeY8tOqxk=.e198cca7-3e3a-43d5-a061-69d246ccc9ef@github.com> Message-ID: On Tue, 30 Sep 2025 09:08:47 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> When I run test compiler/codecache/stress/UnexpectedDeoptimizationTest.java standalone, test occupy about 6 CPU threads. >> >> The default timeout factor was change from 4 to 1 by [JDK-8260555](https://bugs.openjdk.org/browse/JDK-8260555). This make test compiler/codecache/stress/UnexpectedDeoptimizationTest.java intermittent timed out when this test run with other tests simultancely, because this is stress test which may affected by other tests. >> >> So I want to change the default timeout value from 120 to 240, this will make this test run success steady. > > Marked as reviewed by shade (Reviewer). Thanks for the reviews and verify @shipilev @mhaessig ------------- PR Comment: https://git.openjdk.org/jdk/pull/27550#issuecomment-3361019560 From syan at openjdk.org Thu Oct 2 12:40:02 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 2 Oct 2025 12:40:02 GMT Subject: Integrated: 8368866: compiler/codecache/stress/UnexpectedDeoptimizationTest.java intermittent timed out In-Reply-To: References: Message-ID: On Mon, 29 Sep 2025 14:30:32 GMT, SendaoYan wrote: > Hi all, > > When I run test compiler/codecache/stress/UnexpectedDeoptimizationTest.java standalone, test occupy about 6 CPU threads. > > The default timeout factor was change from 4 to 1 by [JDK-8260555](https://bugs.openjdk.org/browse/JDK-8260555). This make test compiler/codecache/stress/UnexpectedDeoptimizationTest.java intermittent timed out when this test run with other tests simultancely, because this is stress test which may affected by other tests. > > So I want to change the default timeout value from 120 to 240, this will make this test run success steady. This pull request has now been integrated. Changeset: cc563c87 Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/cc563c87cd277fbc96fb77af1e99f6c018ccc020 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod 8368866: compiler/codecache/stress/UnexpectedDeoptimizationTest.java intermittent timed out Reviewed-by: shade, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/27550 From bkilambi at openjdk.org Thu Oct 2 12:46:48 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 2 Oct 2025 12:46:48 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <8jnLugyioePdrnVuu9GRZ7VBgVGw9c8Hg00YTQRQAoQ=.d8677216-3330-49b6-a72c-b8e8ae454a34@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <8jnLugyioePdrnVuu9GRZ7VBgVGw9c8Hg00YTQRQAoQ=.d8677216-3330-49b6-a72c-b8e8ae454a34@github.com> Message-ID: On Fri, 26 Sep 2025 16:21:35 GMT, Marc Chevalier wrote: > I seem to have a failure on `compiler/vectorization/TestFloat16VectorOperations.java` on aarch64 in `C2_MacroAssembler::neon_reduce_add_fp16(FloatRegister, FloatRegister, FloatRegister, unsigned int, FloatRegister)` at `src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp:1930`: > > ``` > assert(vector_length_in_bytes == 8 || vector_length_in_bytes == 16) failed: unsupported > ``` Hi, thanks for letting me know. However, I am unable to reproduce it on any of my machines. Would it be possible to share the JVM options used and also machines details. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3361062146 From mchevalier at openjdk.org Thu Oct 2 13:24:01 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 2 Oct 2025 13:24:01 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: <98jWF_NhAAB1WHHsotReB6SYIVSRIWNO0rmhxnNMJM8=.f21f3406-f3b3-4ce5-b009-6e50e2ebe1f1@github.com> On Fri, 26 Sep 2025 12:00:31 GMT, Bhavana Kilambi wrote: > This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - > > **For AddReduction :** > On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. > > **For MulReduction :** > Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. > > Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - > > Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the master branch. > > **N1 (UseSVE = 0, max vector length = 16B):** > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > ReductionAddFP16 512 thrpt 9 1.41 1.41 > ReductionAddFP16 1024 thrpt 9 1.43 1.40 > ReductionAddFP16 2048 thrpt 9 1.43 1.40 > ReductionMulFP16 256 thrpt 9 1.22 1.22 > ReductionMulFP16 512 thrpt 9 1.21 1.23 > ReductionMulFP16 1024 thrpt 9 1.21 1.22 > ReductionMulFP16 2048 thrpt 9 1.20 1.22 > > > On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ... I see now the flags are not triviall: -XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:VerifyConstraintCasts=1 -XX:+StressLoopPeeling a lot of stress file. It's likely that many runs might be needed to reproduce. The machine is a VM.Standard.A1.Flex shape, as described in https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm Backtrace at the failure: Current CompileTask: C2:1523 346 % b compiler.vectorization.TestFloat16VectorOperations::vectorAddReductionFloat16 @ 4 (39 bytes) Stack: [0x0000ffff84799000,0x0000ffff84997000], sp=0x0000ffff849920d0, free space=2020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x7da724] C2_MacroAssembler::neon_reduce_add_fp16(FloatRegister, FloatRegister, FloatRegister, unsigned int, FloatRegister)+0x2b4 (c2_MacroAssembler_aarch64.cpp:1930) V [libjvm.so+0x154492c] PhaseOutput::scratch_emit_size(Node const*)+0x2ec (output.cpp:3171) V [libjvm.so+0x153d4a4] PhaseOutput::shorten_branches(unsigned int*)+0x2e4 (output.cpp:528) V [libjvm.so+0x154dcdc] PhaseOutput::Output()+0x95c (output.cpp:328) V [libjvm.so+0x9be070] Compile::Code_Gen()+0x7f0 (compile.cpp:3127) V [libjvm.so+0x9c21c0] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1774 (compile.cpp:894) V [libjvm.so+0x7eec64] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x2e0 (c2compiler.cpp:147) V [libjvm.so+0x9d0f8c] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb08 (compileBroker.cpp:2345) V [libjvm.so+0x9d1eb8] CompileBroker::compiler_thread_loop()+0x638 (compileBroker.cpp:1989) V [libjvm.so+0xed25a8] JavaThread::thread_main_inner()+0x108 (javaThread.cpp:775) V [libjvm.so+0x18466dc] Thread::call_run()+0xac (thread.cpp:243) V [libjvm.so+0x152349c] thread_native_entry(Thread*)+0x12c (os_linux.cpp:895) C [libc.so.6+0x80b50] start_thread+0x300 I've attached the replay file in the JBS issue, if it can help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3361203842 From bkilambi at openjdk.org Thu Oct 2 13:39:50 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 2 Oct 2025 13:39:50 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <98jWF_NhAAB1WHHsotReB6SYIVSRIWNO0rmhxnNMJM8=.f21f3406-f3b3-4ce5-b009-6e50e2ebe1f1@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <98jWF_NhAAB1WHHsotReB6SYIVSRIWNO0rmhxnNMJM8=.f21f3406-f3b3-4ce5-b009-6e50e2ebe1f1@github.com> Message-ID: On Thu, 2 Oct 2025 13:21:32 GMT, Marc Chevalier wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > I see now the flags are not triviall: > > -XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:VerifyConstraintCasts=1 -XX:+StressLoopPeeling > > a lot of stress file. It's likely that many runs might be needed to reproduce. > > The machine is a VM.Standard.A1.Flex shape, as described in > https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm > > Backtrace at the failure: > > Current CompileTask: > C2:1523 346 % b compiler.vectorization.TestFloat16VectorOperations::vectorAddReductionFloat16 @ 4 (39 bytes) > > Stack: [0x0000ffff84799000,0x0000ffff84997000], sp=0x0000ffff849920d0, free space=2020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x7da724] C2_MacroAssembler::neon_reduce_add_fp16(FloatRegister, FloatRegister, FloatRegister, unsigned int, FloatRegister)+0x2b4 (c2_MacroAssembler_aarch64.cpp:1930) > V [libjvm.so+0x154492c] PhaseOutput::scratch_emit_size(Node const*)+0x2ec (output.cpp:3171) > V [libjvm.so+0x153d4a4] PhaseOutput::shorten_branches(unsigned int*)+0x2e4 (output.cpp:528) > V [libjvm.so+0x154dcdc] PhaseOutput::Output()+0x95c (output.cpp:328) > V [libjvm.so+0x9be070] Compile::Code_Gen()+0x7f0 (compile.cpp:3127) > V [libjvm.so+0x9c21c0] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1774 (compile.cpp:894) > V [libjvm.so+0x7eec64] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x2e0 (c2compiler.cpp:147) > V [libjvm.so+0x9d0f8c] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb08 (compileBroker.cpp:2345) > V [libjvm.so+0x9d1eb8] CompileBroker::compiler_thread_loop()+0x638 (compileBroker.cpp:1989) > V [libjvm.so+0xed25a8] JavaThread::thread_main_inner()+0x108 (javaThread.cpp:775) > V [libjvm.so+0x18466dc] Thread::call_run()+0xac (thread.cpp:243) > V [libjvm.so+0x152349c] thread_native_entry(Thread*)+0x12c (os_linux.cpp:895) > C [libc.so.6+0x80b50] start_thread+0x300 > > > I've attached the replay file in the JBS issue, if it can help. @marc-chevalier Thanks! I have now been able to reproduce it using the flags you shared. Will update my patch soon with a fix for this along with addressing other review comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3361263768 From kvn at openjdk.org Thu Oct 2 15:25:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 Oct 2025 15:25:08 GMT Subject: RFR: 8369033: Remove dead code in training data In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 04:13:54 GMT, Igor Veresov wrote: > Remove dead code Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27600#pullrequestreview-3295176182 From iveresov at openjdk.org Thu Oct 2 15:42:05 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 2 Oct 2025 15:42:05 GMT Subject: RFR: 8369033: Remove dead code in training data In-Reply-To: References: Message-ID: <88jaTJQKnv5mzjD03_BG2qvFCvX_6FTzB8GjMJB6-ao=.4ae0ff33-aa46-4469-8172-258c5c607f9e@github.com> On Thu, 2 Oct 2025 04:13:54 GMT, Igor Veresov wrote: > Remove dead code Thanks guys! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27600#issuecomment-3361908025 From iveresov at openjdk.org Thu Oct 2 15:42:05 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 2 Oct 2025 15:42:05 GMT Subject: Integrated: 8369033: Remove dead code in training data In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 04:13:54 GMT, Igor Veresov wrote: > Remove dead code This pull request has now been integrated. Changeset: 1a03a1fb Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/1a03a1fbb1c7a83469128106341591c59428437a Stats: 23 lines in 2 files changed: 0 ins; 14 del; 9 mod 8369033: Remove dead code in training data Reviewed-by: rcastanedalo, kvn ------------- PR: https://git.openjdk.org/jdk/pull/27600 From kvn at openjdk.org Thu Oct 2 16:21:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 Oct 2025 16:21:54 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v47] In-Reply-To: References: Message-ID: On Tue, 9 Sep 2025 23:04:08 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix race when not installed nmethod is deoptimized Update looks good. I submitted new testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3362052593 From dlong at openjdk.org Thu Oct 2 20:36:47 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 2 Oct 2025 20:36:47 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee [v4] In-Reply-To: References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> Message-ID: On Wed, 3 Sep 2025 06:50:26 GMT, Damon Fenacci wrote: >> # Issue >> The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one. >> In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan. >> >> # Cause >> The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading. >> >> For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method. >> What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`. >> >> More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same. >> >> # Fix >> >> This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. This can possibly happen in other situations (e.g JVMTI class redefinition). So, instead of modifying the assert (to check for invalid dependencies) we avoid re-setting the callee method ... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8355354: add stress comment src/hotspot/share/opto/compile.cpp line 2119: > 2117: C->igvn_worklist()->push(cg->call_node()); > 2118: should_stress = true; > 2119: break; Don't we want to process the rest of the _late_inlines list? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26441#discussion_r2399998805 From kxu at openjdk.org Thu Oct 2 22:28:29 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 2 Oct 2025 22:28:29 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v20] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: refactor Multiplication into a class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/ce23d393..57c19bc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=18-19 Stats: 67 lines in 2 files changed: 29 ins; 3 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From kxu at openjdk.org Thu Oct 2 22:28:39 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 2 Oct 2025 22:28:39 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v19] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 08:03:23 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 69 commits: >> >> - Merge remote-tracking branch 'origin/master' into arithmetic-canonicalization >> - update naming and comments >> - Merge branch 'openjdk:master' into arithmetic-canonicalization >> - Merge remote-tracking branch 'origin/master' into arithmetic-canonicalization >> - Allow swapping LHS/RHS in case not matched >> - Merge branch 'refs/heads/master' into arithmetic-canonicalization >> - improve comment readability and struct helper functions >> - remove asserts, add more documentation >> - fix typo: lhs->rhs >> - update comments >> - ... and 59 more: https://git.openjdk.org/jdk/compare/0366d882...ce23d393 > > src/hotspot/share/opto/addnode.cpp line 555: > >> 553: >> 554: // Pattern (1) >> 555: if (lhs.valid && rhs.valid && lhs.variable == rhs.variable) { > > Would it make sense to add a method to `Multiplication`, let's say `add`. The 3 `if`s here would then be replaced by: > > Multiplication res = lhs.add(rhs); > if (res.valid()) { > return res; > } > return find_simple_addition_pattern(n, bt); > > and the logic from the 3 ifs would be moved to `Multiplication::add`. > What do you think? I added `add(Mulitplication)` and moved `Pattern (1)` to it. However, for pattern `(2)` and `(3)`, one side is not a `Multiplcation` but a simple node (`(a << CON) + a` and `a + (a << CON)`). Therefore I left the two other `if`s here. > src/hotspot/share/opto/addnode.hpp line 46: > >> 44: virtual uint hash() const; >> 45: >> 46: struct Multiplication { > > Is there a benefit to this being a `struct` instead of a `class`? No, I think not. Earlier iterations were much simpler so I used `struct`. I've updated the code to be `class` > src/hotspot/share/opto/addnode.hpp line 60: > >> 58: }; >> 59: >> 60: static Multiplication find_collapsible_addition_patterns(const Node* a, const Node* pattern, BasicType bt); > > Shouldn't these methods be member methods of `Multiplication` Move to static members of `Multiplication` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2400258106 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2400250708 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2400253138 From kvn at openjdk.org Thu Oct 2 23:46:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 Oct 2025 23:46:07 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v47] In-Reply-To: References: Message-ID: On Tue, 9 Sep 2025 23:04:08 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix race when not installed nmethod is deoptimized My testing for v46 passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23573#pullrequestreview-3296876592 From qamai at openjdk.org Fri Oct 3 06:13:18 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 3 Oct 2025 06:13:18 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations Message-ID: Hi, This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences Changes: https://git.openjdk.org/jdk/pull/27618/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8367341 Stats: 920 lines in 8 files changed: 591 ins; 309 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/27618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27618/head:pull/27618 PR: https://git.openjdk.org/jdk/pull/27618 From rehn at openjdk.org Fri Oct 3 06:48:53 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 3 Oct 2025 06:48:53 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v28] In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 08:37:06 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > - minor updates requested by reviewer Not a detailed review, sorry about that and sorry for the delay. Thank you! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17413#pullrequestreview-3297641875 From thartmann at openjdk.org Fri Oct 3 08:41:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 3 Oct 2025 08:41:54 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v11] In-Reply-To: References: Message-ID: On Fri, 19 Sep 2025 20:44:54 GMT, Jatin Bhateja wrote: >> This patch optimizes PopCount value transforms using KnownBits information. >> Following are the results of the micro-benchmark included with the patch >> >> >> >> System: 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s >> >> Withopt: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update countbitsnode.cpp I submitted testing for this and will report back once it finished. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3364792311 From duke at openjdk.org Fri Oct 3 09:35:56 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 3 Oct 2025 09:35:56 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v28] In-Reply-To: References: Message-ID: On Thu, 25 Sep 2025 11:27:09 GMT, Robbin Ehn wrote: >> kindly reminder ... > >> kindly reminder ... > > Sorry, I have been very busy and is, thanks for the reminder! Thanks a lot, @robehn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3364964924 From duke at openjdk.org Fri Oct 3 09:39:56 2025 From: duke at openjdk.org (duke) Date: Fri, 3 Oct 2025 09:39:56 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v28] In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 08:37:06 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > - minor updates requested by reviewer @ygaevsky Your change (at version 38ae6629e00813da64e61c7625dcee99c759975e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3364978577 From duke at openjdk.org Fri Oct 3 09:48:06 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 3 Oct 2025 09:48:06 GMT Subject: Integrated: 8322174: RISC-V: C2 VectorizedHashCode RVV Version In-Reply-To: References: Message-ID: On Sat, 13 Jan 2024 09:21:37 GMT, Yuri Gaevsky wrote: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. This pull request has now been integrated. Changeset: 134b63f0 Author: Yuri Gaevsky Committer: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/134b63f0e8c4093f7ad0a528d6996898ab881d5c Stats: 187 lines in 6 files changed: 169 ins; 2 del; 16 mod 8322174: RISC-V: C2 VectorizedHashCode RVV Version Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Fri Oct 3 10:31:50 2025 From: duke at openjdk.org (duke) Date: Fri, 3 Oct 2025 10:31:50 GMT Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop [v12] In-Reply-To: References: Message-ID: <5pWIIz-jigUQEgXosEWidGn3LT41P-0Ga-jvZkaCgPE=.327ddb41-9c9a-4b3c-aec9-eabe23308d7b@github.com> On Wed, 1 Oct 2025 09:02:56 GMT, Beno?t Maillard wrote: >> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`. >> >> ### Context >> >> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored: >> >> >> static public void test() { >> x = 0; >> for (int i = 0; i < 20000; i++) { >> x += i; >> } >> x = 0; >> } >> >> >> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late. >> >> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues). >> >> ### Detailed Analysis >> >> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`. When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later. >> >> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect. >> >> This is wh... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Use PhaseIdealLoop::is_member @benoitmaillard Your change (at version 73ee95460db7ca9c34c40bf734eb2b5ea47601fd) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27225#issuecomment-3365186372 From bmaillard at openjdk.org Fri Oct 3 10:43:59 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 3 Oct 2025 10:43:59 GMT Subject: Integrated: 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop In-Reply-To: References: Message-ID: On Thu, 11 Sep 2025 13:05:21 GMT, Beno?t Maillard wrote: > This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`. > > ### Context > > The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored: > > > static public void test() { > x = 0; > for (int i = 0; i < 20000; i++) { > x += i; > } > x = 0; > } > > > After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late. > > This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues). > > ### Detailed Analysis > > In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`. When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later. > > This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect. > > This is what the IR looks like after the creation of the post lo... This pull request has now been integrated. Changeset: 72319167 Author: Beno?t Maillard Committer: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/72319167543a28295276f11178c17bef6680c32f Stats: 183 lines in 3 files changed: 183 ins; 0 del; 0 mod 8364757: Missing Store nodes caused by bad wiring in PhaseIdealLoop::insert_post_loop Reviewed-by: mhaessig, roland ------------- PR: https://git.openjdk.org/jdk/pull/27225 From qamai at openjdk.org Fri Oct 3 12:15:48 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 3 Oct 2025 12:15:48 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v3] In-Reply-To: <7rNGFuFTMSG8xdoIFjIncfu0Ybq2nocT-mXzO6r4wyo=.7df27c0b-b831-4506-a8c7-511393a3ddf3@github.com> References: <52RpYM-r-1EZcYjbaNllAEPHQP1nYhQcs-GfydIzP08=.0bfb8185-78a7-4dfb-9700-f4a36a1d0e99@github.com> <7rNGFuFTMSG8xdoIFjIncfu0Ybq2nocT-mXzO6r4wyo=.7df27c0b-b831-4506-a8c7-511393a3ddf3@github.com> Message-ID: On Sat, 20 Sep 2025 19:47:55 GMT, Jatin Bhateja wrote: >> I can't approve this approach. I think blindly biasing the color of an operation to that of its input is too optimistic and will lead to numerous false-positive cases. It is better to have a more fine-grained selection using the script in the ad file. For example: >> >> instruct addI_rReg_ndd(rRegI dst, rRegI src1, rRegI src2, rFlagsReg cr) >> %{ >> predicate(UseAPX); >> match(Set dst (AddI src1 src2)); >> effect(KILL cr); >> bias(src1); >> flag(PD::Flag_sets_overflow_flag, PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_carry_flag, PD::Flag_sets_parity_flag); >> >> format %{ "eaddl $dst, $src1, $src2\t# int ndd" %} >> ins_encode %{ >> __ eaddl($dst$$Register, $src1$$Register, $src2$$Register, false); >> %} >> ins_pipe(ialu_reg_reg); >> %} > >> I can't approve this approach. I think blindly biasing the color of an operation to that of its input is too optimistic and will lead to numerous false-positive cases. It is better to have a more fine-grained selection using the script in the ad file. For example: >> >> ``` >> instruct addI_rReg_ndd(rRegI dst, rRegI src1, rRegI src2, rFlagsReg cr) >> %{ >> predicate(UseAPX); >> match(Set dst (AddI src1 src2)); >> effect(KILL cr); >> bias(src1); >> flag(PD::Flag_sets_overflow_flag, PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_carry_flag, PD::Flag_sets_parity_flag); >> >> format %{ "eaddl $dst, $src1, $src2\t# int ndd" %} >> ins_encode %{ >> __ eaddl($dst$$Register, $src1$$Register, $src2$$Register, false); >> %} >> ins_pipe(ialu_reg_reg); >> %} >> ``` > > Solution takes into consideration the live range overlaps, biasing is only enforced if source live range ends at its user instruciton, while picking the color we don't follow first color selection but give preference to the bias. Second operand bias is only enabled for commutative operations. Biaising is simply an allocation time hint to allocator used while color selection, and does not modify the infererence graph of LRG. Our assembler now supports EEVEX to REX/REX2 demotion if dst matches to either first or second source operand for commutative operations. So we just don't intent to bias towards the first but second source also. Also we dont bias destination if it has a bounded live range. @jatin-bhateja The issue I see here is that you try biasing for all kinds of instructions, not just the NDD ones. As a result, numerous nodes are uselessly biased and can lead to regression. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3365455818 From qamai at openjdk.org Fri Oct 3 12:17:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 3 Oct 2025 12:17:51 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v11] In-Reply-To: References: Message-ID: On Fri, 19 Sep 2025 20:44:54 GMT, Jatin Bhateja wrote: >> This patch optimizes PopCount value transforms using KnownBits information. >> Following are the results of the micro-benchmark included with the patch >> >> >> >> System: 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s >> >> Withopt: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update countbitsnode.cpp src/hotspot/share/opto/countbitsnode.cpp line 145: > 143: } > 144: KnownBits bits = t->isa_int()->_bits; > 145: return TypeInt::make(population_count(bits._ones), population_count(~bits._zeros), Type::WidenMax); The `widen` of the output should be the same as the `widen` of the input, not `WidenMax` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2401703583 From qamai at openjdk.org Fri Oct 3 12:27:04 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 3 Oct 2025 12:27:04 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v5] In-Reply-To: References: Message-ID: <9K6zzpfJIgl6wBeebPwwhi5i1SMEUWHeNQT_tuECc8c=.479891dc-4794-4e81-b7aa-5668a5c6e36a@github.com> > Hi, > > The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into misplacedcastll - fix comment - fix comment - fix - fix issues - misplaced CastLL ------------- Changes: https://git.openjdk.org/jdk/pull/25284/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=04 Stats: 54 lines in 5 files changed: 8 ins; 14 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/25284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25284/head:pull/25284 PR: https://git.openjdk.org/jdk/pull/25284 From qamai at openjdk.org Fri Oct 3 12:27:06 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 3 Oct 2025 12:27:06 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v4] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 07:06:41 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - fix comment > - fix comment Can I have this PR reviewed, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25284#issuecomment-3365484939 From dfenacci at openjdk.org Fri Oct 3 13:26:03 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 3 Oct 2025 13:26:03 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee [v5] In-Reply-To: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> Message-ID: <-FZV3k-r1F6LSZrTEEpgLadLjzDZ1s2niWJbrKQy20k=.f3d4af01-6b8f-47d1-8d24-0b36e0f35d58@github.com> > # Issue > The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one. > In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan. > > # Cause > The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading. > > For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method. > What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`. > > More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same. > > # Fix > > This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. This can possibly happen in other situations (e.g JVMTI class redefinition). So, instead of modifying the assert (to check for invalid dependencies) we avoid re-setting the callee method if it is already defined. > > # T... Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8355354: continue with other inlines after stress triggers repeated late inlining ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26441/files - new: https://git.openjdk.org/jdk/pull/26441/files/bf92e244..62afa84a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26441&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26441&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26441.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26441/head:pull/26441 PR: https://git.openjdk.org/jdk/pull/26441 From dfenacci at openjdk.org Fri Oct 3 13:26:07 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 3 Oct 2025 13:26:07 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee [v4] In-Reply-To: References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> Message-ID: On Thu, 2 Oct 2025 20:34:03 GMT, Dean Long wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8355354: add stress comment > > src/hotspot/share/opto/compile.cpp line 2119: > >> 2117: C->igvn_worklist()->push(cg->call_node()); >> 2118: should_stress = true; >> 2119: break; > > Don't we want to process the rest of the _late_inlines list? Good idea! That removes the necessity for a `should_stress` flag as well. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26441#discussion_r2401895009 From qamai at openjdk.org Fri Oct 3 16:05:52 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 3 Oct 2025 16:05:52 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v6] In-Reply-To: References: Message-ID: > Hi, > > The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix test options ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25284/files - new: https://git.openjdk.org/jdk/pull/25284/files/b7906ca4..dad2df7d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25284/head:pull/25284 PR: https://git.openjdk.org/jdk/pull/25284 From duke at openjdk.org Fri Oct 3 19:52:49 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 3 Oct 2025 19:52:49 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [x] Linux x64 fastdebug tier 1/2/3/4 > - [x] Linux aarch64 fastdebug tier 1/2/3/4 Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Fix race when not installed nmethod is deoptimized - Fix NMethodRelocationTest.java logging race - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Refactor JVMTI test - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Lock nmethod::relocate behind experimental flag - Use CompiledICLocker instead of CompiledIC_lock - ... and 104 more: https://git.openjdk.org/jdk/compare/012e079d...104661c6 ------------- Changes: https://git.openjdk.org/jdk/pull/23573/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=47 Stats: 1593 lines in 26 files changed: 1529 ins; 2 del; 62 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From dlong at openjdk.org Fri Oct 3 20:47:49 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 3 Oct 2025 20:47:49 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee [v5] In-Reply-To: <-FZV3k-r1F6LSZrTEEpgLadLjzDZ1s2niWJbrKQy20k=.f3d4af01-6b8f-47d1-8d24-0b36e0f35d58@github.com> References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> <-FZV3k-r1F6LSZrTEEpgLadLjzDZ1s2niWJbrKQy20k=.f3d4af01-6b8f-47d1-8d24-0b36e0f35d58@github.com> Message-ID: On Fri, 3 Oct 2025 13:26:03 GMT, Damon Fenacci wrote: >> # Issue >> The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one. >> In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan. >> >> # Cause >> The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading. >> >> For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method. >> What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`. >> >> More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same. >> >> # Fix >> >> This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. This can possibly happen in other situations (e.g JVMTI class redefinition). So, instead of modifying the assert (to check for invalid dependencies) we avoid re-setting the callee method ... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8355354: continue with other inlines after stress triggers repeated late inlining Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26441#pullrequestreview-3300895869 From dcubed at openjdk.org Fri Oct 3 22:20:08 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 3 Oct 2025 22:20:08 GMT Subject: Integrated: 8369138: New test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java fails In-Reply-To: <7CRh-ioETkcKO1GkBy8K7fJghm0f7fjlPFt__ZRoSxI=.c2f1dea3-b179-408d-8353-d506f556b0fc@github.com> References: <7CRh-ioETkcKO1GkBy8K7fJghm0f7fjlPFt__ZRoSxI=.c2f1dea3-b179-408d-8353-d506f556b0fc@github.com> Message-ID: On Fri, 3 Oct 2025 22:10:07 GMT, Daniel D. Daugherty wrote: > A trivial fix to allow new test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java > to execute with release bits. I tested this fix with a local release bits run on my MBP14. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27630#issuecomment-3367377177 From kvn at openjdk.org Fri Oct 3 22:20:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 Oct 2025 22:20:08 GMT Subject: Integrated: 8369138: New test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java fails In-Reply-To: <7CRh-ioETkcKO1GkBy8K7fJghm0f7fjlPFt__ZRoSxI=.c2f1dea3-b179-408d-8353-d506f556b0fc@github.com> References: <7CRh-ioETkcKO1GkBy8K7fJghm0f7fjlPFt__ZRoSxI=.c2f1dea3-b179-408d-8353-d506f556b0fc@github.com> Message-ID: <-9uQtIDwqmzu91j5YIq8VNCIPKukIvWBj_vkpaSEBSI=.9badf4fd-0b0a-4f1e-98f1-74854e5044a6@github.com> On Fri, 3 Oct 2025 22:10:07 GMT, Daniel D. Daugherty wrote: > A trivial fix to allow new test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java > to execute with release bits. Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27630#pullrequestreview-3301152546 From dcubed at openjdk.org Fri Oct 3 22:20:10 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 3 Oct 2025 22:20:10 GMT Subject: Integrated: 8369138: New test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java fails In-Reply-To: <7CRh-ioETkcKO1GkBy8K7fJghm0f7fjlPFt__ZRoSxI=.c2f1dea3-b179-408d-8353-d506f556b0fc@github.com> References: <7CRh-ioETkcKO1GkBy8K7fJghm0f7fjlPFt__ZRoSxI=.c2f1dea3-b179-408d-8353-d506f556b0fc@github.com> Message-ID: <8jFcnpEQJ0A-ewTXxSDdsKHPTc_sW4rDxuhZQTCShKk=.9c7d9937-045a-47a7-adfd-bbad9b899eaa@github.com> On Fri, 3 Oct 2025 22:10:07 GMT, Daniel D. Daugherty wrote: > A trivial fix to allow new test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java > to execute with release bits. This pull request has now been integrated. Changeset: e6868c62 Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/e6868c624851d5c6bd182e45ba908cb06b731e8c Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8369138: New test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java fails Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/27630 From dcubed at openjdk.org Fri Oct 3 22:20:09 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 3 Oct 2025 22:20:09 GMT Subject: Integrated: 8369138: New test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java fails In-Reply-To: <-9uQtIDwqmzu91j5YIq8VNCIPKukIvWBj_vkpaSEBSI=.9badf4fd-0b0a-4f1e-98f1-74854e5044a6@github.com> References: <7CRh-ioETkcKO1GkBy8K7fJghm0f7fjlPFt__ZRoSxI=.c2f1dea3-b179-408d-8353-d506f556b0fc@github.com> <-9uQtIDwqmzu91j5YIq8VNCIPKukIvWBj_vkpaSEBSI=.9badf4fd-0b0a-4f1e-98f1-74854e5044a6@github.com> Message-ID: <3naIU4lB6Vlr8RjCZKKAmHU4QfQJTjdHbX5rtOXy3L0=.d0b7c7fa-173e-43fe-9f7e-64da86e191fa@github.com> On Fri, 3 Oct 2025 22:14:51 GMT, Vladimir Kozlov wrote: >> A trivial fix to allow new test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java >> to execute with release bits. > > Good @vnkozlov - Thanks for the fast review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27630#issuecomment-3367377929 From dcubed at openjdk.org Fri Oct 3 22:20:07 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 3 Oct 2025 22:20:07 GMT Subject: Integrated: 8369138: New test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java fails Message-ID: <7CRh-ioETkcKO1GkBy8K7fJghm0f7fjlPFt__ZRoSxI=.c2f1dea3-b179-408d-8353-d506f556b0fc@github.com> A trivial fix to allow new test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java to execute with release bits. ------------- Commit messages: - 8369138: new test compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java fails Changes: https://git.openjdk.org/jdk/pull/27630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27630&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369138 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27630/head:pull/27630 PR: https://git.openjdk.org/jdk/pull/27630 From duke at openjdk.org Fri Oct 3 22:42:03 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 3 Oct 2025 22:42:03 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v47] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 16:19:12 GMT, Vladimir Kozlov wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix race when not installed nmethod is deoptimized > > Update looks good. I submitted new testing. @vnkozlov There was a minor merge conflict due to [JDK-8366461](https://bugs.openjdk.org/browse/JDK-8366461) if you could re-review (hopefully for the last time) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3367434640 From epeter at openjdk.org Fri Oct 3 22:50:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 3 Oct 2025 22:50:38 GMT Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole loop instead of just the basic block [v5] In-Reply-To: References: Message-ID: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > ------------------------------ > > **Goals** > - VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop) > - Remove `_nodes` from the vector vtnodes. > > **Details** > - Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`. > - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states. > - Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi). > - Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation). > - `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes. > > I also made a lot of annotations in the code below, for easier review. > > **Suggested order for review** > - Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly. > - Old and new code for `VLoopMemorySlices` -> we now also track load-only slices. > - `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop` > - `VTransformApplyState`: how it now tracks the memory state. > - `VTransformVectorNode` -> removal of `_nodes` (Big Win!) > - Then look at all the other details. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - Merge branch 'master' into JDK-8367389-vtn-bb-to-loop-refactor - Update src/hotspot/share/opto/vtransform.cpp Co-authored-by: Galder Zamarre?o - Update src/hotspot/share/opto/vectorization.cpp Co-authored-by: Manuel H?ssig - for Manuel - fix documentation - mem_ref -> vpointer - wip rm nodes - control dependency - phi cleanup - apply_backedge - ... and 10 more: https://git.openjdk.org/jdk/compare/98ccb9af...4fad8b7a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27208/files - new: https://git.openjdk.org/jdk/pull/27208/files/99fd1c99..4fad8b7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27208&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27208&range=03-04 Stats: 175446 lines in 2262 files changed: 139214 ins; 21901 del; 14331 mod Patch: https://git.openjdk.org/jdk/pull/27208.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27208/head:pull/27208 PR: https://git.openjdk.org/jdk/pull/27208 From epeter at openjdk.org Fri Oct 3 22:50:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 3 Oct 2025 22:50:39 GMT Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole loop instead of just the basic block [v5] In-Reply-To: References: Message-ID: <67DqsCAIj_J_mFQ8Lhy_jkQjy4C1BPzZEK235d1GMk4=.73c31198-020e-461b-9857-b18982302634@github.com> On Tue, 30 Sep 2025 14:40:35 GMT, Roland Westrelin wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8367389-vtn-bb-to-loop-refactor >> - Update src/hotspot/share/opto/vtransform.cpp >> >> Co-authored-by: Galder Zamarre?o >> - Update src/hotspot/share/opto/vectorization.cpp >> >> Co-authored-by: Manuel H?ssig >> - for Manuel >> - fix documentation >> - mem_ref -> vpointer >> - wip rm nodes >> - control dependency >> - phi cleanup >> - apply_backedge >> - ... and 10 more: https://git.openjdk.org/jdk/compare/98ccb9af...4fad8b7a > > Looks reasonable to me. @rwestrel @mhaessig Thanks for the approvals! I just merged with master and will run testing again over the weekend before integration :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27208#issuecomment-3367448880 From kvn at openjdk.org Fri Oct 3 22:55:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 Oct 2025 22:55:07 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 19:52:49 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix race when not installed nmethod is deoptimized > - Fix NMethodRelocationTest.java logging race > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Refactor JVMTI test > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - ... and 104 more: https://git.openjdk.org/jdk/compare/012e079d...104661c6 Re-approved. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23573#pullrequestreview-3301222369 From duke at openjdk.org Fri Oct 3 23:15:05 2025 From: duke at openjdk.org (duke) Date: Fri, 3 Oct 2025 23:15:05 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 19:52:49 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix race when not installed nmethod is deoptimized > - Fix NMethodRelocationTest.java logging race > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Refactor JVMTI test > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - ... and 104 more: https://git.openjdk.org/jdk/compare/012e079d...104661c6 @chadrako Your change (at version 104661c69ce67a2896c394cb0eb8677bb6a43f5e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3367496276 From jbhateja at openjdk.org Sat Oct 4 04:12:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 4 Oct 2025 04:12:50 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v11] In-Reply-To: References: Message-ID: <3RQawTRzXt6NAE5TM7mjgw0JjdqSWLnx_NE_flDRgDU=.b0d4b3e0-1a80-4309-90c4-6ca4e4946304@github.com> On Fri, 3 Oct 2025 12:14:46 GMT, Quan Anh Mai wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update countbitsnode.cpp > > src/hotspot/share/opto/countbitsnode.cpp line 145: > >> 143: } >> 144: KnownBits bits = t->isa_int()->_bits; >> 145: return TypeInt::make(population_count(bits._ones), population_count(~bits._zeros), Type::WidenMax); > > The `widen` of the output should be the same as the `widen` of the input, not `WidenMax` here. Thanks @merykitty, widen is mainly used for optimistic data flow analysis pass like CCP where type analysis begins with TOP and progressively grows the value range till convergence / fixed point. it's good to preserve the widen of input to delay eager convergence. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2403728728 From jbhateja at openjdk.org Sat Oct 4 06:01:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 4 Oct 2025 06:01:07 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: References: Message-ID: > This patch optimizes PopCount value transforms using KnownBits information. > Following are the results of the micro-benchmark included with the patch > > > > System: 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark Mode Cnt Score Error Units > PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s > PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s > > Withopt: > Benchmark Mode Cnt Score Error Units > PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s > PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27075/files - new: https://git.openjdk.org/jdk/pull/27075/files/e206ccc3..85b10e88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=10-11 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27075.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075 PR: https://git.openjdk.org/jdk/pull/27075 From vlivanov at openjdk.org Sat Oct 4 18:20:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 4 Oct 2025 18:20:49 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee [v5] In-Reply-To: <-FZV3k-r1F6LSZrTEEpgLadLjzDZ1s2niWJbrKQy20k=.f3d4af01-6b8f-47d1-8d24-0b36e0f35d58@github.com> References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> <-FZV3k-r1F6LSZrTEEpgLadLjzDZ1s2niWJbrKQy20k=.f3d4af01-6b8f-47d1-8d24-0b36e0f35d58@github.com> Message-ID: On Fri, 3 Oct 2025 13:26:03 GMT, Damon Fenacci wrote: >> # Issue >> The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one. >> In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan. >> >> # Cause >> The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading. >> >> For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method. >> What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`. >> >> More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same. >> >> # Fix >> >> This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. This can possibly happen in other situations (e.g JVMTI class redefinition). So, instead of modifying the assert (to check for invalid dependencies) we avoid re-setting the callee method ... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8355354: continue with other inlines after stress triggers repeated late inlining Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26441#pullrequestreview-3302037103 From duke at openjdk.org Sat Oct 4 21:20:12 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Sat, 4 Oct 2025 21:20:12 GMT Subject: Integrated: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 22:05:13 GMT, Chad Rakoczy wrote: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [x] Linux x64 fastdebug tier 1/2/3/4 > - [x] Linux aarch64 fastdebug tier 1/2/3/4 This pull request has now been integrated. Changeset: f740cd2a Author: Chad Rakoczy Committer: Evgeny Astigeevich URL: https://git.openjdk.org/jdk/commit/f740cd2aad43a008da1ed1ff15ebe2c790f893a0 Stats: 1593 lines in 26 files changed: 1529 ins; 2 del; 62 mod 8316694: Implement relocation of nmethod within CodeCache Reviewed-by: kvn, eosterlund, never, eastigeevich, bulasevich ------------- PR: https://git.openjdk.org/jdk/pull/23573 From kvn at openjdk.org Sun Oct 5 02:12:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 02:12:20 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v47] In-Reply-To: References: Message-ID: <_x9afeouhNZwbycpEDDqUtH5xTVyjusx0c2dUJAa2hE=.e75c5e51-be1e-4f4d-8b23-fefe1dfb53e2@github.com> On Fri, 3 Oct 2025 22:38:42 GMT, Chad Rakoczy wrote: >> Update looks good. I submitted new testing. > > @vnkozlov There was a minor merge conflict due to [JDK-8366461](https://bugs.openjdk.org/browse/JDK-8366461) if you could re-review (hopefully for the last time) @chadrako we hit failures in tier 3 testing. I will file bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3368675345 From kvn at openjdk.org Sun Oct 5 02:37:15 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 02:37:15 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 19:52:49 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix race when not installed nmethod is deoptimized > - Fix NMethodRelocationTest.java logging race > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Refactor JVMTI test > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - ... and 104 more: https://git.openjdk.org/jdk/compare/012e079d...104661c6 https://bugs.openjdk.org/browse/JDK-8369147 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3368693044 From kvn at openjdk.org Sun Oct 5 02:44:23 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 02:44:23 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 19:52:49 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix race when not installed nmethod is deoptimized > - Fix NMethodRelocationTest.java logging race > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Refactor JVMTI test > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - ... and 104 more: https://git.openjdk.org/jdk/compare/012e079d...104661c6 https://bugs.openjdk.org/browse/JDK-8369148 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3368696021 From kvn at openjdk.org Sun Oct 5 02:55:21 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 02:55:21 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: References: Message-ID: <10P4yR5r5L_dk8r3cb7f2CnI4A-JrcAxVtO1SczoVoU=.404bfe17-e8b2-4a0e-a775-bb8fd695b0e8@github.com> On Fri, 3 Oct 2025 19:52:49 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix race when not installed nmethod is deoptimized > - Fix NMethodRelocationTest.java logging race > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Refactor JVMTI test > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - ... and 104 more: https://git.openjdk.org/jdk/compare/012e079d...104661c6 https://bugs.openjdk.org/browse/JDK-8369149 https://bugs.openjdk.org/browse/JDK-8369150 Looks like when I tested changes I did not include new tests (forgot `git add` for them) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3368700166 PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3368700431 From kvn at openjdk.org Sun Oct 5 03:03:24 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 03:03:24 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 19:52:49 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix race when not installed nmethod is deoptimized > - Fix NMethodRelocationTest.java logging race > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Refactor JVMTI test > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - ... and 104 more: https://git.openjdk.org/jdk/compare/012e079d...104661c6 https://bugs.openjdk.org/browse/JDK-8369151 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3368702861 From kvn at openjdk.org Sun Oct 5 03:36:23 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 03:36:23 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 19:52:49 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix race when not installed nmethod is deoptimized > - Fix NMethodRelocationTest.java logging race > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Refactor JVMTI test > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - ... and 104 more: https://git.openjdk.org/jdk/compare/012e079d...104661c6 I problem listing new tests which fail: https://github.com/openjdk/jdk/pull/27634 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3368716829 From kvn at openjdk.org Sun Oct 5 03:42:28 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 03:42:28 GMT Subject: RFR: 8369152: Problem list new tests from JDK-8316694 Message-ID: Most new tests from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) failed in tier3. Problem list them until they are fixed. ------------- Commit messages: - 8369152: Problem list new tests from JDK-8316694 Changes: https://git.openjdk.org/jdk/pull/27634/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27634&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369152 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27634/head:pull/27634 PR: https://git.openjdk.org/jdk/pull/27634 From dholmes at openjdk.org Sun Oct 5 04:38:47 2025 From: dholmes at openjdk.org (David Holmes) Date: Sun, 5 Oct 2025 04:38:47 GMT Subject: RFR: 8369152: Problem list new tests from JDK-8316694 In-Reply-To: References: Message-ID: <1_leH8nr3ZqrrM1MXOC_9R49WmwjLpwK3Kk0-ujKR4M=.7ec157b7-5cef-45eb-975e-7e09d2a6b299@github.com> On Sun, 5 Oct 2025 03:18:37 GMT, Vladimir Kozlov wrote: > Most new tests from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) failed in tier3. Problem list them until they are fixed. Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27634#pullrequestreview-3302169480 From serb at openjdk.org Sun Oct 5 06:10:45 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Sun, 5 Oct 2025 06:10:45 GMT Subject: RFR: 8369152: Problem list new tests from JDK-8316694 In-Reply-To: References: Message-ID: On Sun, 5 Oct 2025 03:18:37 GMT, Vladimir Kozlov wrote: > Most new tests from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) failed in tier3. Problem list them until they are fixed. Marked as reviewed by serb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27634#pullrequestreview-3302222468 From kvn at openjdk.org Sun Oct 5 06:33:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 06:33:27 GMT Subject: RFR: 8369152: Problem list new tests from JDK-8316694 In-Reply-To: References: Message-ID: <_Khub-XFED3RVUmp97Qp1EoNHWRBQyNU7i3NAYh7Nsc=.3aa49c7e-153d-47ea-b402-5591ee3751c0@github.com> On Sun, 5 Oct 2025 03:18:37 GMT, Vladimir Kozlov wrote: > Most new tests from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) failed in tier3. Problem list them until they are fixed. Unfortunately my testing shows that I have to specify each subtest. I updated changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27634#issuecomment-3368792521 From kvn at openjdk.org Sun Oct 5 06:33:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 06:33:27 GMT Subject: RFR: 8369152: Problem list new tests from JDK-8316694 [v2] In-Reply-To: References: Message-ID: > Most new tests from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) failed in tier3. Problem list them until they are fixed. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Specify each subtest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27634/files - new: https://git.openjdk.org/jdk/pull/27634/files/b679c375..d0a19eb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27634&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27634&range=00-01 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27634/head:pull/27634 PR: https://git.openjdk.org/jdk/pull/27634 From kvn at openjdk.org Sun Oct 5 06:57:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 06:57:46 GMT Subject: RFR: 8369152: Problem list new tests from JDK-8316694 [v2] In-Reply-To: References: Message-ID: On Sun, 5 Oct 2025 06:33:27 GMT, Vladimir Kozlov wrote: >> Most new tests from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) failed in tier3. Problem list them until they are fixed. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Specify each subtest Please, re-approve. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27634#issuecomment-3368804009 From jpai at openjdk.org Sun Oct 5 15:57:47 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Sun, 5 Oct 2025 15:57:47 GMT Subject: RFR: 8369152: Problem list new tests from JDK-8316694 [v2] In-Reply-To: References: Message-ID: On Sun, 5 Oct 2025 06:33:27 GMT, Vladimir Kozlov wrote: >> Most new tests from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) failed in tier3. Problem list them until they are fixed. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Specify each subtest I haven't verified each of the linked bug ids, but this looks OK to me. ------------- Marked as reviewed by jpai (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27634#pullrequestreview-3302444660 From kvn at openjdk.org Sun Oct 5 16:23:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 16:23:55 GMT Subject: RFR: 8369152: Problem list new tests from JDK-8316694 [v2] In-Reply-To: <1_leH8nr3ZqrrM1MXOC_9R49WmwjLpwK3Kk0-ujKR4M=.7ec157b7-5cef-45eb-975e-7e09d2a6b299@github.com> References: <1_leH8nr3ZqrrM1MXOC_9R49WmwjLpwK3Kk0-ujKR4M=.7ec157b7-5cef-45eb-975e-7e09d2a6b299@github.com> Message-ID: On Sun, 5 Oct 2025 04:35:38 GMT, David Holmes wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Specify each subtest > > Marked as reviewed by dholmes (Reviewer). Thank you, @dholmes-ora , @mrserb and @jaikiran for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27634#issuecomment-3369161907 From kvn at openjdk.org Sun Oct 5 16:23:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 5 Oct 2025 16:23:56 GMT Subject: Integrated: 8369152: Problem list new tests from JDK-8316694 In-Reply-To: References: Message-ID: On Sun, 5 Oct 2025 03:18:37 GMT, Vladimir Kozlov wrote: > Most new tests from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) failed in tier3. Problem list them until they are fixed. This pull request has now been integrated. Changeset: 5d9f94e0 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/5d9f94e05e1527745271d0167a418741607619e2 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod 8369152: Problem list new tests from JDK-8316694 Reviewed-by: jpai, dholmes, serb ------------- PR: https://git.openjdk.org/jdk/pull/27634 From dfenacci at openjdk.org Mon Oct 6 06:21:01 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 Oct 2025 06:21:01 GMT Subject: Integrated: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee In-Reply-To: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> Message-ID: On Wed, 23 Jul 2025 11:14:27 GMT, Damon Fenacci wrote: > # Issue > The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one. > In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan. > > # Cause > The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading. > > For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method. > What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`. > > More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same. > > # Fix > > This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. This can possibly happen in other situations (e.g JVMTI class redefinition). So, instead of modifying the assert (to check for invalid dependencies) we avoid re-setting the callee method if it is already defined. > > # T... This pull request has now been integrated. Changeset: 85877e20 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/85877e2022114031ef1ba13c67bf768edb0dfaa7 Stats: 41 lines in 5 files changed: 17 ins; 1 del; 23 mod 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee Reviewed-by: vlivanov, dlong ------------- PR: https://git.openjdk.org/jdk/pull/26441 From dfenacci at openjdk.org Mon Oct 6 06:21:00 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 Oct 2025 06:21:00 GMT Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee == m) failed: repeated inline attempt with different callee [v5] In-Reply-To: References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com> <-FZV3k-r1F6LSZrTEEpgLadLjzDZ1s2niWJbrKQy20k=.f3d4af01-6b8f-47d1-8d24-0b36e0f35d58@github.com> Message-ID: On Sat, 4 Oct 2025 18:18:05 GMT, Vladimir Ivanov wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8355354: continue with other inlines after stress triggers repeated late inlining > > Marked as reviewed by vlivanov (Reviewer). Thanks for your reviews @iwanowww and @dean-long! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26441#issuecomment-3370062301 From chagedorn at openjdk.org Mon Oct 6 07:22:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 Oct 2025 07:22:00 GMT Subject: RFR: 8367899: compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out [v2] In-Reply-To: References: <_zhjDERsbq1mfrqW8IBGsEB3WrMmuVVykDAijO-yOUU=.39b2c9a5-e784-419d-bc07-504750f48eaf@github.com> Message-ID: On Tue, 30 Sep 2025 06:33:28 GMT, SendaoYan wrote: >> Hi all, >> >> After JDK-8260555 change the timeout factor from 4 to 1, make test compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out. >> >> There are 20 10k/20k loop count of loops in this test, this cause test need many CPU cycles to finish. If I reduce the loop count from 10k/20k to 100/200, the test failures descripted in [JDK-8350896](https://bugs.openjdk.org/browse/JDK-8350896) also reproduced when the tested jdk is jdk25u. So it seems that there no need so many loop count for these tests. >> >> Without the proposed change, the driver action finish about 65 sencods on linux-x64, with the proposed change the driver action finish about 1.5 seconds. > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > Remove the loop which do not use the loop variable i or do not use random number Removing the loops looks good! As pointed out by @dafedafe, the IR framework will indeed take care of warming the tests up. test/hotspot/jtreg/compiler/c2/gvn/TestBitCompressValueTransform.java line 78: > 76: > 77: @Run(test = "test1") > 78: public void run1(RunInfo info) { While at it, you can also remove all the unused `RunInfo` parameters. But up to you if you want to squeeze that in here as well - it's only a clean-up that is not that important. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27548#pullrequestreview-3303110818 PR Review Comment: https://git.openjdk.org/jdk/pull/27548#discussion_r2405139254 From chagedorn at openjdk.org Mon Oct 6 07:23:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 Oct 2025 07:23:55 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v5] In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 09:00:38 GMT, Daniel Skantz wrote: >> This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). >> >> In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. >> >> The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. >> >> [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 >> >> [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 >> >> Testing: T1-4. >> >> Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > comment bound; debug prints; bug numbers Thanks for the update, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26685#pullrequestreview-3303146656 From shade at openjdk.org Mon Oct 6 07:36:46 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 Oct 2025 07:36:46 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods In-Reply-To: References: Message-ID: On Fri, 26 Sep 2025 16:12:14 GMT, Martin Doerr wrote: > We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. > > We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. > > I've tested this proposal by the following code on x86_64: > > diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp > index a6b4efbe4f2..d715e69c850 100644 > --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp > +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp > @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { > void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { > prepare_to_jump_from_interpreted(); > > + if (UseNewCode) { > + Label ok; > + movptr(temp, Address(method, Method::from_interpreted_offset())); > + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); > + je(ok); > + movptr(rax, Address(method, Method::from_compiled_offset())); > + movptr(rbx, rax); > + addptr(rbx, 128); > + hlt(); > + bind(ok); > + } > + > if (JvmtiExport::can_post_interpreter_events()) { > Label run_compiled_code; > // JVMTI events, such as single-stepping, are implemented partly by avoiding running > > > The output is (requires hsdis library, otherwise we get a hex dump instead of disassembly): > > RAX=0x00007f3b19000100 is at entry_point+0 in (nmethod*)0x00007f3b19000008 > Compiled method (c1) 2915 1 3 java.lang.Byte::toUnsignedInt (6 bytes) > total in heap [0x00007f3b19000008,0x00007f3b190001f8] = 496 > main code [0x00007f3b19000100,0x00007f3b190001b8] = 184 > stub code [0x00007f3b190001b8,0x00007f3b190001f8] = 64 > mutable data [0x00007f3ab401e0b0,0x00007f3ab401e0e0] = 48 > relocation [0x00007f3ab401e0b0,0x00007f3ab401e0d8] = 40 > metadata [0x00007f3ab401e0d8,0x00007f3ab401e0e0] = 8 > immutable data [0x00007f3ab401dcd0,0x00007f3ab401dd30] = 96 > dependencies [0x00007f3ab401dcd0,0x00007f3ab401dcd8] = 8 > scopes pcs [0x00007f3ab401dcd8,0x00007f3ab401dd18] = 64 > ... Looks like a useful diagnostic tool. At very least we should dump the raw instruction stream around that pc, like we do for `Instructions:` block. Does Hotspot do that already? Relocation trick is cute, but it hinges on assumption that relocations are always pointing at instruction boundary. I looked around and I think while most relocs are that way, there are some relocs that do not follow this rule. For example: // Store Null Pointer instruct zStorePNull(memory mem, immP0 zero, rRegP tmp, rFlagsReg cr) %{ predicate(UseZGC && n->as_Store()->barrier_data() != 0); match(Set mem (StoreP mem zero)); effect(TEMP tmp, KILL cr); ins_cost(125); // XXX format %{ "movq $mem, 0\t# ptr" %} ins_encode %{ z_store_barrier(masm, this, $mem$$Address, noreg, $tmp$$Register, false /* is_atomic */); // Store a colored null - barrier code above does not need to color __ movq($mem$$Address, barrier_Relocation::unpatched); // The relocation cant be fully after the mov, as that is the beginning of a random subsequent // instruction, which violates assumptions made by unrelated code. Hence the end() - 1 __ code_section()->relocate(__ code_section()->end() - 1, barrier_Relocation::spec(), ZBarrierRelocationFormatStoreGoodAfterMov); %} ins_pipe(ialu_mem_reg); %} I am guessing it is still fine to attempt to disassemble in this case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27530#issuecomment-3370291722 From thartmann at openjdk.org Mon Oct 6 07:52:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 6 Oct 2025 07:52:54 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: References: Message-ID: <3hRTOJiGlZXrFqy7m3loXdorkRmyL3zb7hwyrwi8b6w=.0c159d31-bab3-4ed1-94a5-23b33bad457d@github.com> On Sat, 4 Oct 2025 06:01:07 GMT, Jatin Bhateja wrote: >> This patch optimizes PopCount value transforms using KnownBits information. >> Following are the results of the micro-benchmark included with the patch >> >> >> >> System: 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s >> >> Withopt: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Testing all passed. I'll pass the review to someone else. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3370347366 From dfenacci at openjdk.org Mon Oct 6 08:07:53 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 Oct 2025 08:07:53 GMT Subject: RFR: 8367899: compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out [v2] In-Reply-To: References: <_zhjDERsbq1mfrqW8IBGsEB3WrMmuVVykDAijO-yOUU=.39b2c9a5-e784-419d-bc07-504750f48eaf@github.com> Message-ID: On Tue, 30 Sep 2025 06:33:28 GMT, SendaoYan wrote: >> Hi all, >> >> After JDK-8260555 change the timeout factor from 4 to 1, make test compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out. >> >> There are 20 10k/20k loop count of loops in this test, this cause test need many CPU cycles to finish. If I reduce the loop count from 10k/20k to 100/200, the test failures descripted in [JDK-8350896](https://bugs.openjdk.org/browse/JDK-8350896) also reproduced when the tested jdk is jdk25u. So it seems that there no need so many loop count for these tests. >> >> Without the proposed change, the driver action finish about 65 sencods on linux-x64, with the proposed change the driver action finish about 1.5 seconds. > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > Remove the loop which do not use the loop variable i or do not use random number Thanks for the cleanup @sendaoYan. LGTM ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/27548#pullrequestreview-3303282969 From chagedorn at openjdk.org Mon Oct 6 08:08:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 Oct 2025 08:08:53 GMT Subject: RFR: 8368753: IGV: improve CFG view of difference graphs [v2] In-Reply-To: <5-BS7aDey-hd9OBefAYw7dkxM0mgbrcRmOfPObTQ7IE=.40d0ca81-45ae-4d3f-9ebb-ef0965436fd5@github.com> References: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com> <5-BS7aDey-hd9OBefAYw7dkxM0mgbrcRmOfPObTQ7IE=.40d0ca81-45ae-4d3f-9ebb-ef0965436fd5@github.com> Message-ID: On Mon, 29 Sep 2025 08:47:25 GMT, Roberto Casta?eda Lozano wrote: >> This changeset improves the control-flow graph view of difference graphs by: >> >> 1. ensuring that nodes are scheduled locally within each block, and >> 2. hiding internal, artificial blocks containing nodes that remain in the graph even if they are dead, such as the top constant node. >> >> The following screenshot illustrates the effect of scheduling nodes locally: >> >> JDK-8368753 >> >> For example, before this changeset (left) the `Return` node in B9 is scheduled at the beginning of the block. After the changeset (right), this node is scheduled last, as expected. >> >> **Testing:** tier1 and manual testing on a few graphs. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright header Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27520#pullrequestreview-3303286849 From eirbjo at openjdk.org Mon Oct 6 08:09:51 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Mon, 6 Oct 2025 08:09:51 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: References: Message-ID: <9EL8Kg6tW9JVZHrelcP7bLCRHpoEd1l6YH_0eLe8U5Y=.a9db2c93-7a94-46a5-b90e-c104eddf6bc3@github.com> On Sat, 4 Oct 2025 06:01:07 GMT, Jatin Bhateja wrote: >> This patch optimizes PopCount value transforms using KnownBits information. >> Following are the results of the micro-benchmark included with the patch >> >> >> >> System: 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s >> >> Withopt: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Is the `core-libs` label appropriate for this PR? Looks hotspot specific? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3370399830 From chagedorn at openjdk.org Mon Oct 6 08:11:46 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 Oct 2025 08:11:46 GMT Subject: RFR: 8368780: IGV: Upgrade to Netbeans Platform 27 In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 13:57:23 GMT, Ant?n Seoane Ampudia wrote: > This PR upgrades IGV and its dependencies to the newest Netbeans Platform 27, released on August 21, 2025. It also supports running the latest (LTS) JDK 25. > > It has been tested that IGV still behaves as expected after the upgrade. Nice to see IGV supporting the latest LTS, thanks for updating! Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27579#pullrequestreview-3303296238 From hgreule at openjdk.org Mon Oct 6 08:14:50 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 6 Oct 2025 08:14:50 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: <9EL8Kg6tW9JVZHrelcP7bLCRHpoEd1l6YH_0eLe8U5Y=.a9db2c93-7a94-46a5-b90e-c104eddf6bc3@github.com> References: <9EL8Kg6tW9JVZHrelcP7bLCRHpoEd1l6YH_0eLe8U5Y=.a9db2c93-7a94-46a5-b90e-c104eddf6bc3@github.com> Message-ID: <0n629aakXFsODeKAbtRtvDTbaCHEt18Mc9LFlAb-G2o=.8be1175e-cebc-4395-b44e-973df41507cf@github.com> On Mon, 6 Oct 2025 08:07:14 GMT, Eirik Bj?rsn?s wrote: > Is the `core-libs` label appropriate for this PR? Looks hotspot specific? That label was added automatically, closely after https://mail.openjdk.org/pipermail/jdk-dev/2025-September/010486.html. Not sure why, but the change is definitely hotspot specific. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3370417585 From rcastanedalo at openjdk.org Mon Oct 6 08:16:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 Oct 2025 08:16:59 GMT Subject: RFR: 8368753: IGV: improve CFG view of difference graphs [v2] In-Reply-To: <5-BS7aDey-hd9OBefAYw7dkxM0mgbrcRmOfPObTQ7IE=.40d0ca81-45ae-4d3f-9ebb-ef0965436fd5@github.com> References: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com> <5-BS7aDey-hd9OBefAYw7dkxM0mgbrcRmOfPObTQ7IE=.40d0ca81-45ae-4d3f-9ebb-ef0965436fd5@github.com> Message-ID: On Mon, 29 Sep 2025 08:47:25 GMT, Roberto Casta?eda Lozano wrote: >> This changeset improves the control-flow graph view of difference graphs by: >> >> 1. ensuring that nodes are scheduled locally within each block, and >> 2. hiding internal, artificial blocks containing nodes that remain in the graph even if they are dead, such as the top constant node. >> >> The following screenshot illustrates the effect of scheduling nodes locally: >> >> JDK-8368753 >> >> For example, before this changeset (left) the `Return` node in B9 is scheduled at the beginning of the block. After the changeset (right), this node is scheduled last, as expected. >> >> **Testing:** tier1 and manual testing on a few graphs. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright header Thank you Manuel, Christian and Damon for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27520#issuecomment-3370419415 From rcastanedalo at openjdk.org Mon Oct 6 08:17:00 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 Oct 2025 08:17:00 GMT Subject: Integrated: 8368753: IGV: improve CFG view of difference graphs In-Reply-To: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com> References: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com> Message-ID: On Fri, 26 Sep 2025 09:48:57 GMT, Roberto Casta?eda Lozano wrote: > This changeset improves the control-flow graph view of difference graphs by: > > 1. ensuring that nodes are scheduled locally within each block, and > 2. hiding internal, artificial blocks containing nodes that remain in the graph even if they are dead, such as the top constant node. > > The following screenshot illustrates the effect of scheduling nodes locally: > > JDK-8368753 > > For example, before this changeset (left) the `Return` node in B9 is scheduled at the beginning of the block. After the changeset (right), this node is scheduled last, as expected. > > **Testing:** tier1 and manual testing on a few graphs. This pull request has now been integrated. Changeset: 59e87437 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/59e87437b4f9259121710dca5e595ca714c3e71b Stats: 65 lines in 4 files changed: 44 ins; 11 del; 10 mod 8368753: IGV: improve CFG view of difference graphs Reviewed-by: chagedorn, mhaessig, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/27520 From eirbjo at openjdk.org Mon Oct 6 08:19:50 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Mon, 6 Oct 2025 08:19:50 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: <0n629aakXFsODeKAbtRtvDTbaCHEt18Mc9LFlAb-G2o=.8be1175e-cebc-4395-b44e-973df41507cf@github.com> References: <9EL8Kg6tW9JVZHrelcP7bLCRHpoEd1l6YH_0eLe8U5Y=.a9db2c93-7a94-46a5-b90e-c104eddf6bc3@github.com> <0n629aakXFsODeKAbtRtvDTbaCHEt18Mc9LFlAb-G2o=.8be1175e-cebc-4395-b44e-973df41507cf@github.com> Message-ID: On Mon, 6 Oct 2025 08:12:32 GMT, Hannes Greule wrote: > That label was added automatically, closely after https://mail.openjdk.org/pipermail/jdk-dev/2025-September/010486.html. Not sure why, but the change is definitely hotspot specific. Right, makes sense. I'll go ahead and remove that label. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3370433481 From jbhateja at openjdk.org Mon Oct 6 08:25:51 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 6 Oct 2025 08:25:51 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: References: Message-ID: On Sat, 4 Oct 2025 06:01:07 GMT, Jatin Bhateja wrote: >> This patch optimizes PopCount value transforms using KnownBits information. >> Following are the results of the micro-benchmark included with the patch >> >> >> >> System: 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s >> >> Withopt: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Hi @merykitty , Can you kindly be the second reviewer? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3370453037 From shade at openjdk.org Mon Oct 6 08:37:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 Oct 2025 08:37:59 GMT Subject: RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v6] In-Reply-To: References: Message-ID: <8t28FlJ7Q6K_yFxEMR6KjtCq8M0DULz3vh8YXneoxfM=.5c919678-dca8-4c01-8398-f47c9e8f5f59@github.com> On Tue, 30 Sep 2025 19:26:43 GMT, Boris Ulasevich wrote: >> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal. >> >> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run. >> >> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments. >> >> The problems is that shift count `n` may be too large here: >> >> class Pipeline_Use_Cycle_Mask { >> protected: >> uint _mask; >> .. >> Pipeline_Use_Cycle_Mask& operator<<=(int n) { >> _mask <<= n; >> return *this; >> } >> }; >> >> The recent change attempted to cap the shift amount at one call site: >> >> class Pipeline_Use_Element { >> protected: >> .. >> // Mask of specific used cycles >> Pipeline_Use_Cycle_Mask _mask; >> .. >> void step(uint cycles) { >> _used = 0; >> uint max_shift = 8 * sizeof(_mask) - 1; >> _mask <<= (cycles < max_shift) ? cycles : max_shift; >> } >> } >> >> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count: >> >> // The following two routines assume that the root Pipeline_Use entity >> // consists of exactly 1 element for each functional unit >> // start is relative to the current cycle; used for latency-based info >> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const { >> for (uint i = 0; i < pred._count; i++) { >> const Pipeline_Use_Element *predUse = pred.element(i); >> if (predUse->_multiple) { >> uint min_delay = 7; >> // Multiple possible functional units, choose first unused one >> for (uint j = predUse->_lb; j <= predUse->_ub; j++) { >> const Pipeline_Use_Element *currUse = element(j); >> uint curr_delay = delay; >> if (predUse->_used & currUse->_used) { >> Pipeline_Use_Cycle_Mask x = predUse->_mask; >> Pipeline_Use_Cycle_Mask y = currUse->_mask; >> >> for ( y <<= curr_delay; x.overlaps(y); curr_delay++ ) >> y <<= 1; >> } >> if (min_delay > curr_delay) >> min_delay = curr_delay; >> } >> if (delay < min_delay) >> delay = min_delay; >> } >> else { >> for (uint j = predUse->_lb; j <= pre... > > Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains three commits: > > - use uint32_t for _mask > - remove redundant code > - 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' I have a post-integration question: src/hotspot/share/adlc/output_h.cpp line 768: > 766: fprintf(fp_hpp, " }\n\n"); > 767: fprintf(fp_hpp, " Pipeline_Use_Cycle_Mask& operator<<=(int n) {\n"); > 768: fprintf(fp_hpp, " _mask <<= (n < 32) ? n : 31;\n"); I was staring at this line for a while. Isn't this cutting too early? I would have expected `n=32` case to zero out the mask completely. Instead, this code moves lowest bit to highest bit, as it performs 31-bit shift. Should it be something like `_mask = (n < 32) ? (_mask << n) : 0;`? ------------- PR Review: https://git.openjdk.org/jdk/pull/26890#pullrequestreview-3303382292 PR Review Comment: https://git.openjdk.org/jdk/pull/26890#discussion_r2405337941 From syan at openjdk.org Mon Oct 6 09:29:57 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 6 Oct 2025 09:29:57 GMT Subject: RFR: 8367899: compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out [v2] In-Reply-To: References: <_zhjDERsbq1mfrqW8IBGsEB3WrMmuVVykDAijO-yOUU=.39b2c9a5-e784-419d-bc07-504750f48eaf@github.com> Message-ID: On Mon, 6 Oct 2025 08:04:42 GMT, Damon Fenacci wrote: >> SendaoYan has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove the loop which do not use the loop variable i or do not use random number > > Thanks for the cleanup @sendaoYan. LGTM Thanks for the reviews and suggestions at dafedafe @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/27548#issuecomment-3370685085 From syan at openjdk.org Mon Oct 6 09:29:58 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 6 Oct 2025 09:29:58 GMT Subject: Integrated: 8367899: compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out In-Reply-To: <_zhjDERsbq1mfrqW8IBGsEB3WrMmuVVykDAijO-yOUU=.39b2c9a5-e784-419d-bc07-504750f48eaf@github.com> References: <_zhjDERsbq1mfrqW8IBGsEB3WrMmuVVykDAijO-yOUU=.39b2c9a5-e784-419d-bc07-504750f48eaf@github.com> Message-ID: On Mon, 29 Sep 2025 13:52:59 GMT, SendaoYan wrote: > Hi all, > > After JDK-8260555 change the timeout factor from 4 to 1, make test compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out. > > There are 20 10k/20k loop count of loops in this test, this cause test need many CPU cycles to finish. If I reduce the loop count from 10k/20k to 100/200, the test failures descripted in [JDK-8350896](https://bugs.openjdk.org/browse/JDK-8350896) also reproduced when the tested jdk is jdk25u. So it seems that there no need so many loop count for these tests. > > Without the proposed change, the driver action finish about 65 sencods on linux-x64, with the proposed change the driver action finish about 1.5 seconds. This pull request has now been integrated. Changeset: 2c114d67 Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/2c114d676d9904094dd6058d15f06d801ec7a3d6 Stats: 29 lines in 1 file changed: 0 ins; 9 del; 20 mod 8367899: compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out Reviewed-by: dfenacci, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/27548 From mdoerr at openjdk.org Mon Oct 6 09:33:47 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 6 Oct 2025 09:33:47 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods In-Reply-To: References: Message-ID: On Fri, 26 Sep 2025 16:12:14 GMT, Martin Doerr wrote: > We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. > > We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. > > I've tested this proposal by the following code on x86_64: > > diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp > index a6b4efbe4f2..d715e69c850 100644 > --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp > +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp > @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { > void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { > prepare_to_jump_from_interpreted(); > > + if (UseNewCode) { > + Label ok; > + movptr(temp, Address(method, Method::from_interpreted_offset())); > + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); > + je(ok); > + movptr(rax, Address(method, Method::from_compiled_offset())); > + movptr(rbx, rax); > + addptr(rbx, 128); > + hlt(); > + bind(ok); > + } > + > if (JvmtiExport::can_post_interpreter_events()) { > Label run_compiled_code; > // JVMTI events, such as single-stepping, are implemented partly by avoiding running > > > The output is (requires hsdis library, otherwise we get a hex dump instead of disassembly): > > RAX=0x00007f3b19000100 is at entry_point+0 in (nmethod*)0x00007f3b19000008 > Compiled method (c1) 2915 1 3 java.lang.Byte::toUnsignedInt (6 bytes) > total in heap [0x00007f3b19000008,0x00007f3b190001f8] = 496 > main code [0x00007f3b19000100,0x00007f3b190001b8] = 184 > stub code [0x00007f3b190001b8,0x00007f3b190001f8] = 64 > mutable data [0x00007f3ab401e0b0,0x00007f3ab401e0e0] = 48 > relocation [0x00007f3ab401e0b0,0x00007f3ab401e0d8] = 40 > metadata [0x00007f3ab401e0d8,0x00007f3ab401e0e0] = 8 > immutable data [0x00007f3ab401dcd0,0x00007f3ab401dd30] = 96 > dependencies [0x00007f3ab401dcd0,0x00007f3ab401dcd8] = 8 > scopes pcs [0x00007f3ab401dcd8,0x00007f3ab401dd18] = 64 > ... Thanks for looking at this PR! Hotspot currently dumps code (hex or disassembled) when the nmethod is on stack of the crashing thread. That is completely missing when it's not on stack. Should we print both, hex dump and disassembly? Interesting. I haven't tried with ZGC. Did you find more relocations which don't point to an instruction start? We could ignore relocations with format `ZBarrierRelocationFormatStoreGoodAfterMov` on x86. Or use `patch_barrier_relocation_offset` to find the correct start in this case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27530#issuecomment-3370703736 From syan at openjdk.org Mon Oct 6 09:41:57 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 6 Oct 2025 09:41:57 GMT Subject: RFR: 8367899: compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out [v2] In-Reply-To: References: <_zhjDERsbq1mfrqW8IBGsEB3WrMmuVVykDAijO-yOUU=.39b2c9a5-e784-419d-bc07-504750f48eaf@github.com> Message-ID: On Mon, 6 Oct 2025 07:13:18 GMT, Christian Hagedorn wrote: >> SendaoYan has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove the loop which do not use the loop variable i or do not use random number > > test/hotspot/jtreg/compiler/c2/gvn/TestBitCompressValueTransform.java line 78: > >> 76: >> 77: @Run(test = "test1") >> 78: public void run1(RunInfo info) { > > While at it, you can also remove all the unused `RunInfo` parameters. But up to you if you want to squeeze that in here as well - it's only a clean-up that is not that important. Sorry for missed this comment...... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27548#discussion_r2405508641 From chagedorn at openjdk.org Mon Oct 6 10:29:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 Oct 2025 10:29:56 GMT Subject: RFR: 8367899: compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out [v2] In-Reply-To: References: <_zhjDERsbq1mfrqW8IBGsEB3WrMmuVVykDAijO-yOUU=.39b2c9a5-e784-419d-bc07-504750f48eaf@github.com> Message-ID: On Mon, 6 Oct 2025 09:39:35 GMT, SendaoYan wrote: >> test/hotspot/jtreg/compiler/c2/gvn/TestBitCompressValueTransform.java line 78: >> >>> 76: >>> 77: @Run(test = "test1") >>> 78: public void run1(RunInfo info) { >> >> While at it, you can also remove all the unused `RunInfo` parameters. But up to you if you want to squeeze that in here as well - it's only a clean-up that is not that important. > > Sorry for missed this comment...... No worries! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27548#discussion_r2405631770 From roland at openjdk.org Mon Oct 6 11:39:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 Oct 2025 11:39:58 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v20] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 22:28:29 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > refactor Multiplication into a class Changes requested by roland (Reviewer). src/hotspot/share/opto/addnode.cpp line 447: > 445: } > 446: > 447: Node* con = (bt == T_INT) With an `if`, the `static_cast` are not needed, right? I would do that then. It would more readable. src/hotspot/share/opto/addnode.cpp line 494: > 492: AddNode::Multiplication AddNode::Multiplication::find_simple_addition_pattern(const Node* n, BasicType bt) { > 493: if (n->Opcode() == Op_Add(bt) && n->in(1) == n->in(2)) { > 494: return {n->in(1), 2}; You should use a constructor call here. src/hotspot/share/opto/addnode.cpp line 509: > 507: Node* con = n->in(2); > 508: if (!con->is_top()) { > 509: return {n->in(1), java_shift_left(1, con->get_int(), bt)}; You should use a constructor call here. src/hotspot/share/opto/addnode.cpp line 524: > 522: if (n->Opcode() == Op_Mul(bt) && (n->in(1)->is_Con() || n->in(2)->is_Con())) { > 523: // Pattern (1) > 524: Node* con = n->in(1); Isn't con always input 2 because `MulNode::Ideal` canonicalize it? src/hotspot/share/opto/addnode.cpp line 534: > 532: > 533: if (!con->is_top()) { > 534: return {base, con->get_integer_as_long(bt)}; You should use a constructor call here. src/hotspot/share/opto/addnode.cpp line 564: > 562: // Pattern (2) > 563: if (lhs.is_valid_with(n->in(2))) { > 564: return {lhs.variable(), java_add(lhs.multiplier(), static_cast(1))}; You should use a constructor call here. src/hotspot/share/opto/addnode.cpp line 569: > 567: // Pattern (3) > 568: if (rhs.is_valid_with(n->in(1))) { > 569: return {rhs.variable(), java_add(rhs.multiplier(), static_cast(1))}; You should use a constructor call here. src/hotspot/share/opto/addnode.hpp line 74: > 72: Multiplication add(const Multiplication rhs) const { > 73: if (is_valid_with(rhs.variable()) && rhs.is_valid_with(variable())) { > 74: return {variable(), java_add(multiplier(), rhs.multiplier())}; You should use a constructor call here ------------- PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-3303979449 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2405860167 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2405815427 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2405816662 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2405847407 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2405819342 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2405784710 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2405785351 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2405781525 From adinn at openjdk.org Mon Oct 6 11:54:58 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 6 Oct 2025 11:54:58 GMT Subject: RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v6] In-Reply-To: <8t28FlJ7Q6K_yFxEMR6KjtCq8M0DULz3vh8YXneoxfM=.5c919678-dca8-4c01-8398-f47c9e8f5f59@github.com> References: <8t28FlJ7Q6K_yFxEMR6KjtCq8M0DULz3vh8YXneoxfM=.5c919678-dca8-4c01-8398-f47c9e8f5f59@github.com> Message-ID: On Mon, 6 Oct 2025 08:34:42 GMT, Aleksey Shipilev wrote: >> Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains three commits: >> >> - use uint32_t for _mask >> - remove redundant code >> - 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' > > src/hotspot/share/adlc/output_h.cpp line 768: > >> 766: fprintf(fp_hpp, " }\n\n"); >> 767: fprintf(fp_hpp, " Pipeline_Use_Cycle_Mask& operator<<=(int n) {\n"); >> 768: fprintf(fp_hpp, " _mask <<= (n < 32) ? n : 31;\n"); > > I was staring at this line for a while. Isn't this cutting too early? I would have expected `n=32` case to zero out the mask completely. Instead, this code moves lowest bit to highest bit, as it performs 31-bit shift. Should it be something like `_mask = (n < 32) ? (_mask << n) : 0;`? This is arguably correct. However, it doesn't really matter much whether we cap the depth at(n) at 32 or 31 because for all current pipeline models the pipeline depth is always way less than 31. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26890#discussion_r2405908503 From chagedorn at openjdk.org Mon Oct 6 13:10:47 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 Oct 2025 13:10:47 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v2] In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 14:44:31 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Add test Thanks for the summary and listing the options! I had a feeling that this fix is not enough and we can also trigger it without Loop Peeling. I had a go and indeed found a case without stress peeling: [Test.java](https://github.com/user-attachments/files/22722318/Test.java) This, unfortunately, makes this fix not sufficient and we need to find a better solution. But what is there left to do? - As you pointed out, running IGVN just for this one assert seems overkill. Also applying some local pre-IGVN phi clean-up hacks seems like duplicated effort and possibly error-prone. - Removing the asserts since we seem to do the right thing anyway: Doable but then we have no protection anymore when we later introduce a bug (or already have an existing lurking bug) where we mess this property up. We hit this assert in the past due to bugs ([link](https://bugs.openjdk.org/issues/?jql=text%20~%20%22%5C%22opaq-%3Eoutcnt()%20%3D%3D%201%5C%22%22)). So, I would be less inclined to remove the asserts. - You could have a go to somehow prove that the equality is just hidden by some useless phis but as you mentioned already, it might not be so straight forward and difficult to get right. - Could we somehow bail out of loop unrolling or just not apply it at all when we have this this inequality and wait until after IGVN? We would probably then still need some mechanism to re-check that after IGVN, we now have the same node and not just endlessly bail out again without noticing a real problem. - ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3371554178 From shade at openjdk.org Mon Oct 6 13:20:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 Oct 2025 13:20:55 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods In-Reply-To: References: Message-ID: <8UzfbVuWg89u7-Ow-2ZJcSj4wIQGpvbqiz8MtqEPbu8=.2c0a6c7a-0286-4dae-b2e9-392720e5638c@github.com> On Mon, 6 Oct 2025 09:30:40 GMT, Martin Doerr wrote: > Hotspot currently dumps code (hex or disassembled) when the nmethod is on stack of the crashing thread. That is completely missing when it's not on stack. [...] Should we print both, hex dump and disassembly? Yes, I think if we know the location is within nmethod, it makes sense to dump around the location. I think hex dump is most bullet-proof, as we can always disassemble offline it at different offsets. I don't think we want to specialize for reloc types, it does not gain us much? Also, relocs solve the variable-sized encoding only if you are lucky to hit the reloc right at the location you are decoding, right? Anything in between relocs is still pretty foggy. I suspect current patch would work in 99% of the cases, as it is hard to imagine e.g. the value in the register that points into nmethod and _does not_ have some sort of reloc. Then I also suspect that disassemblers actually able to figure the instruction boundaries pretty well? Because I don't quite see how our usual printout of `decode(pc - 64, pc + 64)` would otherwise work: `pc-64` starts at arbitrary boundary. You might want to check if this whole reloc thing is even needed. What happens if we just do `Disassembler::decode(MAX2(nm->entry_point(), addr - 64), MIN2(nm->code_end(), addr + 64))`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27530#issuecomment-3371604164 From hgreule at openjdk.org Mon Oct 6 13:29:53 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 6 Oct 2025 13:29:53 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 06:07:50 GMT, Quan Anh Mai wrote: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. Nice change overall. I'm not sure how "easily" we can really see the benefit in the example of the interval splitting, but I leave that to others to judge. I was just wondering, do you think it makes sense to move more such code into the RangeInference classes in future (e.g., for shift ops) or how we'll tell what to place where. From what it looks like the main reason currently is to use the TypeIntMirror classes for testability, which other node types definitely could benefit from as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/27618#pullrequestreview-3304640558 From kxu at openjdk.org Mon Oct 6 14:17:10 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 6 Oct 2025 14:17:10 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v21] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Use constructor, improve readability ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/57c19bc1..199f2735 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=19-20 Stats: 26 lines in 1 file changed: 4 ins; 7 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From kxu at openjdk.org Mon Oct 6 14:17:13 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 6 Oct 2025 14:17:13 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v20] In-Reply-To: References: Message-ID: On Mon, 6 Oct 2025 11:34:57 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor Multiplication into a class > > src/hotspot/share/opto/addnode.cpp line 447: > >> 445: } >> 446: >> 447: Node* con = (bt == T_INT) > > With an `if`, the `static_cast` are not needed, right? I would do that then. It would more readable. Good point. Used `if` instead. Thanks! > src/hotspot/share/opto/addnode.cpp line 524: > >> 522: if (n->Opcode() == Op_Mul(bt) && (n->in(1)->is_Con() || n->in(2)->is_Con())) { >> 523: // Pattern (1) >> 524: Node* con = n->in(1); > > Isn't con always input 2 because `MulNode::Ideal` canonicalize it? I think you're right. An assertion is also added to verify this. > src/hotspot/share/opto/addnode.hpp line 74: > >> 72: Multiplication add(const Multiplication rhs) const { >> 73: if (is_valid_with(rhs.variable()) && rhs.is_valid_with(variable())) { >> 74: return {variable(), java_add(multiplier(), rhs.multiplier())}; > > You should use a constructor call here clang-tidy suggested to > Avoid repeating the return type from the declaration; use a braced initializer list instead I believe they generate the same code for this case, but I updated to explicitly use constructor making it more clear. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2406573167 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2406573652 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2406574112 From roland at openjdk.org Mon Oct 6 14:27:31 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 6 Oct 2025 14:27:31 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v21] In-Reply-To: References: Message-ID: On Mon, 6 Oct 2025 14:17:10 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > Use constructor, improve readability src/hotspot/share/opto/addnode.cpp line 451: > 449: con = phase->intcon(java_add(static_cast(mul.multiplier()), 1)); > 450: } else { > 451: con = phase->longcon(java_add(mul.multiplier(), static_cast(1))); Instead of casting to `jlong`, you can use `CONST64(1)`. src/hotspot/share/opto/addnode.cpp line 561: > 559: // Pattern (2) > 560: if (lhs.is_valid_with(n->in(2))) { > 561: return Multiplication(lhs.variable(), java_add(lhs.multiplier(), static_cast(1))); Same here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2406630926 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2406642003 From mchevalier at openjdk.org Mon Oct 6 14:35:42 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 6 Oct 2025 14:35:42 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v2] In-Reply-To: References: Message-ID: <9-1Rk4Mam4szX3LdaBZauHGfvpKGdNP6TgVdgTFkjxs=.22d95e51-d87f-4f54-abc9-eceec9d45c43@github.com> On Mon, 6 Oct 2025 13:07:49 GMT, Christian Hagedorn wrote: > We hit this assert in the past due to bugs ([link](https://bugs.openjdk.org/issues/?jql=text%20~%20%22%5C%22opaq-%3Eoutcnt()%20%3D%3D%201%5C%22%22)). I might misunderstand, but some of these issues (for instance [JDK-8298353](https://bugs.openjdk.org/browse/JDK-8298353) and backport, duplicate...) seems purely to be hitting the assert, and I don't always see another problem. I wonder if it has been years we are trying to make the assert truer while it doesn't really need to hold (and indeed, it has no reason to hold in general). Actually, the part `opaq->outcnt() == 1` is fine in this PR, but the other half `opaq->in(1) == limit` is what makes the assert fail here, and in quite some other issues. This is also the half that is the least useful. The first part of the assert is the one that I see used as an argument that what we do is reasonable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3372008684 From qamai at openjdk.org Mon Oct 6 14:42:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 6 Oct 2025 14:42:51 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: Message-ID: On Mon, 6 Oct 2025 13:27:24 GMT, Hannes Greule wrote: > I'm not sure how "easily" we can really see the benefit in the example of the interval splitting, but I leave that to others to judge. Without it, the simple inference function fails `AndLNodeIdealizationTest` because the current version also splits the analysis between the negative part and the non-negative part. > I was just wondering, do you think it makes sense to move more such code into the RangeInference classes in future (e.g., for shift ops) or how we'll tell what to place where. From what it looks like the main reason currently is to use the TypeIntMirror classes for testability, which other node types definitely could benefit from as well. Yes that is entirely my intention, that for example, we only need to implement `RangeInference::infer_left_shift` and the unittest can be a simple: class OpLeftShift; class InferLeftShift: TEST(opto, range_inference) { test_binary(); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3372049054 From kxu at openjdk.org Mon Oct 6 15:14:31 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 6 Oct 2025 15:14:31 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v22] In-Reply-To: References: Message-ID: <4ws9_5sMsWEVwJUQnjJDUXUAZ6ek-IS0jVpmveOcM9g=.bfd3a412-563d-4e2c-a9eb-4f56271e56a4@github.com> > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Use CONST64() macro ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/199f2735..893ffa7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=20-21 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From liach at openjdk.org Mon Oct 6 15:23:19 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 6 Oct 2025 15:23:19 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: References: Message-ID: On Sat, 4 Oct 2025 06:01:07 GMT, Jatin Bhateja wrote: >> This patch optimizes PopCount value transforms using KnownBits information. >> Following are the results of the micro-benchmark included with the patch >> >> >> >> System: 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s >> >> Withopt: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution FYI this patch was marked core-libs because of the benchmark addition in java/lang. I wonder if it belongs to vm/compiler instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3372267688 From mdoerr at openjdk.org Mon Oct 6 15:34:48 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 6 Oct 2025 15:34:48 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: References: Message-ID: <89qm91MCFrkq2W4lRH6ZrnPiTZbuuapiZN70M3E9Hy4=.ba976fea-f0cf-4e74-9f76-655afb6f3d5d@github.com> On Fri, 3 Oct 2025 19:52:49 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix race when not installed nmethod is deoptimized > - Fix NMethodRelocationTest.java logging race > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Refactor JVMTI test > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - ... and 104 more: https://git.openjdk.org/jdk/compare/012e079d...104661c6 We also see assertions on PPC64 in the new test `DeoptimizeRelocatedNMethod`: # Internal Error (jdk/src/hotspot/cpu/ppc/nativeInst_ppc.cpp:405) # assert(!decode(i1, i2)) failed: already patched Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1701784] NativePostCallNop::patch(int, int)+0xf4 (nativeInst_ppc.cpp:405) V [libjvm.so+0x1718414] nmethod::finalize_relocations()+0x6f4 (nmethod.cpp:2059) V [libjvm.so+0x171891c] nmethod::post_init()+0x5c (nmethod.cpp:1252) V [libjvm.so+0x171a8dc] nmethod::relocate(CodeBlobType)+0x1ec (nmethod.cpp:1515) V [libjvm.so+0x200b598] WB_RelocateNMethodFromMethod+0x388 (whitebox.cpp:1653) j jdk.test.whitebox.WhiteBox.relocateNMethodFromMethod0(Ljava/lang/reflect/Executable;I)V+0 j jdk.test.whitebox.WhiteBox.relocateNMethodFromMethod(Ljava/lang/reflect/Executable;I)V+8 j compiler.whitebox.DeoptimizeRelocatedNMethod.main([Ljava/lang/String;)V+50 @reinrich: I assume this assertion is no longer valid. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3372321313 From rrich at openjdk.org Mon Oct 6 15:48:03 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 6 Oct 2025 15:48:03 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: <89qm91MCFrkq2W4lRH6ZrnPiTZbuuapiZN70M3E9Hy4=.ba976fea-f0cf-4e74-9f76-655afb6f3d5d@github.com> References: <89qm91MCFrkq2W4lRH6ZrnPiTZbuuapiZN70M3E9Hy4=.ba976fea-f0cf-4e74-9f76-655afb6f3d5d@github.com> Message-ID: On Mon, 6 Oct 2025 15:31:02 GMT, Martin Doerr wrote: > We also see assertions on PPC64 in the new test `DeoptimizeRelocatedNMethod`: > > ``` > # Internal Error (jdk/src/hotspot/cpu/ppc/nativeInst_ppc.cpp:405) > # assert(!decode(i1, i2)) failed: already patched > > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1701784] NativePostCallNop::patch(int, int)+0xf4 (nativeInst_ppc.cpp:405) > V [libjvm.so+0x1718414] nmethod::finalize_relocations()+0x6f4 (nmethod.cpp:2059) > V [libjvm.so+0x171891c] nmethod::post_init()+0x5c (nmethod.cpp:1252) > V [libjvm.so+0x171a8dc] nmethod::relocate(CodeBlobType)+0x1ec (nmethod.cpp:1515) > V [libjvm.so+0x200b598] WB_RelocateNMethodFromMethod+0x388 (whitebox.cpp:1653) > j jdk.test.whitebox.WhiteBox.relocateNMethodFromMethod0(Ljava/lang/reflect/Executable;I)V+0 > j jdk.test.whitebox.WhiteBox.relocateNMethodFromMethod(Ljava/lang/reflect/Executable;I)V+8 > j compiler.whitebox.DeoptimizeRelocatedNMethod.main([Ljava/lang/String;)V+50 > ``` > > @reinrich: I assume this assertion is no longer valid. Yeah, I reckon it needs to be adapted/removed. Would be nice, though, if we could keep it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3372392382 From bulasevich at openjdk.org Mon Oct 6 15:49:55 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 6 Oct 2025 15:49:55 GMT Subject: RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' [v6] In-Reply-To: References: <8t28FlJ7Q6K_yFxEMR6KjtCq8M0DULz3vh8YXneoxfM=.5c919678-dca8-4c01-8398-f47c9e8f5f59@github.com> Message-ID: On Mon, 6 Oct 2025 11:51:44 GMT, Andrew Dinn wrote: >> src/hotspot/share/adlc/output_h.cpp line 768: >> >>> 766: fprintf(fp_hpp, " }\n\n"); >>> 767: fprintf(fp_hpp, " Pipeline_Use_Cycle_Mask& operator<<=(int n) {\n"); >>> 768: fprintf(fp_hpp, " _mask <<= (n < 32) ? n : 31;\n"); >> >> I was staring at this line for a while. Isn't this cutting too early? I would have expected `n=32` case to zero out the mask completely. Instead, this code moves lowest bit to highest bit, as it performs 31-bit shift. Should it be something like `_mask = (n < 32) ? (_mask << n) : 0;`? > > This is arguably correct. However, it doesn't really matter much whether we cap the depth at(n) at 32 or 31 because for all current pipeline models the pipeline depth is always way less than 31. Yes, you?re right. I mechanically limited the shift to the maximum allowed value according to the static analyzer message, which means the top bit can survive an excessive shift. Semantically, however, the shift represents elapsed pipeline cycles. If the shift is large, even beyond the mask width, it means all cycles have passed, so all bits should be shifted out rather than stopping at the last valid step. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26890#discussion_r2407121670 From qamai at openjdk.org Mon Oct 6 15:56:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 6 Oct 2025 15:56:55 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: References: Message-ID: On Sat, 4 Oct 2025 06:01:07 GMT, Jatin Bhateja wrote: >> This patch optimizes PopCount value transforms using KnownBits information. >> Following are the results of the micro-benchmark included with the patch >> >> >> >> System: 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s >> >> Withopt: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution LGTM otherwise src/hotspot/share/opto/countbitsnode.cpp line 144: > 142: return Type::TOP; > 143: } > 144: const TypeInt* tint = t->isa_int(); This should be `is_int`. `isa_int` is fine, but in cases of unexpected types, it will SIGSEGV instead of throwing an assertion, which is more difficult to debug. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/27075#pullrequestreview-3305634424 PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2407155945 From mdoerr at openjdk.org Mon Oct 6 21:03:45 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 6 Oct 2025 21:03:45 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods In-Reply-To: <8UzfbVuWg89u7-Ow-2ZJcSj4wIQGpvbqiz8MtqEPbu8=.2c0a6c7a-0286-4dae-b2e9-392720e5638c@github.com> References: <8UzfbVuWg89u7-Ow-2ZJcSj4wIQGpvbqiz8MtqEPbu8=.2c0a6c7a-0286-4dae-b2e9-392720e5638c@github.com> Message-ID: On Mon, 6 Oct 2025 13:18:01 GMT, Aleksey Shipilev wrote: > I think hex dump is most bullet-proof, as we can always disassemble offline it at different offsets. Right. The disassembler produces garbage if it starts disassembling somewhere besides the correct instruction start (on x86). If that happens, we can play with the offset in the hex dump until the sequence looks feasible. So, I think printing some hex dump around the address is always a good thing. > I don't think we want to specialize for reloc types, it does not gain us much? Also, relocs solve the variable-sized encoding only if you are lucky to hit the reloc right at the location you are decoding, right? Anything in between relocs is still pretty foggy. I suspect current patch would work in 99% of the cases, as it is hard to imagine e.g. the value in the register that points into nmethod and does not have some sort of reloc. Yeah, nmethods don't contain a lot of data which is something else than valid instructions. So, most of the time, disassembly works as proposed here, but we may still produce garbage in rare cases. That's not a big problem if we have the hex dump. > Then I also suspect that disassemblers actually able to figure the instruction boundaries pretty well? Because I don't quite see how our usual printout of decode(pc - 64, pc + 64) would otherwise work: pc-64 starts at arbitrary boundary. You might want to check if this whole reloc thing is even needed. What happens if we just do Disassembler::decode(MAX2(nm->entry_point(), addr - 64), MIN2(nm->code_end(), addr + 64))? `decode(pc - 64, pc + 64)` works fine on platforms like aarch64 and PPC64. I hope we don't have such code for x86. I got complete garbage when trying a wrong offset on x86. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27530#issuecomment-3374109251 From mdoerr at openjdk.org Mon Oct 6 22:08:21 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 6 Oct 2025 22:08:21 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v2] In-Reply-To: References: Message-ID: > We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. > > We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. > > I've tested this proposal by the following code on x86_64: > > diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp > index a6b4efbe4f2..d715e69c850 100644 > --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp > +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp > @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { > void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { > prepare_to_jump_from_interpreted(); > > + if (UseNewCode) { > + Label ok; > + movptr(temp, Address(method, Method::from_interpreted_offset())); > + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); > + je(ok); > + movptr(rax, Address(method, Method::from_compiled_offset())); > + movptr(rbx, rax); > + addptr(rbx, 128); > + hlt(); > + bind(ok); > + } > + > if (JvmtiExport::can_post_interpreter_events()) { > Label run_compiled_code; > // JVMTI events, such as single-stepping, are implemented partly by avoiding running > > > The output is (requires hsdis library, otherwise we get a hex dump instead of disassembly): > > RAX=0x00007f3b19000100 is at entry_point+0 in (nmethod*)0x00007f3b19000008 > Compiled method (c1) 2915 1 3 java.lang.Byte::toUnsignedInt (6 bytes) > total in heap [0x00007f3b19000008,0x00007f3b190001f8] = 496 > main code [0x00007f3b19000100,0x00007f3b190001b8] = 184 > stub code [0x00007f3b190001b8,0x00007f3b190001f8] = 64 > mutable data [0x00007f3ab401e0b0,0x00007f3ab401e0e0] = 48 > relocation [0x00007f3ab401e0b0,0x00007f3ab401e0d8] = 40 > metadata [0x00007f3ab401e0d8,0x00007f3ab401e0e0] = 8 > immutable data [0x00007f3ab401dcd0,0x00007f3ab401dd30] = 96 > dependencies [0x00007f3ab401dcd0,0x00007f3ab401dcd8] = 8 > scopes pcs [0x00007f3ab401dcd8,0x00007f3ab401dd18] = 64 > ... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Always print hex dump. Plus disassembly when hsdis loaded. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27530/files - new: https://git.openjdk.org/jdk/pull/27530/files/660a3884..003680de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27530&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27530&range=00-01 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27530/head:pull/27530 PR: https://git.openjdk.org/jdk/pull/27530 From xgong at openjdk.org Tue Oct 7 01:35:48 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 7 Oct 2025 01:35:48 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <-Ei1bFBHQvpeD3n7j8WuhV572oNW1b9X8FI488DMigI=.d1f9c421-b0f5-49e0-9ac5-97732ca82c4f@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <-Ei1bFBHQvpeD3n7j8WuhV572oNW1b9X8FI488DMigI=.d1f9c421-b0f5-49e0-9ac5-97732ca82c4f@github.com> Message-ID: On Thu, 2 Oct 2025 09:23:20 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 272: >> >>> 270: if (length_in_bytes > 16 || !is_feat_fp16_supported()) { >>> 271: return false; >>> 272: } >> >> Reductions with `length_in_bytes < 8` should also be skipped. Because such operations are not supported now, while the IRs with 32-bit vector size might exist, right? > > Hi @XiaohongGong, yes `length_in_bytes < 8` is also not supported and currently we support only for vector lengths of 8B and 16B. > IRs with 32-bit vector size might exist but we do not have an optimized implementation for 32B vector lengths and thus I have disabled it. Instead of that, it generates the 16B scalarized Neon instruction sequence for a 32B vector length. Is this what you were asking? I mean do we need to check the length_in_bytes < 8, such as: Suggestion: if (length_in_bytes < 8 || length_in_bytes > 16 || !is_feat_fp16_supported()) { return false; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2409091499 From xgong at openjdk.org Tue Oct 7 02:50:45 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 7 Oct 2025 02:50:45 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <4VXHOCR1YSoMVbDbB8j-j18Z-_VbO0y5fJfyp3IjQ9c=.19485011-9cb3-4016-a642-61cee81adcd1@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <4VXHOCR1YSoMVbDbB8j-j18Z-_VbO0y5fJfyp3IjQ9c=.19485011-9cb3-4016-a642-61cee81adcd1@github.com> Message-ID: On Thu, 2 Oct 2025 10:21:06 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1900: >> >>> 1898: fmulh(dst, dst, vtmp); >>> 1899: ins(vtmp, H, vsrc, 0, 7); >>> 1900: fmulh(dst, dst, vtmp); >> >> Do you know why the performance is not improved significantly for multiply reduction? Seems instructions between different `ins` instructions will have a data-dependence, which is not expected? Could you please use other instructions instead or clear the register `vtmp` before `ins` and check the performance changes? >> >> Note that a clear of `mov` such as `MOVI Vd.2D, #0` has zero cost from V2's guide. > > Are you referring to the N1 numbers? The add reduction operation has gains around ~40% while the mul reduction is around ~20% on N1. On V1 and V2 they look comparable (not considering the cases where we generate `fadda` instructions for add reduction). > >> Seems instructions between different ins instructions will have a data-dependence, which is not expected > > Why do you think it's not expected? We have the exact same sequence for Neon add reduction as well. There's back to back dependency there as well and yet it shows better performance. The N1 optimization guide shows 2 cyc latency for `fadd` and 3 cyc latency for `fmul`. Could this be the reason? WDYT? I mean we do not expect there is data-dependence between two `ins` operations, but it has now. We do not recommend use the instructions that just write part of a register. This might involve un-expected dependence between. I suggest to use `ext` instead, and I can observe about 20% performance improvement compared with current version on V2. I did not check the correctness, but it looks right to me. Could you please help check on other machines? Thanks! The change might look like: Suggestion: fmulh(dst, fsrc, vsrc); ext(vtmp, T8B, vsrc, vsrc, 2); fmulh(dst, dst, vtmp); ext(vtmp, T8B, vsrc, vsrc, 4); fmulh(dst, dst, vtmp); ext(vtmp, T8B, vsrc, vsrc, 6); fmulh(dst, dst, vtmp); if (isQ) { ext(vtmp, T16B, vsrc, vsrc, 8); fmulh(dst, dst, vtmp); ext(vtmp, T16B, vsrc, vsrc, 10); fmulh(dst, dst, vtmp); ext(vtmp, T16B, vsrc, vsrc, 12); fmulh(dst, dst, vtmp); ext(vtmp, T16B, vsrc, vsrc, 14); fmulh(dst, dst, vtmp); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2409190652 From duke at openjdk.org Tue Oct 7 05:49:48 2025 From: duke at openjdk.org (erifan) Date: Tue, 7 Oct 2025 05:49:48 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v8] In-Reply-To: References: Message-ID: <_6err-oI7jnkN1zwTDpqBR4Gurfez_OdLJtveJYvORc=.53394d2b-2214-47e1-a87a-1590a356aaab@github.com> On Wed, 20 Aug 2025 10:11:47 GMT, Jatin Bhateja wrote: >> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. >> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. >> >> Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). >> >> Vector API jtreg tests pass at AVX level 2, remaining validation in progress. >> >> Performance numbers: >> >> >> System : 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms >> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms >> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms >> VectorSliceB... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update callGenerator.hpp copyright year @jatin-bhateja I have no further comments, great work. After this PR is merged, I will complete the backend optimization of the aarch64 part based on it. Thanks! src/hotspot/cpu/x86/x86.ad line 10770: > 10768: %} > 10769: > 10770: instruct vector_slice_const_origin_LT16B_reg(vec dst, vec src1, vec src2, immI origin) Suggestion: instruct vector_slice_const_origin_EQ16B_reg(vec dst, vec src1, vec src2, immI origin) Or Suggestion: instruct vector_slice_const_origin_16B_reg(vec dst, vec src1, vec src2, immI origin) ------------- PR Review: https://git.openjdk.org/jdk/pull/24104#pullrequestreview-3308445233 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2409418070 From duke at openjdk.org Tue Oct 7 05:49:50 2025 From: duke at openjdk.org (erifan) Date: Tue, 7 Oct 2025 05:49:50 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v8] In-Reply-To: References: Message-ID: <30EG_sC2od4Xwsibk4Uv1XW_XROt9OtbYSaEDwFmycY=.c2c59c3b-dbfe-4d52-a353-08b7f41bab1d@github.com> On Thu, 25 Sep 2025 08:52:09 GMT, erifan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update callGenerator.hpp copyright year > > test/hotspot/jtreg/compiler/vectorapi/TestSliceOptValueTransforms.java line 45: > >> 43: public static final VectorSpecies SSP = ShortVector.SPECIES_PREFERRED; >> 44: public static final VectorSpecies ISP = IntVector.SPECIES_PREFERRED; >> 45: public static final VectorSpecies LSP = LongVector.SPECIES_PREFERRED; > > The implementation supports floating point types, but why doesn't the test include fp types? It might be better to consider **partial cases**. I looked at the aarch64 situation and found that different implementations are needed for partial and non-partial cases. The test indices in `test/jdk/jdk/incubator/vector/` are randomly generated, so it might be better to test different vector species here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2409431981 From duke at openjdk.org Tue Oct 7 06:24:22 2025 From: duke at openjdk.org (erifan) Date: Tue, 7 Oct 2025 06:24:22 GMT Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of vector compress [v5] In-Reply-To: References: Message-ID: > The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions. > > This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects: > 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction. > 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid. > > This pull request introduces the following changes: > 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput. > 2. Eliminates unnecessary compress operations for partial subword type cases. > 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure. > > Benchmark results demonstrate that these changes significantly improve performance. > > Benchmarks on Nvidia Grace machine with 128-bit SVE: > > Benchmark Unit Before Error After Error Uplift > Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 > Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 > Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 > Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 > > > This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed. erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Enable the IR test for x86 - Merge branch 'master' into JDK-8366333-compress - Improve coding style a bit - Improve some code style - Merge branch 'master' into JDK-8366333-compress - Merge branch 'master' into JDK-8366333-compress - 8366333: AArch64: Enhance SVE subword type implementation of vector compress The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions. This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects: 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction. 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid. This pull request introduces the following changes: 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput. 2. Eliminates unnecessary compress operations for partial subword type cases. 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure. Benchmark results demonstrate that these changes significantly improve performance. Benchmarks on Nvidia Grace machine with 128-bit SVE: ``` Benchmark Unit Before Error After Error Uplift Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 ``` This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed. ------------- Changes: https://git.openjdk.org/jdk/pull/27188/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27188&range=04 Stats: 434 lines in 10 files changed: 317 ins; 24 del; 93 mod Patch: https://git.openjdk.org/jdk/pull/27188.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27188/head:pull/27188 PR: https://git.openjdk.org/jdk/pull/27188 From duke at openjdk.org Tue Oct 7 06:33:47 2025 From: duke at openjdk.org (erifan) Date: Tue, 7 Oct 2025 06:33:47 GMT Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of vector compress [v3] In-Reply-To: References: Message-ID: On Mon, 29 Sep 2025 18:54:35 GMT, Vladimir Ivanov wrote: >> Done, thanks! > > The following reads slightly better, but it's up to you how to shape it. > > FloatRegister vzr = vtmp3; > sve_dup(vzr, B, 0); Done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2409540159 From duke at openjdk.org Tue Oct 7 06:33:50 2025 From: duke at openjdk.org (erifan) Date: Tue, 7 Oct 2025 06:33:50 GMT Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of vector compress [v4] In-Reply-To: References: <_plVr3bffYuI85KtB6vwCbEKvYp4Rlq2NIXCpLhHzpc=.1b8e461e-d360-45b5-af21-ad3bfbc2fce3@github.com> Message-ID: On Tue, 30 Sep 2025 06:57:34 GMT, Jatin Bhateja wrote: >> erifan has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve coding style a bit > > test/hotspot/jtreg/compiler/vectorapi/VectorCompressTest.java line 170: > >> 168: @Test >> 169: @IR(counts = { IRNode.COMPRESS_VB, "= 1" }, >> 170: applyIfCPUFeature = { "sve", "true" }) > > Hi @erifan, > Nice work!, > Can you please also enable these tests for x86? Following are the relevant features. > > CompressVB -> avx512_vbmi2, avx512_vl > CompressVS -> avx512_vbmi2. avx512_vl > CompressVI/VF -> avx512f, avx512vl > ComprssVL/VD -> avx512f, avx512vl > > PS: avx512_vbmi2 is missing from test/IREncodingPrinter.java > > FYI , currently, we don't support sub-word compression intrinsics on AVX2/E-core targets. I created a vectorized algorithm without any x86 backend change just using vector APIs, and it showed 12x improvement. > > https://github.com/jatin-bhateja/external_staging/blob/main/VectorizedAlgos/SubwordCompress/short_vector_compress.java > > > PROMPT>java -cp . --add-modules=jdk.incubator.vector short_vector_compress 0 > WARNING: Using incubator modules: jdk.incubator.vector > [ baseline time] 976 ms [res] 429507073 > PROMPT>java -cp . --add-modules=jdk.incubator.vector short_vector_compress 1 > WARNING: Using incubator modules: jdk.incubator.vector > [ withopt time] 80 ms [res] 429507073 > PROMPT> Done, please help me check if it is correct, thank you! I have tested it locally. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2409522760 From duke at openjdk.org Tue Oct 7 06:41:48 2025 From: duke at openjdk.org (erifan) Date: Tue, 7 Oct 2025 06:41:48 GMT Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of vector compress [v4] In-Reply-To: References: <_plVr3bffYuI85KtB6vwCbEKvYp4Rlq2NIXCpLhHzpc=.1b8e461e-d360-45b5-af21-ad3bfbc2fce3@github.com> Message-ID: On Mon, 29 Sep 2025 18:54:45 GMT, Vladimir Ivanov wrote: >> erifan has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve coding style a bit > > Looks good. Hi @iwanowww @jatin-bhateja I have addressed your comments, thanks for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27188#issuecomment-3375426818 From duke at openjdk.org Tue Oct 7 06:41:50 2025 From: duke at openjdk.org (erifan) Date: Tue, 7 Oct 2025 06:41:50 GMT Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of vector compress [v3] In-Reply-To: References: Message-ID: On Mon, 29 Sep 2025 18:54:35 GMT, Vladimir Ivanov wrote: >> Done, thanks! > > The following reads slightly better, but it's up to you how to shape it. > > FloatRegister vzr = vtmp3; > sve_dup(vzr, B, 0); Done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2409551881 From jbhateja at openjdk.org Tue Oct 7 07:21:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Oct 2025 07:21:53 GMT Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of vector compress [v5] In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 06:24:22 GMT, erifan wrote: >> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions. >> >> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects: >> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction. >> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid. >> >> This pull request introduces the following changes: >> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput. >> 2. Eliminates unnecessary compress operations for partial subword type cases. >> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure. >> >> Benchmark results demonstrate that these changes significantly improve performance. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 >> Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 >> Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 >> Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 >> >> >> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Enable the IR test for x86 > - Merge branch 'master' into JDK-8366333-compress > - Improve coding style a bit > - Improve some code style > - Merge branch 'master' into JDK-8366333-compress > - Merge branch 'master' into JDK-8366333-compress > - 8366333: AArch64: Enhance SVE subword type implementation of vector compress > > The AArch64 SVE and SVE2 architectures lack an instruction suitable for > subword-type `compress` operations. Therefore, the current implementation > uses the 32-bit SVE `compact` instruction to compress subword types by > first widening the high and low parts to 32 bits, compressing them, and > then narrowing them back to their original type. Finally, the high and > low parts are merged using the `index + tbl` instructions. > > This approach is significantly slower compared to architectures with native > support. After evaluating all available AArch64 SVE instructions and > experimenting with various implementations?such as looping over the active > elements, extraction, and insertion?I confirmed that the existing algorithm > is optimal given the instruction set. However, there is still room for > optimization in the following two aspects: > 1. Merging with `index + tbl` is suboptimal due to the high latency of > the `index` instruction. > 2. For partial subword types, operations to the highest half are unnecessary > because those bits are invalid. > > This pull request introduces the following changes: > 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which > offer lower latency and higher throughput. > 2. Eliminates unnecessary compress operations for partial subword type cases. > 3. For `sve_compress_byte`, one less temporary register is used to alleviate > potential register pressure. > > Benchmark results demonstrate that these changes significantly improve performance. > > Benchmarks on Nvidia Grace machine with 128-bit SVE: > ``` > Benchmark Unit Before Error After Error Uplift > Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 > Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 > Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 > Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 > ``` > > This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, > and all... Thanks @erifan , Verified IR test changes. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27188#pullrequestreview-3308743448 From bkilambi at openjdk.org Tue Oct 7 07:39:46 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 7 Oct 2025 07:39:46 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <-Ei1bFBHQvpeD3n7j8WuhV572oNW1b9X8FI488DMigI=.d1f9c421-b0f5-49e0-9ac5-97732ca82c4f@github.com> Message-ID: On Tue, 7 Oct 2025 01:33:05 GMT, Xiaohong Gong wrote: >> Hi @XiaohongGong, yes `length_in_bytes < 8` is also not supported and currently we support only for vector lengths of 8B and 16B. >> IRs with 32-bit vector size might exist but we do not have an optimized implementation for 32B vector lengths and thus I have disabled it. Instead of that, it generates the 16B scalarized Neon instruction sequence for a 32B vector length. Is this what you were asking? > > I mean do we need to check the length_in_bytes < 8, such as: > Suggestion: > > if (length_in_bytes < 8 || length_in_bytes > 16 || !is_feat_fp16_supported()) { > return false; > } Yes, I understood that part (and I already made that change in my patch interally) but not this - > the IRs with 32-bit vector size might exist ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2409685656 From roland at openjdk.org Tue Oct 7 07:43:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 7 Oct 2025 07:43:25 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types Message-ID: Currently ReassociateInvariants is only enabled for int counted loops. I noticed, enabling it for long counted loops helps RCE. It also seems like something that would help any loop. I propose enabling it for all inner loops. ------------- Commit messages: - test fixes - test and fix Changes: https://git.openjdk.org/jdk/pull/27666/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27666&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369258 Stats: 451 lines in 6 files changed: 255 ins; 190 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/27666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27666/head:pull/27666 PR: https://git.openjdk.org/jdk/pull/27666 From chagedorn at openjdk.org Tue Oct 7 07:51:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 Oct 2025 07:51:30 GMT Subject: RFR: 8369236: testlibrary_tests/ir_framework/tests/TestCompileThreshold.java timed out Message-ID: The test `testlibrary_tests/ir_framework/tests/TestCompileThreshold.java` times out intermittently after the timeout factor change (taking more than 120s). On my local machine, I measured around ~105-115s. The test uses `CompileThreshold=10` which is almost like `Xcomp` and thus quite slow. However, the purpose of this test is not to stress the compiler but actually to verify that passing `CompileThreshold` to the IR framework over jtreg options is properly ignored. Therefore, we can use higher `CompileThreshold` values and achieve the same goal. With the proposed changes, the test finishes in ~10-15s on my local machine. Thanks, Christian ------------- Commit messages: - 8369236: testlibrary_tests/ir_framework/tests/TestCompileThreshold.java timed out Changes: https://git.openjdk.org/jdk/pull/27667/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27667&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369236 Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/27667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27667/head:pull/27667 PR: https://git.openjdk.org/jdk/pull/27667 From hgreule at openjdk.org Tue Oct 7 07:54:47 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 7 Oct 2025 07:54:47 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 07:36:41 GMT, Roland Westrelin wrote: > Currently ReassociateInvariants is only enabled for int counted > loops. I noticed, enabling it for long counted loops helps RCE. It > also seems like something that would help any loop. I propose enabling > it for all inner loops. Hi @rwestrel, it looks like this also resolves https://bugs.openjdk.org/browse/JDK-8326001, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27666#issuecomment-3375624311 From ayang at openjdk.org Tue Oct 7 08:29:53 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 7 Oct 2025 08:29:53 GMT Subject: RFR: 8369236: testlibrary_tests/ir_framework/tests/TestCompileThreshold.java timed out In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 07:45:15 GMT, Christian Hagedorn wrote: > The test `testlibrary_tests/ir_framework/tests/TestCompileThreshold.java` times out intermittently after the timeout factor change (taking more than 120s). On my local machine, I measured around ~105-115s. > > The test uses `CompileThreshold=10` which is almost like `Xcomp` and thus quite slow. However, the purpose of this test is not to stress the compiler but actually to verify that passing `CompileThreshold` to the IR framework over jtreg options is properly ignored. Therefore, we can use higher `CompileThreshold` values and achieve the same goal. With the proposed changes, the test finishes in ~10-15s on my local machine. > > Thanks, > Christian Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27667#pullrequestreview-3309001508 From roland at openjdk.org Tue Oct 7 08:37:45 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 7 Oct 2025 08:37:45 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 07:51:42 GMT, Hannes Greule wrote: > Hi @rwestrel, it looks like this also resolves https://bugs.openjdk.org/browse/JDK-8326001, right? Right. I missed that there was already a bug for that. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27666#issuecomment-3375789319 From xgong at openjdk.org Tue Oct 7 08:56:52 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 7 Oct 2025 08:56:52 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v5] In-Reply-To: References: Message-ID: On Tue, 9 Sep 2025 07:33:49 GMT, Emanuel Peter wrote: >> It just has the `Extract` node to extract an element from vector in C2, right? Extracting the lowest part can be implemented with `VectorReinterpret` easily. But how about the higher parts? Maybe this can also be implemented with operations like `slice` ? But, seems this will also make the IR more complex? For `Cast`, we have `VectorCastMask` now, but it assumes the vector length should be the same for input and output. So the `VectorReinterpret` or an `VectorExtract` is sill needed. >> >> I can have a try with separating the IR. But I guess an additional new node is still necessary. >> >>> It would just allow us to have one fewer nodes. >> >> This is also what I expect really. > > It would just be nice to build on "simple" building blocks and not have too many complex nodes, that have very special semantics (widen + split into two). It just means that the IR optimizations have to take care of more special cases, rather than following simple rules/optimizations because every IR node does a relatively simple thing. > > Maybe you find out that we really need a complex node, and can provide good arguments. Looking forward to what you find :) Hi @iwanowww , regarding to the operation of extending the higher half element size for a vector mask, do you have any better idea? To split the gather operation for a subword type, we usually need to split the input mask as well. Especially for SVE, which the vector mask needs the same data type for an element. I need to extract the part of the original vector mask, and extend it to the int type. For Vector API, I think we can either use similar vector slice for a mask, or a vector extract API. WDYT? Note that on SVE, it has the native `PUNPKHI` [1] instruction supported. [1] https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/PUNPKHI--PUNPKLO--Unpack-and-widen-half-of-predicate- ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2409903068 From dskantz at openjdk.org Tue Oct 7 09:04:49 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 7 Oct 2025 09:04:49 GMT Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors [v5] In-Reply-To: References: Message-ID: On Tue, 30 Sep 2025 09:00:38 GMT, Daniel Skantz wrote: >> This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). >> >> In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. >> >> The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. >> >> [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 >> >> [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 >> >> Testing: T1-4. >> >> Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > comment bound; debug prints; bug numbers Thanks for the reviews and good suggestions ------------- PR Comment: https://git.openjdk.org/jdk/pull/26685#issuecomment-3375887713 From dskantz at openjdk.org Tue Oct 7 09:08:09 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 7 Oct 2025 09:08:09 GMT Subject: Integrated: 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors In-Reply-To: References: Message-ID: On Fri, 8 Aug 2025 06:10:56 GMT, Daniel Skantz wrote: > This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments(). > > In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations. > > The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2]. > > [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303 > > [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806 > > Testing: T1-4. > > Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass. This pull request has now been integrated. Changeset: c06d6805 Author: Daniel Skantz URL: https://git.openjdk.org/jdk/commit/c06d6805aae3af2e6175f3f43deea46c9ce08bc6 Stats: 116 lines in 2 files changed: 115 ins; 0 del; 1 mod 8362394: C2: Repeated stacked string concatenation fails with "Hit MemLimit" and other resourcing errors Reviewed-by: chagedorn, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/26685 From mdoerr at openjdk.org Tue Oct 7 11:07:36 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 7 Oct 2025 11:07:36 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v3] In-Reply-To: References: Message-ID: > We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. > > We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. > > I've tested this proposal by the following code on x86_64: > > diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp > index a6b4efbe4f2..d715e69c850 100644 > --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp > +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp > @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { > void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { > prepare_to_jump_from_interpreted(); > > + if (UseNewCode) { > + Label ok; > + movptr(temp, Address(method, Method::from_interpreted_offset())); > + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); > + je(ok); > + movptr(rax, Address(method, Method::from_compiled_offset())); > + movptr(rbx, rax); > + addptr(rbx, 128); > + hlt(); > + bind(ok); > + } > + > if (JvmtiExport::can_post_interpreter_events()) { > Label run_compiled_code; > // JVMTI events, such as single-stepping, are implemented partly by avoiding running > > > The output is (requires hsdis library, otherwise we get a hex dump instead of disassembly): > > RAX=0x00007f3b19000100 is at entry_point+0 in (nmethod*)0x00007f3b19000008 > Compiled method (c1) 2915 1 3 java.lang.Byte::toUnsignedInt (6 bytes) > total in heap [0x00007f3b19000008,0x00007f3b190001f8] = 496 > main code [0x00007f3b19000100,0x00007f3b190001b8] = 184 > stub code [0x00007f3b190001b8,0x00007f3b190001f8] = 64 > mutable data [0x00007f3ab401e0b0,0x00007f3ab401e0e0] = 48 > relocation [0x00007f3ab401e0b0,0x00007f3ab401e0d8] = 40 > metadata [0x00007f3ab401e0d8,0x00007f3ab401e0e0] = 8 > immutable data [0x00007f3ab401dcd0,0x00007f3ab401dd30] = 96 > dependencies [0x00007f3ab401dcd0,0x00007f3ab401dcd8] = 8 > scopes pcs [0x00007f3ab401dcd8,0x00007f3ab401dd18] = 64 > ... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Move printing code to nmethod.cpp. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27530/files - new: https://git.openjdk.org/jdk/pull/27530/files/003680de..4a05d40f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27530&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27530&range=01-02 Stats: 48 lines in 3 files changed: 26 ins; 21 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27530/head:pull/27530 PR: https://git.openjdk.org/jdk/pull/27530 From chagedorn at openjdk.org Tue Oct 7 11:11:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 Oct 2025 11:11:15 GMT Subject: RFR: 8369236: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out Message-ID: The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. #### Reduce Execution Time by not Executing the Scenarios I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. #### Changes - Verification without actually running scenarios. - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. - Refactored the test a little more. - Refactored some small things in `addCrossProductScenarios()` while looking at it. - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. #### Execution Time Comparison Measured on my local machine: - Mainline: ~80s - With patch: ~2-3s Thanks, Christian ------------- Commit messages: - 8369236: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out Changes: https://git.openjdk.org/jdk/pull/27672/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27672&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369236 Stats: 234 lines in 2 files changed: 134 ins; 30 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/27672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27672/head:pull/27672 PR: https://git.openjdk.org/jdk/pull/27672 From chagedorn at openjdk.org Tue Oct 7 11:40:48 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 Oct 2025 11:40:48 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v2] In-Reply-To: <9-1Rk4Mam4szX3LdaBZauHGfvpKGdNP6TgVdgTFkjxs=.22d95e51-d87f-4f54-abc9-eceec9d45c43@github.com> References: <9-1Rk4Mam4szX3LdaBZauHGfvpKGdNP6TgVdgTFkjxs=.22d95e51-d87f-4f54-abc9-eceec9d45c43@github.com> Message-ID: On Mon, 6 Oct 2025 14:32:55 GMT, Marc Chevalier wrote: > > We hit this assert in the past due to bugs ([link](https://bugs.openjdk.org/issues/?jql=text%20~%20%22%5C%22opaq-%3Eoutcnt()%20%3D%3D%201%5C%22%22)). > > I might misunderstand, but some of these issues (for instance [JDK-8298353](https://bugs.openjdk.org/browse/JDK-8298353) and backport, duplicate...) seems purely to be hitting the assert, and I don't always see another problem. I think here we hit the same assert and actually fixed a bug with split If here: https://github.com/openjdk/jdk/pull/11391 > I wonder if it has been years we are trying to make the assert truer while it doesn't really need to hold (and indeed, it has no reason to hold in general). But I think it's a key property that the zero trip guard share the same limit as the loop exit test, otherwise, something is off. So, it should hold in general which the assert tries to verify. We now seem to have found a special case where useless phi are violating the property. If we remove the assert and introduce a bug later which violates this property, we might not notice it anymore. It now really depends on how complex/impactful a solution that keeps the assert would be. I guess it will be a trade-off. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3376497278 From roland at openjdk.org Tue Oct 7 12:25:07 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 7 Oct 2025 12:25:07 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v22] In-Reply-To: <4ws9_5sMsWEVwJUQnjJDUXUAZ6ek-IS0jVpmveOcM9g=.bfd3a412-563d-4e2c-a9eb-4f56271e56a4@github.com> References: <4ws9_5sMsWEVwJUQnjJDUXUAZ6ek-IS0jVpmveOcM9g=.bfd3a412-563d-4e2c-a9eb-4f56271e56a4@github.com> Message-ID: On Mon, 6 Oct 2025 15:14:31 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > Use CONST64() macro Other than that, looks good to me. src/hotspot/share/opto/addnode.cpp line 570: > 568: > 569: // Pattern (4), which is equivalent to a simple addition pattern > 570: return find_simple_addition_pattern(n, bt); Isn't that one redundant with the call to `find_simple_addition_pattern()` in `AddNode::Multiplication::find_collapsible_addition_patterns()`? ------------- PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-3309847621 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2410436202 From roland at openjdk.org Tue Oct 7 12:25:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 7 Oct 2025 12:25:09 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v20] In-Reply-To: References: Message-ID: <8hKALE6nqzEh4ohC3DBAzqhFWjwXFwzshChKXPIEIhU=.7ee5d326-7dfb-408e-976c-bd97cffee7c1@github.com> On Mon, 6 Oct 2025 14:12:30 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/addnode.hpp line 74: >> >>> 72: Multiplication add(const Multiplication rhs) const { >>> 73: if (is_valid_with(rhs.variable()) && rhs.is_valid_with(variable())) { >>> 74: return {variable(), java_add(multiplier(), rhs.multiplier())}; >> >> You should use a constructor call here > > clang-tidy suggested to > >> Avoid repeating the return type from the declaration; use a braced initializer list instead > > I believe they generate the same code for this case, but I updated to explicitly use constructor making it more clear. Thanks! I think you should change that one as well to keep everything consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2410433511 From dfenacci at openjdk.org Tue Oct 7 12:44:41 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 7 Oct 2025 12:44:41 GMT Subject: RFR: 8369236: testlibrary_tests/ir_framework/tests/TestCompileThreshold.java timed out In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 07:45:15 GMT, Christian Hagedorn wrote: > The test `testlibrary_tests/ir_framework/tests/TestCompileThreshold.java` times out intermittently after the timeout factor change (taking more than 120s). On my local machine, I measured around ~105-115s. > > The test uses `CompileThreshold=10` which is almost like `Xcomp` and thus quite slow. However, the purpose of this test is not to stress the compiler but actually to verify that passing `CompileThreshold` to the IR framework over jtreg options is properly ignored. Therefore, we can use higher `CompileThreshold` values and achieve the same goal. With the proposed changes, the test finishes in ~10-15s on my local machine. > > Thanks, > Christian The fix looks good to me. Thanks @chhagedorn! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/27667#pullrequestreview-3309937585 From chagedorn at openjdk.org Tue Oct 7 13:00:20 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 Oct 2025 13:00:20 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 10:39:21 GMT, Roland Westrelin wrote: > In the `test1()` method of the test case: > > `inlined2()` calls `clone()` for an object loaded from field `field` > that has inexact type `A` at parse time. The intrinsic for `clone()` > inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the > load of `field` is optimized out because it reads back a newly > allocated `B` written to `field` in the same method. `ArrayCopy` can > now be optimized because the type of its `src` input is known. The > type of its `dest` input is the `CheckCastPP` from the allocation of > the cloned object created at parse time. That one has type `A`. A > series of `Load`s/`Store`s are created to copy the fields of class `B` > from `src` (of type `B`) to `dest` of (type `A`). > > Writting to `dest` with offsets for fields that don't exist in `A`, > causes this code in `Compile::flatten_alias_type()`: > > > } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { > // Static fields are in the space above the normal instance > // fields in the java.lang.Class instance. > if (ik != ciEnv::current()->Class_klass()) { > to = nullptr; > tj = TypeOopPtr::BOTTOM; > offset = tj->offset(); > } > > > to assign it some slice that doesn't match the one that's used at the > same offset in `B`. > > That causes an assert in `ArrayCopyNode::try_clone_instance()` to > fire. With a release build, execution proceeds. `test1()` also has a > non escaping allocation. That one causes EA to run and > `ConnectionGraph::split_unique_types()` to move the store to the non > escaping allocation to a new slice. In the process, when it iterates > over `MergeMem` nodes, it notices the stores added by > `ArrayCopyNode::try_clone_instance()`, finds that some are not on the > right slice, tries to move them to the correct slice (expecting they > are from a non escaping EA). That causes some of the `Store`s to be > disconnected. When the resulting code runs, execution fails as some > fields are not copied. > > The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` > when `src` and `dest` classes don't match as this seems like a rare > enough corner case. That looks reasonable to me. src/hotspot/share/opto/arraycopynode.cpp line 220: > 218: // of the newly allocated cloned object (in dest). Exact type is now known (in src), but type for the cloned object > 219: // (dest) was not updated. When copying fields below, Store nodes may write to offsets for fields that don't exist in > 220: // the inexact class. The stores would then be assigned an incorrect slice. Suggestion: // At parse time, the exact type of the object to clone was not known. That inexact type was captured by the CheckCastPP // of the newly allocated cloned object (in dest). The exact type is now known (in src), but the type for the cloned object // (dest) was not updated. When copying the fields below, Store nodes may write to offsets for fields that don't exist in // the inexact class. The stores would then be assigned an incorrect slice. test/hotspot/jtreg/compiler/arraycopy/TestCloneUnknownClassAtParseTime.java line 47: > 45: } > 46: > 47: static A field; Can you move the field up to the other field? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27604#pullrequestreview-3309935950 PR Review Comment: https://git.openjdk.org/jdk/pull/27604#discussion_r2410520602 PR Review Comment: https://git.openjdk.org/jdk/pull/27604#discussion_r2410498928 From chagedorn at openjdk.org Tue Oct 7 13:01:46 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 Oct 2025 13:01:46 GMT Subject: RFR: 8369236: testlibrary_tests/ir_framework/tests/TestCompileThreshold.java timed out In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 08:26:58 GMT, Albert Mingkun Yang wrote: >> The test `testlibrary_tests/ir_framework/tests/TestCompileThreshold.java` times out intermittently after the timeout factor change (taking more than 120s). On my local machine, I measured around ~105-115s. >> >> The test uses `CompileThreshold=10` which is almost like `Xcomp` and thus quite slow. However, the purpose of this test is not to stress the compiler but actually to verify that passing `CompileThreshold` to the IR framework over jtreg options is properly ignored. Therefore, we can use higher `CompileThreshold` values and achieve the same goal. With the proposed changes, the test finishes in ~10-15s on my local machine. >> >> Thanks, >> Christian > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @dafedafe for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27667#issuecomment-3376785956 From kxu at openjdk.org Tue Oct 7 15:08:49 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 7 Oct 2025 15:08:49 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v22] In-Reply-To: References: <4ws9_5sMsWEVwJUQnjJDUXUAZ6ek-IS0jVpmveOcM9g=.bfd3a412-563d-4e2c-a9eb-4f56271e56a4@github.com> Message-ID: On Tue, 7 Oct 2025 12:19:44 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> Use CONST64() macro > > src/hotspot/share/opto/addnode.cpp line 570: > >> 568: >> 569: // Pattern (4), which is equivalent to a simple addition pattern >> 570: return find_simple_addition_pattern(n, bt); > > Isn't that one redundant with the call to `find_simple_addition_pattern()` in `AddNode::Multiplication::find_collapsible_addition_patterns()`? Yes it is, but `find_power_of_two_addition_pattern()` is also called in `AddNode::Ideal_collapse_variable_times_con()` which requires the simple `n + n` pattern to be detected to avoid repeating idealizing the same nodes without progress. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2410967437 From kxu at openjdk.org Tue Oct 7 16:04:34 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 7 Oct 2025 16:04:34 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v23] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Use constructor call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/893ffa7a..84b79d3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=21-22 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From mdoerr at openjdk.org Tue Oct 7 17:45:38 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 7 Oct 2025 17:45:38 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v4] In-Reply-To: References: Message-ID: > We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. > > We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. > > I've tested this proposal by the following code on x86_64: > > diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp > index a6b4efbe4f2..d715e69c850 100644 > --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp > +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp > @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { > void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { > prepare_to_jump_from_interpreted(); > > + if (UseNewCode) { > + Label ok; > + movptr(temp, Address(method, Method::from_interpreted_offset())); > + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); > + je(ok); > + movptr(rax, Address(method, Method::from_compiled_offset())); > + movptr(rbx, rax); > + addptr(rbx, 128); > + hlt(); > + bind(ok); > + } > + > if (JvmtiExport::can_post_interpreter_events()) { > Label run_compiled_code; > // JVMTI events, such as single-stepping, are implemented partly by avoiding running > > > The output is (requires hsdis library, otherwise we get a hex dump instead of disassembly): > > RAX=0x00007f3b19000100 is at entry_point+0 in (nmethod*)0x00007f3b19000008 > Compiled method (c1) 2915 1 3 java.lang.Byte::toUnsignedInt (6 bytes) > total in heap [0x00007f3b19000008,0x00007f3b190001f8] = 496 > main code [0x00007f3b19000100,0x00007f3b190001b8] = 184 > stub code [0x00007f3b190001b8,0x00007f3b190001f8] = 64 > mutable data [0x00007f3ab401e0b0,0x00007f3ab401e0e0] = 48 > relocation [0x00007f3ab401e0b0,0x00007f3ab401e0d8] = 40 > metadata [0x00007f3ab401e0d8,0x00007f3ab401e0e0] = 8 > immutable data [0x00007f3ab401dcd0,0x00007f3ab401dd30] = 96 > dependencies [0x00007f3ab401dcd0,0x00007f3ab401dcd8] = 8 > scopes pcs [0x00007f3ab401dcd8,0x00007f3ab401dd18] = 64 > ... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Use frame_complete_offset for better start address computation. Improve comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27530/files - new: https://git.openjdk.org/jdk/pull/27530/files/4a05d40f..81dd1c8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27530&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27530&range=02-03 Stats: 14 lines in 1 file changed: 12 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27530/head:pull/27530 PR: https://git.openjdk.org/jdk/pull/27530 From kxu at openjdk.org Tue Oct 7 18:06:07 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 7 Oct 2025 18:06:07 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v20] In-Reply-To: References: Message-ID: On Mon, 6 Oct 2025 14:12:25 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/addnode.cpp line 524: >> >>> 522: if (n->Opcode() == Op_Mul(bt) && (n->in(1)->is_Con() || n->in(2)->is_Con())) { >>> 523: // Pattern (1) >>> 524: Node* con = n->in(1); >> >> Isn't con always input 2 because `MulNode::Ideal` canonicalize it? > > I think you're right. An assertion is also added to verify this. Upon further investigation, I don't think this is necessarily true. With `compiler/c2/Test6636138_1` I'm seeing 2 1052 MulI === _ 55 1040 [[ 1053 ]] 1 1053 ConvI2L === _ 1052 [[ 1054 ]] #long:minint..maxint, 0u..maxulong 1 1046 ConL === 0 [[ 1047 1054 ]] #long:4 0 1054 MulL === _ 1046 1053 [[ 1055 ]] which is not idealized. I'm reverting this change. Please let me know if you think a input node is necessarily already idealized when the currently addnode is being idealized and I'm potentially missing something. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2411476370 From kxu at openjdk.org Tue Oct 7 18:14:47 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 7 Oct 2025 18:14:47 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v24] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with two additional commits since the last revision: - Remove assertion - Refine assertion condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/84b79d3e..a05e7408 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=22-23 Stats: 13 lines in 1 file changed: 7 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From acobbs at openjdk.org Tue Oct 7 19:40:35 2025 From: acobbs at openjdk.org (Archie Cobbs) Date: Tue, 7 Oct 2025 19:40:35 GMT Subject: RFR: 5038439: Warning message for literal shift amounts outside the canonical domain Message-ID: When bit shifting an `int` or `long` value by an amount `X`, all but the last 5 or 6 (respectively) bits of `X` are ignored. This can create a trap for the unwary, as in this example: public long readLongBigEndian(byte[] buf, int offset) { return ((buf[offset + 0] & 0xff) << 56) // BUG HERE | ((buf[offset + 1] & 0xff) << 48) // BUG HERE | ((buf[offset + 2] & 0xff) << 40) // BUG HERE | ((buf[offset + 3] & 0xff) << 32) // BUG HERE | ((buf[offset + 4] & 0xff) << 24) | ((buf[offset + 5] & 0xff) << 16) | ((buf[offset + 6] & 0xff) << 8) | ((buf[offset + 7] & 0xff); } This PR adds a new warning when the compiler detects an out-of-range bit shift, i.e., an `int` bit shift not in the range `[0...31]` or a `long` bit shift not in the range `[0...63]`. ------------- Commit messages: - Merge branch 'master' into JDK-5038439 to fix conflict. - Add "long" as a supported message parameter type. - Use "bit(s)" instead of "bits" where value could be 1. - Merge branch 'master' into JDK-5038439 - Sprinkle more variety into the regression test. - Minor diff cleanup. - Update "lossy-conversions" description in compiler module Javadoc. - Warn for bit shifts using an out-of-range shift amount. Changes: https://git.openjdk.org/jdk/pull/27102/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27102&range=00 Issue: https://bugs.openjdk.org/browse/JDK-5038439 Stats: 194 lines in 14 files changed: 184 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/27102.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27102/head:pull/27102 PR: https://git.openjdk.org/jdk/pull/27102 From epeter at openjdk.org Wed Oct 8 03:09:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Oct 2025 03:09:22 GMT Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole loop instead of just the basic block [v2] In-Reply-To: <5zLWoCC7_s5VBF435fL1hk_m9vsk5JQrdZ1tEipatFo=.bc502b75-7074-4923-8dce-d367eb1b71af@github.com> References: <5zLWoCC7_s5VBF435fL1hk_m9vsk5JQrdZ1tEipatFo=.bc502b75-7074-4923-8dce-d367eb1b71af@github.com> Message-ID: On Wed, 17 Sep 2025 14:37:00 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> for Manuel > > Thank you for addressing my comments and answering my question. Bar the new typo, this looks good to me. @mhaessig @rwestrel Thanks for the reviews! @galderz Thanks for having a look as well :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27208#issuecomment-3379403068 From epeter at openjdk.org Wed Oct 8 03:09:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Oct 2025 03:09:23 GMT Subject: Integrated: 8367389: C2 SuperWord: refactor VTransform to model the whole loop instead of just the basic block In-Reply-To: References: Message-ID: <1J48fbEViK3jOZ5qJi5C0EOMKAbbEFh7Y3Mq-JvHyVA=.6e2b2219-ff18-421d-8c8a-801dbb970752@github.com> On Thu, 11 Sep 2025 06:52:19 GMT, Emanuel Peter wrote: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > ------------------------------ > > **Goals** > - VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop) > - Remove `_nodes` from the vector vtnodes. > > **Details** > - Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`. > - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states. > - Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi). > - Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation). > - `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes. > > I also made a lot of annotations in the code below, for easier review. > > **Suggested order for review** > - Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly. > - Old and new code for `VLoopMemorySlices` -> we now also track load-only slices. > - `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop` > - `VTransformApplyState`: how it now tracks the memory state. > - `VTransformVectorNode` -> removal of `_nodes` (Big Win!) > - Then look at all the other details. This pull request has now been integrated. Changeset: 2ac24bf1 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/2ac24bf1bac9c32704ebd72b93a75819b9404063 Stats: 690 lines in 10 files changed: 364 ins; 243 del; 83 mod 8367389: C2 SuperWord: refactor VTransform to model the whole loop instead of just the basic block Reviewed-by: roland, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/27208 From rrich at openjdk.org Wed Oct 8 06:08:51 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 8 Oct 2025 06:08:51 GMT Subject: RFR: 8369257: PPC: compiler/whitebox/RelocateNMethodMultiplePaths.java fails with assertion Message-ID: Relax assertion in NativePostCallNop::patch(). With NMethodRelocation (see [JDK-8369257](https://bugs.openjdk.org/browse/JDK-8369257)) it cannot be expected that the post call nop to be patched is still clean. compiler/whitebox/RelocateNMethodMultiplePaths.java succeeds in local testing on PPC. ------------- Commit messages: - Expect already patched NativePostCallNop with NMethodRelocation Changes: https://git.openjdk.org/jdk/pull/27669/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27669&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369257 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27669/head:pull/27669 PR: https://git.openjdk.org/jdk/pull/27669 From dfenacci at openjdk.org Wed Oct 8 06:53:06 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 8 Oct 2025 06:53:06 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 11:03:13 GMT, Christian Hagedorn wrote: > The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. > > #### Reduce Execution Time by not Executing the Scenarios > I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. > > To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. > > #### Changes > - Verification without actually running scenarios. > - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. > - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. > - Refactored the test a little more. > - Refactored some small things in `addCrossProductScenarios()` while looking at it. > - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. > > #### Execution Time Comparison > Measured on my local machine: > - Mainline: ~80s > - With patch: ~2-3s > > Thanks, > Christian Cool trick with reflection ? Thanks @chhagedorn! LGTM Just a quick question to double-check: did you run some testing (just because I couldn't find it mentioned anywhere and I noticed that IR test framework scenarios are used a bit all around)? ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/27672#pullrequestreview-3313239251 From chagedorn at openjdk.org Wed Oct 8 07:06:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 8 Oct 2025 07:06:06 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out In-Reply-To: References: Message-ID: <8KnBBsUhA_PTstxGtEef7Qdfz_-wZwbhY9AQlrS-UJA=.8983895c-01dd-4600-86c0-bc2ca64de8bc@github.com> On Tue, 7 Oct 2025 11:03:13 GMT, Christian Hagedorn wrote: > The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. > > #### Reduce Execution Time by not Executing the Scenarios > I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. > > To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. > > #### Changes > - Verification without actually running scenarios. > - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. > - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. > - Refactored the test a little more. > - Refactored some small things in `addCrossProductScenarios()` while looking at it. > - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. > > #### Execution Time Comparison > Measured on my local machine: > - Mainline: ~80s > - With patch: ~2-3s > > Thanks, > Christian Thanks Damon for your review! Yes, I forgot to mention that: I ran the test through t1-5 + precheckin-comp + stress :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27672#issuecomment-3380043323 From dfenacci at openjdk.org Wed Oct 8 07:18:07 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 8 Oct 2025 07:18:07 GMT Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v4] In-Reply-To: References: Message-ID: On Tue, 9 Sep 2025 22:47:32 GMT, Dean Long wrote: > LGTM, but let's wait for @vnkozlov to approve it. @vnkozlov, would you mind having a look when you get a chance? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26556#issuecomment-3380076340 From mdoerr at openjdk.org Wed Oct 8 08:32:24 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 8 Oct 2025 08:32:24 GMT Subject: RFR: 8369257: PPC: compiler/whitebox/RelocateNMethodMultiplePaths.java fails with assertion In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 09:21:16 GMT, Richard Reingruber wrote: > Relax assertion in NativePostCallNop::patch(). With NMethodRelocation (see [JDK-8369257](https://bugs.openjdk.org/browse/JDK-8369257)) it cannot be expected that the post call nop to be patched is still clean. > > compiler/whitebox/RelocateNMethodMultiplePaths.java succeeds in local testing on PPC. Looks good and trivial. Thanks for fixing it! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27669#pullrequestreview-3313633114 From dlong at openjdk.org Wed Oct 8 08:46:05 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 Oct 2025 08:46:05 GMT Subject: RFR: 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache [v2] In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 15:02:55 GMT, Ramkumar Sunderbabu wrote: >> MethodHandle invocations with Xcomp are filling up CodeCache quickly in the test, especially in machines with high number of processors. >> It is possible to measure code cache consumption per invocation, estimate overall consumption and bail out before CodeCache runs out of memory. >> But it is much simpler to exclude the test for Xcomp flag. >> >> Additional Change: MethodHandles.lookup was done unnecessarily invoked for all iterations. Replaced it with single invocation. >> >> PS: This issue is not seen in JDK 20 and above, possibly due to JDK-8290025, but the exclusion guards against vagaries of CodeCache management. > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comment I think I may have a fix for this issue. What do you think about adding this test to ProblemList-Xcomp.txt instead of changing the test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26840#issuecomment-3380428782 From roland at openjdk.org Wed Oct 8 09:00:34 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 8 Oct 2025 09:00:34 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v2] In-Reply-To: References: Message-ID: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> > In the `test1()` method of the test case: > > `inlined2()` calls `clone()` for an object loaded from field `field` > that has inexact type `A` at parse time. The intrinsic for `clone()` > inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the > load of `field` is optimized out because it reads back a newly > allocated `B` written to `field` in the same method. `ArrayCopy` can > now be optimized because the type of its `src` input is known. The > type of its `dest` input is the `CheckCastPP` from the allocation of > the cloned object created at parse time. That one has type `A`. A > series of `Load`s/`Store`s are created to copy the fields of class `B` > from `src` (of type `B`) to `dest` of (type `A`). > > Writting to `dest` with offsets for fields that don't exist in `A`, > causes this code in `Compile::flatten_alias_type()`: > > > } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { > // Static fields are in the space above the normal instance > // fields in the java.lang.Class instance. > if (ik != ciEnv::current()->Class_klass()) { > to = nullptr; > tj = TypeOopPtr::BOTTOM; > offset = tj->offset(); > } > > > to assign it some slice that doesn't match the one that's used at the > same offset in `B`. > > That causes an assert in `ArrayCopyNode::try_clone_instance()` to > fire. With a release build, execution proceeds. `test1()` also has a > non escaping allocation. That one causes EA to run and > `ConnectionGraph::split_unique_types()` to move the store to the non > escaping allocation to a new slice. In the process, when it iterates > over `MergeMem` nodes, it notices the stores added by > `ArrayCopyNode::try_clone_instance()`, finds that some are not on the > right slice, tries to move them to the correct slice (expecting they > are from a non escaping EA). That causes some of the `Store`s to be > disconnected. When the resulting code runs, execution fails as some > fields are not copied. > > The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` > when `src` and `dest` classes don't match as this seems like a rare > enough corner case. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - review - Merge branch 'master' into JDK-8339526 - Update src/hotspot/share/opto/arraycopynode.cpp Co-authored-by: Christian Hagedorn - test & fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27604/files - new: https://git.openjdk.org/jdk/pull/27604/files/cc69d43c..b6652e04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27604&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27604&range=00-01 Stats: 184502 lines in 2410 files changed: 144763 ins; 24626 del; 15113 mod Patch: https://git.openjdk.org/jdk/pull/27604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27604/head:pull/27604 PR: https://git.openjdk.org/jdk/pull/27604 From duke at openjdk.org Wed Oct 8 09:02:54 2025 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane Ampudia) Date: Wed, 8 Oct 2025 09:02:54 GMT Subject: Integrated: 8368780: IGV: Upgrade to Netbeans Platform 27 In-Reply-To: References: Message-ID: <9ffDlAkD9qPKsxkimPf_PXBHQ_RnA6ry6JPLhsYVob8=.c85e6c98-918c-496a-ae56-c27b3f68ec0e@github.com> On Tue, 30 Sep 2025 13:57:23 GMT, Ant?n Seoane Ampudia wrote: > This PR upgrades IGV and its dependencies to the newest Netbeans Platform 27, released on August 21, 2025. It also supports running the latest (LTS) JDK 25. > > It has been tested that IGV still behaves as expected after the upgrade. This pull request has now been integrated. Changeset: f58e17fd Author: Ant?n Seoane Ampudia Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/f58e17fd27e868e4a8816befc4c4bb8946c1f7fd Stats: 11 lines in 3 files changed: 4 ins; 0 del; 7 mod 8368780: IGV: Upgrade to Netbeans Platform 27 Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/27579 From mdoerr at openjdk.org Wed Oct 8 10:44:10 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 8 Oct 2025 10:44:10 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods In-Reply-To: <8UzfbVuWg89u7-Ow-2ZJcSj4wIQGpvbqiz8MtqEPbu8=.2c0a6c7a-0286-4dae-b2e9-392720e5638c@github.com> References: <8UzfbVuWg89u7-Ow-2ZJcSj4wIQGpvbqiz8MtqEPbu8=.2c0a6c7a-0286-4dae-b2e9-392720e5638c@github.com> Message-ID: On Mon, 6 Oct 2025 13:18:01 GMT, Aleksey Shipilev wrote: >> Thanks for looking at this PR! >> Hotspot currently dumps code (hex or disassembled) when the nmethod is on stack of the crashing thread. That is completely missing when it's not on stack. >> >> Should we print both, hex dump and disassembly? >> >> Interesting. I haven't tried with ZGC. Did you find more relocations which don't point to an instruction start? >> We could ignore relocations with format `ZBarrierRelocationFormatStoreGoodAfterMov` on x86. (Or find the correct start in this case.) E.g. we could fix it like this: >> >> diff --git a/src/hotspot/share/code/codeBlob.cpp b/src/hotspot/share/code/codeBlob.cpp >> index 6511b4689ed..2e4d49a81c1 100644 >> --- a/src/hotspot/share/code/codeBlob.cpp >> +++ b/src/hotspot/share/code/codeBlob.cpp >> @@ -52,6 +52,9 @@ >> #ifdef COMPILER1 >> #include "c1/c1_Runtime1.hpp" >> #endif >> +#if defined(AMD64) && INCLUDE_ZGC >> +#include "gc/z/zBarrierSetAssembler.hpp" >> +#endif >> >> #include >> >> @@ -919,6 +922,10 @@ void CodeBlob::dump_for_addr(address addr, outputStream* st, bool verbose) const >> // disassemble correctly at instruction start addresses.) >> RelocIterator iter(nm, start); >> while (iter.next() && iter.addr() < addr) { // find relocation before addr >> +#if defined(AMD64) && INCLUDE_ZGC >> + // There's a relocation which doesn't point to an instruction start: >> + if ((iter.type() != relocInfo::barrier_type) || (iter.format() != ZBarrierRelocationFormatStoreGoodAfterMov)) >> +#endif >> start = iter.addr(); >> } >> if (iter.has_current()) { > >> Hotspot currently dumps code (hex or disassembled) when the nmethod is on stack of the crashing thread. That is completely missing when it's not on stack. [...] Should we print both, hex dump and disassembly? > > Yes, I think if we know the location is within nmethod, it makes sense to dump around the location. > > I think hex dump is most bullet-proof, as we can always disassemble offline it at different offsets. I don't think we want to specialize for reloc types, it does not gain us much? Also, relocs solve the variable-sized encoding only if you are lucky to hit the reloc right at the location you are decoding, right? Anything in between relocs is still pretty foggy. I suspect current patch would work in 99% of the cases, as it is hard to imagine e.g. the value in the register that points into nmethod and _does not_ have some sort of reloc. > > Then I also suspect that disassemblers actually able to figure the instruction boundaries pretty well? Because I don't quite see how our usual printout of `decode(pc - 64, pc + 64)` would otherwise work: `pc-64` starts at arbitrary boundary. You might want to check if this whole reloc thing is even needed. What happens if we just do `Disassembler::decode(MAX2(nm->entry_point(), addr - 64), MIN2(nm->code_end(), addr + 64))`? @shipilev: I have made some improvements after your feedback. Please take another look! Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27530#issuecomment-3380913610 From Fei.Gao2 at arm.com Wed Oct 8 10:44:06 2025 From: Fei.Gao2 at arm.com (Fei Gao) Date: Wed, 8 Oct 2025 10:44:06 +0000 Subject: Leverage profiled compiled size to avoid aggressive inlining and code growth In-Reply-To: <8e23a680-cc52-4b43-8620-f74b982f58a1@oracle.com> References: <8e23a680-cc52-4b43-8620-f74b982f58a1@oracle.com> Message-ID: Hi Vladimir, Thank you for your valuable feedback. It?s very helpful. I?d like to clarify the intended purpose of the AOT cache. Is it primarily designed to preserve consistent compilation behavior between the training and production runs to avoid redundant work? Or is there also a plan to further optimize the production run using the profiling information stored in the AOT cache? I?d appreciate your insights. Thanks! Best regards, Fei From: Vladimir Kozlov Date: Friday, 26 September 2025 at 16:40 To: Fei Gao , leyden-dev , hotspot-compiler-dev at openjdk.org Subject: Re: Leverage profiled compiled size to avoid aggressive inlining and code growth Hi Fei, I think you stumble on `InlineSmallCode` (1000 or 1500) issues. The flag is used exactly for filter out inlining of previously big compiled code. But it not always helps - for example, if the method is inlined some paths could be removed due to constants (exact klass) propagation or EA can eliminate some allocations. We know about such limitation and have numerous RFEs to improve it. On other hand, FreqInlineSize and MaxInlineSize flags are based on bytecode size of method. This are more stable. Note, AOT profiling caching also preserves inlining decisions for C2 which is used during JIT compilation in production run to reproduce compilation decisions in training run. We don't advise to use JSON. Please, store information in AOT cache instead. Regards, Vladimir K On 9/26/25 8:15 AM, Fei Gao wrote: > Post to hotspot-compiler-dev at openjdk.org dev at openjdk.org> instead of hotspot-compiler-dev at openjdk.java.net > . > > Sorry for the repetition. > > *From: *Fei Gao > *Date: *Friday, 26 September 2025 at 14:52 > *To: *leyden-dev , hotspot compiler compiler-dev at openjdk.java.net> > *Subject: *Leverage profiled compiled size to avoid aggressive inlining > and code growth > > Hi @leyden-dev and @hotspot compiler > , > > *TL;DR* > > ** > > *https://github.com/openjdk/jdk/pull/27527* jdk/pull/27527> > > I proposed a PoC that explores leveraging profiled compiled sizes to > improve C2 inlining decisions and mitigate code bloat. The approach > records method sizes during a pre-run and feeds them back via compiler > directives, helping to reduce aggressive inlining of large methods. > > Testing on Renaissance and SPECjbb2015 showed clear code size > differences but no significant performance impact on either AArch64 or > x86. An alternative AOT-cache-based approach was also evaluated but did > not produce meaningful code size changes. > > Open questions remain about the long-term value of profiling given > Project Leyden's direction of caching compiled code in AOT, and whether > global profiling information could help C2 make better inlining decisions. > > *1. Motivation* > > In the current C2 behavior, the inliner only considers the estimated > inlined size [1] [2] of a callee if the method has already been compiled > by C2. In particular, C2 will explicitly reject inlining in the > following cases: > > Hot methods with bytecode size > FreqInlineSize (325) [3] > > Cold methods with bytecode size > MaxInlineSize (35) > > However, a common situation arises where a method's bytecode size is > below 325, yet once compiled by C2 it produces a very large machine code > body. If this method has not been compiled at the time its caller is > being compiled, the inliner may aggressively inline it, potentially > bloating the caller, even though an independent compiled copy might > eventually exist. > > To mitigate such cases, we can make previously profiled compiled sizes > available early, allowing the inliner to make more informed decisions > and reduce excessive code growth. > > [1] https://github.com/openjdk/jdk/ > blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ > bytecodeInfo.cpp#L180 blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ > bytecodeInfo.cpp#L180> > > [2] https://github.com/openjdk/jdk/ > blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ > bytecodeInfo.cpp#L274 blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ > bytecodeInfo.cpp#L274> > > [3] https://github.com/openjdk/jdk/ > blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ > bytecodeInfo.cpp#L184 blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ > bytecodeInfo.cpp#L184> > > *2. Proof of Concept* > > To validate this idea, I created a proof-of-concept: *https:// > github.com/openjdk/jdk/pull/27527* pull/27527> > > In this PoC: > > 1) A dumping interface was added to record C2-compiled method sizes, > enabled via the `-XX:+PrintOptoMethodSize` flag. > > 2) A new attribute was introduced in InlineMatcher: > _inline_instructions_size. This attribute stores the estimated inlined > size of a method, sourced from a compiler directive JSON file generated > during a prior profiling run. > > 3) The inliner was updated to use these previously profiled method sizes > to prevent aggressive inlining of large methods. > > *3. How to Use* > > To apply this approach to any workload, the workload must be run twice: > > 1) Pre-run: collect inlined sizes for all C2-compiled methods. > > 2) Product run: use the profiled method sizes to improve C2 inlining. > > Step 1 Profile method size (pre-run) > > Log the compiled method size: > > `-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX: > +PrintOptoMethodSize -XX:LogFile=logmethodsize.out` This will generate a > log containing method size information from C2. > > Step 2 Generate the compiler directive file > > Use the provided Python script to extract method size info and generate > a JSON file: > > `python3 extract_size_to_directives.py logmethodsize.out > output_directives.json` > > This file contains estimated inlined sizes to guide inlining decisions > in product run. If the same method is compiled multiple times, the > script conservatively retains the smallest observed size. > > Note: Methods that are not accepted by the CompilerDirective format need > to be excluded. > > Step 3 Use the compiler directive in a product run > > Pass the generated JSON to the JVM as a directive: > > `-XX:+UnlockDiagnosticVMOptions - > XX:CompilerDirectivesFile=output_directives.json` > > This enables the inliner to make decisions using previously profiled > method sizes, potentially avoiding aggressive inlining of large methods. > > Note: The patch reuses the existing `inline` directive attribute for > inlining control. If multiple inline rules match the same method, only > the first match is effective. > > *4. Testing* > > I tested the following workloads using the method above and measured the > code cache size with `-XX:+PrintCodeCache`. The results are shown below, > compared against the mainline code. All statistics (min, max, median, > mean) are based on three runs. > > (patch - mainline) / mainline > > 1) Renaissance.dotty > > Code size change: > > AArch64: > > ``` > > used min max median mean > > non-profiled -9.88% -8.13% -8.92% -8.98% > > profiled -0.73% -0.21% -0.40% -0.45% > > non-nmethods -15.20% -0.02% -14.92% -10.32% > > codecache -2.82% -2.88% -2.97% -2.89% > > max_used min max median mean > > non-profiled -9.88% -8.13% -8.92% -8.98% > > profiled 2.37% 1.41% 1.50% 1.76% > > non-nmethods -0.95% -1.73% -0.93% -1.21% > > codecache -0.35% -1.00% -0.95% -0.77% > > ``` > > X86: > > ``` > > used min max median mean > > non-profiled -9.72% -9.61% -9.36% -9.56% > > profiled -0.81% -0.90% -1.15% -0.95% > > non-nmethods -0.04% 0.04% -0.02% -0.01% > > codecache -2.94% -2.96% -3.11% -3.00% > > max_used min max median mean > > non-profiled -9.72% -9.61% -9.36% -9.56% > > profiled 2.32% 2.60% 2.51% 2.48% > > non-nmethods -0.63% -2.25% -1.28% -1.39% > > codecache -0.68% -0.59% -0.70% -0.66% > > ``` > > No significant performance changes were observed on either platform. > > 2) SPECjbb 2015 > > Code size change: > > AArch64: > > ``` > > used min max median mean > > non-profiled -1.00% -11.68% -12.73% -8.62% > > profiled 9.07% -6.93% -2.34% -0.29% > > non-nmethods 0.02% -0.02% 0.00% 0.00% > > codecache 2.98% -7.18% -5.35% -3.28% > > max_used min max median mean > > non-profiled -10.85% -11.68% -12.73% -11.76% > > profiled -2.09% -11.65% -1.26% -5.62% > > non-nmethods 0.13% -1.21% -0.16% -0.41% > > codecache -6.42% -6.33% -6.10% -6.29% > > ``` > > On the AArch64 platform, no significant performance changes were > observed for either high-bound IR or max jOPS. > > For critical jOPS: > > ``` > > Min Median Mean Max Var% > > -2.45% -1.87% -2.45% -3.00% 1.9% > > ``` > > X86: > > ``` > > used min max median mean > > non-profiled -9.02% -9.65% -7.93% -8.87% > > profiled -6.09% -3.18% -4.52% -4.61% > > non-nmethods -0.02% 0.25% 0.04% 0.09% > > codecache -5.36% -4.75% -4.58% -4.90% > > max_used min max median mean > > non-profiled -4.03% -9.65% -7.93% -7.23% > > profiled -2.86% 1.16% -1.03% -0.93% > > non-nmethods 0.02% -0.08% 0.08% 0.01% > > codecache -0.23% -4.20% -3.70% -2.73% > > ``` > > No significant performance change was observed on x86 platform. > > *5. AOT cache* > > The current procedure above requires three steps: > > a pre-run to record method sizes, > > a separate step to process the JSON file, > > and finally a product run using the profiled method sizes. > > This workflow may add extra burden to workload deployment. > > With JEP 515 [4], we can instead store the estimated inlined size in the > AOT cache when ciMethod::inline_instructions_size() is called during the > premain run, and later load this size from the AOT cache during the > product run [5]. > > The store-load mechanism for inlined size can help reduce the overhead > of recomputing actual sizes, but it does not provide the inliner with > much additional information about the callee, since the compilation > order in the product run generally follows that of the premain run, even > if not exactly. > > To give the inliner more profiled information about callees, I tried > another simple draft that records inlined sizes for more C2-compiled > methods: > > https://github.com/openjdk/jdk/pull/27519/commits/ > ef5e61f3d68ad565ee11e2cc6aa57b6e2697ae6d jdk/pull/27519/commits/ef5e61f3d68ad565ee11e2cc6aa57b6e2697ae6d> > > However, with this draft using the AOT cache, I did not observe any > significant code size changes for any workloads. This may require > further investigation. > > [4] https://openjdk.org/jeps/515 > > [5] https://github.com/openjdk/jdk/ > blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/ci/ > ciMethod.cpp#L1152 blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/ci/ > ciMethod.cpp#L1152> > > *6 Questions* > > 1) Relation to Project Leyden > > Project Leyden aims to enhance the AOT cache to store compiled code from > training runs [6]. This suggests that we may eventually prefer to cache > compiled code directly from the AOT cache rather than rely solely on JIT > compilation. > > Given this direction, is it still worthwhile to invest further in using > profiled method sizes as a means to improve inlining heuristics? > > Could such profiling provide complementary benefits even if compiled > code is cached? > > 2) Global profiling information for C2 > > Should we consider leveraging profiled information stored in the AOT > cache to give the C2 inliner a broader, more global view of methods, > enabling better inlining decisions? > > For example, could global visibility into method sizes and call sites > help address pathological cases of code bloat or missed optimization > opportunities? [7] > > [6] https://openjdk.org/jeps/8335368 > > [7] https://wiki.openjdk.org/display/hotspot/inlining wiki.openjdk.org/display/hotspot/inlining> > > I'd greatly appreciate any feedback. Thank you for your time and > consideration. > > Thanks, > > Fei > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy > the information in any medium. Thank you. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Wed Oct 8 10:57:34 2025 From: duke at openjdk.org (Lei Zhu) Date: Wed, 8 Oct 2025 10:57:34 GMT Subject: RFR: 8364346: Typo in IR framework README Message-ID: Fix typo errors. ------------- Commit messages: - 8364346: Typo in IR framework README Changes: https://git.openjdk.org/jdk/pull/27688/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27688&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364346 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27688.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27688/head:pull/27688 PR: https://git.openjdk.org/jdk/pull/27688 From thartmann at openjdk.org Wed Oct 8 11:02:02 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Oct 2025 11:02:02 GMT Subject: RFR: 8364346: Typo in IR framework README In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 10:50:43 GMT, Lei Zhu wrote: > Fix typo errors. As the bug report describes, the duplicated `-DExcludeRandom=true` in the README should be removed as well. ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27688#pullrequestreview-3314273531 From chagedorn at openjdk.org Wed Oct 8 11:37:20 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 8 Oct 2025 11:37:20 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java Message-ID: The test `testlibrary_tests/ir_framework/tests/TestDFlags.java` runs 11 separate IR framework runs by enabling only a single property/D flag in each run. This seems like a waste of resources for just a sanity run (we don't do any additional verification). Moreover, some newer property flags are missing. I suggest to cut it down to 2 runs: - IR verification enabled (i.e. -DVerifyIR=true, which is the default) - IR verification disabled (i.e. -DVerifyIR=false) In both runs we simultaneously set all property flags to some non-default value as a sanity test. This reduces the test execution time from around 20-30s down to 3-4s. Thanks, Christian ------------- Commit messages: - 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java Changes: https://git.openjdk.org/jdk/pull/27690/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27690&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369423 Stats: 20 lines in 2 files changed: 7 ins; 3 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/27690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27690/head:pull/27690 PR: https://git.openjdk.org/jdk/pull/27690 From chagedorn at openjdk.org Wed Oct 8 11:42:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 8 Oct 2025 11:42:14 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v2] In-Reply-To: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> References: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> Message-ID: On Wed, 8 Oct 2025 09:00:34 GMT, Roland Westrelin wrote: >> In the `test1()` method of the test case: >> >> `inlined2()` calls `clone()` for an object loaded from field `field` >> that has inexact type `A` at parse time. The intrinsic for `clone()` >> inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the >> load of `field` is optimized out because it reads back a newly >> allocated `B` written to `field` in the same method. `ArrayCopy` can >> now be optimized because the type of its `src` input is known. The >> type of its `dest` input is the `CheckCastPP` from the allocation of >> the cloned object created at parse time. That one has type `A`. A >> series of `Load`s/`Store`s are created to copy the fields of class `B` >> from `src` (of type `B`) to `dest` of (type `A`). >> >> Writting to `dest` with offsets for fields that don't exist in `A`, >> causes this code in `Compile::flatten_alias_type()`: >> >> >> } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { >> // Static fields are in the space above the normal instance >> // fields in the java.lang.Class instance. >> if (ik != ciEnv::current()->Class_klass()) { >> to = nullptr; >> tj = TypeOopPtr::BOTTOM; >> offset = tj->offset(); >> } >> >> >> to assign it some slice that doesn't match the one that's used at the >> same offset in `B`. >> >> That causes an assert in `ArrayCopyNode::try_clone_instance()` to >> fire. With a release build, execution proceeds. `test1()` also has a >> non escaping allocation. That one causes EA to run and >> `ConnectionGraph::split_unique_types()` to move the store to the non >> escaping allocation to a new slice. In the process, when it iterates >> over `MergeMem` nodes, it notices the stores added by >> `ArrayCopyNode::try_clone_instance()`, finds that some are not on the >> right slice, tries to move them to the correct slice (expecting they >> are from a non escaping EA). That causes some of the `Store`s to be >> disconnected. When the resulting code runs, execution fails as some >> fields are not copied. >> >> The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` >> when `src` and `dest` classes don't match as this seems like a rare >> enough corner case. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8339526 > - Update src/hotspot/share/opto/arraycopynode.cpp > > Co-authored-by: Christian Hagedorn > - test & fix Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27604#pullrequestreview-3314416156 From roland at openjdk.org Wed Oct 8 11:55:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 8 Oct 2025 11:55:06 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v6] In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 16:05:52 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix test options Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25284#pullrequestreview-3314458185 From duke at openjdk.org Wed Oct 8 11:58:25 2025 From: duke at openjdk.org (Lei Zhu) Date: Wed, 8 Oct 2025 11:58:25 GMT Subject: RFR: 8364346: Typo in IR framework README [v2] In-Reply-To: References: Message-ID: > Fix typo errors. Lei Zhu has updated the pull request incrementally with one additional commit since the last revision: 8364346: Typo in IR framework README ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27688/files - new: https://git.openjdk.org/jdk/pull/27688/files/89e44f97..8272a95a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27688&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27688&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27688.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27688/head:pull/27688 PR: https://git.openjdk.org/jdk/pull/27688 From duke at openjdk.org Wed Oct 8 12:02:26 2025 From: duke at openjdk.org (Lei Zhu) Date: Wed, 8 Oct 2025 12:02:26 GMT Subject: RFR: 8364346: Typo in IR framework README [v2] In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 10:59:44 GMT, Tobias Hartmann wrote: > As the bug report describes, the duplicated `-DExcludeRandom=true` in the README should be removed as well. I deleted the second duplicate `-DExcludeRandom=true`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27688#issuecomment-3381182783 From thartmann at openjdk.org Wed Oct 8 12:42:27 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Oct 2025 12:42:27 GMT Subject: RFR: 8364346: Typo in IR framework README [v2] In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 11:58:25 GMT, Lei Zhu wrote: >> Fix typo errors. > > Lei Zhu has updated the pull request incrementally with one additional commit since the last revision: > > 8364346: Typo in IR framework README Looks good to me. Thanks for fixing this. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27688#pullrequestreview-3314628225 From fandreuzzi at openjdk.org Wed Oct 8 12:45:34 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Wed, 8 Oct 2025 12:45:34 GMT Subject: RFR: 8364346: Typo in IR framework README [v2] In-Reply-To: References: Message-ID: <2slUWaBloqGQoZRuaHhdOtWBdQnRf01BXsZr9-nQ_Dc=.b58e0ca4-a3a5-4d5e-a796-905587766871@github.com> On Wed, 8 Oct 2025 11:58:25 GMT, Lei Zhu wrote: >> Fix typo errors. > > Lei Zhu has updated the pull request incrementally with one additional commit since the last revision: > > 8364346: Typo in IR framework README Marked as reviewed by fandreuzzi (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/27688#pullrequestreview-3314639116 From thartmann at openjdk.org Wed Oct 8 13:18:48 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Oct 2025 13:18:48 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 11:30:43 GMT, Christian Hagedorn wrote: > The test `testlibrary_tests/ir_framework/tests/TestDFlags.java` runs 11 separate IR framework runs by enabling only a single property/D flag in each run. This seems like a waste of resources for just a sanity run (we don't do any additional verification). Moreover, some newer property flags are missing. > > I suggest to cut it down to 2 runs: > - IR verification enabled (i.e. -DVerifyIR=true, which is the default) > - IR verification disabled (i.e. -DVerifyIR=false) > > In both runs we simultaneously set all property flags to some non-default value as a sanity test. > > This reduces the test execution time from around 20-30s down to 3-4s. > > Thanks, > Christian Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27690#pullrequestreview-3314776568 From chagedorn at openjdk.org Wed Oct 8 13:42:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 8 Oct 2025 13:42:09 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 11:30:43 GMT, Christian Hagedorn wrote: > The test `testlibrary_tests/ir_framework/tests/TestDFlags.java` runs 11 separate IR framework runs by enabling only a single property/D flag in each run. This seems like a waste of resources for just a sanity run (we don't do any additional verification). Moreover, some newer property flags are missing. > > I suggest to cut it down to 2 runs: > - IR verification enabled (i.e. -DVerifyIR=true, which is the default) > - IR verification disabled (i.e. -DVerifyIR=false) > > In both runs we simultaneously set all property flags to some non-default value as a sanity test. > > This reduces the test execution time from around 20-30s down to 3-4s. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27690#issuecomment-3381598628 From duke at openjdk.org Wed Oct 8 14:36:31 2025 From: duke at openjdk.org (Lei Zhu) Date: Wed, 8 Oct 2025 14:36:31 GMT Subject: RFR: 8364346: Typo in IR framework README [v3] In-Reply-To: References: Message-ID: > Fix typo errors. Lei Zhu has updated the pull request incrementally with one additional commit since the last revision: Update full name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27688/files - new: https://git.openjdk.org/jdk/pull/27688/files/8272a95a..94e6b893 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27688&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27688&range=01-02 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27688.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27688/head:pull/27688 PR: https://git.openjdk.org/jdk/pull/27688 From duke at openjdk.org Wed Oct 8 14:36:34 2025 From: duke at openjdk.org (duke) Date: Wed, 8 Oct 2025 14:36:34 GMT Subject: RFR: 8364346: Typo in IR framework README [v2] In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 11:58:25 GMT, Lei Zhu wrote: >> Fix typo errors. > > Lei Zhu has updated the pull request incrementally with one additional commit since the last revision: > > 8364346: Typo in IR framework README @Korov Your change (at version 94e6b8933671e924eaf24e8e4a1d2774c80d6284) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27688#issuecomment-3381839838 From dfenacci at openjdk.org Wed Oct 8 15:21:56 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 8 Oct 2025 15:21:56 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 11:30:43 GMT, Christian Hagedorn wrote: > The test `testlibrary_tests/ir_framework/tests/TestDFlags.java` runs 11 separate IR framework runs by enabling only a single property/D flag in each run. This seems like a waste of resources for just a sanity run (we don't do any additional verification). Moreover, some newer property flags are missing. > > I suggest to cut it down to 2 runs: > - IR verification enabled (i.e. -DVerifyIR=true, which is the default) > - IR verification disabled (i.e. -DVerifyIR=false) > > In both runs we simultaneously set all property flags to some non-default value as a sanity test. > > This reduces the test execution time from around 20-30s down to 3-4s. > > Thanks, > Christian LGTM otherwise. Thanks @chhagedorn! test/hotspot/jtreg/compiler/lib/ir_framework/README.md line 178: > 176: - `-DWarmup=200`: Provide a new default value of the number of warm-up iterations (framework default is 2000). This might have an influence on the resulting IR and could lead to matching failures (the user can also set a fixed default warm-up value in a test with `testFrameworkObject.setDefaultWarmup(200)`). > 177: - `-DReportStdout=true`: Print the standard output of the test VM. > 178: - `-DVerbose=true`: Enable more fain-grained logging (slows the execution down). Suggestion: - `-DVerbose=true`: Enable more fine-grained logging (slows the execution down). it just caught my eye ? test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestDFlags.java line 23: > 21: * questions. > 22: */ > 23: I just noticed the copyright year ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/27690#pullrequestreview-3315265256 PR Review Comment: https://git.openjdk.org/jdk/pull/27690#discussion_r2414163890 PR Review Comment: https://git.openjdk.org/jdk/pull/27690#discussion_r2414195124 From kvn at openjdk.org Wed Oct 8 15:56:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 8 Oct 2025 15:56:27 GMT Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v4] In-Reply-To: References: Message-ID: On Tue, 9 Sep 2025 15:37:48 GMT, Damon Fenacci wrote: >> # Issue >> While compiling `java.util.zip.ZipFile` in C2 this assert is triggered >> https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235 >> >> # Cause >> While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens: >> * we insert a trailing `MemBarStoreStore` in the constructor >> before_folding >> >> * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. >> after_folding >> >> * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302 >> * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235 >> triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier >> >> The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped). >> >> # Fix >> Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation. >> >> # Testing >> Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after. >> Tier 1-3+ tests passed. > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8360031: add assert condition and make consume method argument escape Yes, looks good. Thank you @dean-long for suggestions and @dafedafe implementation. I agree with this. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26556#pullrequestreview-3315520605 From rrich at openjdk.org Wed Oct 8 16:00:29 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 8 Oct 2025 16:00:29 GMT Subject: RFR: 8369257: PPC: compiler/whitebox/RelocateNMethodMultiplePaths.java fails with assertion In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 09:21:16 GMT, Richard Reingruber wrote: > Relax assertion in NativePostCallNop::patch(). With NMethodRelocation (see [JDK-8369257](https://bugs.openjdk.org/browse/JDK-8369257)) it cannot be expected that the post call nop to be patched is still clean. > > compiler/whitebox/RelocateNMethodMultiplePaths.java succeeds in local testing on PPC. Thanks for the review, Martin. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27669#issuecomment-3382224312 From rrich at openjdk.org Wed Oct 8 16:00:30 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 8 Oct 2025 16:00:30 GMT Subject: Integrated: 8369257: PPC: compiler/whitebox/RelocateNMethodMultiplePaths.java fails with assertion In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 09:21:16 GMT, Richard Reingruber wrote: > Relax assertion in NativePostCallNop::patch(). With NMethodRelocation (see [JDK-8369257](https://bugs.openjdk.org/browse/JDK-8369257)) it cannot be expected that the post call nop to be patched is still clean. > > compiler/whitebox/RelocateNMethodMultiplePaths.java succeeds in local testing on PPC. This pull request has now been integrated. Changeset: 79bcc7b8 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/79bcc7b8ec577dad592dc3f575c15d1bdeb65b19 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8369257: PPC: compiler/whitebox/RelocateNMethodMultiplePaths.java fails with assertion Reviewed-by: mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/27669 From vladimir.kozlov at oracle.com Wed Oct 8 16:13:51 2025 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 8 Oct 2025 09:13:51 -0700 Subject: [External] : Re: Leverage profiled compiled size to avoid aggressive inlining and code growth In-Reply-To: References: <8e23a680-cc52-4b43-8620-f74b982f58a1@oracle.com> Message-ID: <7942a028-c0c1-4dfb-918e-740a3c672ac7@oracle.com> Hi Fei, There are several purposes AOT "profiling" cache has. 1. Skip profiling in Interpreter and tier3 c1 to improve startup. The profiling drives compilation decisions during production run. 2. Record used classes by compiled code during training run (dependencies) to trigger compilation of the method in production run when all recorded classes are initialized. It is also used to load AOT compiled code if it is available (currently only in Leyden repo) 3. Record inlining decision to reproduce quality of compiled code during production run. And may be something else I am missing. Vladimir K On 10/8/25 3:44 AM, Fei Gao wrote: > Hi Vladimir, > > Thank you for your valuable feedback. It?s very helpful. > > I?d like to clarify the intended purpose of the AOT cache. Is it > primarily designed to preserve consistent compilation behavior between > the training and production runs to avoid redundant work? Or is there > also a plan to further optimize the production run using the profiling > information stored in the AOT cache? > > I?d appreciate your insights. Thanks! > > Best regards, > > Fei > > *From: *Vladimir Kozlov > *Date: *Friday, 26 September 2025 at 16:40 > *To: *Fei Gao , leyden-dev , > hotspot-compiler-dev at openjdk.org > *Subject: *Re: Leverage profiled compiled size to avoid aggressive > inlining and code growth > > Hi Fei, > > I think you stumble on `InlineSmallCode` (1000 or 1500) issues. The flag > is used exactly for filter out inlining of previously big compiled code. > But it not always helps - for example, if the method is inlined some > paths could be removed due to constants (exact klass) propagation or EA > can eliminate some allocations. We know about such limitation and have > numerous RFEs to improve it. > > On other hand, FreqInlineSize and MaxInlineSize flags are based on > bytecode size of method. This are more stable. > > Note, AOT profiling caching also preserves inlining decisions for C2 > which is used during JIT compilation in production run to reproduce > compilation decisions in training run. > > We don't advise to use JSON. Please, store information in AOT cache instead. > > Regards, > Vladimir K > > On 9/26/25 8:15 AM, Fei Gao wrote: >> Post to hotspot-compiler-dev at openjdk.org > dev at openjdk.org> instead of hotspot-compiler-dev at openjdk.java.net >> dev at openjdk.java.net>>. >> >> Sorry for the repetition. >> >> *From: *Fei Gao >> *Date: *Friday, 26 September 2025 at 14:52 >> *To: *leyden-dev , hotspot compiler > compiler-dev at openjdk.java.net> >> *Subject: *Leverage profiled compiled size to avoid aggressive inlining >> and code growth >> >> Hi @leyden-dev >and > @hotspot compiler >> dev at openjdk.java.net>>, >> >> *TL;DR* >> >> ** >> >> *https://github.com/openjdk/jdk/pull/27527* __https://github.com/openjdk/jdk/pull/27527*__;Kg!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDn0E8aah$> >> jdk/pull/27527> >> >> I proposed a PoC that explores leveraging profiled compiled sizes to >> improve C2 inlining decisions and mitigate code bloat. The approach >> records method sizes during a pre-run and feeds them back via compiler >> directives, helping to reduce aggressive inlining of large methods. >> >> Testing on Renaissance and SPECjbb2015 showed clear code size >> differences but no significant performance impact on either AArch64 or >> x86. An alternative AOT-cache-based approach was also evaluated but did >> not produce meaningful code size changes. >> >> Open questions remain about the long-term value of profiling given >> Project Leyden's direction of caching compiled code in AOT, and whether >> global profiling information could help C2 make better inlining decisions. >> >> *1. Motivation* >> >> In the current C2 behavior, the inliner only considers the estimated >> inlined size [1] [2] of a callee if the method has already been compiled >> by C2. In particular, C2 will explicitly reject inlining in the >> following cases: >> >>? ??? Hot methods with bytecode size > FreqInlineSize (325) [3] >> >>? ??? Cold methods with bytecode size > MaxInlineSize (35) >> >> However, a common situation arises where a method's bytecode size is >> below 325, yet once compiled by C2 it produces a very large machine code >> body. If this method has not been compiled at the time its caller is >> being compiled, the inliner may aggressively inline it, potentially >> bloating the caller, even though an independent compiled copy might >> eventually exist. >> >> To mitigate such cases, we can make previously profiled compiled sizes >> available early, allowing the inliner to make more informed decisions >> and reduce excessive code growth. >> >> [1] https://github.com/openjdk/jdk/ github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDlwlYXdp$> >> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ >> bytecodeInfo.cpp#L180 github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDlwlYXdp$> >> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ >> bytecodeInfo.cpp#L180> >> >> [2] https://github.com/openjdk/jdk/ github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDlwlYXdp$> >> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ >> bytecodeInfo.cpp#L274 github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDlwlYXdp$> >> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ >> bytecodeInfo.cpp#L274> >> >> [3] https://github.com/openjdk/jdk/ github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDlwlYXdp$> >> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ >> bytecodeInfo.cpp#L184 github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDlwlYXdp$> >> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ >> bytecodeInfo.cpp#L184> >> >> *2. Proof of Concept* >> >> To validate this idea, I created a proof-of-concept: *https:// >> github.com/openjdk/jdk/pull/27527* github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDlwlYXdp$> >> pull/27527> >> >> In this PoC: >> >> 1) A dumping interface was added to record C2-compiled method sizes, >> enabled via the `-XX:+PrintOptoMethodSize` flag. >> >> 2) A new attribute was introduced in InlineMatcher: >> _inline_instructions_size. This attribute stores the estimated inlined >> size of a method, sourced from a compiler directive JSON file generated >> during a prior profiling run. >> >> 3) The inliner was updated to use these previously profiled method sizes >> to prevent aggressive inlining of large methods. >> >> *3. How to Use* >> >> To apply this approach to any workload, the workload must be run twice: >> >> 1) Pre-run: collect inlined sizes for all C2-compiled methods. >> >> 2) Product run: use the profiled method sizes to improve C2 inlining. >> >> Step 1 Profile method size (pre-run) >> >> Log the compiled method size: >> >> `-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX: >> +PrintOptoMethodSize -XX:LogFile=logmethodsize.out` This will generate a >> log containing method size information from C2. >> >> Step 2 Generate the compiler directive file >> >> Use the provided Python script to extract method size info and generate >> a JSON file: >> >> `python3 extract_size_to_directives.py logmethodsize.out >> output_directives.json` >> >> This file contains estimated inlined sizes to guide inlining decisions >> in product run. If the same method is compiled multiple times, the >> script conservatively retains the smallest observed size. >> >> Note: Methods that are not accepted by the CompilerDirective format need >> to be excluded. >> >> Step 3 Use the compiler directive in a product run >> >> Pass the generated JSON to the JVM as a directive: >> >> `-XX:+UnlockDiagnosticVMOptions - >> XX:CompilerDirectivesFile=output_directives.json` >> >> This enables the inliner to make decisions using previously profiled >> method sizes, potentially avoiding aggressive inlining of large methods. >> >> Note: The patch reuses the existing `inline` directive attribute for >> inlining control. If multiple inline rules match the same method, only >> the first match is effective. >> >> *4. Testing* >> >> I tested the following workloads using the method above and measured the >> code cache size with `-XX:+PrintCodeCache`. The results are shown below, >> compared against the mainline code. All statistics (min, max, median, >> mean) are based on three runs. >> >> (patch - mainline) / mainline >> >> 1) Renaissance.dotty >> >> Code size change: >> >> AArch64: >> >> ``` >> >> used?????????? min????? max????? median?? mean >> >> non-profiled?? -9.88%?? -8.13%?? -8.92%?? -8.98% >> >> profiled?????? -0.73%?? -0.21%?? -0.40%?? -0.45% >> >> non-nmethods?? -15.20%? -0.02%?? -14.92%?? -10.32% >> >> codecache????? -2.82%?? -2.88%?? -2.97%?? -2.89% >> >> max_used?????? min????? max????? median?? mean >> >> non-profiled?? -9.88%?? -8.13%?? -8.92%?? -8.98% >> >> profiled?????? 2.37%??? 1.41%??? 1.50%??? 1.76% >> >> non-nmethods?? -0.95%?? -1.73%?? -0.93%?? -1.21% >> >> codecache????? -0.35%?? -1.00%?? -0.95%?? -0.77% >> >> ``` >> >> X86: >> >> ``` >> >> used??????????? min????? max????? median?? mean >> >> non-profiled??? -9.72%?? -9.61%?? -9.36%?? -9.56% >> >> profiled??????? -0.81%?? -0.90%?? -1.15%?? -0.95% >> >> non-nmethods??? -0.04%?? 0.04%??? -0.02%?? -0.01% >> >> codecache?????? -2.94%?? -2.96%?? -3.11%?? -3.00% >> >> max_used??????? min????? max????? median?? mean >> >> non-profiled??? -9.72%?? -9.61%?? -9.36%?? -9.56% >> >> profiled??????? 2.32%??? 2.60%??? 2.51%??? 2.48% >> >> non-nmethods??? -0.63%?? -2.25%?? -1.28%?? -1.39% >> >> codecache?????? -0.68%?? -0.59%?? -0.70%?? -0.66% >> >> ``` >> >> No significant performance changes were observed on either platform. >> >> 2) SPECjbb 2015 >> >> Code size change: >> >> AArch64: >> >> ``` >> >> used?????????? min????? max?????? median??? mean >> >> non-profiled?? -1.00%?? -11.68%?? -12.73%?? -8.62% >> >> profiled?????? 9.07%??? -6.93%??? -2.34%??? -0.29% >> >> non-nmethods?? 0.02%??? -0.02%??? 0.00%???? 0.00% >> >> codecache????? 2.98%??? -7.18%??? -5.35%??? -3.28% >> >> max_used?????? min????? max?????? median??? mean >> >> non-profiled?? -10.85%? -11.68%?? -12.73%?? -11.76% >> >> profiled?????? -2.09%?? -11.65%?? -1.26%?? -5.62% >> >> non-nmethods?? 0.13%??? -1.21%??? -0.16%?? -0.41% >> >> codecache????? -6.42%?? -6.33%??? -6.10%?? -6.29% >> >> ``` >> >> On the AArch64 platform, no significant performance changes were >> observed for either high-bound IR or max jOPS. >> >> For critical jOPS: >> >> ``` >> >> Min????? Median?? Mean???? Max???? Var% >> >> -2.45%?? -1.87%?? -2.45%?? -3.00%? 1.9% >> >> ``` >> >> X86: >> >> ``` >> >> used?????????? min????? max????? median?? mean >> >> non-profiled?? -9.02%?? -9.65%?? -7.93%?? -8.87% >> >> profiled?????? -6.09%?? -3.18%?? -4.52%?? -4.61% >> >> non-nmethods?? -0.02%?? 0.25%??? 0.04%??? 0.09% >> >> codecache????? -5.36%?? -4.75%?? -4.58%?? -4.90% >> >> max_used?????? min????? max????? median?? mean >> >> non-profiled?? -4.03%?? -9.65%?? -7.93%?? -7.23% >> >> profiled?????? -2.86%?? 1.16%??? -1.03%?? -0.93% >> >> non-nmethods?? 0.02%??? -0.08%?? 0.08%??? 0.01% >> >> codecache????? -0.23%?? -4.20%?? -3.70%?? -2.73% >> >> ``` >> >> No significant performance change was observed on x86 platform. >> >> *5. AOT cache* >> >> The current procedure above requires three steps: >> >> a pre-run to record method sizes, >> >> a separate step to process the JSON file, >> >> and finally a product run using the profiled method sizes. >> >> This workflow may add extra burden to workload deployment. >> >> With JEP 515 [4], we can instead store the estimated inlined size in the >> AOT cache when ciMethod::inline_instructions_size() is called during the >> premain run, and later load this size from the AOT cache during the >> product run [5]. >> >> The store-load mechanism for inlined size can help reduce the overhead >> of recomputing actual sizes, but it does not provide the inliner with >> much additional information about the callee, since the compilation >> order in the product run generally follows that of the premain run, even >> if not exactly. >> >> To give the inliner more profiled information about callees, I tried >> another simple draft that records inlined sizes for more C2-compiled >> methods: >> >> https://github.com/openjdk/jdk/pull/27519/commits/ urldefense.com/v3/__https://github.com/openjdk/jdk/pull/27519/commits/ > __;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDhFN2bOd$> >> ef5e61f3d68ad565ee11e2cc6aa57b6e2697ae6d github.com/openjdk/__;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDuvu8dru$> >> jdk/pull/27519/commits/ef5e61f3d68ad565ee11e2cc6aa57b6e2697ae6d> >> >> However, with this draft using the AOT cache, I did not observe any >> significant code size changes for any workloads. This may require >> further investigation. >> >> [4] https://openjdk.org/jeps/515 openjdk.org/jeps/515 > >> >> [5] https://github.com/openjdk/jdk/ github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDlwlYXdp$> >> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/ci/ >> ciMethod.cpp#L1152 github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ! > M62nlXrSUxsVkJzzR2aM2_wFagwlqXpa9gcFibsMd9wEAXfsWQrVvTfl6bA8NSiIzH2U8SF6WdVmDlwlYXdp$> >> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/ci/ >> ciMethod.cpp#L1152> >> >> *6 Questions* >> >> 1) Relation to Project Leyden >> >> Project Leyden aims to enhance the AOT cache to store compiled code from >> training runs [6]. This suggests that we may eventually prefer to cache >> compiled code directly from the AOT cache rather than rely solely on JIT >> compilation. >> >> Given this direction, is it still worthwhile to invest further in using >> profiled method sizes as a means to improve inlining heuristics? >> >> Could such profiling provide complementary benefits even if compiled >> code is cached? >> >> 2) Global profiling information for C2 >> >> Should we consider leveraging profiled information stored in the AOT >> cache to give the C2 inliner a broader, more global view of methods, >> enabling better inlining decisions? >> >> For example, could global visibility into method sizes and call sites >> help address pathological cases of code bloat or missed optimization >> opportunities? [7] >> >> [6] https://openjdk.org/jeps/8335368 > > >> >> [7] https://wiki.openjdk.org/display/hotspot/inlining wiki.openjdk.org/display/hotspot/inlining> > wiki.openjdk.org/display/hotspot/inlining> >> >> I'd greatly appreciate any feedback. Thank you for your time and >> consideration. >> >> Thanks, >> >> Fei >> >> IMPORTANT NOTICE: The contents of this email and any attachments are >> confidential and may also be privileged. If you are not the intended >> recipient, please notify the sender immediately and do not disclose the >> contents to any other person, use it for any purpose, or store or copy >> the information in any medium. Thank you. > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy > the information in any medium. Thank you. From kxu at openjdk.org Wed Oct 8 16:31:49 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 8 Oct 2025 16:31:49 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v25] In-Reply-To: References: Message-ID: <1nIAYsKgvsIyomk5TKNvSmnLOMDj09MgqGtjxWQKt8k=.f36dbaf7-392b-4440-9050-9714c0242ed1@github.com> > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Skip and delay processing add nodes with non-canonicalized inputs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/a05e7408..dd53a45b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=23-24 Stats: 15 lines in 1 file changed: 1 ins; 8 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From dlong at openjdk.org Wed Oct 8 22:43:03 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 Oct 2025 22:43:03 GMT Subject: RFR: 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache [v2] In-Reply-To: References: Message-ID: <7AouBwK7hP4Z28l3wEaLd4Qxcrdz_mcMFccDkQleMvs=.3c7554c1-7de2-48e6-adf3-be75ba0e3d87@github.com> On Tue, 19 Aug 2025 15:02:55 GMT, Ramkumar Sunderbabu wrote: >> MethodHandle invocations with Xcomp are filling up CodeCache quickly in the test, especially in machines with high number of processors. >> It is possible to measure code cache consumption per invocation, estimate overall consumption and bail out before CodeCache runs out of memory. >> But it is much simpler to exclude the test for Xcomp flag. >> >> Additional Change: MethodHandles.lookup was done unnecessarily invoked for all iterations. Replaced it with single invocation. >> >> PS: This issue is not seen in JDK 20 and above, possibly due to JDK-8290025, but the exclusion guards against vagaries of CodeCache management. > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comment Nevermind about problem-listing it, I think this solution is OK for now. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26840#pullrequestreview-3316716616 From epeter at openjdk.org Wed Oct 8 23:06:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Oct 2025 23:06:18 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransforrm::optimize Message-ID: I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964 [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? -------------------------- **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. **Details** Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. **Future Work** - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. ------------- Commit messages: - documentation - better tracing - rm scalar_opcode - a few todos - impl is_alive - fix del_out and phi apply - wip 3 - wip 2 - JDK-8369448 Changes: https://git.openjdk.org/jdk/pull/27704/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369448 Stats: 712 lines in 10 files changed: 371 ins; 336 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27704.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27704/head:pull/27704 PR: https://git.openjdk.org/jdk/pull/27704 From epeter at openjdk.org Wed Oct 8 23:06:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Oct 2025 23:06:22 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransforrm::optimize In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 19:42:38 GMT, Emanuel Peter wrote: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? > > -------------------------- > > **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. > > **Details** > Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. > > Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. > > **Future Work** > - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) > - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. src/hotspot/share/opto/loopnode.cpp line 5298: > 5296: } > 5297: } > 5298: } Note: instead of performing the optimization after auto vectorization, we now perform it during auto vectorization. src/hotspot/share/opto/loopopts.cpp line 4607: > 4605: // reordering of operations (for example float addition/multiplication require > 4606: // strict order). > 4607: void PhaseIdealLoop::move_unordered_reduction_out_of_loop(IdealLoopTree* loop) { Note: moved to `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop` src/hotspot/share/opto/vectornode.cpp line 297: > 295: // Return the scalar opcode for the specified vector opcode > 296: // and basic type. > 297: int VectorNode::scalar_opcode(int sopc, BasicType bt) { Note: no longer needed. We used to have to go back from vectorized reduction to scalar op to get the corresponding element-wise accumulation instruction. Now that we move the reduction out of the loop during auto vectorization, we still have access to the scalar node. src/hotspot/share/opto/vectornode.cpp line 1615: > 1613: } > 1614: > 1615: bool ReductionNode::auto_vectorization_requires_strict_order(int vopc) { Note: we need to know which ones we can move out of the loop, and we can only do that with those that do not require strict order. src/hotspot/share/opto/vtransform.cpp line 43: > 41: ) > 42: > 43: void VTransformGraph::optimize(VTransform& vtransform) { Note: this is similar to IGVN optimization. But we are a bit lazy, and don't care about notifiation / worklist. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2415178920 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2415178137 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2415180713 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2415181617 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2415182559 From epeter at openjdk.org Wed Oct 8 23:12:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Oct 2025 23:12:18 GMT Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025 [v6] In-Reply-To: References: Message-ID: On Fri, 19 Sep 2025 04:05:15 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/gallery/TestNormalMapping.java >> >> Co-authored-by: Andrey Turbanov >> - Update test/hotspot/jtreg/compiler/gallery/NormalMapping.java >> >> Co-authored-by: Christian Hagedorn > > Great demo! I run it on my M4 Pro at 220 FPS with default flags ? @galderz @chhagedorn Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27282#issuecomment-3383511790 From epeter at openjdk.org Wed Oct 8 23:12:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Oct 2025 23:12:20 GMT Subject: Integrated: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025 In-Reply-To: References: Message-ID: On Mon, 15 Sep 2025 06:01:46 GMT, Emanuel Peter wrote: > Demo from here: > https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/ > > Cleaned up and enhanced with a JTREG and IR test. > I also added some additional "generated" normal maps from height functions. > And I display the resulting image side-by-side with the normal map. > > I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing. > > There is a **stand-alone** way to run the demo: > `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java` > (though it may only run with JDK22+, probably due some amber features) > > **Quick Perforance Numbers**, running on my avx512 laptop. > default / AVX3: 105 FPS > AVX2: 82 FPS > AVX1: 50 FPS > No vectorization: 19 FPS > GraalJIT: 13 FPS (`jdk-26-ea+5` - probably issue with vectorization / inlining?) > > Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**: > image > image > image > image This pull request has now been integrated. Changeset: 0e5655e6 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/0e5655e6680762a99b5aecb58369b880ea913565 Stats: 670 lines in 4 files changed: 670 ins; 0 del; 0 mod 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025 Reviewed-by: chagedorn, galder ------------- PR: https://git.openjdk.org/jdk/pull/27282 From thartmann at openjdk.org Thu Oct 9 05:10:03 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 9 Oct 2025 05:10:03 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 In-Reply-To: References: Message-ID: On Mon, 22 Sep 2025 07:39:24 GMT, erifan wrote: > The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes. > > This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher. test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 33: > 31: /* > 32: * @test > 33: * @bug 8354242 8368205 Suggestion: * @bug 8354242 This test is not a regression test for `8368205`, because `8368205` is a test bug. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27418#discussion_r2415580923 From chagedorn at openjdk.org Thu Oct 9 05:22:04 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 05:22:04 GMT Subject: RFR: 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache [v2] In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 15:02:55 GMT, Ramkumar Sunderbabu wrote: >> MethodHandle invocations with Xcomp are filling up CodeCache quickly in the test, especially in machines with high number of processors. >> It is possible to measure code cache consumption per invocation, estimate overall consumption and bail out before CodeCache runs out of memory. >> But it is much simpler to exclude the test for Xcomp flag. >> >> Additional Change: MethodHandles.lookup was done unnecessarily invoked for all iterations. Replaced it with single invocation. >> >> PS: This issue is not seen in JDK 20 and above, possibly due to JDK-8290025, but the exclusion guards against vagaries of CodeCache management. > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comment Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26840#pullrequestreview-3317256789 From thartmann at openjdk.org Thu Oct 9 05:31:04 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 9 Oct 2025 05:31:04 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 14:58:06 GMT, Damon Fenacci wrote: >> The test `testlibrary_tests/ir_framework/tests/TestDFlags.java` runs 11 separate IR framework runs by enabling only a single property/D flag in each run. This seems like a waste of resources for just a sanity run (we don't do any additional verification). Moreover, some newer property flags are missing. >> >> I suggest to cut it down to 2 runs: >> - IR verification enabled (i.e. -DVerifyIR=true, which is the default) >> - IR verification disabled (i.e. -DVerifyIR=false) >> >> In both runs we simultaneously set all property flags to some non-default value as a sanity test. >> >> This reduces the test execution time from around 20-30s down to 3-4s. >> >> Thanks, >> Christian > > test/hotspot/jtreg/compiler/lib/ir_framework/README.md line 178: > >> 176: - `-DWarmup=200`: Provide a new default value of the number of warm-up iterations (framework default is 2000). This might have an influence on the resulting IR and could lead to matching failures (the user can also set a fixed default warm-up value in a test with `testFrameworkObject.setDefaultWarmup(200)`). >> 177: - `-DReportStdout=true`: Print the standard output of the test VM. >> 178: - `-DVerbose=true`: Enable more fain-grained logging (slows the execution down). > > Suggestion: > > - `-DVerbose=true`: Enable more fine-grained logging (slows the execution down). > > > it just caught my eye ? Ouh, good catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27690#discussion_r2415607037 From chagedorn at openjdk.org Thu Oct 9 05:31:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 05:31:14 GMT Subject: RFR: 8364346: Typo in IR framework README [v3] In-Reply-To: References: Message-ID: <4vxZt1LTLFbBKllqDCDXSjTHla6t1etVLRJWSucsKew=.6f733871-e0fa-46a0-a609-5adf768bbc1c@github.com> On Wed, 8 Oct 2025 14:36:31 GMT, Lei Zhu wrote: >> Fix typo errors. > > Lei Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Update full name Looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27688#pullrequestreview-3317267352 From duke at openjdk.org Thu Oct 9 05:31:15 2025 From: duke at openjdk.org (Lei Zhu) Date: Thu, 9 Oct 2025 05:31:15 GMT Subject: Integrated: 8364346: Typo in IR framework README In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 10:50:43 GMT, Lei Zhu wrote: > Fix typo errors. This pull request has now been integrated. Changeset: 0b81db1d Author: Lei Zhu Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/0b81db1d38e69e6d8c73f22e4dae63ff5775852e Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8364346: Typo in IR framework README Reviewed-by: thartmann, fandreuzzi, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/27688 From chagedorn at openjdk.org Thu Oct 9 05:43:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 05:43:34 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java [v2] In-Reply-To: References: Message-ID: <7Q4MjLC7SGgJuOrtrcOBYm9CAdyWsMKGldO1bv0-7cc=.a3fd0e0d-ebd7-4e47-902f-70138f0ad5f3@github.com> > The test `testlibrary_tests/ir_framework/tests/TestDFlags.java` runs 11 separate IR framework runs by enabling only a single property/D flag in each run. This seems like a waste of resources for just a sanity run (we don't do any additional verification). Moreover, some newer property flags are missing. > > I suggest to cut it down to 2 runs: > - IR verification enabled (i.e. -DVerifyIR=true, which is the default) > - IR verification disabled (i.e. -DVerifyIR=false) > > In both runs we simultaneously set all property flags to some non-default value as a sanity test. > > This reduces the test execution time from around 20-30s down to 3-4s. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: update Damon ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27690/files - new: https://git.openjdk.org/jdk/pull/27690/files/976200be..a0c779f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27690&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27690&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27690/head:pull/27690 PR: https://git.openjdk.org/jdk/pull/27690 From chagedorn at openjdk.org Thu Oct 9 05:43:35 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 05:43:35 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java In-Reply-To: References: Message-ID: <31b0fz9aS8ncb9tNIcmg7IylvcriUTGp9B23pqFpFN4=.9e7292c9-9400-48b0-9b5c-06e4c41ed627@github.com> On Wed, 8 Oct 2025 11:30:43 GMT, Christian Hagedorn wrote: > The test `testlibrary_tests/ir_framework/tests/TestDFlags.java` runs 11 separate IR framework runs by enabling only a single property/D flag in each run. This seems like a waste of resources for just a sanity run (we don't do any additional verification). Moreover, some newer property flags are missing. > > I suggest to cut it down to 2 runs: > - IR verification enabled (i.e. -DVerifyIR=true, which is the default) > - IR verification disabled (i.e. -DVerifyIR=false) > > In both runs we simultaneously set all property flags to some non-default value as a sanity test. > > This reduces the test execution time from around 20-30s down to 3-4s. > > Thanks, > Christian Thanks Damon for your review! Pushed an update ------------- PR Comment: https://git.openjdk.org/jdk/pull/27690#issuecomment-3384209646 From chagedorn at openjdk.org Thu Oct 9 05:43:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 05:43:37 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java [v2] In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 05:27:53 GMT, Tobias Hartmann wrote: >> test/hotspot/jtreg/compiler/lib/ir_framework/README.md line 178: >> >>> 176: - `-DWarmup=200`: Provide a new default value of the number of warm-up iterations (framework default is 2000). This might have an influence on the resulting IR and could lead to matching failures (the user can also set a fixed default warm-up value in a test with `testFrameworkObject.setDefaultWarmup(200)`). >>> 177: - `-DReportStdout=true`: Print the standard output of the test VM. >>> 178: - `-DVerbose=true`: Enable more fain-grained logging (slows the execution down). >> >> Suggestion: >> >> - `-DVerbose=true`: Enable more fine-grained logging (slows the execution down). >> >> >> it just caught my eye ? > > Ouh, good catch! I've checked them all and also when reviewing https://github.com/openjdk/jdk/pull/27688 but still missed that! Good catch indeed :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27690#discussion_r2415623174 From thartmann at openjdk.org Thu Oct 9 05:48:07 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 9 Oct 2025 05:48:07 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java [v2] In-Reply-To: <7Q4MjLC7SGgJuOrtrcOBYm9CAdyWsMKGldO1bv0-7cc=.a3fd0e0d-ebd7-4e47-902f-70138f0ad5f3@github.com> References: <7Q4MjLC7SGgJuOrtrcOBYm9CAdyWsMKGldO1bv0-7cc=.a3fd0e0d-ebd7-4e47-902f-70138f0ad5f3@github.com> Message-ID: <-rB307uqOBZCiDGxKyVSgNti8lvcApOt3HpH8oSpQ1E=.ba16ce13-df94-40c4-8933-19c6845b710c@github.com> On Thu, 9 Oct 2025 05:43:34 GMT, Christian Hagedorn wrote: >> The test `testlibrary_tests/ir_framework/tests/TestDFlags.java` runs 11 separate IR framework runs by enabling only a single property/D flag in each run. This seems like a waste of resources for just a sanity run (we don't do any additional verification). Moreover, some newer property flags are missing. >> >> I suggest to cut it down to 2 runs: >> - IR verification enabled (i.e. -DVerifyIR=true, which is the default) >> - IR verification disabled (i.e. -DVerifyIR=false) >> >> In both runs we simultaneously set all property flags to some non-default value as a sanity test. >> >> This reduces the test execution time from around 20-30s down to 3-4s. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > update Damon Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27690#pullrequestreview-3317302410 From duke at openjdk.org Thu Oct 9 06:11:13 2025 From: duke at openjdk.org (duke) Date: Thu, 9 Oct 2025 06:11:13 GMT Subject: RFR: 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache [v2] In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 15:02:55 GMT, Ramkumar Sunderbabu wrote: >> MethodHandle invocations with Xcomp are filling up CodeCache quickly in the test, especially in machines with high number of processors. >> It is possible to measure code cache consumption per invocation, estimate overall consumption and bail out before CodeCache runs out of memory. >> But it is much simpler to exclude the test for Xcomp flag. >> >> Additional Change: MethodHandles.lookup was done unnecessarily invoked for all iterations. Replaced it with single invocation. >> >> PS: This issue is not seen in JDK 20 and above, possibly due to JDK-8290025, but the exclusion guards against vagaries of CodeCache management. > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comment @rsunderbabu Your change (at version 30cb217c4fe604c855e405816c8b73273af4d9a7) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26840#issuecomment-3384260499 From dfenacci at openjdk.org Thu Oct 9 06:21:06 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 9 Oct 2025 06:21:06 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java [v2] In-Reply-To: <7Q4MjLC7SGgJuOrtrcOBYm9CAdyWsMKGldO1bv0-7cc=.a3fd0e0d-ebd7-4e47-902f-70138f0ad5f3@github.com> References: <7Q4MjLC7SGgJuOrtrcOBYm9CAdyWsMKGldO1bv0-7cc=.a3fd0e0d-ebd7-4e47-902f-70138f0ad5f3@github.com> Message-ID: On Thu, 9 Oct 2025 05:43:34 GMT, Christian Hagedorn wrote: >> The test `testlibrary_tests/ir_framework/tests/TestDFlags.java` runs 11 separate IR framework runs by enabling only a single property/D flag in each run. This seems like a waste of resources for just a sanity run (we don't do any additional verification). Moreover, some newer property flags are missing. >> >> I suggest to cut it down to 2 runs: >> - IR verification enabled (i.e. -DVerifyIR=true, which is the default) >> - IR verification disabled (i.e. -DVerifyIR=false) >> >> In both runs we simultaneously set all property flags to some non-default value as a sanity test. >> >> This reduces the test execution time from around 20-30s down to 3-4s. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > update Damon Thanks for the small fixes @chhagedorn! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/27690#pullrequestreview-3317369038 From rsunderbabu at openjdk.org Thu Oct 9 06:22:18 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Thu, 9 Oct 2025 06:22:18 GMT Subject: Integrated: 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache In-Reply-To: References: Message-ID: On Tue, 19 Aug 2025 10:02:05 GMT, Ramkumar Sunderbabu wrote: > MethodHandle invocations with Xcomp are filling up CodeCache quickly in the test, especially in machines with high number of processors. > It is possible to measure code cache consumption per invocation, estimate overall consumption and bail out before CodeCache runs out of memory. > But it is much simpler to exclude the test for Xcomp flag. > > Additional Change: MethodHandles.lookup was done unnecessarily invoked for all iterations. Replaced it with single invocation. > > PS: This issue is not seen in JDK 20 and above, possibly due to JDK-8290025, but the exclusion guards against vagaries of CodeCache management. This pull request has now been integrated. Changeset: 1b11bea7 Author: Ramkumar Sunderbabu Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/1b11bea76ba29d1dfa414ad7e10693cf054bb96f Stats: 24 lines in 1 file changed: 15 ins; 8 del; 1 mod 8286865: vmTestbase/vm/mlvm/meth/stress/jni/nativeAndMH/Test.java fails with Out of space in CodeCache Reviewed-by: dlong, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/26840 From dfenacci at openjdk.org Thu Oct 9 06:24:23 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 9 Oct 2025 06:24:23 GMT Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v2] In-Reply-To: References: <5CGrcWjFZ7Zqj_Tm0LO6Tqg9cUA-xxvcaa2J-yWW8BE=.af4dea7c-e39d-491d-b924-c89fa82e757a@github.com> Message-ID: On Fri, 5 Sep 2025 09:45:38 GMT, Dean Long wrote: >> Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8360031 >> - JDK-8360031: update assert message >> - Merge branch 'master' into JDK-8360031 >> - JDK-8360031: remove unnecessary include >> - JDK-8360031: remove UseNewCode >> - JDK-8360031: compilation asserts in MemBarNode::remove > > I stepped through the crash with the replay file, and I'm not convinced that the problem is only with MemBarStoreStore and not MemBarRelease. What happens in the replay crash is the MemBarStoreStore gets onto the worklist through an indirect route in ConnectionGraph::split_unique_types() because of its memory edge. I think this explains why it is intermittent and hard to reproduce. A MemBarRelease on the other hand would get added to the worklist directly in compute_escape() if it has a Precedent edge. > The different handling of MemBarStoreStore vs MemBarRelease in this code is confusing. The MemBarRelease code came from JDK-6934604. It adds the node to the worklist, and lets MemBarNode::Ideal remove it based on does_not_escape_thread() on the alloc node. Contrast that with the MemBarStoreStore handling, which came from JDK-7121140, and instead of removing the node, it replaces it with a MemBarCPUOrder based on not_global_escape() on the alloc node. This MemBarStoreStore handling is for "MemBarStoreStore nodes added in library_call.cpp" and seems to fail to work for MemBarStoreStore nodes added in the ctor, which means MemBarStoreStore nodes added in the ctor only get on the worklist by accident, as mentioned above. > I think the conservative fix is to have compute_escape() always add the MemBarStoreStore to the worklist if it has a Precedent edge. Because of StressIGVN randomizing the worklist, I think the outcnt() can be 1 for either MemBarStoreStore or MemBarRelease, so we should relax the assert accordingly. I'm not sure how useful the assert will be after that. It might be better to remove it. > Longer-term, it might be nice to get rid of the separate handling of "MemBarStoreStore nodes added in library_call.cpp" if the MemBarCPUOrder is not really needed. Thanks a lot for your reviews @dean-long @shipilev @vnkozlov! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26556#issuecomment-3384286660 From dfenacci at openjdk.org Thu Oct 9 06:24:24 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 9 Oct 2025 06:24:24 GMT Subject: Integrated: 8360031: C2 compilation asserts in MemBarNode::remove In-Reply-To: References: Message-ID: On Wed, 30 Jul 2025 15:08:29 GMT, Damon Fenacci wrote: > # Issue > While compiling `java.util.zip.ZipFile` in C2 this assert is triggered > https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235 > > # Cause > While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens: > * we insert a trailing `MemBarStoreStore` in the constructor > before_folding > > * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. > after_folding > > * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302 > * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235 > triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier > > The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped). > > # Fix > Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation. > > # Testing > Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after. > Tier 1-3+ tests passed. This pull request has now been integrated. Changeset: 991f8e6f Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/991f8e6f385ab85b33d2e4d274506995b651ce65 Stats: 9 lines in 3 files changed: 3 ins; 3 del; 3 mod 8360031: C2 compilation asserts in MemBarNode::remove Reviewed-by: dlong, kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/26556 From jbhateja at openjdk.org Thu Oct 9 06:27:52 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 Oct 2025 06:27:52 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v13] In-Reply-To: References: Message-ID: <7Hvs60B_m8bmMzOMyrBZ_CbNJrQmHPMFKRAEU7F-Tu4=.94863cda-8af6-4e50-8563-2144947074e4@github.com> > This patch optimizes PopCount value transforms using KnownBits information. > Following are the results of the micro-benchmark included with the patch > > > > System: 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark Mode Cnt Score Error Units > PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s > PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s > > Withopt: > Benchmark Mode Cnt Score Error Units > PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s > PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27075/files - new: https://git.openjdk.org/jdk/pull/27075/files/85b10e88..49cdf296 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=11-12 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27075.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075 PR: https://git.openjdk.org/jdk/pull/27075 From jbhateja at openjdk.org Thu Oct 9 06:27:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 Oct 2025 06:27:54 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: <3hRTOJiGlZXrFqy7m3loXdorkRmyL3zb7hwyrwi8b6w=.0c159d31-bab3-4ed1-94a5-23b33bad457d@github.com> References: <3hRTOJiGlZXrFqy7m3loXdorkRmyL3zb7hwyrwi8b6w=.0c159d31-bab3-4ed1-94a5-23b33bad457d@github.com> Message-ID: On Mon, 6 Oct 2025 07:50:41 GMT, Tobias Hartmann wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > Testing all passed. I'll pass the review to someone else. Hi @TobiHartmann, we have @merykitty and @SirYwell clearance, need your approval to land this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3384292699 From mhaessig at openjdk.org Thu Oct 9 06:58:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 9 Oct 2025 06:58:17 GMT Subject: RFR: 8368573: MultiBranchNode::required_outcnt should return an unsigned int Message-ID: This small PR turns the type `MultiBranchNode::required_outcnt` from an `int` into a `uint` because all usages of this are already unsigned integers. Thus, the patch eliminates all implicit conversion from the code path. Testing: - [ ] Github Actions - [x] tier1, tier2 on Oracle supported platforms ------------- Commit messages: - Make MultiBranchNode::required_outcnt unsigned Changes: https://git.openjdk.org/jdk/pull/27714/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27714&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8368573 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27714.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27714/head:pull/27714 PR: https://git.openjdk.org/jdk/pull/27714 From duke at openjdk.org Thu Oct 9 07:13:26 2025 From: duke at openjdk.org (erifan) Date: Thu, 9 Oct 2025 07:13:26 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: > The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes. > > This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher. erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Remove JBS number 8368205 from VectorMaskCompareNotTest.java - Merge branch 'master' into JDK-8368205-VectorMaskCompareNotTest-failure - 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes. This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27418/files - new: https://git.openjdk.org/jdk/pull/27418/files/2818a686..5afc18d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27418&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27418&range=00-01 Stats: 152891 lines in 1810 files changed: 127648 ins; 16150 del; 9093 mod Patch: https://git.openjdk.org/jdk/pull/27418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27418/head:pull/27418 PR: https://git.openjdk.org/jdk/pull/27418 From duke at openjdk.org Thu Oct 9 07:13:27 2025 From: duke at openjdk.org (erifan) Date: Thu, 9 Oct 2025 07:13:27 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: <1fyNgGSpy9QihsviY0QEQOAqrFfS_ZohCmLTvBN8-IU=.f0f7545f-b355-45c4-a868-c39acffd7c36@github.com> On Thu, 9 Oct 2025 05:07:15 GMT, Tobias Hartmann wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Remove JBS number 8368205 from VectorMaskCompareNotTest.java >> - Merge branch 'master' into JDK-8368205-VectorMaskCompareNotTest-failure >> - 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 >> >> The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** >> is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. >> Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which >> is an illegal value because the minimum vector size requirement is 8 bytes. >> >> This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` >> is set to 16 bytes or higher. > > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 33: > >> 31: /* >> 32: * @test >> 33: * @bug 8354242 8368205 > > Suggestion: > > * @bug 8354242 > > > This test is not a regression test for `8368205`, because `8368205` is a test bug. Done, thanks~ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27418#discussion_r2415775621 From chagedorn at openjdk.org Thu Oct 9 07:41:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 07:41:09 GMT Subject: RFR: 8368573: MultiBranchNode::required_outcnt should return an unsigned int In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 06:51:50 GMT, Manuel H?ssig wrote: > This small PR turns the type `MultiBranchNode::required_outcnt` from an `int` into a `uint` because all usages of this are already unsigned integers. Thus, the patch eliminates all implicit conversion from the code path. > > Testing: > - [ ] Github Actions > - [x] tier1, tier2 on Oracle supported platforms Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27714#pullrequestreview-3317606637 From chagedorn at openjdk.org Thu Oct 9 07:42:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 07:42:12 GMT Subject: RFR: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java [v2] In-Reply-To: <-rB307uqOBZCiDGxKyVSgNti8lvcApOt3HpH8oSpQ1E=.ba16ce13-df94-40c4-8933-19c6845b710c@github.com> References: <7Q4MjLC7SGgJuOrtrcOBYm9CAdyWsMKGldO1bv0-7cc=.a3fd0e0d-ebd7-4e47-902f-70138f0ad5f3@github.com> <-rB307uqOBZCiDGxKyVSgNti8lvcApOt3HpH8oSpQ1E=.ba16ce13-df94-40c4-8933-19c6845b710c@github.com> Message-ID: On Thu, 9 Oct 2025 05:45:40 GMT, Tobias Hartmann wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> update Damon > > Marked as reviewed by thartmann (Reviewer). Thanks @TobiHartmann and @dafedafe for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27690#issuecomment-3384554601 From dfenacci at openjdk.org Thu Oct 9 08:11:16 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 9 Oct 2025 08:11:16 GMT Subject: RFR: 8368573: MultiBranchNode::required_outcnt should return an unsigned int In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 06:51:50 GMT, Manuel H?ssig wrote: > This small PR turns the type `MultiBranchNode::required_outcnt` from an `int` into a `uint` because all usages of this are already unsigned integers. Thus, the patch eliminates all implicit conversion from the code path. > > Testing: > - [ ] Github Actions > - [x] tier1, tier2 on Oracle supported platforms LGTM. Thanks @mhaessig! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/27714#pullrequestreview-3317704222 From duke at openjdk.org Thu Oct 9 08:24:09 2025 From: duke at openjdk.org (Lei Zhu) Date: Thu, 9 Oct 2025 08:24:09 GMT Subject: RFR: 8364346: Typo in IR framework README [v3] In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 14:36:31 GMT, Lei Zhu wrote: >> Fix typo errors. > > Lei Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Update full name > /sponsor Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27688#issuecomment-3384703867 From fandreuzzi at openjdk.org Thu Oct 9 09:02:38 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Thu, 9 Oct 2025 09:02:38 GMT Subject: RFR: 8368573: MultiBranchNode::required_outcnt should return an unsigned int In-Reply-To: References: Message-ID: <6FoiiLWtkX6vvf2urn3F-37PbE7TjBJaBrXqrPNY-68=.3001f86a-ef9e-44b2-ab0d-acf787f68d8f@github.com> On Thu, 9 Oct 2025 06:51:50 GMT, Manuel H?ssig wrote: > This small PR turns the type `MultiBranchNode::required_outcnt` from an `int` into a `uint` because all usages of this are already unsigned integers. Thus, the patch eliminates all implicit conversion from the code path. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on Oracle supported platforms Marked as reviewed by fandreuzzi (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/27714#pullrequestreview-3317913065 From syan at openjdk.org Thu Oct 9 09:42:04 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 9 Oct 2025 09:42:04 GMT Subject: RFR: 8367899: compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out [v2] In-Reply-To: References: <_zhjDERsbq1mfrqW8IBGsEB3WrMmuVVykDAijO-yOUU=.39b2c9a5-e784-419d-bc07-504750f48eaf@github.com> Message-ID: On Mon, 6 Oct 2025 10:26:51 GMT, Christian Hagedorn wrote: >> Sorry for missed this comment...... > > No worries! I create a new JBS [issue](https://bugs.openjdk.org/browse/JDK-8369490) to fix this trivial issue ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27548#discussion_r2416167603 From syan at openjdk.org Thu Oct 9 09:54:12 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 9 Oct 2025 09:54:12 GMT Subject: RFR: 8369490: Remove unused Runinfo parameters in compiler/c2/gvn/TestBitCompressValueTransform.java Message-ID: Hi all, The 'Runinfo info' parameters in compiler/c2/gvn/TestBitCompressValueTransform.java is unused, maybe we can remove the unused parameters. Change has been verified locally on linux-x64, test-fix only, trivial fix, no risk. ------------- Commit messages: - 8369490: Remove unused Runinfo parameters in compiler/c2/gvn/TestBitCompressValueTransform.java Changes: https://git.openjdk.org/jdk/pull/27720/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27720&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369490 Stats: 19 lines in 1 file changed: 0 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/27720.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27720/head:pull/27720 PR: https://git.openjdk.org/jdk/pull/27720 From mhaessig at openjdk.org Thu Oct 9 10:27:29 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 9 Oct 2025 10:27:29 GMT Subject: RFR: 8368573: MultiBranchNode::required_outcnt should return an unsigned int In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 07:38:28 GMT, Christian Hagedorn wrote: >> This small PR turns the type `MultiBranchNode::required_outcnt` from an `int` into a `uint` because all usages of this are already unsigned integers. Thus, the patch eliminates all implicit conversion from the code path. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 on Oracle supported platforms > > Looks good and trivial! Thank you for your reviews, @chhagedorn, @dafedafe, @fandreuz. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27714#issuecomment-3385188505 From mhaessig at openjdk.org Thu Oct 9 10:27:30 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 9 Oct 2025 10:27:30 GMT Subject: Integrated: 8368573: MultiBranchNode::required_outcnt should return an unsigned int In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 06:51:50 GMT, Manuel H?ssig wrote: > This small PR turns the type `MultiBranchNode::required_outcnt` from an `int` into a `uint` because all usages of this are already unsigned integers. Thus, the patch eliminates all implicit conversion from the code path. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 on Oracle supported platforms This pull request has now been integrated. Changeset: 7e3e55a5 Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/7e3e55a576b24ae704395b01a15c363ce6e28cae Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8368573: MultiBranchNode::required_outcnt should return an unsigned int Reviewed-by: chagedorn, dfenacci, fandreuzzi ------------- PR: https://git.openjdk.org/jdk/pull/27714 From chagedorn at openjdk.org Thu Oct 9 10:38:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 10:38:01 GMT Subject: RFR: 8369490: Remove unused Runinfo parameters in compiler/c2/gvn/TestBitCompressValueTransform.java In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 09:47:30 GMT, SendaoYan wrote: > Hi all, > > The 'Runinfo info' parameters in compiler/c2/gvn/TestBitCompressValueTransform.java is unused, maybe we can remove the unused parameters. > > Change has been verified locally on linux-x64, test-fix only, trivial fix, no risk. Looks good and trivial, thanks for following up on that! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27720#pullrequestreview-3318304694 From chagedorn at openjdk.org Thu Oct 9 10:39:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 10:39:12 GMT Subject: RFR: 8367899: compiler/c2/gvn/TestBitCompressValueTransform.java intermittent timed out [v2] In-Reply-To: References: <_zhjDERsbq1mfrqW8IBGsEB3WrMmuVVykDAijO-yOUU=.39b2c9a5-e784-419d-bc07-504750f48eaf@github.com> Message-ID: On Thu, 9 Oct 2025 09:38:57 GMT, SendaoYan wrote: >> No worries! > > I create a new JBS [issue](https://bugs.openjdk.org/browse/JDK-8369490) to fix this trivial issue Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27548#discussion_r2416337011 From mhaessig at openjdk.org Thu Oct 9 10:41:01 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 9 Oct 2025 10:41:01 GMT Subject: RFR: 8369490: Remove unused Runinfo parameters in compiler/c2/gvn/TestBitCompressValueTransform.java In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 09:47:30 GMT, SendaoYan wrote: > Hi all, > > The 'Runinfo info' parameters in compiler/c2/gvn/TestBitCompressValueTransform.java is unused, maybe we can remove the unused parameters. > > Change has been verified locally on linux-x64, test-fix only, trivial fix, no risk. That looks good and trivial. Thanks for fixing this. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/27720#pullrequestreview-3318315776 From duke at openjdk.org Thu Oct 9 10:52:38 2025 From: duke at openjdk.org (erifan) Date: Thu, 9 Oct 2025 10:52:38 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms Message-ID: According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. This test problem was discovered by simulating a 512-bit sve2 environment using qemu. This PR fixes these test failures. ------------- Commit messages: - 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms Changes: https://git.openjdk.org/jdk/pull/27723/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27723&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369456 Stats: 62 lines in 3 files changed: 40 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/27723.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27723/head:pull/27723 PR: https://git.openjdk.org/jdk/pull/27723 From roland at openjdk.org Thu Oct 9 12:20:42 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 9 Oct 2025 12:20:42 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v25] In-Reply-To: <1nIAYsKgvsIyomk5TKNvSmnLOMDj09MgqGtjxWQKt8k=.f36dbaf7-392b-4440-9050-9714c0242ed1@github.com> References: <1nIAYsKgvsIyomk5TKNvSmnLOMDj09MgqGtjxWQKt8k=.f36dbaf7-392b-4440-9050-9714c0242ed1@github.com> Message-ID: On Wed, 8 Oct 2025 16:31:49 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > Skip and delay processing add nodes with non-canonicalized inputs Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-3318695659 From epeter at openjdk.org Thu Oct 9 12:48:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Oct 2025 12:48:18 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 05:03:27 GMT, Kangcheng Xu wrote: >> @tabjy >> >>> I could, at very least, try to swap LHS and RHS if no match is found >> >> I think that would be a good idea, and not very hard. You can just have a function `add_pattern(lhs, rhs)`, and then run it also with `add_pattern(rhs, lhs)` for **swapping**. >> >> Personally, I would have preferred a recursive algorithm, but that could have some compile time overhead. @chhagedorn Was a little more skeptical about the recursive algorithm. >> >> It seems the motivation for this change is the benchmark from here: >> ArithmeticCanonicalizationBenchmark >> https://ionutbalosin.com/2024/02/jvm-performance-comparison-for-jdk-21/#jit-compiler >> >> This benchmark is of course somewhat arbitrary, and so are now all of your added patterns. Having a most general solution would be nice, but maybe the recursive algorithm is too much, I'm not 100% sure. Of course we now still have cases that do not optimize/canonicalize, and so someone could write a benchmark for those cases still.. oh well. >> >> What I would like to see for **testing**: add some more patterns with IR rules. More that now optimize, and also a few that do not optimize, just so we have a bit of a sense what we are still missing. >> >> @rwestrel Filed this issue. I wonder: what do you think we should do here? How general should the optimization/canonicalization be? > > @eme64 Thank you for reviewing! Those are very valid suggestion, especially on naming as this PR evolves. I've done the following: > > - updated naming (mostly with "serial addition" to "collapsable addition (into multiplication)") > - updated comments > - moved test file > - merged in master > > Please enjoy your time off! > > Once GHA passes, @rwestrel could you please give this a quick review if you have some time? Thank you very much! @tabjy The code now looks good to me. I ran some internal testing, should take about 24h. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-3385705024 From roland at openjdk.org Thu Oct 9 13:25:27 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 9 Oct 2025 13:25:27 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal Message-ID: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> This change refactor code that's similar for LShiftINode and LShiftLNode into shared methods. I also added extra test cases to cover all transformations. ------------- Commit messages: - more - more - more - more - more - fix Changes: https://git.openjdk.org/jdk/pull/27725/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27725&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369167 Stats: 625 lines in 6 files changed: 368 ins; 170 del; 87 mod Patch: https://git.openjdk.org/jdk/pull/27725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27725/head:pull/27725 PR: https://git.openjdk.org/jdk/pull/27725 From chagedorn at openjdk.org Thu Oct 9 13:36:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 13:36:13 GMT Subject: Integrated: 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java In-Reply-To: References: Message-ID: <_cOKuZOGxcLi8PvbyZuzhxa1hoQzhFbIfa7gZebtjoc=.0d18a89e-b67e-4e64-8f21-dc0f28262f6a@github.com> On Wed, 8 Oct 2025 11:30:43 GMT, Christian Hagedorn wrote: > The test `testlibrary_tests/ir_framework/tests/TestDFlags.java` runs 11 separate IR framework runs by enabling only a single property/D flag in each run. This seems like a waste of resources for just a sanity run (we don't do any additional verification). Moreover, some newer property flags are missing. > > I suggest to cut it down to 2 runs: > - IR verification enabled (i.e. -DVerifyIR=true, which is the default) > - IR verification disabled (i.e. -DVerifyIR=false) > > In both runs we simultaneously set all property flags to some non-default value as a sanity test. > > This reduces the test execution time from around 20-30s down to 3-4s. > > Thanks, > Christian This pull request has now been integrated. Changeset: dd410e0b Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/dd410e0b776a01b617a457786b11ddf87d3b4d60 Stats: 22 lines in 2 files changed: 7 ins; 3 del; 12 mod 8369423: Reduce execution time of testlibrary_tests/ir_framework/tests/TestDFlags.java Reviewed-by: thartmann, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/27690 From chagedorn at openjdk.org Thu Oct 9 13:37:16 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Oct 2025 13:37:16 GMT Subject: Integrated: 8369236: testlibrary_tests/ir_framework/tests/TestCompileThreshold.java timed out In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 07:45:15 GMT, Christian Hagedorn wrote: > The test `testlibrary_tests/ir_framework/tests/TestCompileThreshold.java` times out intermittently after the timeout factor change (taking more than 120s). On my local machine, I measured around ~105-115s. > > The test uses `CompileThreshold=10` which is almost like `Xcomp` and thus quite slow. However, the purpose of this test is not to stress the compiler but actually to verify that passing `CompileThreshold` to the IR framework over jtreg options is properly ignored. Therefore, we can use higher `CompileThreshold` values and achieve the same goal. With the proposed changes, the test finishes in ~10-15s on my local machine. > > Thanks, > Christian This pull request has now been integrated. Changeset: 005877b0 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/005877b0635f1a9547724168ebd894b1b61fc116 Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod 8369236: testlibrary_tests/ir_framework/tests/TestCompileThreshold.java timed out Reviewed-by: ayang, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/27667 From kvn at openjdk.org Thu Oct 9 15:44:32 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 Oct 2025 15:44:32 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize In-Reply-To: References: Message-ID: <4zoyX_92nKlN8oDSPF3OWSPpPI-2cA-OLVu27tpk15U=.400c0e41-3485-4458-929f-80cb039a5dbd@github.com> On Wed, 8 Oct 2025 22:44:57 GMT, Emanuel Peter wrote: >> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: >> https://github.com/openjdk/jdk/pull/20964 >> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) >> >> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. >> >> This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? >> >> -------------------------- >> >> **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. >> >> **Details** >> Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. >> >> Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. >> >> **Future Work** >> - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) >> - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. > > src/hotspot/share/opto/vectornode.cpp line 1615: > >> 1613: } >> 1614: >> 1615: bool ReductionNode::auto_vectorization_requires_strict_order(int vopc) { > > Note: we need to know which ones we can move out of the loop, and we can only do that with those that do not require strict order. Can we use something like `bottom_type()->isa_float() != nullptr || bottom_type()->isa_double() != nullptr` here to check for strict order? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2417172158 From kvn at openjdk.org Thu Oct 9 15:44:33 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 Oct 2025 15:44:33 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize In-Reply-To: <4zoyX_92nKlN8oDSPF3OWSPpPI-2cA-OLVu27tpk15U=.400c0e41-3485-4458-929f-80cb039a5dbd@github.com> References: <4zoyX_92nKlN8oDSPF3OWSPpPI-2cA-OLVu27tpk15U=.400c0e41-3485-4458-929f-80cb039a5dbd@github.com> Message-ID: On Thu, 9 Oct 2025 15:36:40 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/vectornode.cpp line 1615: >> >>> 1613: } >>> 1614: >>> 1615: bool ReductionNode::auto_vectorization_requires_strict_order(int vopc) { >> >> Note: we need to know which ones we can move out of the loop, and we can only do that with those that do not require strict order. > > Can we use something like `bottom_type()->isa_float() != nullptr || bottom_type()->isa_double() != nullptr` here to check for strict order? May be use it for assert to make sure we did not miss listing some new vector nodes in this switch in a future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2417185836 From epeter at openjdk.org Thu Oct 9 21:46:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Oct 2025 21:46:36 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: References: Message-ID: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? > > -------------------------- > > **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. > > **Details** > Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. > > Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. > > **Future Work** > - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) > - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: For Vladimir K7 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27704/files - new: https://git.openjdk.org/jdk/pull/27704/files/6db3c0d9..a7cd2685 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=00-01 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27704.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27704/head:pull/27704 PR: https://git.openjdk.org/jdk/pull/27704 From epeter at openjdk.org Thu Oct 9 21:46:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Oct 2025 21:46:37 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 19:42:38 GMT, Emanuel Peter wrote: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? > > -------------------------- > > **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. > > **Details** > Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. > > Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. > > **Future Work** > - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) > - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. @vnkozlov Thanks for having a look. I updated the code :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27704#issuecomment-3387579767 From epeter at openjdk.org Thu Oct 9 21:46:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Oct 2025 21:46:38 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: References: <4zoyX_92nKlN8oDSPF3OWSPpPI-2cA-OLVu27tpk15U=.400c0e41-3485-4458-929f-80cb039a5dbd@github.com> Message-ID: On Thu, 9 Oct 2025 15:40:41 GMT, Vladimir Kozlov wrote: >> Can we use something like `bottom_type()->isa_float() != nullptr || bottom_type()->isa_double() != nullptr` here to check for strict order? > > May be use it for assert to make sure we did not miss listing some new vector nodes in this switch in a future. @vnkozlov I like the idea with an assert on default, so that is what I changed it to now. That way, we won't forget any cases in the future. `bottom_type()->isa_float() != nullptr || bottom_type()->isa_double() != nullptr`: that does not work (on its own), because float/double min/max do NOT require strict order and can be optimized/reassociated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2418022949 From kvn at openjdk.org Thu Oct 9 22:16:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 Oct 2025 22:16:04 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 21:46:36 GMT, Emanuel Peter wrote: >> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: >> https://github.com/openjdk/jdk/pull/20964 >> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) >> >> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. >> >> This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? >> >> -------------------------- >> >> **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. >> >> **Details** >> Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. >> >> Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. >> >> **Future Work** >> - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) >> - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Vladimir K7 Yes, this works too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27704#pullrequestreview-3320787659 From xgong at openjdk.org Fri Oct 10 03:29:02 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 10 Oct 2025 03:29:02 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Thu, 25 Sep 2025 03:08:47 GMT, Xiaohong Gong wrote: > The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. > > For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. > > These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. > > This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. > > It also modifies the Vector API jtreg tests for well testing. Here is the details: > > 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: > > VectorMaskToLong (VectorLongToMask l) => l > > Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. > > 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". > > Performance shows significant improvement on NVIDIA's Grace CPU. > > Here is the performance data with `-XX:UseSVE=2`: > > Benchmark bits inputs Mode Unit Before After Gain > MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 > MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 > MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 > MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 > MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/ms 101031... Hi, could anyone please help take a look at this PR? Thanks a lot in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3388156363 From mchevalier at openjdk.org Fri Oct 10 08:28:22 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 10 Oct 2025 08:28:22 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal In-Reply-To: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: <4AzzqZKwkzxGFxIszBSwfAdT6lyEEMdveyzYXhpfJLI=.224d078f-87e7-4b04-97ff-fe67ca4df4aa@github.com> On Thu, 9 Oct 2025 13:16:13 GMT, Roland Westrelin wrote: > This change refactor code that's similar for LShiftINode and > LShiftLNode into shared methods. I also added extra test cases to > cover all transformations. Looks good to me. There are a lot of `SomeType *name` that we are slowly converting into `SomeType* name` when we have an occasion. As you wish. I'm also running some tests. I'll be back soon. src/hotspot/share/opto/mulnode.cpp line 1264: > 1262: int count = 0; > 1263: if (const_shift_count(phase, this, &count) && (count & (bits_per_java_integer(bt) - 1)) == 0) { > 1264: // Shift by a multiple of 32/64 does nothing I know it was there before, but I wonder if it's useful. Shouldn't something like `x << K` be idealized into `x << (K mod 32)` (or 64) by `mask_and_replace_shift_amount`, and then, we just need to treat `x << 0` in `Identity`. Not that it hurts or it's really complex... ------------- PR Review: https://git.openjdk.org/jdk/pull/27725#pullrequestreview-3321957081 PR Review Comment: https://git.openjdk.org/jdk/pull/27725#discussion_r2418898051 From roland at openjdk.org Fri Oct 10 08:44:24 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 10 Oct 2025 08:44:24 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v2] In-Reply-To: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: > This change refactor code that's similar for LShiftINode and > LShiftLNode into shared methods. I also added extra test cases to > cover all transformations. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: sort headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27725/files - new: https://git.openjdk.org/jdk/pull/27725/files/ba8dc6df..05ff54dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27725&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27725&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27725/head:pull/27725 PR: https://git.openjdk.org/jdk/pull/27725 From roland at openjdk.org Fri Oct 10 08:44:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 10 Oct 2025 08:44:25 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal In-Reply-To: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: <-2FHpMSA3_maSA1AokaCBLke5OL4FPFMKNUSz2CiXNM=.dec85f0d-be9b-490b-9a3d-15154fa891a1@github.com> On Thu, 9 Oct 2025 13:16:13 GMT, Roland Westrelin wrote: > This change refactor code that's similar for LShiftINode and > LShiftLNode into shared methods. I also added extra test cases to > cover all transformations. I pushed a new commit that fixes the test failures because some headers were not properly sorted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27725#issuecomment-3388906913 From roland at openjdk.org Fri Oct 10 08:47:17 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 10 Oct 2025 08:47:17 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v2] In-Reply-To: <4AzzqZKwkzxGFxIszBSwfAdT6lyEEMdveyzYXhpfJLI=.224d078f-87e7-4b04-97ff-fe67ca4df4aa@github.com> References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> <4AzzqZKwkzxGFxIszBSwfAdT6lyEEMdveyzYXhpfJLI=.224d078f-87e7-4b04-97ff-fe67ca4df4aa@github.com> Message-ID: <0ydTWVqw0FMaH9tTpRnYDZ8vW1XeFzdb25r9Sx_AFMI=.a093ab46-187c-4532-82f5-5f5927a4d07e@github.com> On Fri, 10 Oct 2025 08:23:19 GMT, Marc Chevalier wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> sort headers > > src/hotspot/share/opto/mulnode.cpp line 1264: > >> 1262: int count = 0; >> 1263: if (const_shift_count(phase, this, &count) && (count & (bits_per_java_integer(bt) - 1)) == 0) { >> 1264: // Shift by a multiple of 32/64 does nothing > > I know it was there before, but I wonder if it's useful. Shouldn't something like `x << K` be idealized into `x << (K mod 32)` (or 64) by `mask_and_replace_shift_amount`, and then, we just need to treat `x << 0` in `Identity`. Not that it hurts or it's really complex... Right. Maybe it's because `Identity` can be called with no previous call to `Ideal`: `PhaseIdealLoop::split_thru_phi()` for instance, even though, not sure it makes much of a difference here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27725#discussion_r2418961556 From duke at openjdk.org Fri Oct 10 09:09:40 2025 From: duke at openjdk.org (erifan) Date: Fri, 10 Oct 2025 09:09:40 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 10:45:47 GMT, erifan wrote: > According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 environment using qemu. > > This PR fixes these test failures. The test failure should has nothing to do with this PR. Hi @Bhavana-Kilambi , could you please help take a look at this PR, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27723#issuecomment-3388991927 From bkilambi at openjdk.org Fri Oct 10 09:35:31 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 10 Oct 2025 09:35:31 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 10:45:47 GMT, erifan wrote: > According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 environment using qemu. > > This PR fixes these test failures. Hi, could you please post the exact failure message as well? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27723#issuecomment-3389075795 From duke at openjdk.org Fri Oct 10 09:42:19 2025 From: duke at openjdk.org (erifan) Date: Fri, 10 Oct 2025 09:42:19 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 09:32:52 GMT, Bhavana Kilambi wrote: >> According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. >> >> This test problem was discovered by simulating a 512-bit sve2 environment using qemu. >> >> This PR fixes these test failures. > > Hi, could you please post the exact failure message as well? Thanks! Yeah, the error logs, @Bhavana-Kilambi [2025-09-03T08:05:59.812Z] STDERR: [2025-09-03T08:05:59.812Z] WARNING: Using incubator modules: jdk.incubator.vector [2025-09-03T08:05:59.812Z] [2025-09-03T08:05:59.812Z] Command Line: [2025-09-03T08:05:59.812Z] /tmp/ci-scripts/build-fastdebug/images/jdk/bin/java -DReproduce=true -cp /tmp/ci-scripts/jtwork_rerun/hotspot/classes/compiler/vectorapi/TestSelectFromTwoVectorOp.d:/tmp/ci-scripts/jdk-src/test/hotspot/jtreg/compiler/vectorapi:/tmp/ci-scripts/jtwork_rerun/hotspot/classes/test/lib:/tmp/ci-scripts/jdk-src/test/lib:/tmp/ci-scripts/jtwork_rerun/hotspot/classes:/tmp/ci-scripts/jdk-src/test/hotspot/jtreg:/usr/local/lib/jtreg/jtreg-7.5.2+1/lib/jtreg.jar -Djava.library.path=/tmp/ci-scripts/build-fastdebug/images/test/hotspot/jtreg/native -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -server -ea -esa -Xmx768m -Djdk.incubator.vector.test.loop-iterations=500 -Djdk.test.lib.artifacts.jcstress-tests-all=/usr/local/lib/jcstress-tests-all-0.17-snapshot-20240328.jar -Dir.framework.server.port=36179 --add-modules=jdk.incubator.vector -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:CompilerDirectivesFile=test-vm-compile-comm ands-pid-307630.log -XX:CompilerDirectivesLimit=221 -XX:-OmitStackTraceInFastThrow -DShouldDoIRVerification=true -XX:-BackgroundCompilation -XX:CompileCommand=quiet compiler.lib.ir_framework.test.TestVM compiler.vectorapi.TestSelectFromTwoVectorOp [2025-09-03T08:05:59.813Z] [2025-09-03T08:05:59.813Z] One or more @IR rules failed: [2025-09-03T08:05:59.813Z] [2025-09-03T08:05:59.813Z] Failed IR Rules (12) of Methods (12) [2025-09-03T08:05:59.813Z] ------------------------------------ [2025-09-03T08:05:59.813Z] 1) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Byte128()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.813Z] * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VB#_", "[_ at 16](mailto:_ at 16)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=16"}, applyIfCPUFeature={"asimd", "true"}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.813Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.813Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.813Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.813Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.813Z] - No nodes matched! [2025-09-03T08:05:59.813Z] [2025-09-03T08:05:59.813Z] 2) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Byte256()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.813Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VB#_", "[_ at 32](mailto:_ at 32)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=32"}, applyIfCPUFeature={"sve2", "true"}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.813Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.813Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.813Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.813Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.813Z] - No nodes matched! [2025-09-03T08:05:59.813Z] [2025-09-03T08:05:59.813Z] 3) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Double128()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.813Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sve2", "true", "avx512vl", "true"}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VD#_", "[_ at 2](mailto:_ at 2)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=16"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.814Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.814Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.814Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.814Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.814Z] - No nodes matched! [2025-09-03T08:05:59.814Z] [2025-09-03T08:05:59.814Z] 4) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Double256()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.814Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sve2", "true", "avx512vl", "true"}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VD#_", "[_ at 4](mailto:_ at 4)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=32"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.814Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.814Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.814Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.814Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.814Z] - No nodes matched! [2025-09-03T08:05:59.814Z] [2025-09-03T08:05:59.814Z] 5) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Float128()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.814Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sve2", "true", "avx512vl", "true"}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VF#_", "[_ at 4](mailto:_ at 4)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=16"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.814Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.814Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.814Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.814Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.814Z] - No nodes matched! [2025-09-03T08:05:59.814Z] [2025-09-03T08:05:59.814Z] 6) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Float256()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.814Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sve2", "true", "avx512vl", "true"}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VF#_", "[_ at 8](mailto:_ at 8)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=32"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.814Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.814Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.814Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.815Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.815Z] - No nodes matched! [2025-09-03T08:05:59.815Z] [2025-09-03T08:05:59.815Z] 7) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Int128()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.815Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sve2", "true", "avx512vl", "true"}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VI#_", "[_ at 4](mailto:_ at 4)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=16"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.815Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.815Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.815Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.815Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.815Z] - No nodes matched! [2025-09-03T08:05:59.815Z] [2025-09-03T08:05:59.815Z] 8) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Int256()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.815Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sve2", "true", "avx512vl", "true"}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VI#_", "[_ at 8](mailto:_ at 8)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=32"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.815Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.815Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.815Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.815Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.815Z] - No nodes matched! [2025-09-03T08:05:59.815Z] [2025-09-03T08:05:59.815Z] 9) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Long128()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.815Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sve2", "true", "avx512vl", "true"}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VL#_", "[_ at 2](mailto:_ at 2)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=16"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.815Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.815Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.815Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.815Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.815Z] - No nodes matched! [2025-09-03T08:05:59.815Z] [2025-09-03T08:05:59.815Z] 10) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Long256()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.816Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sve2", "true", "avx512vl", "true"}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VL#_", "[_ at 4](mailto:_ at 4)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=32"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.816Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.816Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.816Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.816Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.816Z] - No nodes matched! [2025-09-03T08:05:59.816Z] [2025-09-03T08:05:59.816Z] 11) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Short128()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.816Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VS#_", "[_ at 8](mailto:_ at 8)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=16"}, applyIfCPUFeature={"sve2", "true"}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.816Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.816Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.816Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.816Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.816Z] - No nodes matched! [2025-09-03T08:05:59.816Z] [2025-09-03T08:05:59.816Z] 12) Method "public static void compiler.vectorapi.TestSelectFromTwoVectorOp.selectFromTwoVector_Short256()" - [Failed IR rules: 1]: [2025-09-03T08:05:59.816Z] * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#SELECT_FROM_TWO_VECTOR_VS#_", "[_ at 16](mailto:_ at 16)", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">=32"}, applyIfCPUFeature={"sve2", "true"}, applyIfAnd={}, applyIfNot={})" [2025-09-03T08:05:59.816Z] > Phase "PrintIdeal": [2025-09-03T08:05:59.816Z] - counts: Graph contains wrong number of nodes: [2025-09-03T08:05:59.816Z] * Constraint 1: "(\d+(\s){2}(SelectFromTwoVector.*)+(\s){2}===.*vector[A-Za-z])" [2025-09-03T08:05:59.816Z] - Failed comparison: [found] 0 > 0 [given] [2025-09-03T08:05:59.816Z] - No nodes matched! [2025-09-03T08:05:59.816Z] [2025-09-03T08:05:59.816Z] >>> Check stdout for compilation output of the failed methods [2025-09-03T08:05:59.816Z] [2025-09-03T08:05:59.816Z] [2025-09-03T08:05:59.816Z] ############################################################# [2025-09-03T08:05:59.816Z] - To only run the failed tests use -DTest, -DExclude, [2025-09-03T08:05:59.817Z] and/or -DScenarios. [2025-09-03T08:05:59.817Z] - To also get the standard output of the test VM run with [2025-09-03T08:05:59.817Z] -DReportStdout=true or for even more fine-grained logging [2025-09-03T08:05:59.817Z] use -DVerbose=true. [2025-09-03T08:05:59.817Z] ############################################################# [2025-09-03T08:05:59.817Z] [2025-09-03T08:05:59.817Z] [2025-09-03T08:05:59.817Z] compiler.lib.ir_framework.driver.irmatching.IRViolationException: There were one or multiple IR rule failures. Please check stderr for more information. [2025-09-03T08:05:59.817Z] at compiler.lib.ir_framework.driver.irmatching.IRMatcher.reportFailures(IRMatcher.java:61) [2025-09-03T08:05:59.817Z] at compiler.lib.ir_framework.driver.irmatching.IRMatcher.match(IRMatcher.java:49) [2025-09-03T08:05:59.817Z] at compiler.lib.ir_framework.TestFramework.runTestVM(TestFramework.java:882) [2025-09-03T08:05:59.817Z] at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:834) [2025-09-03T08:05:59.817Z] at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:426) [2025-09-03T08:05:59.817Z] at compiler.lib.ir_framework.TestFramework.runWithFlags(TestFramework.java:257) [2025-09-03T08:05:59.817Z] at compiler.vectorapi.TestSelectFromTwoVectorOp.main(TestSelectFromTwoVectorOp.java:484) [2025-09-03T08:05:59.817Z] at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) [2025-09-03T08:05:59.817Z] at java.base/java.lang.reflect.Method.invoke(Method.java:565) [2025-09-03T08:05:59.817Z] at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) [2025-09-03T08:05:59.817Z] at java.base/java.lang.Thread.run(Thread.java:1474) [2025-09-03T08:05:59.817Z] [2025-09-03T08:05:59.817Z] JavaTest Message: Test threw exception: compiler.lib.ir_framework.driver.irmatching.IRViolationException: There were one or multiple IR rule failures. Please check stderr for more information. [2025-09-03T08:05:59.817Z] JavaTest Message: shutting down test [2025-09-03T08:05:59.817Z] [2025-09-03T08:05:59.817Z] STATUS:Failed.`main' threw exception: compiler.lib.ir_framework.driver.irmatching.IRViolationException: There were one or multiple IR rule failures. Please check stderr for more information. [2025-09-03T08:05:59.817Z] rerun: [2025-09-03T08:05:59.817Z] cd /tmp/ci-scripts/jtwork_rerun/hotspot/scratch && \ [2025-09-03T08:05:59.817Z] HOME=/home/bot \ [2025-09-03T08:05:59.817Z] PATH=/bin:/usr/bin:/usr/sbin \ [2025-09-03T08:05:59.817Z] CLASSPATH=/tmp/ci-scripts/jtwork_rerun/hotspot/classes/compiler/vectorapi/TestSelectFromTwoVectorOp.d:/tmp/ci-scripts/jdk-src/test/hotspot/jtreg/compiler/vectorapi:/tmp/ci-scripts/jtwork_rerun/hotspot/classes/test/lib:/tmp/ci-scripts/jdk-src/test/lib:/tmp/ci-scripts/jtwork_rerun/hotspot/classes:/tmp/ci-scripts/jdk-src/test/hotspot/jtreg:/usr/local/lib/jtreg/jtreg-7.5.2+1/lib/jtreg.jar \ [2025-09-03T08:05:59.817Z] /tmp/ci-scripts/build-fastdebug/images/jdk/bin/java \ [2025-09-03T08:05:59.817Z] -Dtest.vm.opts='-server -ea -esa -Xmx768m -Djdk.incubator.vector.test.loop-iterations=500 -Djdk.test.lib.artifacts.jcstress-tests-all=/usr/local/lib/jcstress-tests-all-0.17-snapshot-20240328.jar' \ [2025-09-03T08:05:59.818Z] -Dtest.tool.vm.opts='-J-server -J-ea -J-esa -J-Xmx768m -J-Djdk.incubator.vector.test.loop-iterations=500 -J-Djdk.test.lib.artifacts.jcstress-tests-all=/usr/local/lib/jcstress-tests-all-0.17-snapshot-20240328.jar' \ [2025-09-03T08:05:59.818Z] -Dtest.compiler.opts= \ [2025-09-03T08:05:59.818Z] -Dtest.java.opts= \ [2025-09-03T08:05:59.818Z] -Dtest.jdk=/tmp/ci-scripts/build-fastdebug/images/jdk \ [2025-09-03T08:05:59.818Z] -Dcompile.jdk=/tmp/ci-scripts/build-fastdebug/images/jdk \ [2025-09-03T08:05:59.818Z] -Dtest.timeout.factor=16.0 \ [2025-09-03T08:05:59.818Z] -Dtest.nativepath=/tmp/ci-scripts/build-fastdebug/images/test/hotspot/jtreg/native \ [2025-09-03T08:05:59.818Z] -Dtest.root=/tmp/ci-scripts/jdk-src/test/hotspot/jtreg \ [2025-09-03T08:05:59.818Z] -Dtest.name=compiler/vectorapi/TestSelectFromTwoVectorOp.java \ [2025-09-03T08:05:59.818Z] -Dtest.verbose=Verbose[p=BRIEF,f=FULL,e=BRIEF,t=false,m=false] \ [2025-09-03T08:05:59.818Z] -Dtest.file=/tmp/ci-scripts/jdk-src/test/hotspot/jtreg/compiler/vectorapi/TestSelectFromTwoVectorOp.java \ [2025-09-03T08:05:59.818Z] -Dtest.main.class=compiler.vectorapi.TestSelectFromTwoVectorOp \ [2025-09-03T08:05:59.818Z] -Dtest.src=/tmp/ci-scripts/jdk-src/test/hotspot/jtreg/compiler/vectorapi \ [2025-09-03T08:05:59.818Z] -Dtest.src.path=/tmp/ci-scripts/jdk-src/test/hotspot/jtreg/compiler/vectorapi:/tmp/ci-scripts/jdk-src/test/lib:/tmp/ci-scripts/jdk-src/test/hotspot/jtreg \ [2025-09-03T08:05:59.818Z] -Dtest.classes=/tmp/ci-scripts/jtwork_rerun/hotspot/classes/compiler/vectorapi/TestSelectFromTwoVectorOp.d \ [2025-09-03T08:05:59.818Z] -Dtest.class.path=/tmp/ci-scripts/jtwork_rerun/hotspot/classes/compiler/vectorapi/TestSelectFromTwoVectorOp.d:/tmp/ci-scripts/jtwork_rerun/hotspot/classes/test/lib:/tmp/ci-scripts/jtwork_rerun/hotspot/classes \ [2025-09-03T08:05:59.818Z] -Dtest.class.path.prefix=/tmp/ci-scripts/jtwork_rerun/hotspot/classes/compiler/vectorapi/TestSelectFromTwoVectorOp.d:/tmp/ci-scripts/jdk-src/test/hotspot/jtreg/compiler/vectorapi:/tmp/ci-scripts/jtwork_rerun/hotspot/classes/test/lib:/tmp/ci-scripts/jtwork_rerun/hotspot/classes \ [2025-09-03T08:05:59.818Z] -Dtest.modules=jdk.incubator.vector \ [2025-09-03T08:05:59.818Z] --add-modules jdk.incubator.vector \ [2025-09-03T08:05:59.818Z] -server \ [2025-09-03T08:05:59.818Z] -ea \ [2025-09-03T08:05:59.818Z] -esa \ [2025-09-03T08:05:59.818Z] -Xmx768m \ [2025-09-03T08:05:59.818Z] -Djdk.incubator.vector.test.loop-iterations=500 \ [2025-09-03T08:05:59.818Z] -Djdk.test.lib.artifacts.jcstress-tests-all=/usr/local/lib/jcstress-tests-all-0.17-snapshot-20240328.jar \ [2025-09-03T08:05:59.818Z] -Djava.library.path=/tmp/ci-scripts/build-fastdebug/images/test/hotspot/jtreg/native \ [2025-09-03T08:05:59.818Z] com.sun.javatest.regtest.agent.MainWrapper /tmp/ci-scripts/jtwork_rerun/hotspot/compiler/vectorapi/TestSelectFromTwoVectorOp.d/driver.0.jta [2025-09-03T08:05:59.818Z] [2025-09-03T08:05:59.819Z] TEST RESULT: Failed. Execution failed: `main' threw exception: compiler.lib.ir_framework.driver.irmatching.IRViolationException: There were one or multiple IR rule failures. Please check stderr for more information. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27723#issuecomment-3389099390 From roland at openjdk.org Fri Oct 10 10:02:29 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 10 Oct 2025 10:02:29 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop In-Reply-To: References: Message-ID: On Fri, 12 Sep 2025 08:57:57 GMT, Roland Westrelin wrote: > In: > > > for (int i = 100; i < 1100; i++) { > v += floatArray[i - 100]; > Objects.checkIndex(i, longRange); > } > > > The int counted loop has both an int range check and a long range. The > int range check is optimized first. Assertion predicates are inserted > above the loop. One predicates checks that: > > > init - 100 > > The loop is then transformed to enable the optimization of the long > range check. The loop is short running, so there's no need to create a > loop nest. The counted loop is mostly left as is but, the loop's > bounds are changed from: > > > for (int i = 100; i < 1100; i++) { > > > to: > > > for (int i = 0; i < 1000; i++) { > > > The reason for that the long range check transformation expects the > loop to start at 0. > > Pre/main/post loops are created. Template Assertion predicates are > added above the main loop. The loop is unrolled. Initialized assertion > predicates are created. The one created from the condition: > > > init - 100 > > checks the value of `i` out of the pre loop which is 1. That check fails. > > The root cause of the failure is that when bounds of the counted loop > are changed, template assertion predicates need to be updated with and > adjusted init input. > > When the bounds of the loop are known, the assertion predicates can be > updated in place. Otherwise, when the loop is speculated to be short > running, the assertion predicates are updated when they are cloned. Anyone for a review of this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27250#issuecomment-3389156354 From chagedorn at openjdk.org Fri Oct 10 10:11:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 Oct 2025 10:11:07 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v13] In-Reply-To: <6zw0uSB1sUHZTyDUXDjiXcB0Chmu0XH1cEngzhG-UNk=.b239a687-cfb7-49a3-993a-34327a83c4de@github.com> References: <6zw0uSB1sUHZTyDUXDjiXcB0Chmu0XH1cEngzhG-UNk=.b239a687-cfb7-49a3-993a-34327a83c4de@github.com> Message-ID: On Wed, 1 Oct 2025 15:23:33 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > 8354383: C2: enable sinking of Type nodes out of loop > > Reviewed-by: chagedorn, thartmann > (cherry picked from commit a2f99fd88bd03337e1ba73b413ffe4e39f3584cf) Thanks Kangcheng for coming back with an update and addressing my suggestion! It already looks much better! ? I did another, not yet complete, pass and left some more comments. Happy to come back again to this next week but I think I now need a break from reviewing :-) src/hotspot/share/opto/loopnode.cpp line 380: > 378: > 379: void CountedLoopConverter::insert_loop_limit_check_predicate(const ParsePredicateSuccessProj* loop_limit_check_parse_proj, > 380: Node* cmp_limit, Node* bol) { Can be made `const` and indentation is off here for the second line. src/hotspot/share/opto/loopnode.cpp line 1659: > 1657: assert(exit_test.mask() != BoolTest::ne, "unexpected condition"); > 1658: assert(iv_incr.phi_incr() == nullptr, "bad loop shape"); > 1659: assert(exit_test.cmp()->in(1) == iv_incr.incr(), "bad exit test shape"); About these assertion in this method: Aren't these already implicitly checked with the `is_valid*()` checks further up? src/hotspot/share/opto/loopnode.cpp line 1666: > 1664: #endif > 1665: > 1666: //------------------------------Counted Loop Structures----------------------------- We used to add these headers in the early days (maybe because IDEs were not that powerful?). I think you can remove these nowadays. Suggestion: src/hotspot/share/opto/loopnode.cpp line 1667: > 1665: > 1666: //------------------------------Counted Loop Structures----------------------------- > 1667: bool PhaseIdealLoop::LoopExitTest::build() { You don't seem to use the return value since you only check with `is_valid_with_bt()` afterwards. I suggest to turn this into `void`. Same with some other `build()` returns where we only check `valid()` and don't use the actual return value of `build()`. Maybe double-check them all. src/hotspot/share/opto/loopnode.cpp line 1693: > 1691: > 1692: if (!_phase->is_member(_loop, _phase->get_ctrl(_incr))) { // Swapped trip counter and limit? > 1693: swap(_incr, _limit); // Then reverse order into the CmpI Suggestion: swap(_incr, _limit); // Then reverse order into the CmpI src/hotspot/share/opto/loopnode.cpp line 1741: > 1739: _incr = back_control; > 1740: _phi_incr = phi_incr; > 1741: Suggestion: src/hotspot/share/opto/loopnode.cpp line 1750: > 1748: > 1749: _is_valid = true; > 1750: return true; In the old code, we returned a `nullptr` from `loop_iv_incr()` and then bailed out in `is_counted_loop()`. But here we seem to set `_incr` regardless and also set `_is_valid` to true. This seems incorrect. src/hotspot/share/opto/loopnode.cpp line 1801: > 1799: } > 1800: > 1801: _exit_test = PhaseIdealLoop::LoopExitTest(_back_control, _loop, _phase); It's probably not that bad but here we basically recreate the `LoopExitTest`. Instead of having a default constructor, you could directly initialize `_exit_test` in the constructor of `CountedLoopConverter` by passing only `_loop` and `_phase` and the pass `_back_control` with `_exit_test.build()`. You could do the same for the other classes like `LoopIVIncr`. This could save some reinitializations - it's probably minor but makes `LoopStructure::build()` a little easier to read. src/hotspot/share/opto/loopnode.cpp line 1814: > 1812: _iv_incr = PhaseIdealLoop::LoopIVIncr(incr, _head, _loop); > 1813: _iv_incr.build(); > 1814: if (_iv_incr.incr() == nullptr) { Why don't you also check with a `is_valid()` method here? src/hotspot/share/opto/loopnode.cpp line 1843: > 1841: (_truncated_increment.trunc1() != nullptr && _phi->in(LoopNode::LoopBackControl) != _truncated_increment.trunc1())) { > 1842: return false; > 1843: } Suggestion: } src/hotspot/share/opto/loopnode.cpp line 1884: > 1882: } > 1883: > 1884: // Trip-counter increment must be commutative & associative. This comment did not really make sense. I checked its history and it started to be misplaced here: https://github.com/openjdk/jdk/commit/baaa8f79ed93d4dc1444fed81599ab0f7c2dd395#diff-dc3fdd0572cfc2cb65bce10f08db4054dbaf1b3b94f8ad7883f6c120b4773cfeR332-R342 I suggest to move the comment again to the right place in your patch in `LoopIVIncr::build()`. src/hotspot/share/opto/loopnode.cpp line 2028: > 2026: PhaseIterGVN* igvn = &_phase->igvn(); > 2027: > 2028: _structure = LoopStructure(_head, _loop, _phase, _iv_bt); Can you also initialize this directly in the `CountedLoopConverter` constructor? src/hotspot/share/opto/loopnode.cpp line 2039: > 2037: // Check trip counter will end up higher than the limit > 2038: const TypeInteger* limit_t = igvn->type(_structure.exit_test().limit())->is_integer(_iv_bt); > 2039: if (is_infinite_loop(_structure.truncated_increment().trunc1(), limit_t, _structure.iv_incr().incr())) { Here and in the following method calls you basically fetch almost all information from `LoopStructure`. Could you also move the methods to `LoopStructure` such that you can directly access the information? You then might not even need to expose the info with accessor methods. I.e.: `_structure.is_infinite_loop(...)` src/hotspot/share/opto/loopnode.cpp line 2098: > 2096: // iv_post_i < adjusted_limit > 2097: // > 2098: // If that is not the case, we need to canonicalize the loop exit check by using different values for adjusted_limit: I suggest to refer to `final_limit_correction()` comments here. Maybe something like: If that is not the case, we need to canonicalize the loop exit check by using different values for adjusted_limit (see LoopStructure::final_limit_correction()). src/hotspot/share/opto/loopnode.cpp line 2116: > 2114: // Note that: > 2115: // (AL) limit <= adjusted_limit. > 2116: // I think this should be left here because we refer to `(AL)` further down. Suggestsion: // Note that after canonicalization: // (AL) limit <= adjusted_limit. src/hotspot/share/opto/loopnode.cpp line 2460: > 2458: } > 2459: > 2460: return iff->in(0)->isa_SafePoint(); `isa_SafePoint()` will also return non-null for subclasses of `SafePoint` like `Call` nodes. But IIUC, we want to have only exact `SafePoint` nodes. Maybe @rwestrel can double-check. Same in your patch here: _sfpt = _loop->_child == nullptr ? _phase->find_safepoint(_back_control, _head, _loop) : _back_control->in(0)->in(0)->isa_SafePoint(); src/hotspot/share/opto/loopnode.cpp line 3063: > 3061: > 3062: //============================================================================= > 3063: //----------------------match_incr_with_optional_truncation-------------------- Suggestion: src/hotspot/share/opto/loopnode.cpp line 3067: > 3065: // CHAR: (i+1)&0x7fff, BYTE: ((i+1)<<8)>>8, or SHORT: ((i+1)<<16)>>16 > 3066: bool CountedLoopNode::TruncatedIncrement::build() { > 3067: _is_valid = false; I was about to suggest to remove that since you already initialize it with `false` in the constructor. However, I like the clarity and therefore would suggest to add this as a first line to all the other `build()` methods. Then it's evidently clear that the fall back is always `_is_valid = false` when calling `build()`. src/hotspot/share/opto/loopnode.cpp line 3070: > 3068: > 3069: // Quick cutouts: > 3070: if (_expr == nullptr || _expr->req() != 3) return false; I suggest to add braces. src/hotspot/share/opto/loopnode.cpp line 3073: > 3071: > 3072: Node *t1 = nullptr; > 3073: Node *t2 = nullptr; Suggestion: Node* t1 = nullptr; Node* t2 = nullptr; src/hotspot/share/opto/loopnode.cpp line 3113: > 3111: _incr = n1; > 3112: _trunc1 = t1; > 3113: _trunc2 = t2; Can we find some better names instead of `trunc1` and `trunc2`? src/hotspot/share/opto/loopnode.hpp line 1347: > 1345: bool canonicalize_mask(jlong stride_con); > 1346: > 1347: bool is_valid_with_bt(BasicType bt) { Can be made `const`. src/hotspot/share/opto/loopnode.hpp line 2073: > 2071: > 2072: bool _insert_stride_overflow_limit_check = false; > 2073: bool _insert_init_trip_limit_check = false; Can you move all fields to the top? This makes it easier to find them. ------------- PR Review: https://git.openjdk.org/jdk/pull/24458#pullrequestreview-3321712957 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418778220 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418907645 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418774895 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418772979 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418767290 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418766210 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418992345 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418889702 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418981869 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418913201 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418974397 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418754458 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2419178179 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418922405 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418929161 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2419155761 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2419003003 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2419002784 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2419006501 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2419006886 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2419010452 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418783014 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2418751926 From chagedorn at openjdk.org Fri Oct 10 10:28:42 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 Oct 2025 10:28:42 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop In-Reply-To: References: Message-ID: <1EgDjfhpch9SuqvjEuZUyB0Y_NzmeBEWmDWRK-C0XEY=.3ebe62c7-abfa-426e-90c8-fafc2750f6a2@github.com> On Fri, 12 Sep 2025 08:57:57 GMT, Roland Westrelin wrote: > In: > > > for (int i = 100; i < 1100; i++) { > v += floatArray[i - 100]; > Objects.checkIndex(i, longRange); > } > > > The int counted loop has both an int range check and a long range. The > int range check is optimized first. Assertion predicates are inserted > above the loop. One predicates checks that: > > > init - 100 > > The loop is then transformed to enable the optimization of the long > range check. The loop is short running, so there's no need to create a > loop nest. The counted loop is mostly left as is but, the loop's > bounds are changed from: > > > for (int i = 100; i < 1100; i++) { > > > to: > > > for (int i = 0; i < 1000; i++) { > > > The reason for that the long range check transformation expects the > loop to start at 0. > > Pre/main/post loops are created. Template Assertion predicates are > added above the main loop. The loop is unrolled. Initialized assertion > predicates are created. The one created from the condition: > > > init - 100 > > checks the value of `i` out of the pre loop which is 1. That check fails. > > The root cause of the failure is that when bounds of the counted loop > are changed, template assertion predicates need to be updated with and > adjusted init input. > > When the bounds of the loop are known, the assertion predicates can be > updated in place. Otherwise, when the loop is speculated to be short > running, the assertion predicates are updated when they are cloned. I'll have a look today or on Monday :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27250#issuecomment-3389251825 From mchevalier at openjdk.org Fri Oct 10 12:20:38 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 10 Oct 2025 12:20:38 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v2] In-Reply-To: References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: On Fri, 10 Oct 2025 08:44:24 GMT, Roland Westrelin wrote: >> This change refactor code that's similar for LShiftINode and >> LShiftLNode into shared methods. I also added extra test cases to >> cover all transformations. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > sort headers Testing looks good (up to the now fixed inclusion order). I'm happy: it looks good, it's significant sharing, it's no giving some unreadable code that keeps making cases between long and int, nice new tests... An idea (not a suggestion, just something that crossed my mind, take it more as a thought experiment): we could also parametrize everything not with a `BasicType` parameter but a template parameter (since `IdealIL` and co are invoked with literal values). It wouldn't change much, but for instance it would allow to replace the assert in `java_shift_left` and friends with static checks (I have a bias toward static checks). ------------- Marked as reviewed by mchevalier (Committer). PR Review: https://git.openjdk.org/jdk/pull/27725#pullrequestreview-3323043825 From bmaillard at openjdk.org Fri Oct 10 12:44:24 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 10 Oct 2025 12:44:24 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code Message-ID: This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. ### Analysis This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced and added to this PR as a regression test. The test contains a switch inside a loop, and stressing the loop peeling results in a fairly complex graph. The split-if optimization is applied agressively, and we run a verification pass at every progress made. We end up with a relatively high number of verification passes, with each pass being fairly expensive because of the size of the graph. Each verification pass requires building a new `IdealLoopTree`. This is quite slow (which is unfortunately hard to mitigate), and also causes inefficient memory usage on the `ciEnv` arena. The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. At every call, we have - One allocation on the `ciEnv` arena to store the returned `ciField` - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) - Pushes the new symbol to the `_symbols` array The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to check if the `BasicType` of a static field is a reference type. In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called repeatedly as it is done here. The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 TypeOopPtr::TypeOopPtr type.cpp:3484 TypeInstPtr::TypeInstPtr type.cpp:3953 TypeInstPtr::make type.cpp:3990 TypeInstPtr::add_offset type.cpp:4509 AddPNode::bottom_type addnode.cpp:696 MemNode::adr_type memnode.cpp:73 PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827 PhaseIdealLoop::build_loop_late_post loopnode.cpp:6715 PhaseIdealLoop::build_loop_late loopnode.cpp:6660 PhaseIdealLoop::build_and_optimize loopnode.cpp:5093 PhaseIdealLoop::PhaseIdealLoop loopnode.hpp:1209 PhaseIdealLoop::verify loopnode.cpp:5336 ... Because the `ciEnv` arena is not fred up between verification passes, it quickly fills up and hits the memory limit after about 30s of execution in this case. ### Proposed fix As explained in the previous section, the only point of the `ciInstanceKlass::get_field_by_offset` call is to obtain the `BasicType` of the field. By inspecting carefully what this method does, we notice that the field descriptor `fd` already contains the type information we need. We do not actually need all the information embedded in the `ciField` object. ```c++ ciField* ciInstanceKlass::get_field_by_offset(int field_offset, bool is_static) { if (!is_static) { for (int i = 0, len = nof_nonstatic_fields(); i < len; i++) { ciField* field = _nonstatic_fields->at(i); int field_off = field->offset_in_bytes(); if (field_off == field_offset) return field; } return nullptr; } VM_ENTRY_MARK; InstanceKlass* k = get_instanceKlass(); fieldDescriptor fd; if (!k->find_field_from_offset(field_offset, is_static, &fd)) { return nullptr; } ciField* field = new (CURRENT_THREAD_ENV->arena()) ciField(&fd); return field; } Hence we can simply create a more specialized version of `ciInstanceKlass::get_field_type_by_offset` that directly returns the `BasicType` without creating the `ciField`. This happens to avoid the three memory allocations mentioned before. After this change, the memory usage of the `ciEnv` arena stays constant across verification passes. ### Testing - [x] Added test obtained from the fuzzer (and reduced with c-reduce) - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8366990) - [x] tier1-3, plus some internal testing Thank you for reviewing! ------------- Commit messages: - Minor comments and style changes - 8366990: Add reduced test from the fuzzer - 8366990: Avoid growing ciEnv arena in TypeOopPtr::TypeOopPtr Changes: https://git.openjdk.org/jdk/pull/27731/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366990 Stats: 168 lines in 4 files changed: 161 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731 PR: https://git.openjdk.org/jdk/pull/27731 From bmaillard at openjdk.org Fri Oct 10 13:25:51 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 10 Oct 2025 13:25:51 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: > This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. > > ### Analysis > > This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced > and added to this PR as a regression test. > > The test contains a switch inside a loop, and stressing the loop peeling results in > a fairly complex graph. The split-if optimization is applied agressively, and we > run a verification pass at every progress made. > > We end up with a relatively high number of verification passes, with each pass being > fairly expensive because of the size of the graph. > Each verification pass requires building a new `IdealLoopTree`. This is quite slow > (which is unfortunately hard to mitigate), and also causes inefficient memory usage > on the `ciEnv` arena. > > The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. > At every call, we have > - One allocation on the `ciEnv` arena to store the returned `ciField` > - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: > - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) > - Pushes the new symbol to the `_symbols` array > > The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to > check if the `BasicType` of a static field is a reference type. > > In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols > (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called > repeatedly as it is done here. > > The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: > > > ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 > TypeOopPtr::TypeOopPtr type.cpp:3484 > TypeInstPtr::TypeInstPtr type.cpp:3953 > TypeInstPtr::make type.cpp:3990 > TypeInstPtr::add_offset type.cpp:4509 > AddPNode::bottom_type addnode.cpp:696 > MemNode::adr_type memnode.cpp:73 > PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 > PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 > PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827 > PhaseIdealLoop::build_loop_late_post loopnode.cpp:67... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Add -XX:+UnlockDiagnosticVMOptions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27731/files - new: https://git.openjdk.org/jdk/pull/27731/files/61d1d187..7cdcd059 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731 PR: https://git.openjdk.org/jdk/pull/27731 From epeter at openjdk.org Fri Oct 10 13:49:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 10 Oct 2025 13:49:03 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v25] In-Reply-To: <1nIAYsKgvsIyomk5TKNvSmnLOMDj09MgqGtjxWQKt8k=.f36dbaf7-392b-4440-9050-9714c0242ed1@github.com> References: <1nIAYsKgvsIyomk5TKNvSmnLOMDj09MgqGtjxWQKt8k=.f36dbaf7-392b-4440-9050-9714c0242ed1@github.com> Message-ID: On Wed, 8 Oct 2025 16:31:49 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > Skip and delay processing add nodes with non-canonicalized inputs Code looks good, internal tests pass -> approved! Thanks for the work @tabjy ! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-3323640477 From kxu at openjdk.org Fri Oct 10 14:07:44 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 10 Oct 2025 14:07:44 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 12:45:05 GMT, Emanuel Peter wrote: >> @eme64 Thank you for reviewing! Those are very valid suggestion, especially on naming as this PR evolves. I've done the following: >> >> - updated naming (mostly with "serial addition" to "collapsable addition (into multiplication)") >> - updated comments >> - moved test file >> - merged in master >> >> Please enjoy your time off! >> >> Once GHA passes, @rwestrel could you please give this a quick review if you have some time? Thank you very much! > > @tabjy The code now looks good to me. I ran some internal testing, should take about 24h. Many thanks to @eme64 and @rwestrel for reviewing. It was really educational! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-3390319046 From kxu at openjdk.org Fri Oct 10 14:07:47 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 10 Oct 2025 14:07:47 GMT Subject: Integrated: 8347555: [REDO] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 23:29:51 GMT, Kangcheng Xu wrote: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. This pull request has now been integrated. Changeset: f6d77cb3 Author: Kangcheng Xu Committer: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/f6d77cb33299ae0636a2b52ee752f27e9ea9191b Stats: 875 lines in 6 files changed: 874 ins; 0 del; 1 mod 8347555: [REDO] C2: implement optimization for series of Add of unique value Reviewed-by: roland, epeter ------------- PR: https://git.openjdk.org/jdk/pull/23506 From chagedorn at openjdk.org Fri Oct 10 14:49:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 Oct 2025 14:49:53 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop In-Reply-To: References: Message-ID: <5mWAmKdTlGoPcIMGD1RqXAEhrL9F75m4RcJdoos5_q0=.184b3894-62b3-4ab3-9641-9f72a6c383eb@github.com> On Fri, 12 Sep 2025 08:57:57 GMT, Roland Westrelin wrote: > In: > > > for (int i = 100; i < 1100; i++) { > v += floatArray[i - 100]; > Objects.checkIndex(i, longRange); > } > > > The int counted loop has both an int range check and a long range. The > int range check is optimized first. Assertion predicates are inserted > above the loop. One predicates checks that: > > > init - 100 > > The loop is then transformed to enable the optimization of the long > range check. The loop is short running, so there's no need to create a > loop nest. The counted loop is mostly left as is but, the loop's > bounds are changed from: > > > for (int i = 100; i < 1100; i++) { > > > to: > > > for (int i = 0; i < 1000; i++) { > > > The reason for that the long range check transformation expects the > loop to start at 0. > > Pre/main/post loops are created. Template Assertion predicates are > added above the main loop. The loop is unrolled. Initialized assertion > predicates are created. The one created from the condition: > > > init - 100 > > checks the value of `i` out of the pre loop which is 1. That check fails. > > The root cause of the failure is that when bounds of the counted loop > are changed, template assertion predicates need to be updated with and > adjusted init input. > > When the bounds of the loop are known, the assertion predicates can be > updated in place. Otherwise, when the loop is speculated to be short > running, the assertion predicates are updated when they are cloned. Sorry for letting you wait - I had this on my TODO list for quite some time. The fix looks good to me! I only have some small comments. Thanks for the attribution and nice that you were able to extract another test that shows the root problem directly! src/hotspot/share/opto/loopnode.cpp line 1165: > 1163: ClonePredicateToTargetLoop _clone_predicate_to_loop; > 1164: PhaseIdealLoop* const _phase; > 1165: Node* _new_init; Can be made `const`: Suggestion: Node* const _new_init; src/hotspot/share/opto/loopnode.cpp line 1196: > 1194: // for (int = 0; i < stop - start; i+= stride) { > 1195: // Assertion Predicate added so far were with an init value of start. They need to be updated with the new init value of > 1196: // 0. Some small suggestions here. I suggest to also add some visualization to quickly see what we do here. Suggestion: // For an int counted loop, try_make_short_running_loop() transforms the loop from: // for (int = start; i < stop; i+= stride) { ... } // to // for (int = 0; i < stop - start; i+= stride) { ... } // Template Assertion Predicates added so far were with an init value of start. They need to be updated with the new // init value of 0: // zero // init | // | ===> OpaqueLoopInit init // OpaqueLoopInit \ / // AddI // src/hotspot/share/opto/node.hpp line 2168: > 2166: > 2167: // Defines an action that should be taken when we visit a target node in the BFS traversal. > 2168: virtual void target_node_action(Node* node, uint i) = 0; Maybe you can name `node` `child` and add the following comment to better describe it: Suggestion: // Defines an action that should be taken when we visit a target node in the BFS traversal. // To give more freedom, we pass the direct child node to the target node such that // child->in(i) == target node. This allows to also directly replace the target node instead // of only updating its inputs. virtual void target_node_action(Node* child, uint i) = 0; src/hotspot/share/opto/predicates.cpp line 206: > 204: } > 205: > 206: // Clone this Template Assertion Predicate and use the expression passed as argument as init. Suggestion: // Clone this Template Assertion Predicate and replace the old OpaqueLoopInit node with 'new_init'. // Note: 'new_init' could also have the 'OpaqueLoopInit` as parent node further up. src/hotspot/share/opto/predicates.cpp line 249: > 247: } > 248: > 249: void target_node_action(Node* node, uint i) override { I also suggest to add an assert here just to be sure: assert(node->in(i)->is_OpaqueLoopStride(), "must be OpaqueLoopStride"); src/hotspot/share/opto/predicates.cpp line 279: > 277: } > 278: > 279: void target_node_action(Node* node, uint i) override { Should hold but might still not hurt to add the following assertion: assert(node->in(i)->is_OpaqueLoopInit(), "must be old OpaqueLoopInit); src/hotspot/share/opto/predicates.cpp line 1144: > 1142: > 1143: // Clones the provided Template Assertion Predicate to the head of the current predicate chain at the target loop and > 1144: // replace current OpaqueLoopInit with the expression passed as argument. Suggestion: // replaces the current OpaqueLoopInit with 'new_init'. // Note: 'new_init' could also have the 'OpaqueLoopInit` as parent node further up. test/hotspot/jtreg/compiler/longcountedloops/TestShortCountedLoopWithLongRCBadAssertPredicate2.java line 1: > 1: /* Could the two tests also be merged? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27250#pullrequestreview-3323632416 PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2420490429 PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2420281093 PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2420413917 PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2420584197 PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2420591008 PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2420424787 PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2420496983 PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2420602420 From chagedorn at openjdk.org Fri Oct 10 15:09:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 Oct 2025 15:09:54 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 13:25:51 GMT, Beno?t Maillard wrote: >> This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. >> >> ### Analysis >> >> This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced >> and added to this PR as a regression test. >> >> The test contains a switch inside a loop, and stressing the loop peeling results in >> a fairly complex graph. The split-if optimization is applied agressively, and we >> run a verification pass at every progress made. >> >> We end up with a relatively high number of verification passes, with each pass being >> fairly expensive because of the size of the graph. >> Each verification pass requires building a new `IdealLoopTree`. This is quite slow >> (which is unfortunately hard to mitigate), and also causes inefficient memory usage >> on the `ciEnv` arena. >> >> The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. >> At every call, we have >> - One allocation on the `ciEnv` arena to store the returned `ciField` >> - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: >> - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) >> - Pushes the new symbol to the `_symbols` array >> >> The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to >> check if the `BasicType` of a static field is a reference type. >> >> In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols >> (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called >> repeatedly as it is done here. >> >> The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: >> >> >> ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 >> TypeOopPtr::TypeOopPtr type.cpp:3484 >> TypeInstPtr::TypeInstPtr type.cpp:3953 >> TypeInstPtr::make type.cpp:3990 >> TypeInstPtr::add_offset type.cpp:4509 >> AddPNode::bottom_type addnode.cpp:696 >> MemNode::adr_type memnode.cpp:73 >> PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 >> PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 >> PhaseIdealLoop::build_lo... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Add -XX:+UnlockDiagnosticVMOptions Nice summary and solution! I have a few comments but otherwise, the fix looks good to me. I guess it's a discussion for another time if we also want to improve the verification time somehow. But that should not block this PR. src/hotspot/share/ci/ciInstanceKlass.cpp line 434: > 432: // > 433: // This is essentially a shortcut for: > 434: // get_field_type_by_offset(field_offset, is_static)->layout_type() `get_field_by_offset()`? Suggestion: // get_field_by_offset(field_offset, is_static)->layout_type() src/hotspot/share/ci/ciInstanceKlass.cpp line 436: > 434: // get_field_type_by_offset(field_offset, is_static)->layout_type() > 435: // except this does not require allocating memory for a new ciField > 436: BasicType ciInstanceKlass::get_field_type_by_offset(int field_offset, bool is_static) { Nit: Suggestion: BasicType ciInstanceKlass::get_field_type_by_offset(const int field_offset, const bool is_static) { src/hotspot/share/ci/ciInstanceKlass.cpp line 443: > 441: if (field_off == field_offset) > 442: return field->layout_type(); > 443: } Could this code be shared with `get_field_by_offset()`? We could put it into a method and return the field. Not sure if it's also worth for the field descriptor below when having a "get field descriptor" method to further share code. You would need to check. Anyway, I'm fine with both :-) test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 1: > 1: package compiler.loopopts; For consistency, I suggest to move it after the copyright test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 39: > 37: * -XX:-TieredCompilation -Xcomp -XX:CompileCommand=dontinline,*::* > 38: * -XX:+StressLoopPeeling -XX:PerMethodTrapLimit=0 -XX:+VerifyLoopOptimizations > 39: * -XX:StressSeed=1870557292 I suggest to either remove the stress seed since it might not trigger anymore in later builds. Usually, we add a run with a fixed stress seed and one without but since this test requires to do just some work verification work, I would suggest to not add two runs but only one without fixed seed. Another question: How close are we to hit the default the memory limit with this test? With your fix it probably consumes not much memory anymore. I therefore suggest to add `MemLimit` as additional flag with a much smaller value to be sure that your fix works as expected (you might need to check how low we can choose the limit to not run into problems in higher tiers). ------------- PR Review: https://git.openjdk.org/jdk/pull/27731#pullrequestreview-3324083968 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2420663368 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2420700310 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2420698361 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2420701817 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2420721807 From mdoerr at openjdk.org Fri Oct 10 16:34:26 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 10 Oct 2025 16:34:26 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters Message-ID: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> Disabling the test for Power8 (see JBS issue). ------------- Commit messages: - 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters Changes: https://git.openjdk.org/jdk/pull/27749/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27749&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369511 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27749.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27749/head:pull/27749 PR: https://git.openjdk.org/jdk/pull/27749 From epeter at openjdk.org Fri Oct 10 19:08:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 10 Oct 2025 19:08:25 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 14:01:31 GMT, Kangcheng Xu wrote: >> @tabjy The code now looks good to me. I ran some internal testing, should take about 24h. > > Many thanks to @eme64 and @rwestrel for reviewing. It was really educational! @tabjy Congrats on the integration! This is not an official rule, but we consider it good practice not to integrate of Friday, and even less on Friday afternoon. If something breaks you probably would not be around on the weekend to fix it, and so it would mean we have more failing tests in the CI, and waste resources and have a polluted CI pipeline. It could then happen that someone just backs out your change, instead of giving you a few hours to fix a P1 bug. Again, not an official rule, just something we like to do ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-3391908282 From kxu at openjdk.org Fri Oct 10 19:24:20 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 10 Oct 2025 19:24:20 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 19:05:26 GMT, Emanuel Peter wrote: >> Many thanks to @eme64 and @rwestrel for reviewing. It was really educational! > > @tabjy Congrats on the integration! > > This is not an official rule, but we consider it good practice not to integrate of Friday, and even less on Friday afternoon. If something breaks you probably would not be around on the weekend to fix it, and so it would mean we have more failing tests in the CI, and waste resources and have a polluted CI pipeline. It could then happen that someone just backs out your change, instead of giving you a few hours to fix a P1 bug. Again, not an official rule, just something we like to do ;) @eme64 Thank you for letting me know! Sorry I wasn't aware of this practice. Thanks for explaining. It makes a lot sense and potentially save me troubles, too. I'll avoid integrating before weekends and holidays in the future. Again sorry about that. Hopefully nothing breaks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-3391982918 From syan at openjdk.org Sat Oct 11 06:01:24 2025 From: syan at openjdk.org (SendaoYan) Date: Sat, 11 Oct 2025 06:01:24 GMT Subject: Integrated: 8369490: Remove unused Runinfo parameters in compiler/c2/gvn/TestBitCompressValueTransform.java In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 09:47:30 GMT, SendaoYan wrote: > Hi all, > > The 'Runinfo info' parameters in compiler/c2/gvn/TestBitCompressValueTransform.java is unused, maybe we can remove the unused parameters. > > Change has been verified locally on linux-x64, test-fix only, trivial fix, no risk. This pull request has now been integrated. Changeset: 2dfe4586 Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/2dfe4586f7a29d9e3a944e6483d5d4cbbdde3be8 Stats: 19 lines in 1 file changed: 0 ins; 0 del; 19 mod 8369490: Remove unused Runinfo parameters in compiler/c2/gvn/TestBitCompressValueTransform.java Reviewed-by: chagedorn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/27720 From syan at openjdk.org Sat Oct 11 06:01:23 2025 From: syan at openjdk.org (SendaoYan) Date: Sat, 11 Oct 2025 06:01:23 GMT Subject: RFR: 8369490: Remove unused Runinfo parameters in compiler/c2/gvn/TestBitCompressValueTransform.java In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 10:38:06 GMT, Manuel H?ssig wrote: >> Hi all, >> >> The 'Runinfo info' parameters in compiler/c2/gvn/TestBitCompressValueTransform.java is unused, maybe we can remove the unused parameters. >> >> Change has been verified locally on linux-x64, test-fix only, trivial fix, no risk. > > That looks good and trivial. Thanks for fixing this. Thanks for the reviews @mhaessig @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/27720#issuecomment-3392956637 From syan at openjdk.org Sat Oct 11 06:10:05 2025 From: syan at openjdk.org (SendaoYan) Date: Sat, 11 Oct 2025 06:10:05 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 13:25:51 GMT, Beno?t Maillard wrote: >> This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. >> >> ### Analysis >> >> This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced >> and added to this PR as a regression test. >> >> The test contains a switch inside a loop, and stressing the loop peeling results in >> a fairly complex graph. The split-if optimization is applied agressively, and we >> run a verification pass at every progress made. >> >> We end up with a relatively high number of verification passes, with each pass being >> fairly expensive because of the size of the graph. >> Each verification pass requires building a new `IdealLoopTree`. This is quite slow >> (which is unfortunately hard to mitigate), and also causes inefficient memory usage >> on the `ciEnv` arena. >> >> The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. >> At every call, we have >> - One allocation on the `ciEnv` arena to store the returned `ciField` >> - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: >> - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) >> - Pushes the new symbol to the `_symbols` array >> >> The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to >> check if the `BasicType` of a static field is a reference type. >> >> In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols >> (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called >> repeatedly as it is done here. >> >> The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: >> >> >> ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 >> TypeOopPtr::TypeOopPtr type.cpp:3484 >> TypeInstPtr::TypeInstPtr type.cpp:3953 >> TypeInstPtr::make type.cpp:3990 >> TypeInstPtr::add_offset type.cpp:4509 >> AddPNode::bottom_type addnode.cpp:696 >> MemNode::adr_type memnode.cpp:73 >> PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 >> PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 >> PhaseIdealLoop::build_lo... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Add -XX:+UnlockDiagnosticVMOptions test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 123: > 121: public static void main(String[] t) { > 122: try { > 123: test(t); Suggestion: test(t); throw new RuntimeException("The expected NPE do not seen"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2422524266 From jbhateja at openjdk.org Mon Oct 13 06:05:12 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 13 Oct 2025 06:05:12 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: <3hRTOJiGlZXrFqy7m3loXdorkRmyL3zb7hwyrwi8b6w=.0c159d31-bab3-4ed1-94a5-23b33bad457d@github.com> References: <3hRTOJiGlZXrFqy7m3loXdorkRmyL3zb7hwyrwi8b6w=.0c159d31-bab3-4ed1-94a5-23b33bad457d@github.com> Message-ID: On Mon, 6 Oct 2025 07:50:41 GMT, Tobias Hartmann wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > Testing all passed. I'll pass the review to someone else. Hi @TobiHartmann, Looking forward to your approval here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3395974482 From rrich at openjdk.org Mon Oct 13 06:12:04 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 13 Oct 2025 06:12:04 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters In-Reply-To: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> References: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> Message-ID: On Fri, 10 Oct 2025 16:27:30 GMT, Martin Doerr wrote: > Disabling the test for Power8 (see JBS issue). Looks good. Thanks for fixing. Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27749#pullrequestreview-3330086385 From dfenacci at openjdk.org Mon Oct 13 07:07:10 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 Oct 2025 07:07:10 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 13:25:51 GMT, Beno?t Maillard wrote: >> This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. >> >> ### Analysis >> >> This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced >> and added to this PR as a regression test. >> >> The test contains a switch inside a loop, and stressing the loop peeling results in >> a fairly complex graph. The split-if optimization is applied agressively, and we >> run a verification pass at every progress made. >> >> We end up with a relatively high number of verification passes, with each pass being >> fairly expensive because of the size of the graph. >> Each verification pass requires building a new `IdealLoopTree`. This is quite slow >> (which is unfortunately hard to mitigate), and also causes inefficient memory usage >> on the `ciEnv` arena. >> >> The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. >> At every call, we have >> - One allocation on the `ciEnv` arena to store the returned `ciField` >> - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: >> - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) >> - Pushes the new symbol to the `_symbols` array >> >> The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to >> check if the `BasicType` of a static field is a reference type. >> >> In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols >> (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called >> repeatedly as it is done here. >> >> The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: >> >> >> ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 >> TypeOopPtr::TypeOopPtr type.cpp:3484 >> TypeInstPtr::TypeInstPtr type.cpp:3953 >> TypeInstPtr::make type.cpp:3990 >> TypeInstPtr::add_offset type.cpp:4509 >> AddPNode::bottom_type addnode.cpp:696 >> MemNode::adr_type memnode.cpp:73 >> PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 >> PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 >> PhaseIdealLoop::build_lo... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Add -XX:+UnlockDiagnosticVMOptions src/hotspot/share/ci/ciInstanceKlass.hpp line 207: > 205: ciField* get_field_by_offset(int field_offset, bool is_static); > 206: ciField* get_field_by_name(ciSymbol* name, ciSymbol* signature, bool is_static); > 207: BasicType get_field_type_by_offset(int field_offset, bool is_static); Following @chhagedorn's suggestion: Suggestion: BasicType get_field_type_by_offset(const int field_offset, const bool is_static); src/hotspot/share/opto/type.cpp line 3494: > 3492: } else { > 3493: // Instance fields which contains a compressed oop references. > 3494: BasicType basic_elem_type = ik->get_field_type_by_offset(_offset, false);; Suggestion: BasicType basic_elem_type = ik->get_field_type_by_offset(_offset, false); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2425357342 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2425372940 From epeter at openjdk.org Mon Oct 13 07:21:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 07:21:16 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries Message-ID: I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. **Major issue with Template Framework: lambda vs token order** The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. var testTemplate = Template.make(() -> body( ... addDataName("name", someType, MUTABLE), let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), ... )); **Two possible solutions: all-in on lambda execution or all-in on tokens** First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the following example: var testTemplate = Template.make(() -> body( ... template1.call(), "some code right here", template2.call(), ... )); One way would have been that calling `template1` and `template2` directly inserts code. But in `testTemplate` we would execute `template2` before adding the `some code right here`, so that does not work. So maybe calling `template2` should only return some state that captures what happened inside `template2`, and that captured state is only applied once we collect all the tokens of the `testTemplate` `body`. But what if the user somehow calls `template2` but never adds the captured state to tokens? As long as the captured state is pure, i.e. has truly no side-effect, that would be fine. But what if it needs to insert code into some outer scope? That would become very messy quickly. An alternative would have been to abandon the token list completely, and do something similar to `StringBuilder`, where the state is updated explicitly, and code would be added explicitly. But that does not look very nice either. Christian had asked me from the beginning if I should not make everything into tokens. I was hesitant because I thought we could not do something like sampling without returning immediately. But eventually I realized we can just create a token that samples and then calls a lambda with the result: var testTemplate = Template.make(() -> body( ... addDataName("name", someType, MUTABLE), dataNames(MUTABLE).exactOf(someType).sample((DataName dn) -> scope( ... code that can use the DataName dn ... ))), ... )); **Minor issue: Hook.insert did not work without nested Template, and was implicitly transparent for Names** Having to use a separate template for the code to be inserted is sometimes a bit cumbersome, and separates the code too far. And that insertion means the inserted template is implicitly transparent for names is also not great: if the template is used in insertion, its scope is transparent, but if it is used in regular template nesting it is non-transparent. That is not a great design: the template would have different semantics based on the context. Now we can directly do `hook.insert(scope(...))`. And we have to explicitly allow transparency by either doing: - `hook.insert(scope(...))`: non-transparent. - `hook.insert(transparentScope(..))`: transparent for names. When inserting templates, the scope of the template has to be specified to be transparent or non-transparent. This allows us to be very precise about when names escape into the anchor scope and when they stay local. And if a template with a transparent scope is used in regular template nesting, its scope is transparent as well. Hence, the behavior of a template is now more consistent, and does not depend on the context (insertion vs regular nesting). **Summary of Changes** - `Token`s instead of "immediate return functions": - Names: `sample`, `count`, `hasAny`, `toList`: this means we don't have these queries "float" above a `addDataName` or `addStructuralName`, which was very very confusing, and lead to misleading results and confusing bugs (e.g. no names found when sampling). - Hook: `isAnchored`. Prevents the `isAnchored` query from floating above the `hook.anchor`, which could lead to misleading results. - `let`: allows us to keep hashtag replacements local to nested scopes in a template. This is especially helpful when streaming over lists, where we want to have a `let` for each item. - Generalize `TemplateToken` to `ScopeToken`, and `body` to `scope` (and its friends). This allows us to use scopes systematically in templates, limiting `Name`s, hashtags and `setFuelCost`. This required quite a bit of refactoring in the `Renderer`. - Adjusted and improved the `TestTutorial.java`, as well as the `TestTemplate.java` (2.8k of the changes, i.e. the majority). **Notes for Reviewers** Make sure to look at these first: - Changes in `TestTutorial.java`. - `generateWithHashtagAndDollarReplacements3` shows generalization of scopes. - `generateWithCustomHooks` shows that we can `Hook.insert` scopes and templates. - Replacing `generateWithDataNamesAndScopes1/2` with `generateWithScopes1`: bad "old" way (floating issues) replaced with new token-based and scope-based queries. - Changes in `TestTemplate.java` (2k+ lines!) - Many tests are simply adapted (renaming only). But some are extended, modified or completely new additions. - Make sure to look first at `testLet2`, `testDataNames0a...d`, `testDataNames6`, `testStructuralNames3...6`, `testNestedScopes1/2`, `testHookAndScopes1...3`. You probably don't need to look at everything in absolute detail, just make sure you roughly know what's going on at first. - Once you understand the new semantics of the scopes and queries, look at the `template_framework` changes. - Look at the new cases in `Token.java` - Look at the changes in `CodeFrame` and `TemplateFrame`: they implement the `Name`, hashtag and `setFuelCost` (non)transparency, which is fundamental to the scopes. - Look at the `ScopeTokenImpl`, and how it is used in `Renderer.java`. I'm very sorry that this is a huge change, and that I did not get this right the first time. I'm realizing how difficult it is to develop API's ? I left some comments in the code changes, so hopefully that helps in the review. Feel free to ask me for a code-walk-through, or if you have any questions ? ------------- Commit messages: - fix test - NestingToken -> ScopeToken - flat -> transparentScope - update other test - clean up tutorial - tutorial scope and DataNames - wip tutorial - extend tutorial - more tutorial improvements - tutorial scope with insert - ... and 86 more: https://git.openjdk.org/jdk/compare/2826d170...aceced65 Changes: https://git.openjdk.org/jdk/pull/27255/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8367531 Stats: 3973 lines in 35 files changed: 2918 ins; 333 del; 722 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Mon Oct 13 07:21:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 07:21:28 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries In-Reply-To: References: Message-ID: On Fri, 12 Sep 2025 10:16:10 GMT, Emanuel Peter wrote: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Calling on the reviewers of the original Template Framework for review ;) (of course there is no obligation, but you are most familiar with the code) @mhaessig @robcasloz @chhagedorn @galderz As mentioned above: I'm sorry that I did not get this right the first time. It is a lot of code, so feel free to ask for anything, including code-walk-throughs. test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 126: > 124: return fs.toString(); > 125: } > 126: } Having the `Predicate` interface allows unification of queries across `DataName` and `StructuralName`. test/hotspot/jtreg/compiler/lib/template_framework/Hook.java line 70: > 68: * public static int $field = 42; > 69: * """ > 70: * )), I decided to refactor the example using a scope insertion rather than template insertion. I think it is a bit more concise this way. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 193: > 191: return currentCodeFrame.listNames(predicate); > 192: } > 193: We used to need these to access the queries from name filter set. But now we do it all from the Renderer anyway, so these methods are obsolete. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 266: > 264: case NothingToken() -> { > 265: // Nothing. > 266: } Was needed for queries that returned immediately, but we wanted that users could place them in a token list, so everything looks nice. But now, all the queries are tokens that do something, so this is obsolete. test/hotspot/jtreg/compiler/lib/template_framework/ScopeToken.java line 32: > 30: * created with {@link Template#scope}, {@link Template#transparentScope}, and other related methods. > 31: */ > 32: public sealed interface ScopeToken extends Token permits ScopeTokenImpl {} Note: `ScopeToken` is a generalization of the old `TemplateBody`. Also: we need this to be public, because it needs to go into public interfaces like `Template.make`. But we don't want to expose the internals (e.g. we used to have tokens exposed, not great). So I decided to hide away the details in a `ScopeTokenImpl` class now. test/hotspot/jtreg/compiler/lib/template_framework/SetFuelCostToken.java line 29: > 27: * Represents the setting of the fuel cost in the current scope. > 28: */ > 29: record SetFuelCostToken(float fuelCost) implements Token {} Github presents this as a renaming, but it is really a removal of `NothingToken` (obsolete), and an addition of the new `SetFuelCostToken`. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 182: > 180: * {@link Hook#insert}ed to where a {@link Hook} was {@link Hook#anchor}ed earlier (in some outer scope of the code). > 181: * For example, while generating code in a method, one can reach out to the scope of the class, and insert a > 182: * new field, or define a utility method. Note: removed, replaced by new descriptions about scopes and hook usage. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 195: > 193: * template can be changed when we {@code render()} it (e.g. {@link ZeroArgs#render(float)}) and the default > 194: * fuel cost with {@link #setFuelCost}) when defining the {@link #body(Object...)}. Recursive templates are > 195: * supposed to terminate once the {@link #fuel} is depleted (i.e. reaches zero). Note: moved down test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 225: > 223: * of the lambda. But a method like {@link #addDataName} returns a token, and does not immediately add > 224: * the {@link DataName}. This ensures that the {@link DataName} is only inserted when the tokens are > 225: * evaluated, so that it is inserted at the exact scope where we would expect it. Note: this was really the confusing part about the Template Framework, and this PR addresses the problem by making everything tokens. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 267: > 265: * // one by one. > 266: * )); > 267: * } Example is now obsolete, and not correct any more. It used to demonstrate the confusing behavior of immediate-return queries mixed with delayed `addDataName`, which could mean that `count` would not see `addDataName` that happened above it in the same template. That is now fixed with this PR. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 846: > 844: */ > 845: static Token let(String key, T value, Function function) { > 846: return new LetToken(key, value, function); Note: this special version of `let` was constrained to the top of a Template, and would only forward the `body`. It was a bit strange, and I think now it is nicer: you can use it at any point in the template and do a computation, and have the hashtag and lambda argument available for the inner scope. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 79: > 77: comp.addJavaSourceCode("p.xyz.InnerTest7", generateWithDataNamesSimple()); > 78: comp.addJavaSourceCode("p.xyz.InnerTest8", generateWithDataNamesForFieldsAndVariables()); > 79: comp.addJavaSourceCode("p.xyz.InnerTest9", generateWithScopes1()); These used to demonstrate the old "bad" behaviour: queries could float above `addDataName`. Now I replaced it with new tutorial "chapters". test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 423: > 421: int #name = #value; > 422: """ > 423: )); I think it is preferable to use scope insertion for the tutorial. It makes the example a little more readable. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 594: > 592: #f += 42; > 593: """ > 594: )); Note: I inlined the code directly. Having a separate template was not great. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 699: > 697: let("longs", dataNames(MUTABLE).exactOf(myLong).count()), > 698: // Note: we could also count the MUTABLE_OR_IMMUTABLE, we will > 699: // cover the concept of mutability in an example further down. Note: query used to return the integer count immediately. That meant that `count` could float above a `addDataName`. Now we do it with a token instead, and so it looks a little less convenient, since we have to capture the value into a lambda argument. See new code. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 812: > 810: // Let us have a closer look at how DataNames interact with scopes created by > 811: // Templates and Hooks. Additionally, we see how the execution order of the > 812: // lambdas and token evaluation affects the availability of DataNames. Note: `generateWithDataNamesAndScopes1/2` demonstrate the bad "old" way. Now I replaced it with `generateWithScopes1`, that explains the scopes and queries that relate to scopes. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 833: > 831: // so that the DataName can escape to outer scopes. > 832: var templateStaticField = Template.make("type", (DataName.Type type) -> transparentScope( > 833: addDataName($("field"), type, MUTABLE), // escapes template because of "transparentScope" Note: `Hook.insert` used to have an implicitly transparent scope ... but I did not like that. I think it is preferable to specify transparency explicitly, and so that is what we do now. test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 483: > 481: hook1.insert(template1.asToken()), > 482: hook1.insert(scope("Beautiful\n", template0.asToken())), > 483: "t2 isAnchored: ", hook1.isAnchored(a -> scope(a)), "\n" Note: the `isAnchored` query here would have floated above the `anchor` before this PR, and given us misleading results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3396166822 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425289812 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425298309 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425303196 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425307159 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425313590 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425315237 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425319022 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425318168 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425320302 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425323288 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425327712 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425333598 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425341015 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425342161 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425350222 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425352394 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425345031 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2425359071 From mchevalier at openjdk.org Mon Oct 13 07:27:04 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 13 Oct 2025 07:27:04 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: <-gzGJUH9BrCbFpJNL-86OKEUaR34Ffmfb0FEpN_A99E=.b5b7ffdf-ef98-414f-963d-1cf8186f55bd@github.com> On Mon, 13 Oct 2025 06:52:42 GMT, Damon Fenacci wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Add -XX:+UnlockDiagnosticVMOptions > > src/hotspot/share/ci/ciInstanceKlass.hpp line 207: > >> 205: ciField* get_field_by_offset(int field_offset, bool is_static); >> 206: ciField* get_field_by_name(ciSymbol* name, ciSymbol* signature, bool is_static); >> 207: BasicType get_field_type_by_offset(int field_offset, bool is_static); > > Following @chhagedorn's suggestion: > Suggestion: > > BasicType get_field_type_by_offset(const int field_offset, const bool is_static); I don't think that is a good suggestion. `const` in this context is useless (and CLion will gray it and suggest to remove it). It doesn't need to appear here to make @chhagedorn's suggestion work, and it just makes the signature longer. Making the parameter const is merely an implementation detail, it doesn't need to be part of the signature or the contract since it won't have any effect on the caller, it doesn't document anything... It doesn't even prevent to implement the function without the `const`. That being said, I'm all for sprinkling `const` everywhere it makes a difference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2425412323 From epeter at openjdk.org Mon Oct 13 07:49:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 07:49:13 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v13] In-Reply-To: <7Hvs60B_m8bmMzOMyrBZ_CbNJrQmHPMFKRAEU7F-Tu4=.94863cda-8af6-4e50-8563-2144947074e4@github.com> References: <7Hvs60B_m8bmMzOMyrBZ_CbNJrQmHPMFKRAEU7F-Tu4=.94863cda-8af6-4e50-8563-2144947074e4@github.com> Message-ID: <_cUB-8iXrBPLWKg93b7tdMAbWxbnLtLLkiL6qBJls6E=.52dde175-9c1d-4d17-85f5-39a90d165ed8@github.com> On Thu, 9 Oct 2025 06:27:52 GMT, Jatin Bhateja wrote: >> This patch optimizes PopCount value transforms using KnownBits information. >> Following are the results of the micro-benchmark included with the patch >> >> >> >> System: 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s >> >> Withopt: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions I'm back from vacation. Code looks good to me now. But running some internal testing before approval ;) ------------- PR Review: https://git.openjdk.org/jdk/pull/27075#pullrequestreview-3330329228 From chagedorn at openjdk.org Mon Oct 13 08:15:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Oct 2025 08:15:23 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v6] In-Reply-To: References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> Message-ID: On Thu, 18 Sep 2025 08:12:50 GMT, Emanuel Peter wrote: >> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)). >> >> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`. >> >> Details, in **order you should review**: >> - `Operations.java`: maps lots of primitive operators as Expressions. >> - `Expression.java`: the fundamental engine behind Expressions. >> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants. >> - `tests/TestExpression.java`: correctness test of Expression machinery. >> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification. >> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions. >> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls. >> >> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples. >> >> **Future Work**: >> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization. >> - Use `Expression`s to model more operations: >> - `Vector API`, more arithmetic operations like from `Math` classes etc. >> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints. >> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just fol... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - more comments > - add othervm to test Nice work! I left some comments here and there when walking through the code but I did not deep dive into it much since it was already properly reviewed by 2 reviewers :-) test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 44: > 42: * > 43: *

> 44: * The {@link Expression}s are composable, they can be explicitly {@link nest}ed, or randomly Suggestion: * The {@link Expression}s are composable, they can be explicitly {@link #nest}ed, or randomly test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 349: > 347: " expected: " + argumentTypes.size() + > 348: " but got: " + arguments.size() + > 349: " for " + this.toString()); `toString()` is implicitly called: Suggestion: " for " + this); test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 401: > 399: List filtered = expressions.stream().filter(e -> e.returnType.isSubtypeOf(returnType)).toList(); > 400: > 401: if (filtered.size() == 0) { Suggestion: if (filtered.isEmpty()) { test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 426: > 424: List filtered = nestingExpressions.stream().filter(e -> e.returnType.isSubtypeOf(argumentType)).toList(); > 425: > 426: if (filtered.size() == 0) { Suggestion: if (filtered.isEmpty()) { test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 460: > 458: newStrings.add(this.strings.get(argumentIndex) + > 459: nestingExpression.strings.get(0)); // concat s1 and S0 > 460: newArgumentTypes.add(nestingExpression.argumentTypes.get(0)); You can use `getFirst()`: Suggestion: nestingExpression.strings.getFirst()); // concat s1 and S0 newArgumentTypes.add(nestingExpression.argumentTypes.getFirst()); test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 67: > 65: ops.add(Expression.make(BYTES, "(byte)(", LONGS, ")")); > 66: ops.add(Expression.make(BYTES, "(byte)(", FLOATS, ")")); > 67: ops.add(Expression.make(BYTES, "(byte)(", DOUBLES, ")")); There is a lot of repetition of this block for the various types. Could you share the code? Maybe something like this: ops.add(Expression.make(returnType, "(castType)(", BYTES, ")")); ops.add(Expression.make(returnType, "(castType)(", SHORTS, ")")); ops.add(Expression.make(returnType, "(castType)(", CHARS, ")")); ops.add(Expression.make(returnType, "(castType)(", INTS, ")")); ops.add(Expression.make(returnType, "(castType)(", LONGS, ")")); ops.add(Expression.make(returnType, "(castType)(", FLOATS, ")")); ops.add(Expression.make(returnType, "(castType)(", DOUBLES, ")")); test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 72: > 70: ops.add(Expression.make(BYTES, "(", BOOLEANS, "?", BYTES, ":", BYTES, ")")); > 71: > 72: // Arithmetic operations are not performned in byte, but rather promoted to int. Suggestion: // Arithmetic operations are not performed in byte, but rather promoted to int. test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 154: > 152: ops.add(Expression.make(BOOLEANS, "(", INTS, " < ", INTS, ")")); > 153: ops.add(Expression.make(BOOLEANS, "(", INTS, " >= ", INTS, ")")); > 154: ops.add(Expression.make(BOOLEANS, "(", INTS, " <= ", INTS, ")")); This also seems to be repeated for `LONGS` and could be shared. test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 263: > 261: ops.add(Expression.make(BOOLEANS, "(", FLOATS, " < ", FLOATS, ")")); > 262: ops.add(Expression.make(BOOLEANS, "(", FLOATS, " >= ", FLOATS, ")")); > 263: ops.add(Expression.make(BOOLEANS, "(", FLOATS, " <= ", FLOATS, ")")); Could also be shared with the double version. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 180: > 178: // Fill an array with 1_000 random values. Every type has at least 2 values, > 179: // so the chance that all values are the same is 2^-1_000 < 10^-300. This should > 180: // never happen, even with a relatively weak PRNG. And if it does, we should consider playing the lottery :-) test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 226: > 224: import java.util.Random; > 225: import jdk.test.lib.Utils; > 226: import compiler.lib.generators.*; Could this be an utility method like `libraryRNGImports()` or something like that? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26885#pullrequestreview-3330339890 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425503658 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425509576 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425506121 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425506497 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425508014 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425480526 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425471097 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425485743 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425488296 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425494544 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425497241 From epeter at openjdk.org Mon Oct 13 08:22:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 08:22:07 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types In-Reply-To: References: Message-ID: <_yOZQmT7y439P5uySjWud3bDKFoeqBEXL0aEu96Yod8=.06e6100b-0386-4a7f-9a70-25c578526dbc@github.com> On Tue, 7 Oct 2025 07:36:41 GMT, Roland Westrelin wrote: > Currently ReassociateInvariants is only enabled for int counted > loops. I noticed, enabling it for long counted loops helps RCE. It > also seems like something that would help any loop. I propose enabling > it for all inner loops. Nice. That looks reasonable. And great that it addresses my two reports! I think you should now close my two reports as duplicates. Once this is integrated, I'll do some investigations on `TestAliasingFuzzer.java`, to see if we might be able to enable some more IR rules there. If not, I'll file some more reports ;) I'll run some internal testing now... test/hotspot/jtreg/compiler/loopopts/TestReassociateInvariants.java line 28: > 26: * @summary > 27: * @library /test/lib / > 28: * @run driver compiler.loopopts.TestReassociateInvariants I'm missing a `@bug 8369258` Suggestion: * @test * @bug 8369258 * @summary * @library /test/lib / * @run driver compiler.loopopts.TestReassociateInvariants test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment_8365982.java line 40: > 38: * can also tighten up the IR rules there. > 39: * @library /test/lib / > 40: * @run driver compiler.loopopts.superword.TestMemorySegment_8365982 You should add the new bug-id to the one above! `@bug 8324751 8369258` test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment_8365982.java line 88: > 86: // does not have any range checks any more. > 87: // Now it vectorizes. That's good, but we should be able to vectorize without multiversioning. > 88: // Can you please also remove the comments below? test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment_ReassociateInvariants1.java line 40: > 38: * can also tighten up the IR rules there. > 39: * @library /test/lib / > 40: * @run driver compiler.loopopts.superword.TestMemorySegment_ReassociateInvariants1 You should add the new bug-id to the one above! `@bug 8324751 8369258` ------------- PR Review: https://git.openjdk.org/jdk/pull/27666#pullrequestreview-3330405292 PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2425522357 PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2425516994 PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2425519554 PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2425515941 From epeter at openjdk.org Mon Oct 13 08:33:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 08:33:47 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v7] In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> Message-ID: > Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)). > > Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`. > > Details, in **order you should review**: > - `Operations.java`: maps lots of primitive operators as Expressions. > - `Expression.java`: the fundamental engine behind Expressions. > - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants. > - `tests/TestExpression.java`: correctness test of Expression machinery. > - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification. > - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions. > - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls. > > If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples. > > **Future Work**: > - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization. > - Use `Expression`s to model more operations: > - `Vector API`, more arithmetic operations like from `Math` classes etc. > - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints. > - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26885/files - new: https://git.openjdk.org/jdk/pull/26885/files/c04c879c..199e06a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=05-06 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/26885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26885/head:pull/26885 PR: https://git.openjdk.org/jdk/pull/26885 From epeter at openjdk.org Mon Oct 13 08:38:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 08:38:08 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v6] In-Reply-To: References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> Message-ID: <3z9Us9GjCwZqLDsZ9LsWrQRetfR0vm3yZj_o_lwWplc=.00d043d1-ba22-4f65-8dae-a07fe2f56ecf@github.com> On Mon, 13 Oct 2025 08:02:40 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - more comments >> - add othervm to test > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 226: > >> 224: import java.util.Random; >> 225: import jdk.test.lib.Utils; >> 226: import compiler.lib.generators.*; > > Could this be an utility method like `libraryRNGImports()` or something like that? I suppose it could be. But I consider these imports so general that they are most likely already imported in all relevant cases. I propose that we keep it as is for now, and add the extra method if it becomes too cumbersome in the future to do it manually. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425568551 From epeter at openjdk.org Mon Oct 13 08:43:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 08:43:08 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v6] In-Reply-To: References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> Message-ID: On Mon, 13 Oct 2025 07:54:28 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - more comments >> - add othervm to test > > test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 67: > >> 65: ops.add(Expression.make(BYTES, "(byte)(", LONGS, ")")); >> 66: ops.add(Expression.make(BYTES, "(byte)(", FLOATS, ")")); >> 67: ops.add(Expression.make(BYTES, "(byte)(", DOUBLES, ")")); > > There is a lot of repetition of this block for the various types. Could you share the code? Maybe something like this: > > > ops.add(Expression.make(returnType, "(castType)(", BYTES, ")")); > ops.add(Expression.make(returnType, "(castType)(", SHORTS, ")")); > ops.add(Expression.make(returnType, "(castType)(", CHARS, ")")); > ops.add(Expression.make(returnType, "(castType)(", INTS, ")")); > ops.add(Expression.make(returnType, "(castType)(", LONGS, ")")); > ops.add(Expression.make(returnType, "(castType)(", FLOATS, ")")); > ops.add(Expression.make(returnType, "(castType)(", DOUBLES, ")")); I'll look into simplifications, good idea :) > test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 154: > >> 152: ops.add(Expression.make(BOOLEANS, "(", INTS, " < ", INTS, ")")); >> 153: ops.add(Expression.make(BOOLEANS, "(", INTS, " >= ", INTS, ")")); >> 154: ops.add(Expression.make(BOOLEANS, "(", INTS, " <= ", INTS, ")")); > > This also seems to be repeated for `LONGS` and could be shared. I'll look into simplifications, good idea :) > test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 263: > >> 261: ops.add(Expression.make(BOOLEANS, "(", FLOATS, " < ", FLOATS, ")")); >> 262: ops.add(Expression.make(BOOLEANS, "(", FLOATS, " >= ", FLOATS, ")")); >> 263: ops.add(Expression.make(BOOLEANS, "(", FLOATS, " <= ", FLOATS, ")")); > > Could also be shared with the double version. I'll look into simplifications, good idea :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425577971 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425578143 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425578255 From epeter at openjdk.org Mon Oct 13 08:48:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 08:48:55 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v8] In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> Message-ID: > Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)). > > Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`. > > Details, in **order you should review**: > - `Operations.java`: maps lots of primitive operators as Expressions. > - `Expression.java`: the fundamental engine behind Expressions. > - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants. > - `tests/TestExpression.java`: correctness test of Expression machinery. > - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification. > - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions. > - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls. > > If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples. > > **Future Work**: > - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization. > - Use `Expression`s to model more operations: > - `Vector API`, more arithmetic operations like from `Math` classes etc. > - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints. > - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 33 additional commits since the last revision: - Merge branch 'master' into JDK-8359412-Template-Framework-Expressions - Apply suggestions from code review Co-authored-by: Christian Hagedorn - more comments - add othervm to test - Merge branch 'master' into JDK-8359412-Template-Framework-Expressions - Apply Manuel's suggestions part 3 Co-authored-by: Manuel H?ssig - Apply Manuel's suggestions part 2 - Apply Manuel's suggestions part 1 Co-authored-by: Manuel H?ssig - fix whitespaces - LibraryRNG example - ... and 23 more: https://git.openjdk.org/jdk/compare/1acca879...1f51d14b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26885/files - new: https://git.openjdk.org/jdk/pull/26885/files/199e06a3..1f51d14b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=06-07 Stats: 162287 lines in 1972 files changed: 134923 ins; 17448 del; 9916 mod Patch: https://git.openjdk.org/jdk/pull/26885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26885/head:pull/26885 PR: https://git.openjdk.org/jdk/pull/26885 From epeter at openjdk.org Mon Oct 13 09:08:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 09:08:12 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v2] In-Reply-To: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> References: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> Message-ID: On Wed, 8 Oct 2025 09:00:34 GMT, Roland Westrelin wrote: >> In the `test1()` method of the test case: >> >> `inlined2()` calls `clone()` for an object loaded from field `field` >> that has inexact type `A` at parse time. The intrinsic for `clone()` >> inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the >> load of `field` is optimized out because it reads back a newly >> allocated `B` written to `field` in the same method. `ArrayCopy` can >> now be optimized because the type of its `src` input is known. The >> type of its `dest` input is the `CheckCastPP` from the allocation of >> the cloned object created at parse time. That one has type `A`. A >> series of `Load`s/`Store`s are created to copy the fields of class `B` >> from `src` (of type `B`) to `dest` of (type `A`). >> >> Writting to `dest` with offsets for fields that don't exist in `A`, >> causes this code in `Compile::flatten_alias_type()`: >> >> >> } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { >> // Static fields are in the space above the normal instance >> // fields in the java.lang.Class instance. >> if (ik != ciEnv::current()->Class_klass()) { >> to = nullptr; >> tj = TypeOopPtr::BOTTOM; >> offset = tj->offset(); >> } >> >> >> to assign it some slice that doesn't match the one that's used at the >> same offset in `B`. >> >> That causes an assert in `ArrayCopyNode::try_clone_instance()` to >> fire. With a release build, execution proceeds. `test1()` also has a >> non escaping allocation. That one causes EA to run and >> `ConnectionGraph::split_unique_types()` to move the store to the non >> escaping allocation to a new slice. In the process, when it iterates >> over `MergeMem` nodes, it notices the stores added by >> `ArrayCopyNode::try_clone_instance()`, finds that some are not on the >> right slice, tries to move them to the correct slice (expecting they >> are from a non escaping EA). That causes some of the `Store`s to be >> disconnected. When the resulting code runs, execution fails as some >> fields are not copied. >> >> The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` >> when `src` and `dest` classes don't match as this seems like a rare >> enough corner case. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8339526 > - Update src/hotspot/share/opto/arraycopynode.cpp > > Co-authored-by: Christian Hagedorn > - test & fix Hi Roland, thanks for looking into this! Can you explain why the `clone` in `inlined2` creates an `ArrayCopy` node? I think I'm missing some context here. Because we are cloning an `A` and not an array, right? test/hotspot/jtreg/compiler/arraycopy/TestCloneUnknownClassAtParseTime.java line 28: > 26: * @bug 8339526 > 27: * @summary C2: store incorrectly removed for clone() transformed to series of loads/stores > 28: * @run main/othervm -XX:-BackgroundCompilation TestCloneUnknownClassAtParseTime Would it make sense to also have a run without the flag? test/hotspot/jtreg/compiler/arraycopy/TestCloneUnknownClassAtParseTime.java line 63: > 61: private static A inlined2() throws CloneNotSupportedException { > 62: A a = field; > 63: return (A)a.clone(); Out of curiosity: why do we even add a `ArrayCopy` here? ------------- PR Review: https://git.openjdk.org/jdk/pull/27604#pullrequestreview-3330535225 PR Review Comment: https://git.openjdk.org/jdk/pull/27604#discussion_r2425606421 PR Review Comment: https://git.openjdk.org/jdk/pull/27604#discussion_r2425625118 From chagedorn at openjdk.org Mon Oct 13 09:13:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Oct 2025 09:13:13 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: References: Message-ID: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> On Thu, 9 Oct 2025 21:46:36 GMT, Emanuel Peter wrote: >> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: >> https://github.com/openjdk/jdk/pull/20964 >> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) >> >> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. >> >> This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? >> >> -------------------------- >> >> **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. >> >> **Details** >> Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. >> >> Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. >> >> **Future Work** >> - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) >> - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Vladimir K7 Nice refactoring! Some small comments, otherwise, it looks good to me, too! src/hotspot/share/opto/vtransform.cpp line 46: > 44: TRACE_OPTIMIZE( tty->print_cr("\nVTransformGraph::optimize"); ) > 45: > 46: while (true) { Could we also just do `while (progress)`? You always seem to check `!progress` at the very end of the loop. src/hotspot/share/opto/vtransform.cpp line 97: > 95: > 96: collect_nodes_without_strong_in_edges(stack); > 97: int num_alive_nodes = count_alive_vtnodes(); Suggestion: const int num_alive_nodes = count_alive_vtnodes(); src/hotspot/share/opto/vtransform.cpp line 1070: > 1068: // outside the loop, and instead cheaper element-wise vector accumulations > 1069: // are performed inside the loop. > 1070: bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop(const VLoopAnalyzer& vloop_analyzer, VTransform& vtransform) { Any particular reason you chose the additional `optimize` prefix? I think the intent is already clear without it. src/hotspot/share/opto/vtransform.cpp line 1074: > 1072: uint vlen = vector_length(); > 1073: BasicType bt = element_basic_type(); > 1074: int ropc = vector_reduction_opcode(); Can probably be made `const` for good measure: Suggestion: const int sopc = scalar_opcode(); const uint vlen = vector_length(); const BasicType bt = element_basic_type(); const int ropc = vector_reduction_opcode(); And could they also be moved down to the definition of `vopc`? src/hotspot/share/opto/vtransform.cpp line 1113: > 1111: VTransformReductionVectorNode* last_red = phi->in_req(2)->isa_ReductionVector(); > 1112: VTransformReductionVectorNode* current_red = last_red; > 1113: while (true) { The method is already quite big. IIUC, this only does some checking and we do not need to bookkeep for further down. Therefore, I suggest to extract this to a "is_looping_back_to_phi" method or something like that. src/hotspot/share/opto/vtransform.cpp line 1175: > 1173: // Create a vector of identity values. > 1174: Node* identity = ReductionNode::make_identity_con_scalar(phase->igvn(), sopc, bt); > 1175: phase->set_ctrl(identity, phase->C->root()); Any particular reason why you are no longer using `set_root_as_ctrl()`? ------------- PR Review: https://git.openjdk.org/jdk/pull/27704#pullrequestreview-3330490894 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425592796 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425602123 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425609143 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425611531 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425640049 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425644744 From chagedorn at openjdk.org Mon Oct 13 09:13:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Oct 2025 09:13:15 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: References: Message-ID: On Wed, 8 Oct 2025 22:45:50 GMT, Emanuel Peter wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Vladimir K7 > > src/hotspot/share/opto/vtransform.cpp line 43: > >> 41: ) >> 42: >> 43: void VTransformGraph::optimize(VTransform& vtransform) { > > Note: this is similar to IGVN optimization. But we are a bit lazy, and don't care about notifiation / worklist. Can you add that as a method comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425575749 From chagedorn at openjdk.org Mon Oct 13 09:13:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Oct 2025 09:13:17 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> References: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> Message-ID: On Mon, 13 Oct 2025 08:45:30 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Vladimir K7 > > src/hotspot/share/opto/vtransform.cpp line 46: > >> 44: TRACE_OPTIMIZE( tty->print_cr("\nVTransformGraph::optimize"); ) >> 45: >> 46: while (true) { > > Could we also just do `while (progress)`? You always seem to check `!progress` at the very end of the loop. If there is a bug and we keep setting `progress` to true, we might loop endlessly. Is there another always true upper-bound condition? We could additionally add an assert for catching issues when we bail out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425601345 From epeter at openjdk.org Mon Oct 13 09:39:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 09:39:05 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v9] In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> Message-ID: <1oECW4fhGLBNbrcdMBc-l7-8Yg5Fdy2xyns-pv2EfNI=.75c3b818-1154-416b-ae15-6e2053ee0f60@github.com> > Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)). > > Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`. > > Details, in **order you should review**: > - `Operations.java`: maps lots of primitive operators as Expressions. > - `Expression.java`: the fundamental engine behind Expressions. > - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants. > - `tests/TestExpression.java`: correctness test of Expression machinery. > - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification. > - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions. > - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls. > > If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples. > > **Future Work**: > - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization. > - Use `Expression`s to model more operations: > - `Vector API`, more arithmetic operations like from `Math` classes etc. > - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints. > - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: refactor for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26885/files - new: https://git.openjdk.org/jdk/pull/26885/files/1f51d14b..c6787e41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=07-08 Stats: 207 lines in 1 file changed: 47 ins; 138 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/26885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26885/head:pull/26885 PR: https://git.openjdk.org/jdk/pull/26885 From epeter at openjdk.org Mon Oct 13 09:39:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 09:39:07 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v6] In-Reply-To: References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> Message-ID: On Mon, 13 Oct 2025 08:12:39 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - more comments >> - add othervm to test > > Nice work! I left some comments here and there when walking through the code but I did not deep dive into it much since it was already properly reviewed by 2 reviewers :-) @chhagedorn Thanks for the review! I applied all your suggestions, and also merged with master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26885#issuecomment-3396640181 From epeter at openjdk.org Mon Oct 13 10:08:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 10:08:44 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v3] In-Reply-To: References: Message-ID: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? > > -------------------------- > > **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. > > **Details** > Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. > > Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. > > **Future Work** > - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) > - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: for Christian part 1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27704/files - new: https://git.openjdk.org/jdk/pull/27704/files/a7cd2685..1710b58d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=01-02 Stats: 20 lines in 1 file changed: 10 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27704.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27704/head:pull/27704 PR: https://git.openjdk.org/jdk/pull/27704 From bmaillard at openjdk.org Mon Oct 13 10:27:43 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 13 Oct 2025 10:27:43 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v3] In-Reply-To: References: Message-ID: > This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. > > ### Analysis > > This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced > and added to this PR as a regression test. > > The test contains a switch inside a loop, and stressing the loop peeling results in > a fairly complex graph. The split-if optimization is applied agressively, and we > run a verification pass at every progress made. > > We end up with a relatively high number of verification passes, with each pass being > fairly expensive because of the size of the graph. > Each verification pass requires building a new `IdealLoopTree`. This is quite slow > (which is unfortunately hard to mitigate), and also causes inefficient memory usage > on the `ciEnv` arena. > > The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. > At every call, we have > - One allocation on the `ciEnv` arena to store the returned `ciField` > - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: > - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) > - Pushes the new symbol to the `_symbols` array > > The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to > check if the `BasicType` of a static field is a reference type. > > In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols > (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called > repeatedly as it is done here. > > The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: > > > ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 > TypeOopPtr::TypeOopPtr type.cpp:3484 > TypeInstPtr::TypeInstPtr type.cpp:3953 > TypeInstPtr::make type.cpp:3990 > TypeInstPtr::add_offset type.cpp:4509 > AddPNode::bottom_type addnode.cpp:696 > MemNode::adr_type memnode.cpp:73 > PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 > PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 > PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827 > PhaseIdealLoop::build_loop_late_post loopnode.cpp:67... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/ci/ciInstanceKlass.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27731/files - new: https://git.openjdk.org/jdk/pull/27731/files/7cdcd059..56055391 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731 PR: https://git.openjdk.org/jdk/pull/27731 From bmaillard at openjdk.org Mon Oct 13 10:27:45 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 13 Oct 2025 10:27:45 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: <1dNutHBmly7Dp0gSjsXiMWUK7EsECApIw3DAd00EGoA=.76dd6a31-4046-44a6-8d09-8def6def9c27@github.com> On Fri, 10 Oct 2025 14:53:42 GMT, Christian Hagedorn wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Add -XX:+UnlockDiagnosticVMOptions > > src/hotspot/share/ci/ciInstanceKlass.cpp line 434: > >> 432: // >> 433: // This is essentially a shortcut for: >> 434: // get_field_type_by_offset(field_offset, is_static)->layout_type() > > `get_field_by_offset()`? > Suggestion: > > // get_field_by_offset(field_offset, is_static)->layout_type() Oops, good catch ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2425846108 From chagedorn at openjdk.org Mon Oct 13 10:28:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Oct 2025 10:28:22 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v9] In-Reply-To: <1oECW4fhGLBNbrcdMBc-l7-8Yg5Fdy2xyns-pv2EfNI=.75c3b818-1154-416b-ae15-6e2053ee0f60@github.com> References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> <1oECW4fhGLBNbrcdMBc-l7-8Yg5Fdy2xyns-pv2EfNI=.75c3b818-1154-416b-ae15-6e2053ee0f60@github.com> Message-ID: On Mon, 13 Oct 2025 09:39:05 GMT, Emanuel Peter wrote: >> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)). >> >> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`. >> >> Details, in **order you should review**: >> - `Operations.java`: maps lots of primitive operators as Expressions. >> - `Expression.java`: the fundamental engine behind Expressions. >> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants. >> - `tests/TestExpression.java`: correctness test of Expression machinery. >> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification. >> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions. >> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls. >> >> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples. >> >> **Future Work**: >> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization. >> - Use `Expression`s to model more operations: >> - `Vector API`, more arithmetic operations like from `Math` classes etc. >> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints. >> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just fol... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > refactor for Christian Looks good, thanks for the update! test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 60: > 58: Expression.Info withNondeterministicResult = new Expression.Info().withNondeterministicResult(); > 59: > 60: // Cast between all primitive types. Escept for Boolean, we cannot cast from and to. Suggestion: // Cast between all primitive types. Except for Boolean, we cannot cast from and to. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26885#pullrequestreview-3330888273 PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2425856952 From bmaillard at openjdk.org Mon Oct 13 10:36:47 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 13 Oct 2025 10:36:47 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v4] In-Reply-To: References: Message-ID: > This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. > > ### Analysis > > This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced > and added to this PR as a regression test. > > The test contains a switch inside a loop, and stressing the loop peeling results in > a fairly complex graph. The split-if optimization is applied agressively, and we > run a verification pass at every progress made. > > We end up with a relatively high number of verification passes, with each pass being > fairly expensive because of the size of the graph. > Each verification pass requires building a new `IdealLoopTree`. This is quite slow > (which is unfortunately hard to mitigate), and also causes inefficient memory usage > on the `ciEnv` arena. > > The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. > At every call, we have > - One allocation on the `ciEnv` arena to store the returned `ciField` > - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: > - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) > - Pushes the new symbol to the `_symbols` array > > The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to > check if the `BasicType` of a static field is a reference type. > > In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols > (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called > repeatedly as it is done here. > > The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: > > > ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 > TypeOopPtr::TypeOopPtr type.cpp:3484 > TypeInstPtr::TypeInstPtr type.cpp:3953 > TypeInstPtr::make type.cpp:3990 > TypeInstPtr::add_offset type.cpp:4509 > AddPNode::bottom_type addnode.cpp:696 > MemNode::adr_type memnode.cpp:73 > PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 > PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 > PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827 > PhaseIdealLoop::build_loop_late_post loopnode.cpp:67... Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/ci/ciInstanceKlass.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/type.cpp Co-authored-by: Damon Fenacci ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27731/files - new: https://git.openjdk.org/jdk/pull/27731/files/56055391..37ff941e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731 PR: https://git.openjdk.org/jdk/pull/27731 From bmaillard at openjdk.org Mon Oct 13 10:42:52 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 13 Oct 2025 10:42:52 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v5] In-Reply-To: References: Message-ID: > This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. > > ### Analysis > > This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced > and added to this PR as a regression test. > > The test contains a switch inside a loop, and stressing the loop peeling results in > a fairly complex graph. The split-if optimization is applied agressively, and we > run a verification pass at every progress made. > > We end up with a relatively high number of verification passes, with each pass being > fairly expensive because of the size of the graph. > Each verification pass requires building a new `IdealLoopTree`. This is quite slow > (which is unfortunately hard to mitigate), and also causes inefficient memory usage > on the `ciEnv` arena. > > The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. > At every call, we have > - One allocation on the `ciEnv` arena to store the returned `ciField` > - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: > - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) > - Pushes the new symbol to the `_symbols` array > > The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to > check if the `BasicType` of a static field is a reference type. > > In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols > (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called > repeatedly as it is done here. > > The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: > > > ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 > TypeOopPtr::TypeOopPtr type.cpp:3484 > TypeInstPtr::TypeInstPtr type.cpp:3953 > TypeInstPtr::make type.cpp:3990 > TypeInstPtr::add_offset type.cpp:4509 > AddPNode::bottom_type addnode.cpp:696 > MemNode::adr_type memnode.cpp:73 > PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 > PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 > PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827 > PhaseIdealLoop::build_loop_late_post loopnode.cpp:67... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Move package after copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27731/files - new: https://git.openjdk.org/jdk/pull/27731/files/37ff941e..04582ccb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=03-04 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731 PR: https://git.openjdk.org/jdk/pull/27731 From epeter at openjdk.org Mon Oct 13 10:43:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 10:43:28 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v10] In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> Message-ID: <2LCX4Ymc-sCbwWkl0vIpp3ue80ddsN0OdLc7bZ9KN14=.ba3b9956-52e2-453d-8495-932e1057aaed@github.com> > Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)). > > Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`. > > Details, in **order you should review**: > - `Operations.java`: maps lots of primitive operators as Expressions. > - `Expression.java`: the fundamental engine behind Expressions. > - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants. > - `tests/TestExpression.java`: correctness test of Expression machinery. > - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification. > - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions. > - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls. > > If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples. > > **Future Work**: > - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization. > - Use `Expression`s to model more operations: > - `Vector API`, more arithmetic operations like from `Math` classes etc. > - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints. > - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26885/files - new: https://git.openjdk.org/jdk/pull/26885/files/c6787e41..3e4a1b76 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26885/head:pull/26885 PR: https://git.openjdk.org/jdk/pull/26885 From epeter at openjdk.org Mon Oct 13 10:58:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 10:58:39 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v4] In-Reply-To: References: Message-ID: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? > > -------------------------- > > **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. > > **Details** > Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. > > Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. > > **Future Work** > - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) > - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: for Christian part 2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27704/files - new: https://git.openjdk.org/jdk/pull/27704/files/1710b58d..925255e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27704.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27704/head:pull/27704 PR: https://git.openjdk.org/jdk/pull/27704 From epeter at openjdk.org Mon Oct 13 10:58:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 10:58:42 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v4] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 08:38:54 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/vtransform.cpp line 43: >> >>> 41: ) >>> 42: >>> 43: void VTransformGraph::optimize(VTransform& vtransform) { >> >> Note: this is similar to IGVN optimization. But we are a bit lazy, and don't care about notifiation / worklist. > > Can you add that as a method comment? Good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425713492 From epeter at openjdk.org Mon Oct 13 10:58:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 10:58:43 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: References: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> Message-ID: On Mon, 13 Oct 2025 08:49:08 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/vtransform.cpp line 46: >> >>> 44: TRACE_OPTIMIZE( tty->print_cr("\nVTransformGraph::optimize"); ) >>> 45: >>> 46: while (true) { >> >> Could we also just do `while (progress)`? You always seem to check `!progress` at the very end of the loop. > > If there is a bug and we keep setting `progress` to true, we might loop endlessly. Is there another always true upper-bound condition? We could additionally add an assert for catching issues when we bail out. Nice idea, I'll limit it to 10 or so. That should work at least for now. Not sure what you mean by `bail out` here. I think I'll just limit it to a debug assert. The issue is that if we do not fully optimize, it may be that we do not get a consistent graph. Well for now optimization is optional, but in the future we may have to perform some optimizations to canonicalize the graph, just like in IGVN. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425738909 From epeter at openjdk.org Mon Oct 13 10:58:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 10:58:47 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> References: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> Message-ID: On Mon, 13 Oct 2025 08:49:26 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Vladimir K7 > > src/hotspot/share/opto/vtransform.cpp line 97: > >> 95: >> 96: collect_nodes_without_strong_in_edges(stack); >> 97: int num_alive_nodes = count_alive_vtnodes(); > > Suggestion: > > const int num_alive_nodes = count_alive_vtnodes(); Applied! > src/hotspot/share/opto/vtransform.cpp line 1070: > >> 1068: // outside the loop, and instead cheaper element-wise vector accumulations >> 1069: // are performed inside the loop. >> 1070: bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop(const VLoopAnalyzer& vloop_analyzer, VTransform& vtransform) { > > Any particular reason you chose the additional `optimize` prefix? I think the intent is already clear without it. In my head, it is a bit like `Ideal` calling a `Ideal_....` method. I want to make clear that it is part of the `optimize`. Is that ok with you, or rather confusing? > src/hotspot/share/opto/vtransform.cpp line 1074: > >> 1072: uint vlen = vector_length(); >> 1073: BasicType bt = element_basic_type(); >> 1074: int ropc = vector_reduction_opcode(); > > Can probably be made `const` for good measure: > > Suggestion: > > const int sopc = scalar_opcode(); > const uint vlen = vector_length(); > const BasicType bt = element_basic_type(); > const int ropc = vector_reduction_opcode(); > > > And could they also be moved down to the definition of `vopc`? Nice idea, applied! > src/hotspot/share/opto/vtransform.cpp line 1113: > >> 1111: VTransformReductionVectorNode* last_red = phi->in_req(2)->isa_ReductionVector(); >> 1112: VTransformReductionVectorNode* current_red = last_red; >> 1113: while (true) { > > The method is already quite big. IIUC, this only does some checking and we do not need to bookkeep for further down. Therefore, I suggest to extract this to a "is_looping_back_to_phi" method or something like that. It seems you have 2 concerns here: - Variables that could be limited to a smaller scope - method too long If I am going to refactor the code, then I'd probably have to split it into: - preconditions - not just this loop, but all conditions above too - transform I'll struggle a bit to name the methods, the name is already insanely long: `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop` I'll get something like: `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop_preconditions` Or even longer if you want me to split the preconditions. There are some downsides to splitting the code: I'll have to either pass quite a lot of arguments around, or duplicate code that finds specific nodes. Example: `VTransformLoopPhiNode* phi = in_req(1)->isa_LoopPhi();` I'm honestly not convinced that refactoring is better here. At least splitting is hard. We could of course make it all a class... but that would be a little over-engineered I think. Keeping it procedural, a simple list of steps seems ok for me. Besides: this is the same code structure as before this patch, I only moved it ;) I gave it a try, and split the preconditions off. But it just leads to more code, so I'm not super satisfied. diff --git a/src/hotspot/share/opto/vtransform.cpp b/src/hotspot/share/opto/vtransform.cpp index 97d16739116..f987c2bee55 100644 --- a/src/hotspot/share/opto/vtransform.cpp +++ b/src/hotspot/share/opto/vtransform.cpp @@ -1072,7 +1072,7 @@ bool VTransformReductionVectorNode::requires_strict_order() const { // become profitable, since the expensive reduction node is moved // outside the loop, and instead cheaper element-wise vector accumulations // are performed inside the loop. -bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop(const VLoopAnalyzer& vloop_analyzer, VTransform& vtransform) { +bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop_preconditions(VTransform& vtransform) { // We have a phi with a single use. VTransformLoopPhiNode* phi = in_req(1)->isa_LoopPhi(); if (phi == nullptr) { @@ -1167,6 +1167,18 @@ bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_ou // We expect another non strict reduction, verify it in the next iteration. current_red = scalar_input->isa_ReductionVector(); } + return true; // success +} + +bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop(const VLoopAnalyzer& vloop_analyzer, VTransform& vtransform) { + if (!optimize_move_non_strict_order_reductions_out_of_loop_preconditions(vtransform)) { + return false; + } + + const int sopc = scalar_opcode(); + const uint vlen = vector_length(); + const BasicType bt = element_basic_type(); + const int vopc = VectorNode::opcode(sopc, bt); // All checks were successful. Edit the vtransform graph now. PhaseIdealLoop* phase = vloop_analyzer.vloop().phase(); @@ -1183,12 +1195,15 @@ bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_ou vtn_identity_vector->init_req(1, vtn_identity); // Turn the scalar phi into a vector phi. + VTransformLoopPhiNode* phi = in_req(1)->isa_LoopPhi(); VTransformNode* init = phi->in_req(1); phi->set_req(1, vtn_identity_vector); // Traverse down the chain of reductions, and replace them with vector_accumulators. VTransformNode* current_vector_accumulator = phi; - current_red = first_red; + VTransformReductionVectorNode* first_red = this; + VTransformReductionVectorNode* last_red = phi->in_req(2)->isa_ReductionVector(); + VTransformReductionVectorNode* current_red = first_red; while (true) { VTransformNode* vector_input = current_red->in_req(2); VTransformVectorNode* vector_accumulator = new (vtransform.arena()) VTransformElementWiseVectorNode(vtransform, 3, current_red->properties(), vopc); diff --git a/src/hotspot/share/opto/vtransform.hpp b/src/hotspot/share/opto/vtransform.hpp index 85f015db442..7ad7b432e9b 100644 --- a/src/hotspot/share/opto/vtransform.hpp +++ b/src/hotspot/share/opto/vtransform.hpp @@ -841,6 +841,7 @@ class VTransformReductionVectorNode : public VTransformVectorNode { private: int vector_reduction_opcode() const; bool requires_strict_order() const; + bool optimize_move_non_strict_order_reductions_out_of_loop_preconditions(VTransform& vtransform); bool optimize_move_non_strict_order_reductions_out_of_loop(const VLoopAnalyzer& vloop_analyzer, VTransform& vtransform); }; We can discuss it offline. > src/hotspot/share/opto/vtransform.cpp line 1175: > >> 1173: // Create a vector of identity values. >> 1174: Node* identity = ReductionNode::make_identity_con_scalar(phase->igvn(), sopc, bt); >> 1175: phase->set_ctrl(identity, phase->C->root()); > > Any particular reason why you are no longer using `set_root_as_ctrl()`? I think my proof-of-concept was still in an old state, before we had added that method. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425740108 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425747258 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425753160 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425941651 PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425938864 From epeter at openjdk.org Mon Oct 13 11:03:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 11:03:07 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v13] In-Reply-To: <7Hvs60B_m8bmMzOMyrBZ_CbNJrQmHPMFKRAEU7F-Tu4=.94863cda-8af6-4e50-8563-2144947074e4@github.com> References: <7Hvs60B_m8bmMzOMyrBZ_CbNJrQmHPMFKRAEU7F-Tu4=.94863cda-8af6-4e50-8563-2144947074e4@github.com> Message-ID: On Thu, 9 Oct 2025 06:27:52 GMT, Jatin Bhateja wrote: >> This patch optimizes PopCount value transforms using KnownBits information. >> Following are the results of the micro-benchmark included with the patch >> >> >> >> System: 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s >> >> Withopt: >> Benchmark Mode Cnt Score Error Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Tests passed -> approved ? @jatin-bhateja Thanks for the work! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27075#pullrequestreview-3331047547 From epeter at openjdk.org Mon Oct 13 11:04:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 11:04:10 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v9] In-Reply-To: References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> <1oECW4fhGLBNbrcdMBc-l7-8Yg5Fdy2xyns-pv2EfNI=.75c3b818-1154-416b-ae15-6e2053ee0f60@github.com> Message-ID: On Mon, 13 Oct 2025 10:25:46 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor for Christian > > Looks good, thanks for the update! @chhagedorn I'll need your re-approval after applying your suggestion :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26885#issuecomment-3397009086 From qamai at openjdk.org Mon Oct 13 11:10:05 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 13 Oct 2025 11:10:05 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 06:07:50 GMT, Quan Anh Mai wrote: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. @eme64 I think it would be great if you take a look at this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3397036741 From chagedorn at openjdk.org Mon Oct 13 11:13:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Oct 2025 11:13:14 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v10] In-Reply-To: <2LCX4Ymc-sCbwWkl0vIpp3ue80ddsN0OdLc7bZ9KN14=.ba3b9956-52e2-453d-8495-932e1057aaed@github.com> References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> <2LCX4Ymc-sCbwWkl0vIpp3ue80ddsN0OdLc7bZ9KN14=.ba3b9956-52e2-453d-8495-932e1057aaed@github.com> Message-ID: On Mon, 13 Oct 2025 10:43:28 GMT, Emanuel Peter wrote: >> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)). >> >> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`. >> >> Details, in **order you should review**: >> - `Operations.java`: maps lots of primitive operators as Expressions. >> - `Expression.java`: the fundamental engine behind Expressions. >> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants. >> - `tests/TestExpression.java`: correctness test of Expression machinery. >> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification. >> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions. >> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls. >> >> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples. >> >> **Future Work**: >> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization. >> - Use `Expression`s to model more operations: >> - `Vector API`, more arithmetic operations like from `Math` classes etc. >> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints. >> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just fol... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26885#pullrequestreview-3331109276 From bmaillard at openjdk.org Mon Oct 13 11:35:25 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 13 Oct 2025 11:35:25 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v6] In-Reply-To: References: Message-ID: > This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. > > ### Analysis > > This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced > and added to this PR as a regression test. > > The test contains a switch inside a loop, and stressing the loop peeling results in > a fairly complex graph. The split-if optimization is applied agressively, and we > run a verification pass at every progress made. > > We end up with a relatively high number of verification passes, with each pass being > fairly expensive because of the size of the graph. > Each verification pass requires building a new `IdealLoopTree`. This is quite slow > (which is unfortunately hard to mitigate), and also causes inefficient memory usage > on the `ciEnv` arena. > > The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. > At every call, we have > - One allocation on the `ciEnv` arena to store the returned `ciField` > - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: > - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) > - Pushes the new symbol to the `_symbols` array > > The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to > check if the `BasicType` of a static field is a reference type. > > In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols > (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called > repeatedly as it is done here. > > The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: > > > ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 > TypeOopPtr::TypeOopPtr type.cpp:3484 > TypeInstPtr::TypeInstPtr type.cpp:3953 > TypeInstPtr::make type.cpp:3990 > TypeInstPtr::add_offset type.cpp:4509 > AddPNode::bottom_type addnode.cpp:696 > MemNode::adr_type memnode.cpp:73 > PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 > PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 > PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827 > PhaseIdealLoop::build_loop_late_post loopnode.cpp:67... Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: - Add missing const - Introduce ciInstanceKlass::get_non_static_field_by_offset ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27731/files - new: https://git.openjdk.org/jdk/pull/27731/files/04582ccb..6c93a873 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=04-05 Stats: 28 lines in 2 files changed: 14 ins; 11 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731 PR: https://git.openjdk.org/jdk/pull/27731 From bmaillard at openjdk.org Mon Oct 13 11:35:27 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 13 Oct 2025 11:35:27 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 14:59:50 GMT, Christian Hagedorn wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Add -XX:+UnlockDiagnosticVMOptions > > src/hotspot/share/ci/ciInstanceKlass.cpp line 443: > >> 441: if (field_off == field_offset) >> 442: return field->layout_type(); >> 443: } > > Could this code be shared with `get_field_by_offset()`? We could put it into a method and return the field. > > Not sure if it's also worth for the field descriptor below when having a "get field descriptor" method to further share code. You would need to check. Anyway, I'm fine with both :-) Thanks for the suggestion, I just added a private `get_non_static_field_by_offset` method. I would have done the same for the field descriptor, but as it's not a pointer and there is no predefined default value it is not really practical. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2426061043 From jsikstro at openjdk.org Mon Oct 13 11:36:18 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 13 Oct 2025 11:36:18 GMT Subject: RFR: 8369658: Client emulation mode set MaxRAM too late Message-ID: Hello, While working on the proposal for the potential deprecation of MaxRAM (see [JDK-8369347](https://bugs.openjdk.org/browse/JDK-8369347)) I saw that `CompilerConfig::ergo_initialize()` sets the value for `MaxRAM` after ergonomic heap sizing is already done, which is the only place in the VM that cares about `MaxRAM`. I suggest we move setting the value of `MaxRAM` to `Arguments::set_heap_size()` to fix this. Even though the `MaxRAM` flag might be deprecated, the code should still account for the fact that client emulation mode might lower the maximum amount of physical memory that can be used for the Java heap. If the flag is removed, we'd still want to lower the maximum memory, so it makes sense to have the code in `Arguments::set_heap_size()` in both cases. Testing: * Currently running Oracle's tier1-2 * Local test with `java -XX:+NeverActAsServerClassMachine -Xlog:gc+init` to see that the lower limit is reflected in ergonomic heap sizing. ------------- Commit messages: - 8369658: Client emulation mode set MaxRAM too late Changes: https://git.openjdk.org/jdk/pull/27765/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27765&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369658 Stats: 29 lines in 3 files changed: 20 ins; 5 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27765.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27765/head:pull/27765 PR: https://git.openjdk.org/jdk/pull/27765 From bmaillard at openjdk.org Mon Oct 13 11:41:04 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 13 Oct 2025 11:41:04 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: On Sat, 11 Oct 2025 06:07:05 GMT, SendaoYan wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Add -XX:+UnlockDiagnosticVMOptions > > test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 123: > >> 121: public static void main(String[] t) { >> 122: try { >> 123: test(t); > > Suggestion: > > test(t); > throw new RuntimeException("The expected NPE do not seen"); Thanks for the suggestion. I would argue that this does not really add value, as this essentially boils down to checking that accessing an uninitialized reference throws a `NullPointerException`, which is not really what this test is about. I would rather keep it specific. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2426078675 From jbhateja at openjdk.org Mon Oct 13 11:44:14 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 13 Oct 2025 11:44:14 GMT Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits [v12] In-Reply-To: <0n629aakXFsODeKAbtRtvDTbaCHEt18Mc9LFlAb-G2o=.8be1175e-cebc-4395-b44e-973df41507cf@github.com> References: <9EL8Kg6tW9JVZHrelcP7bLCRHpoEd1l6YH_0eLe8U5Y=.a9db2c93-7a94-46a5-b90e-c104eddf6bc3@github.com> <0n629aakXFsODeKAbtRtvDTbaCHEt18Mc9LFlAb-G2o=.8be1175e-cebc-4395-b44e-973df41507cf@github.com> Message-ID: <0KHz-jONNjldYHjOMljSkzsug_Xpd7XPHgQKKPuYjVY=.2bb31085-e404-47c4-abfc-123260ccdbbe@github.com> On Mon, 6 Oct 2025 08:12:32 GMT, Hannes Greule wrote: >> Is the `core-libs` label appropriate for this PR? Looks hotspot specific? > >> Is the `core-libs` label appropriate for this PR? Looks hotspot specific? > > That label was added automatically, closely after https://mail.openjdk.org/pipermail/jdk-dev/2025-September/010486.html. Not sure why, but the change is definitely hotspot specific. Thanks @SirYwell , @eme64 , at merykitty ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3397164587 From dbriemann at openjdk.org Mon Oct 13 11:56:46 2025 From: dbriemann at openjdk.org (David Briemann) Date: Mon, 13 Oct 2025 11:56:46 GMT Subject: RFR: 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers Message-ID: No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler and only observers the thread its running on. ------------- Commit messages: - 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers Changes: https://git.openjdk.org/jdk/pull/27768/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27768&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369444 Stats: 9 lines in 1 file changed: 3 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27768.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27768/head:pull/27768 PR: https://git.openjdk.org/jdk/pull/27768 From epeter at openjdk.org Mon Oct 13 12:22:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 12:22:45 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v5] In-Reply-To: References: Message-ID: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? > > -------------------------- > > **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. > > **Details** > Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. > > Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. > > **Future Work** > - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) > - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - prettify code - for Christian part 3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27704/files - new: https://git.openjdk.org/jdk/pull/27704/files/925255e6..80a2ce85 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=03-04 Stats: 21 lines in 2 files changed: 18 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27704.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27704/head:pull/27704 PR: https://git.openjdk.org/jdk/pull/27704 From epeter at openjdk.org Mon Oct 13 12:22:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 12:22:57 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> References: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> Message-ID: On Mon, 13 Oct 2025 09:10:04 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Vladimir K7 > > Nice refactoring! Some small comments, otherwise, it looks good to me, too! @chhagedorn Thank you for reviewing, and the offline discussion! I addressed all your comments now :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27704#issuecomment-3397290320 From epeter at openjdk.org Mon Oct 13 12:23:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 12:23:05 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: References: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> Message-ID: On Mon, 13 Oct 2025 10:55:38 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vtransform.cpp line 1113: >> >>> 1111: VTransformReductionVectorNode* last_red = phi->in_req(2)->isa_ReductionVector(); >>> 1112: VTransformReductionVectorNode* current_red = last_red; >>> 1113: while (true) { >> >> The method is already quite big. IIUC, this only does some checking and we do not need to bookkeep for further down. Therefore, I suggest to extract this to a "is_looping_back_to_phi" method or something like that. > > It seems you have 2 concerns here: > - Variables that could be limited to a smaller scope > - method too long > > If I am going to refactor the code, then I'd probably have to split it into: > - preconditions > - not just this loop, but all conditions above too > - transform > > I'll struggle a bit to name the methods, the name is already insanely long: > `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop` > I'll get something like: > `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop_preconditions` > Or even longer if you want me to split the preconditions. > > There are some downsides to splitting the code: I'll have to either pass quite a lot of arguments around, or duplicate code that finds specific nodes. Example: > `VTransformLoopPhiNode* phi = in_req(1)->isa_LoopPhi();` > > I'm honestly not convinced that refactoring is better here. At least splitting is hard. We could of course make it all a class... but that would be a little over-engineered I think. Keeping it procedural, a simple list of steps seems ok for me. > > Besides: this is the same code structure as before this patch, I only moved it ;) > > I gave it a try, and split the preconditions off. But it just leads to more code, so I'm not super satisfied. > > > diff --git a/src/hotspot/share/opto/vtransform.cpp b/src/hotspot/share/opto/vtransform.cpp > index 97d16739116..f987c2bee55 100644 > --- a/src/hotspot/share/opto/vtransform.cpp > +++ b/src/hotspot/share/opto/vtransform.cpp > @@ -1072,7 +1072,7 @@ bool VTransformReductionVectorNode::requires_strict_order() const { > // become profitable, since the expensive reduction node is moved > // outside the loop, and instead cheaper element-wise vector accumulations > // are performed inside the loop. > -bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop(const VLoopAnalyzer& vloop_analyzer, VTransform& vtransform) { > +bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop_preconditions(VTransform& vtransform) { > // We have a phi with a single use. > VTransformLoopPhiNode* phi = in_req(1)->isa_LoopPhi(); > if (phi == nullptr) { > @@ -1167,6 +1167,18 @@ bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_ou > // We expect another non strict reduction, verify it in the next iteration. > current_red = scalar_input->isa_ReductionVector(); > } > + return true; // success > +} > + > +bool VTransformReductionVectorNode::o... I decided to apply the split diff from above. I hope that is a bit better for you. It is not yet a full-blown class-approach. But at least it separates the preconditions from the optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2426168578 From chagedorn at openjdk.org Mon Oct 13 13:14:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Oct 2025 13:14:17 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v5] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 12:22:45 GMT, Emanuel Peter wrote: >> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: >> https://github.com/openjdk/jdk/pull/20964 >> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) >> >> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. >> >> This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? >> >> -------------------------- >> >> **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. >> >> **Details** >> Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. >> >> Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. >> >> **Future Work** >> - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) >> - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - prettify code > - for Christian part 3 Update looks good, thanks for carefully addressing my suggestions and the offline discussion! And good work to finally getting to the cost model PR next :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27704#pullrequestreview-3331533781 From chagedorn at openjdk.org Mon Oct 13 13:14:19 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Oct 2025 13:14:19 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: References: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> Message-ID: On Mon, 13 Oct 2025 09:46:04 GMT, Emanuel Peter wrote: > Nice idea, I'll limit it to 10 or so. That should work at least for now. Sounds good, thanks! > Not sure what you mean by bail out here. I think I'll just limit it to a debug assert. I really meant like a safety bailout in case of looping endlessly. But I'm not sure how easy it is in that context or if we needed to bail out completely from Superword or even the compilation. Anyway, I raised it as an equivalence to IGVN where we also bail out, even from the entire compilation, if we loop for too long. We can also go with a simple assert for now and revisit it again if necessary ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2426306809 From chagedorn at openjdk.org Mon Oct 13 13:14:21 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Oct 2025 13:14:21 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: References: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> Message-ID: On Mon, 13 Oct 2025 09:49:17 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vtransform.cpp line 1070: >> >>> 1068: // outside the loop, and instead cheaper element-wise vector accumulations >>> 1069: // are performed inside the loop. >>> 1070: bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop(const VLoopAnalyzer& vloop_analyzer, VTransform& vtransform) { >> >> Any particular reason you chose the additional `optimize` prefix? I think the intent is already clear without it. > > In my head, it is a bit like `Ideal` calling a `Ideal_....` method. I want to make clear that it is part of the `optimize`. Is that ok with you, or rather confusing? It was a bit confusing because the latter part implies that it is an optimization. But I see that you want to have it somehow marked as "hey, I'm part of `VTransformGraph::optimize()`". But on the other hand, in IGVN, we only have `Ideal()` and then the methods we call from there often do not carry on the `ideal_` prefix. I leave it up to you to make the final call :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2426296217 From chagedorn at openjdk.org Mon Oct 13 13:14:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Oct 2025 13:14:23 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2] In-Reply-To: References: <_b8BNEA0grpCu9bkLd-tZsvaHto-FmR9OiBqGfZ8bmA=.46a6edbd-ef95-4b1f-be4c-32483f007ab2@github.com> Message-ID: On Mon, 13 Oct 2025 12:14:27 GMT, Emanuel Peter wrote: >> It seems you have 2 concerns here: >> - Variables that could be limited to a smaller scope >> - method too long >> >> If I am going to refactor the code, then I'd probably have to split it into: >> - preconditions >> - not just this loop, but all conditions above too >> - transform >> >> I'll struggle a bit to name the methods, the name is already insanely long: >> `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop` >> I'll get something like: >> `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop_preconditions` >> Or even longer if you want me to split the preconditions. >> >> There are some downsides to splitting the code: I'll have to either pass quite a lot of arguments around, or duplicate code that finds specific nodes. Example: >> `VTransformLoopPhiNode* phi = in_req(1)->isa_LoopPhi();` >> >> I'm honestly not convinced that refactoring is better here. At least splitting is hard. We could of course make it all a class... but that would be a little over-engineered I think. Keeping it procedural, a simple list of steps seems ok for me. >> >> Besides: this is the same code structure as before this patch, I only moved it ;) >> >> I gave it a try, and split the preconditions off. But it just leads to more code, so I'm not super satisfied. >> >> >> diff --git a/src/hotspot/share/opto/vtransform.cpp b/src/hotspot/share/opto/vtransform.cpp >> index 97d16739116..f987c2bee55 100644 >> --- a/src/hotspot/share/opto/vtransform.cpp >> +++ b/src/hotspot/share/opto/vtransform.cpp >> @@ -1072,7 +1072,7 @@ bool VTransformReductionVectorNode::requires_strict_order() const { >> // become profitable, since the expensive reduction node is moved >> // outside the loop, and instead cheaper element-wise vector accumulations >> // are performed inside the loop. >> -bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop(const VLoopAnalyzer& vloop_analyzer, VTransform& vtransform) { >> +bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop_preconditions(VTransform& vtransform) { >> // We have a phi with a single use. >> VTransformLoopPhiNode* phi = in_req(1)->isa_LoopPhi(); >> if (phi == nullptr) { >> @@ -1167,6 +1167,18 @@ bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_ou >> // We expect another non strict reduction, verify it in the next iteration. >> current_red = scalar_input->isa_ReductionVec... > > I decided to apply the split diff from above. I hope that is a bit better for you. It is not yet a full-blown class-approach. But at least it separates the preconditions from the optimization. Thanks for the summary and the interesting offline discussion! I think that's a good trade-off here as otherwise would need to refactor quite some more and probably fall back to builder classes to avoid having mutable fields. That's exciting but out of scope since you did not actually newly implement this but rather moved/updated it. So, I think it's a good choice you made here with splitting only, thanks! :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2426276872 From epeter at openjdk.org Mon Oct 13 13:18:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 13:18:45 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: Message-ID: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> On Fri, 3 Oct 2025 06:07:50 GMT, Quan Anh Mai wrote: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. @merykitty Thank you very much for working on this, very exciting. And it seems that the actual logic is now simpler than all the custom logic before! However, we need to make sure that all cases that you are not deleting are indeed covered. 1. `OrINode::add_ring` if ( r0 == TypeInt::BOOL ) { if ( r1 == TypeInt::ONE) { return TypeInt::ONE; } else if ( r1 == TypeInt::BOOL ) { return TypeInt::BOOL; } } else if ( r0 == TypeInt::ONE ) { if ( r1 == TypeInt::BOOL ) { return TypeInt::ONE; } } That seems to be covered by KnownBits. 2. `OrINode::add_ring` if (r0 == TypeInt::MINUS_1 || r1 == TypeInt::MINUS_1) { return TypeInt::MINUS_1; } Seems also ok, handled by the KnownBits. 3. `OrINode::add_ring` // If either input is not a constant, just return all integers. if( !r0->is_con() || !r1->is_con() ) return TypeInt::INT; // Any integer, but still no symbols. // Otherwise just OR them bits. return TypeInt::make( r0->get_con() | r1->get_con() ); Constants would also be handeld by KnownBits. 4. `xor_upper_bound_for_ranges` I think also this should be handled by doing KnownBits first, and then inferring the signed/unsigned bounds, right? 5. `and_value` Does not look so trivial. Maybe you can go over it step by step, and leave some GitHub code comments? src/hotspot/share/opto/rangeinference.hpp line 96: > 94: KnownBits _bits; > 95: > 96: private: Did you mean to drop the `private:` here? It also makes other things below public now... src/hotspot/share/opto/rangeinference.hpp line 152: > 150: > 151: template > 152: static bool int_type_is_equal(const CTP t1, const CTP t2) { Out of curiosity: why the change `CT*` -> `CTP`? src/hotspot/share/opto/rangeinference.hpp line 192: > 190: // inference from the Type infrastructure of the compiler. It also allows more flexibility with the > 191: // bit width of the integer type. As a result, it is more efficient to use for intermediate steps > 192: // of inference, as well as more flexible to perform testing on different integer types. Would have been nice if we could have used the `TypeIntMirror` inside `TypeInt`, i.e. using composition. But sadly, we are already using fields from `TypeInt` directly everywhere, so not sure if that is very nice/easy. src/hotspot/share/opto/rangeinference.hpp line 199: > 197: S _hi; > 198: U _ulo; > 199: U _uhi; Why not use `RangeInt`? src/hotspot/share/opto/rangeinference.hpp line 218: > 216: bool contains(U u) const; > 217: bool contains(const TypeIntMirror& o) const; > 218: bool operator==(const TypeIntMirror& o) const; Could we limit this to `DEBUG_ONLY`? src/hotspot/share/opto/rangeinference.hpp line 221: > 219: > 220: template > 221: TypeIntMirror cast() const; Can you explain what this casting method is for? src/hotspot/share/opto/rangeinference.hpp line 230: > 228: // TypeLong*, or they can be TypeIntMirror which behave similar to TypeInt* and TypeLong* during > 229: // testing. This allows us to verify the correctness of the implementation without coupling with > 230: // the hotspot compiler allocation infrastructure. This sounds a bit like a hack, but maybe a currently necessary one. But it sounds like we are passing something different in the production code vs in gtest testing code, and that's not ideal. I suppose an alternative would be to always do the transition from `TypeInt` -> `TypeIntMirror`, before passing it into `RangeInference`. Would that be too much overhead, or have other downsides? I suppose an issue with that is how do you get back a `TypeInt` at the end... yeah not ideal. So maybe your hack is required. It would have been nice if we could just compose `TypeIntMirror` inside `TypeInt`, but maybe even that does not solve the whole problem. What do you think? src/hotspot/share/opto/rangeinference.hpp line 301: > 299: _current_interval++; > 300: return res; > 301: } Do we really need both? src/hotspot/share/opto/rangeinference.hpp line 324: > 322: static CTP infer_binary(CTP t1, CTP t2, Inference infer) { > 323: CTP res; > 324: bool init = false; `init` confused me at first. I intuitively read it as `please_initialize_me`, or the imperative `initialize`! But of course you meant `is_initialized`, right? I would use a longer name to be explicit ;) src/hotspot/share/opto/rangeinference.hpp line 353: > 351: S hi = std::numeric_limits>::max(); > 352: U ulo = std::numeric_limits>::min(); > 353: U uhi = MIN2(st1._uhi, st2._uhi); All lines except this one are trivial. Can you please add a comment about it? In `and_values` we had comments like below, you'd have to adjust them a little: // If both ranges are positive, the result will range from 0 up to the hi value of the smaller range. The minimum // of the two constrains the upper bound because any higher value in the other range will see all zeroes, so it will be masked out. if (r0->_lo >= 0 && r1->_lo >= 0) { return IntegerType::make(0, MIN2(r0->_hi, r1->_hi), widen); } src/hotspot/share/opto/rangeinference.hpp line 365: > 363: S lo = std::numeric_limits>::min(); > 364: S hi = std::numeric_limits>::max(); > 365: U ulo = MAX2(st1._ulo, st2._ulo); Add a comment about correctness here too. ------------- PR Review: https://git.openjdk.org/jdk/pull/27618#pullrequestreview-3331413594 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426192063 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426193545 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426198623 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426205982 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426213040 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426221015 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426234153 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426255498 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426265259 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426289482 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426292138 From epeter at openjdk.org Mon Oct 13 13:18:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 13:18:46 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> References: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> Message-ID: On Mon, 13 Oct 2025 12:26:01 GMT, Emanuel Peter wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > src/hotspot/share/opto/rangeinference.hpp line 192: > >> 190: // inference from the Type infrastructure of the compiler. It also allows more flexibility with the >> 191: // bit width of the integer type. As a result, it is more efficient to use for intermediate steps >> 192: // of inference, as well as more flexible to perform testing on different integer types. > > Would have been nice if we could have used the `TypeIntMirror` inside `TypeInt`, i.e. using composition. But sadly, we are already using fields from `TypeInt` directly everywhere, so not sure if that is very nice/easy. Hmm, we also have the third class `TypeIntPrototype`. Do you think we really need all 3 classes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426204520 From epeter at openjdk.org Mon Oct 13 13:18:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 13:18:47 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> Message-ID: On Mon, 13 Oct 2025 12:28:23 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/rangeinference.hpp line 192: >> >>> 190: // inference from the Type infrastructure of the compiler. It also allows more flexibility with the >>> 191: // bit width of the integer type. As a result, it is more efficient to use for intermediate steps >>> 192: // of inference, as well as more flexible to perform testing on different integer types. >> >> Would have been nice if we could have used the `TypeIntMirror` inside `TypeInt`, i.e. using composition. But sadly, we are already using fields from `TypeInt` directly everywhere, so not sure if that is very nice/easy. > > Hmm, we also have the third class `TypeIntPrototype`. Do you think we really need all 3 classes? Ah, I suppose `TypeIntMirror` is always canonicalized from `TypeIntPrototype`, in the constructor? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426208882 From epeter at openjdk.org Mon Oct 13 13:54:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 13:54:24 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 06:07:50 GMT, Quan Anh Mai wrote: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. test/hotspot/gtest/opto/test_rangeinference.cpp line 33: > 31: #include > 32: #include > 33: #include I don't know the current state of code style guide: but are we allowed to use `std::unordered_set`? test/hotspot/gtest/opto/test_rangeinference.cpp line 250: > 248: static_assert(std::is_same_v); > 249: return *this; > 250: } We now re-implement these from `TypeIntHelper::int_type_xmeet`. I wonder if we could not at least share some code. Not sure if that is worth it. But having this kind of code duplication opens the risk of divergence and hence bugs. test/hotspot/gtest/opto/test_rangeinference.cpp line 277: > 275: return 1732; > 276: } > 277: } What do the numbers mean here? I'm lost :/ test/hotspot/gtest/opto/test_rangeinference.cpp line 313: > 311: res[idx] = t; > 312: idx++; > 313: } Not sure if this is possible with `std::array`, but you could do it with `std::vector`: `std::vector tmp(unordered.begin(), unordered.end());` Just an idea, feel free to leave it as is. test/hotspot/gtest/opto/test_rangeinference.cpp line 327: > 325: // on all elements of input1 and input2. > 326: template > 327: static void test_binary_instance_correctness_exhaustive(Operation op, Inference infer, const InputType& input1, const InputType& input2) { Very nice! test/hotspot/gtest/opto/test_rangeinference.cpp line 370: > 368: } > 369: } > 370: }; Using uniform distribution will make it very unlikely that you get a hit in a narrow long range, right? Maybe we just have to live with that. Doing something smarter could probably be done (generate in the signed / unsigned bounds, and masking the bits), but there is also a risk: we may generate values that are too narrow by accident / bug... What do you think? At least adding some comment here about why we do what we do would be good. test/hotspot/gtest/opto/test_rangeinference.cpp line 449: > 447: if (all_instances().size() < 100) { > 448: // This effectively covers the cases up to uintn_t<2> > 449: test_binary_instance_monotonicity_exhaustive(infer, input1, input2); Wow, that's really not much. It's really only a "sign" bit and one "mantissa" bit. Would have been nice if we could have handled at least 3 bits. Is that prohibitively slow? test/hotspot/gtest/opto/test_rangeinference.cpp line 524: > 522: samples[idx] = TypeIntMirror{canonicalized_t._data._srange._lo, canonicalized_t._data._srange._hi, > 523: canonicalized_t._data._urange._lo, canonicalized_t._data._urange._hi, > 524: canonicalized_t._data._bits}; What about using a constructor that creates `TypeIntMirror` directly from a `TypeIntPrototype`? Maybe there is a reason that does not work? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426329802 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426343721 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426350244 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426364709 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426373540 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426388502 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426401408 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426418049 From epeter at openjdk.org Mon Oct 13 13:54:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 13:54:25 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 13:25:23 GMT, Emanuel Peter wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > test/hotspot/gtest/opto/test_rangeinference.cpp line 277: > >> 275: return 1732; >> 276: } >> 277: } > > What do the numbers mean here? I'm lost :/ Ah, this is the number of instances for a type! Makes sense. How did you get those numbers, how do we know they are right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2426355829 From epeter at openjdk.org Mon Oct 13 14:00:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 14:00:44 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v6] In-Reply-To: References: Message-ID: <4-rUSDZo_do_2gXtchDk6MUA9TrQ_A4ER_2MKFZHq8M=.696d7c44-e22d-4146-b6ae-df993aff4d41@github.com> > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? > > -------------------------- > > **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. > > **Details** > Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. > > Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. > > **Future Work** > - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) > - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge branch 'master' into JDK-8369448-VTransform-reduction-out-of-loop - prettify code - for Christian part 3 - for Christian part 2 - for Christian part 1 - For Vladimir K7 - documentation - better tracing - rm scalar_opcode - a few todos - ... and 5 more: https://git.openjdk.org/jdk/compare/b928c84e...5c0e11a5 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27704/files - new: https://git.openjdk.org/jdk/pull/27704/files/80a2ce85..5c0e11a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27704&range=04-05 Stats: 6954 lines in 159 files changed: 5360 ins; 1057 del; 537 mod Patch: https://git.openjdk.org/jdk/pull/27704.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27704/head:pull/27704 PR: https://git.openjdk.org/jdk/pull/27704 From epeter at openjdk.org Mon Oct 13 14:00:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 14:00:46 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v6] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 13:10:56 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8369448-VTransform-reduction-out-of-loop >> - prettify code >> - for Christian part 3 >> - for Christian part 2 >> - for Christian part 1 >> - For Vladimir K7 >> - documentation >> - better tracing >> - rm scalar_opcode >> - a few todos >> - ... and 5 more: https://git.openjdk.org/jdk/compare/b928c84e...5c0e11a5 > > Update looks good, thanks for carefully addressing my suggestions and the offline discussion! And good work to finally getting to the cost model PR next :-) @chhagedorn Thanks for the review and comments! I merged with master, and will do a last round of testing before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27704#issuecomment-3397651409 From epeter at openjdk.org Mon Oct 13 14:03:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 14:03:26 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types In-Reply-To: References: Message-ID: <2VDf9Od04bxB3b5rBeV9O00SX1dNUjTZ4XOK364n1do=.34541815-662d-4e11-8b6d-705eb3633d63@github.com> On Tue, 7 Oct 2025 07:36:41 GMT, Roland Westrelin wrote: > Currently ReassociateInvariants is only enabled for int counted > loops. I noticed, enabling it for long counted loops helps RCE. It > also seems like something that would help any loop. I propose enabling > it for all inner loops. BTW: testing passed! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27666#issuecomment-3397664240 From epeter at openjdk.org Mon Oct 13 14:03:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 14:03:46 GMT Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions [v10] In-Reply-To: References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> <2LCX4Ymc-sCbwWkl0vIpp3ue80ddsN0OdLc7bZ9KN14=.ba3b9956-52e2-453d-8495-932e1057aaed@github.com> Message-ID: On Mon, 13 Oct 2025 11:10:15 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java >> >> Co-authored-by: Christian Hagedorn > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn @galderz @mhaessig Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26885#issuecomment-3397654829 From epeter at openjdk.org Mon Oct 13 14:03:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 14:03:49 GMT Subject: Integrated: 8359412: Template-Framework Library: Operations and Expressions In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com> Message-ID: On Thu, 21 Aug 2025 15:03:57 GMT, Emanuel Peter wrote: > Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)). > > Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`. > > Details, in **order you should review**: > - `Operations.java`: maps lots of primitive operators as Expressions. > - `Expression.java`: the fundamental engine behind Expressions. > - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants. > - `tests/TestExpression.java`: correctness test of Expression machinery. > - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification. > - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions. > - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls. > > If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples. > > **Future Work**: > - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization. > - Use `Expression`s to model more operations: > - `Vector API`, more arithmetic operations like from `Math` classes etc. > - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints. > - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres... This pull request has now been integrated. Changeset: 04968061 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/0496806102bb621bdd82613d5796651d9655ea1c Stats: 1611 lines in 7 files changed: 1611 ins; 0 del; 0 mod 8359412: Template-Framework Library: Operations and Expressions Reviewed-by: chagedorn, mhaessig, galder ------------- PR: https://git.openjdk.org/jdk/pull/26885 From syan at openjdk.org Mon Oct 13 14:06:52 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 13 Oct 2025 14:06:52 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 11:38:07 GMT, Beno?t Maillard wrote: >> test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 123: >> >>> 121: public static void main(String[] t) { >>> 122: try { >>> 123: test(t); >> >> Suggestion: >> >> test(t); >> throw new RuntimeException("The expected NPE do not seen"); > > Thanks for the suggestion. I would argue that this does not really add value, as this essentially boils down to checking that accessing an uninitialized reference throws a `NullPointerException`, which is not really what this test is about. I would rather keep it specific. Okey ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2426456434 From roland at openjdk.org Mon Oct 13 14:20:04 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 14:20:04 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v3] In-Reply-To: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: > This change refactor code that's similar for LShiftINode and > LShiftLNode into shared methods. I also added extra test cases to > cover all transformations. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27725/files - new: https://git.openjdk.org/jdk/pull/27725/files/05ff54dc..e521d918 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27725&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27725&range=01-02 Stats: 16 lines in 2 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/27725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27725/head:pull/27725 PR: https://git.openjdk.org/jdk/pull/27725 From epeter at openjdk.org Mon Oct 13 14:21:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 14:21:51 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 11:03:13 GMT, Christian Hagedorn wrote: > The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. > > #### Reduce Execution Time by not Executing the Scenarios > I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. > > To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. > > #### Changes > - Verification without actually running scenarios. > - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. > - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. > - Refactored the test a little more. > - Refactored some small things in `addCrossProductScenarios()` while looking at it. > - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. > > #### Execution Time Comparison > Measured on my local machine: > - Mainline: ~80s > - With patch: ~2-3s > > Thanks, > Christian Generally looks ok. But I'm still a little sad that we don't even get to test a single case end-to-end now. Can we not do at least a 2x2 case? test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 251: > 249: continue outer; > 250: } > 251: } You could probably do this with an `stream().anyMatch()`: Suggestion: if (scenarios.stream().anyMatch(s -> s.equals(expectedScenarioFlags))) { continue; } You may even be able to further simplify the lambda in there, to get rid of the `s`. ------------- PR Review: https://git.openjdk.org/jdk/pull/27672#pullrequestreview-3331828102 PR Review Comment: https://git.openjdk.org/jdk/pull/27672#discussion_r2426490918 From roland at openjdk.org Mon Oct 13 14:38:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 14:38:50 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v4] In-Reply-To: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: > This change refactor code that's similar for LShiftINode and > LShiftLNode into shared methods. I also added extra test cases to > cover all transformations. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - review - Merge branch 'master' into JDK-8369167 - review - sort headers - more - more - more - more - more - fix ------------- Changes: https://git.openjdk.org/jdk/pull/27725/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27725&range=03 Stats: 617 lines in 6 files changed: 343 ins; 170 del; 104 mod Patch: https://git.openjdk.org/jdk/pull/27725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27725/head:pull/27725 PR: https://git.openjdk.org/jdk/pull/27725 From roland at openjdk.org Mon Oct 13 14:38:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 14:38:51 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v2] In-Reply-To: References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: On Fri, 10 Oct 2025 12:17:35 GMT, Marc Chevalier wrote: > An idea (not a suggestion, just something that crossed my mind, take it more as a thought experiment): we could also parametrize everything not with a `BasicType` parameter but a template parameter (since `IdealIL` and co are invoked with literal values). It wouldn't change much, but for instance it would allow to replace the assert in `java_shift_left` and friends with static checks (I have a bias toward static checks). I wondered about that too. There are many more methods that are parameterized by a `BasicType`. They would have to all go through that transition. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27725#issuecomment-3397792680 From roland at openjdk.org Mon Oct 13 14:38:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 14:38:50 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v4] In-Reply-To: <4AzzqZKwkzxGFxIszBSwfAdT6lyEEMdveyzYXhpfJLI=.224d078f-87e7-4b04-97ff-fe67ca4df4aa@github.com> References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> <4AzzqZKwkzxGFxIszBSwfAdT6lyEEMdveyzYXhpfJLI=.224d078f-87e7-4b04-97ff-fe67ca4df4aa@github.com> Message-ID: On Fri, 10 Oct 2025 08:25:35 GMT, Marc Chevalier wrote: > There are a lot of `SomeType *name` that we are slowly converting into `SomeType* name` when we have an occasion. As you wish. I went over the change and fixed those that I spotted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27725#issuecomment-3397782311 From roland at openjdk.org Mon Oct 13 14:48:40 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 14:48:40 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v2] In-Reply-To: References: Message-ID: > Currently ReassociateInvariants is only enabled for int counted > loops. I noticed, enabling it for long counted loops helps RCE. It > also seems like something that would help any loop. I propose enabling > it for all inner loops. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - review - Merge branch 'master' into JDK-8369258 - test fixes - test and fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27666/files - new: https://git.openjdk.org/jdk/pull/27666/files/d63f11d2..87d69288 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27666&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27666&range=00-01 Stats: 13293 lines in 277 files changed: 10607 ins; 1599 del; 1087 mod Patch: https://git.openjdk.org/jdk/pull/27666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27666/head:pull/27666 PR: https://git.openjdk.org/jdk/pull/27666 From roland at openjdk.org Mon Oct 13 14:48:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 14:48:41 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types In-Reply-To: <2VDf9Od04bxB3b5rBeV9O00SX1dNUjTZ4XOK364n1do=.34541815-662d-4e11-8b6d-705eb3633d63@github.com> References: <2VDf9Od04bxB3b5rBeV9O00SX1dNUjTZ4XOK364n1do=.34541815-662d-4e11-8b6d-705eb3633d63@github.com> Message-ID: On Mon, 13 Oct 2025 14:00:54 GMT, Emanuel Peter wrote: >> Currently ReassociateInvariants is only enabled for int counted >> loops. I noticed, enabling it for long counted loops helps RCE. It >> also seems like something that would help any loop. I propose enabling >> it for all inner loops. > > BTW: testing passed! @eme64 new commit should address your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27666#issuecomment-3397826706 From roland at openjdk.org Mon Oct 13 14:52:38 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 14:52:38 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v2] In-Reply-To: References: Message-ID: > In: > > > for (int i = 100; i < 1100; i++) { > v += floatArray[i - 100]; > Objects.checkIndex(i, longRange); > } > > > The int counted loop has both an int range check and a long range. The > int range check is optimized first. Assertion predicates are inserted > above the loop. One predicates checks that: > > > init - 100 > > The loop is then transformed to enable the optimization of the long > range check. The loop is short running, so there's no need to create a > loop nest. The counted loop is mostly left as is but, the loop's > bounds are changed from: > > > for (int i = 100; i < 1100; i++) { > > > to: > > > for (int i = 0; i < 1000; i++) { > > > The reason for that the long range check transformation expects the > loop to start at 0. > > Pre/main/post loops are created. Template Assertion predicates are > added above the main loop. The loop is unrolled. Initialized assertion > predicates are created. The one created from the condition: > > > init - 100 > > checks the value of `i` out of the pre loop which is 1. That check fails. > > The root cause of the failure is that when bounds of the counted loop > are changed, template assertion predicates need to be updated with and > adjusted init input. > > When the bounds of the loop are known, the assertion predicates can be > updated in place. Otherwise, when the loop is speculated to be short > running, the assertion predicates are updated when they are cloned. Roland Westrelin has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27250/files - new: https://git.openjdk.org/jdk/pull/27250/files/d4c9b9f7..54840f39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=00-01 Stats: 16 lines in 2 files changed: 8 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/27250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27250/head:pull/27250 PR: https://git.openjdk.org/jdk/pull/27250 From epeter at openjdk.org Mon Oct 13 14:53:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 14:53:01 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v2] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 14:48:40 GMT, Roland Westrelin wrote: >> Currently ReassociateInvariants is only enabled for int counted >> loops. I noticed, enabling it for long counted loops helps RCE. It >> also seems like something that would help any loop. I propose enabling >> it for all inner loops. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8369258 > - test fixes > - test and fix Thanks for the updates, approved! And thanks again for the work on the MemorySegment cases, it's really nice to see that the holes are being plugged one-by-one :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27666#pullrequestreview-3331946742 From epeter at openjdk.org Mon Oct 13 14:59:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 14:59:59 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v2] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 97 additional commits since the last revision: - Merge branch 'master' into JDK-8367531-fix-addDataName - fix test - NestingToken -> ScopeToken - flat -> transparentScope - update other test - clean up tutorial - tutorial scope and DataNames - wip tutorial - extend tutorial - more tutorial improvements - ... and 87 more: https://git.openjdk.org/jdk/compare/ca1ebe15...e2e36f74 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/aceced65..e2e36f74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=00-01 Stats: 191280 lines in 2491 files changed: 151568 ins; 24162 del; 15550 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From roland at openjdk.org Mon Oct 13 15:15:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 15:15:56 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v3] In-Reply-To: References: Message-ID: > In: > > > for (int i = 100; i < 1100; i++) { > v += floatArray[i - 100]; > Objects.checkIndex(i, longRange); > } > > > The int counted loop has both an int range check and a long range. The > int range check is optimized first. Assertion predicates are inserted > above the loop. One predicates checks that: > > > init - 100 > > The loop is then transformed to enable the optimization of the long > range check. The loop is short running, so there's no need to create a > loop nest. The counted loop is mostly left as is but, the loop's > bounds are changed from: > > > for (int i = 100; i < 1100; i++) { > > > to: > > > for (int i = 0; i < 1000; i++) { > > > The reason for that the long range check transformation expects the > loop to start at 0. > > Pre/main/post loops are created. Template Assertion predicates are > added above the main loop. The loop is unrolled. Initialized assertion > predicates are created. The one created from the condition: > > > init - 100 > > checks the value of `i` out of the pre loop which is 1. That check fails. > > The root cause of the failure is that when bounds of the counted loop > are changed, template assertion predicates need to be updated with and > adjusted init input. > > When the bounds of the loop are known, the assertion predicates can be > updated in place. Otherwise, when the loop is speculated to be short > running, the assertion predicates are updated when they are cloned. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - review - Merge branch 'master' into JDK-8366888 - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - whitespaces - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27250/files - new: https://git.openjdk.org/jdk/pull/27250/files/54840f39..4ed60fc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=01-02 Stats: 198294 lines in 2590 files changed: 155681 ins; 26500 del; 16113 mod Patch: https://git.openjdk.org/jdk/pull/27250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27250/head:pull/27250 PR: https://git.openjdk.org/jdk/pull/27250 From roland at openjdk.org Mon Oct 13 15:15:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 15:15:59 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v3] In-Reply-To: <5mWAmKdTlGoPcIMGD1RqXAEhrL9F75m4RcJdoos5_q0=.184b3894-62b3-4ab3-9641-9f72a6c383eb@github.com> References: <5mWAmKdTlGoPcIMGD1RqXAEhrL9F75m4RcJdoos5_q0=.184b3894-62b3-4ab3-9641-9f72a6c383eb@github.com> Message-ID: On Fri, 10 Oct 2025 14:42:00 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8366888 >> - Update src/hotspot/share/opto/predicates.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/predicates.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - whitespaces >> - fix > > test/hotspot/jtreg/compiler/longcountedloops/TestShortCountedLoopWithLongRCBadAssertPredicate2.java line 1: > >> 1: /* > > Could the two tests also be merged? I think I left them separated because they have different copyrights. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2426625888 From roland at openjdk.org Mon Oct 13 15:15:57 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 15:15:57 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop In-Reply-To: <1EgDjfhpch9SuqvjEuZUyB0Y_NzmeBEWmDWRK-C0XEY=.3ebe62c7-abfa-426e-90c8-fafc2750f6a2@github.com> References: <1EgDjfhpch9SuqvjEuZUyB0Y_NzmeBEWmDWRK-C0XEY=.3ebe62c7-abfa-426e-90c8-fafc2750f6a2@github.com> Message-ID: On Fri, 10 Oct 2025 10:25:43 GMT, Christian Hagedorn wrote: >> In: >> >> >> for (int i = 100; i < 1100; i++) { >> v += floatArray[i - 100]; >> Objects.checkIndex(i, longRange); >> } >> >> >> The int counted loop has both an int range check and a long range. The >> int range check is optimized first. Assertion predicates are inserted >> above the loop. One predicates checks that: >> >> >> init - 100 > >> >> The loop is then transformed to enable the optimization of the long >> range check. The loop is short running, so there's no need to create a >> loop nest. The counted loop is mostly left as is but, the loop's >> bounds are changed from: >> >> >> for (int i = 100; i < 1100; i++) { >> >> >> to: >> >> >> for (int i = 0; i < 1000; i++) { >> >> >> The reason for that the long range check transformation expects the >> loop to start at 0. >> >> Pre/main/post loops are created. Template Assertion predicates are >> added above the main loop. The loop is unrolled. Initialized assertion >> predicates are created. The one created from the condition: >> >> >> init - 100 > >> >> checks the value of `i` out of the pre loop which is 1. That check fails. >> >> The root cause of the failure is that when bounds of the counted loop >> are changed, template assertion predicates need to be updated with and >> adjusted init input. >> >> When the bounds of the loop are known, the assertion predicates can be >> updated in place. Otherwise, when the loop is speculated to be short >> running, the assertion predicates are updated when they are cloned. > > I'll have a look today or on Monday :-) @chhagedorn new commit should address your comments and suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27250#issuecomment-3397918786 From roland at openjdk.org Mon Oct 13 15:20:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 15:20:43 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v2] In-Reply-To: References: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> Message-ID: On Mon, 13 Oct 2025 09:05:13 GMT, Emanuel Peter wrote: > Hi Roland, thanks for looking into this! > > Can you explain why the `clone` in `inlined2` creates an `ArrayCopy` node? I think I'm missing some context here. Because we are cloning an `A` and not an array, right? Naming of the node is unfortunate as it's also used for instance clones. Historically, optimizations for arraycopy have been used for instance clones as well and that's where the misleading name comes from. For arraycopy and array/instance clones: large arrays/instances are bulk copied with a call to subroutine added during macro expansion, small arrays/instances are copied with a series or loads/stores added during igvn and there's also code so that the a copy to a non escaping array/instance doesn't get in the way of EA and can be eliminated. > test/hotspot/jtreg/compiler/arraycopy/TestCloneUnknownClassAtParseTime.java line 63: > >> 61: private static A inlined2() throws CloneNotSupportedException { >> 62: A a = field; >> 63: return (A)a.clone(); > > Out of curiosity: why do we even add a `ArrayCopy` here? Does my reply to your comment above answer that question? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27604#issuecomment-3397942139 PR Review Comment: https://git.openjdk.org/jdk/pull/27604#discussion_r2426643140 From epeter at openjdk.org Mon Oct 13 15:20:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 15:20:45 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v2] In-Reply-To: References: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> Message-ID: <-B8WCJ970Pbuh7Ur4Hz51ABRdASG2SQHqqCCF2kMd6A=.63385692-29f7-4e5e-8179-d5b369878af1@github.com> On Mon, 13 Oct 2025 15:14:52 GMT, Roland Westrelin wrote: > > Hi Roland, thanks for looking into this! > > Can you explain why the `clone` in `inlined2` creates an `ArrayCopy` node? I think I'm missing some context here. Because we are cloning an `A` and not an array, right? > > Naming of the node is unfortunate as it's also used for instance clones. Oh dear, I see, that explains my confusion. That's really not very nice ? I wonder if we should do a renaming then? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27604#issuecomment-3397954861 From roland at openjdk.org Mon Oct 13 15:28:01 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 15:28:01 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v3] In-Reply-To: References: Message-ID: > In the `test1()` method of the test case: > > `inlined2()` calls `clone()` for an object loaded from field `field` > that has inexact type `A` at parse time. The intrinsic for `clone()` > inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the > load of `field` is optimized out because it reads back a newly > allocated `B` written to `field` in the same method. `ArrayCopy` can > now be optimized because the type of its `src` input is known. The > type of its `dest` input is the `CheckCastPP` from the allocation of > the cloned object created at parse time. That one has type `A`. A > series of `Load`s/`Store`s are created to copy the fields of class `B` > from `src` (of type `B`) to `dest` of (type `A`). > > Writting to `dest` with offsets for fields that don't exist in `A`, > causes this code in `Compile::flatten_alias_type()`: > > > } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { > // Static fields are in the space above the normal instance > // fields in the java.lang.Class instance. > if (ik != ciEnv::current()->Class_klass()) { > to = nullptr; > tj = TypeOopPtr::BOTTOM; > offset = tj->offset(); > } > > > to assign it some slice that doesn't match the one that's used at the > same offset in `B`. > > That causes an assert in `ArrayCopyNode::try_clone_instance()` to > fire. With a release build, execution proceeds. `test1()` also has a > non escaping allocation. That one causes EA to run and > `ConnectionGraph::split_unique_types()` to move the store to the non > escaping allocation to a new slice. In the process, when it iterates > over `MergeMem` nodes, it notices the stores added by > `ArrayCopyNode::try_clone_instance()`, finds that some are not on the > right slice, tries to move them to the correct slice (expecting they > are from a non escaping EA). That causes some of the `Store`s to be > disconnected. When the resulting code runs, execution fails as some > fields are not copied. > > The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` > when `src` and `dest` classes don't match as this seems like a rare > enough corner case. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - review - Merge branch 'master' into JDK-8339526 - review - Merge branch 'master' into JDK-8339526 - Update src/hotspot/share/opto/arraycopynode.cpp Co-authored-by: Christian Hagedorn - test & fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27604/files - new: https://git.openjdk.org/jdk/pull/27604/files/b6652e04..6dedf517 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27604&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27604&range=01-02 Stats: 13287 lines in 275 files changed: 10607 ins; 1596 del; 1084 mod Patch: https://git.openjdk.org/jdk/pull/27604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27604/head:pull/27604 PR: https://git.openjdk.org/jdk/pull/27604 From roland at openjdk.org Mon Oct 13 15:28:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Oct 2025 15:28:02 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v2] In-Reply-To: <-B8WCJ970Pbuh7Ur4Hz51ABRdASG2SQHqqCCF2kMd6A=.63385692-29f7-4e5e-8179-d5b369878af1@github.com> References: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> <-B8WCJ970Pbuh7Ur4Hz51ABRdASG2SQHqqCCF2kMd6A=.63385692-29f7-4e5e-8179-d5b369878af1@github.com> Message-ID: <6baIrfwlGjkQPIVogY2aIX6VzQainACv_-4IsVXWOpg=.67d79427-9320-4a0a-93ef-d932bdf5eb58@github.com> On Mon, 13 Oct 2025 15:17:36 GMT, Emanuel Peter wrote: > I wonder if we should do a renaming then? Sure. That makes sense. But it's likely quite a bit of work as "arraycopy" is used not only for the node type but also for the supporting methods during parsing, EA, igvn and macro expansion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27604#issuecomment-3397982455 From epeter at openjdk.org Mon Oct 13 15:33:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 15:33:04 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v3] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix tests after integration of Expressions/Operations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/e2e36f74..fef26c96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=01-02 Stats: 20 lines in 6 files changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Mon Oct 13 15:54:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Oct 2025 15:54:51 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v3] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 15:28:01 GMT, Roland Westrelin wrote: >> In the `test1()` method of the test case: >> >> `inlined2()` calls `clone()` for an object loaded from field `field` >> that has inexact type `A` at parse time. The intrinsic for `clone()` >> inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the >> load of `field` is optimized out because it reads back a newly >> allocated `B` written to `field` in the same method. `ArrayCopy` can >> now be optimized because the type of its `src` input is known. The >> type of its `dest` input is the `CheckCastPP` from the allocation of >> the cloned object created at parse time. That one has type `A`. A >> series of `Load`s/`Store`s are created to copy the fields of class `B` >> from `src` (of type `B`) to `dest` of (type `A`). >> >> Writting to `dest` with offsets for fields that don't exist in `A`, >> causes this code in `Compile::flatten_alias_type()`: >> >> >> } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { >> // Static fields are in the space above the normal instance >> // fields in the java.lang.Class instance. >> if (ik != ciEnv::current()->Class_klass()) { >> to = nullptr; >> tj = TypeOopPtr::BOTTOM; >> offset = tj->offset(); >> } >> >> >> to assign it some slice that doesn't match the one that's used at the >> same offset in `B`. >> >> That causes an assert in `ArrayCopyNode::try_clone_instance()` to >> fire. With a release build, execution proceeds. `test1()` also has a >> non escaping allocation. That one causes EA to run and >> `ConnectionGraph::split_unique_types()` to move the store to the non >> escaping allocation to a new slice. In the process, when it iterates >> over `MergeMem` nodes, it notices the stores added by >> `ArrayCopyNode::try_clone_instance()`, finds that some are not on the >> right slice, tries to move them to the correct slice (expecting they >> are from a non escaping EA). That causes some of the `Store`s to be >> disconnected. When the resulting code runs, execution fails as some >> fields are not copied. >> >> The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` >> when `src` and `dest` classes don't match as this seems like a rare >> enough corner case. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8339526 > - review > - Merge branch 'master' into JDK-8339526 > - Update src/hotspot/share/opto/arraycopynode.cpp > > Co-authored-by: Christian Hagedorn > - test & fix > inlined2() calls clone() for an object loaded from field field that has inexact type A at parse time. The intrinsic for clone() inserts an Allocate and an ArrayCopy nodes. When igvn runs, the load of field is optimized out because it reads back a newly allocated B written to field in the same method. ArrayCopy can now be optimized because the type of its src input is known. The type of its dest input is the CheckCastPP from the allocation of the cloned object created at parse time. That one has type A. A series of Loads/Stores are created to copy the fields of class B from src (of type B) to dest of (type A). I'm still struggling to understand. I wonder if the test can be further simplified to make the case more clear. Am I understanding right, that we essentially this: field = new B(42, 42, 42); A a = field; return (A)a.clone(); What should the result of that be? An `A` or a `B`? I think we should be getting a `B`, right? So why is the `dest` of the `ArrayCopy` an `A`? Is that even correct? > The fix I propose is to skip ArrayCopyNode::try_clone_instance() when src and dest classes don't match as this seems like a rare enough corner case. How do you know that this is a rare case? Did you do some kind of profiling / benchmarking? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27604#issuecomment-3398083365 From duke at openjdk.org Tue Oct 14 00:02:46 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 14 Oct 2025 00:02:46 GMT Subject: RFR: 8369642: [ubsan] nmethod::nmethod null pointer passed as argument 2 to memcpy Message-ID: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced a counter so that the nmethod immutable data can be shared between relocated nmethods to eliminate an unnecessary copy. The counter is aligned in memory so that must be taken into account when calculating the amount of memory used by the counter ------------- Commit messages: - Respect memory alignment for ImmutableDataReferencesCounter Changes: https://git.openjdk.org/jdk/pull/27778/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27778&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369642 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27778/head:pull/27778 PR: https://git.openjdk.org/jdk/pull/27778 From dlong at openjdk.org Tue Oct 14 00:50:07 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 14 Oct 2025 00:50:07 GMT Subject: RFR: 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 11:48:42 GMT, David Briemann wrote: > No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler and only observers the thread its running on. If all CPU ports follow this, then it seems like we could eventually implement JavaFrameAnchor in shared code without CPU-specific parts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27768#issuecomment-3399555486 From jbhateja at openjdk.org Tue Oct 14 03:38:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 14 Oct 2025 03:38:22 GMT Subject: Integrated: 8365205: C2: Optimize popcount value computation using knownbits In-Reply-To: References: Message-ID: On Wed, 3 Sep 2025 16:10:43 GMT, Jatin Bhateja wrote: > This patch optimizes PopCount value transforms using KnownBits information. > Following are the results of the micro-benchmark included with the patch > > > > System: 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark Mode Cnt Score Error Units > PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s > PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s > > Withopt: > Benchmark Mode Cnt Score Error Units > PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s > PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s > > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 44964181 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/449641813ada3b0af6441dd7299e40235e7adf56 Stats: 321 lines in 5 files changed: 321 ins; 0 del; 0 mod 8365205: C2: Optimize popcount value computation using knownbits Reviewed-by: epeter, hgreule, qamai ------------- PR: https://git.openjdk.org/jdk/pull/27075 From epeter at openjdk.org Tue Oct 14 05:13:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 05:13:36 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v4] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix TestMethodArguments.java after merge with master ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/fef26c96..a855cc4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From qxing at openjdk.org Tue Oct 14 06:08:41 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 14 Oct 2025 06:08:41 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v15] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' into enhance-clz-type - Merge branch 'master' into enhance-clz-type - Fix constant fold - Remove redundant import - Add random range tests - Add more comments to IR test - Add more constant folding tests for CLZ/CTZ - Add proof of correstness comments - Remove redundant `@require` in IR test - Add microbench - ... and 11 more: https://git.openjdk.org/jdk/compare/5bf1bab5...d7ebc8f2 ------------- Changes: https://git.openjdk.org/jdk/pull/25928/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=14 Stats: 820 lines in 4 files changed: 754 ins; 54 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From rcastanedalo at openjdk.org Tue Oct 14 06:12:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 14 Oct 2025 06:12:11 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v2] In-Reply-To: <6baIrfwlGjkQPIVogY2aIX6VzQainACv_-4IsVXWOpg=.67d79427-9320-4a0a-93ef-d932bdf5eb58@github.com> References: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> <-B8WCJ970Pbuh7Ur4Hz51ABRdASG2SQHqqCCF2kMd6A=.63385692-29f7-4e5e-8179-d5b369878af1@github.com> <6baIrfwlGjkQPIVogY2aIX6VzQainACv_-4IsVXWOpg=.67d79427-9320-4a0a-93ef-d932bdf5eb58@github.com> Message-ID: <7Lx4UPBkbTwHAlXvmg2ekbKfZ2Z9GNmN9Kywkje5dxI=.ca6adbf4-d1b9-4efa-ab00-03d2bb84562b@github.com> On Mon, 13 Oct 2025 15:25:15 GMT, Roland Westrelin wrote: > I wonder if we should do a renaming then? I agree, have been confused by this in the past as well. Maybe `BulkCopyNode` would be a better name? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27604#issuecomment-3400255155 From qxing at openjdk.org Tue Oct 14 06:16:09 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 14 Oct 2025 06:16:09 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v14] In-Reply-To: References: Message-ID: On Fri, 12 Sep 2025 12:46:44 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove redundant import > > test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 516: > >> 514: } >> 515: >> 516: int getResultChecksum(int result, int[] LIMITS) { > > I would put a `@ForceInlinie` before this. You are using it in many methods, and so it may not get inlined reliably. And if it does not get inlined, then the result verifcation would not constant-fold, and so it would be kind of useless. Because we rely on the fact that if the range is wrong, we could get bad constant folding ;) Added `@ForceInline`. > test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 521: > >> 519: if (result < LIMITS[i]) sum += 1 << i; >> 520: if (result > LIMITS[i + 1]) sum += 1 << (i + 1); >> 521: } > > I doublt that this works, because the test would not constant fold if the range was too narrow. > I think you need to manually unroll the loop, and load the constants from `static final` values, or another method that allows it to be a compile time constant. Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2428020170 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2428020382 From qxing at openjdk.org Tue Oct 14 06:22:05 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 14 Oct 2025 06:22:05 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com> References: <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com> Message-ID: On Tue, 9 Sep 2025 08:40:35 GMT, Emanuel Peter wrote: >> Hi @jatin-bhateja, I've added a micro benchmark that includes the `numberOfNibbles` implementation from this PR description and your micro kernel. >> >> Here's my test results on an Intel(R) Xeon(R) Platinum: >> >> >> # Baseline: >> Benchmark Mode Cnt Score Error Units >> CountLeadingZeros.benchClzLongConstrained avgt 15 1517.888 ? 5.691 ns/op >> CountLeadingZeros.benchNumberOfNibbles avgt 15 1094.422 ? 1.753 ns/op >> >> # This patch: >> Benchmark Mode Cnt Score Error Units >> CountLeadingZeros.benchClzLongConstrained avgt 15 0.948 ? 0.002 ns/op >> CountLeadingZeros.benchNumberOfNibbles avgt 15 942.438 ? 1.742 ns/op > > @MaxXSoft Feel free to just ping me again when you want another review :) > FYI: I'll be on a longer vacation starting in about a week, so don't expect me to respond then. Hi @eme64, thanks for your review. I've updated the IR test to ensure that range checks work with the constant folding. Do you have any other suggestions for this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3400276822 From pminborg at openjdk.org Tue Oct 14 06:36:04 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 14 Oct 2025 06:36:04 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v2] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 14:48:40 GMT, Roland Westrelin wrote: >> Currently ReassociateInvariants is only enabled for int counted >> loops. I noticed, enabling it for long counted loops helps RCE. It >> also seems like something that would help any loop. I propose enabling >> it for all inner loops. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8369258 > - test fixes > - test and fix Thanks for this one! This will improve FFM performance for many idiomatic code snippets. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27666#issuecomment-3400311873 From qxing at openjdk.org Tue Oct 14 06:42:07 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 14 Oct 2025 06:42:07 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v4] In-Reply-To: References: Message-ID: > In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. > > loop-safepoint > > This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). > > > Benchmark Mode Cnt Score Error Units > LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline > LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch > > > Testing: tier1-2 on x86_64 and aarch64. Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into enhance-loop-safepoint-elim - Improve documentation comments - Merge branch 'master' into enhance-loop-safepoint-elim - Add IR test and microbench. - Make `PhaseIdealLoop` eliminate more redundant safepoints in loops. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23057/files - new: https://git.openjdk.org/jdk/pull/23057/files/1a216046..ba6e7e79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=02-03 Stats: 1031881 lines in 13321 files changed: 545175 ins; 398869 del; 87837 mod Patch: https://git.openjdk.org/jdk/pull/23057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23057/head:pull/23057 PR: https://git.openjdk.org/jdk/pull/23057 From aboldtch at openjdk.org Tue Oct 14 06:53:03 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 14 Oct 2025 06:53:03 GMT Subject: RFR: 8369658: Client emulation mode sets MaxRAM too late In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 11:25:04 GMT, Joel Sikstr?m wrote: > Hello, > > While working on the proposal for the potential deprecation of MaxRAM (see [JDK-8369347](https://bugs.openjdk.org/browse/JDK-8369347)) I saw that `CompilerConfig::ergo_initialize()` sets the value for `MaxRAM` after ergonomic heap sizing is already done, which is the only place in the VM that cares about `MaxRAM`. I suggest we move setting the value of `MaxRAM` to `Arguments::set_heap_size()` to fix this. > > Even though the `MaxRAM` flag might be deprecated, the code should still account for the fact that client emulation mode might lower the maximum amount of physical memory that can be used for the Java heap. If the flag is removed, we'd still want to lower the maximum memory, so it makes sense to have the code in `Arguments::set_heap_size()` in both cases. > > Testing: > * Currently running Oracle's tier1-2 > * Local test with `java -XX:+NeverActAsServerClassMachine -Xlog:gc+init` to see that the lower limit is reflected in ergonomic heap sizing. Looks good, I see little risk with taking the patch as is. I had one suggestion for making `should_set_client_emulation_mode_flags` a bit more robust / less scary. Sidenote: `NeverActAsServerClassMachine` in the java.md seems to suggest that `MaxRAM` is "The maximum amount of memory that the JVM may use". While its section on `MaxRAM` explicitly says it is only for the java heap. Sidenote2: `NeverActAsServerClassMachine` and `AlwaysActAsServerClassMachine` seem very scewed. First that we allow them to be contradictory, `NeverActAsServerClassMachine` outweighs `AlwaysActAsServerClassMachine`, and we only care about `AlwaysActAsServerClassMachine` for the GC selection, but not the compiler. I wonder if the compiler also should have used `is_server_class_machine`. But regardless, the best thing is probably just to not change behaviour and hopefully be able to remove the flags in the future. At least consolidate them into one flag. Especially after JEP-523. src/hotspot/share/compiler/compilerDefinitions.cpp line 562: > 560: } > 561: } else if (!has_c2() && !is_jvmci_compiler()) { > 562: return true; Pre-existing: Just a note that this `!has_c2` seems so strange, as we set the values that are set inside `set_client_emulation_mode_flags();` to other defaults than those in `c1_globals_.hpp`. So we override those defaults here always. Some defaults seem to be the same but some are different. Which seems very strange. `CICompilerCount` is the only flag which is not from that file but its default is also correctly 1 if c2 has been built out. src/hotspot/share/compiler/compilerDefinitions.cpp line 576: > 574: if (should_set_client_emulation_mode_flags()) { > 575: set_client_emulation_mode_flags(); > 576: } Should there be a comment here which mentions that this decision has already been taken (and acted on) in `GCArguments::set_heap_size`? Someone that is not aware might start ergonomically changing flags between the first and second call to `should_set_client_emulation_mode_flags` which changes its decision. Alternatively `should_set_client_emulation_mode_flags` could be implemented as a set once property. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27765#pullrequestreview-3334091882 PR Review Comment: https://git.openjdk.org/jdk/pull/27765#discussion_r2428057465 PR Review Comment: https://git.openjdk.org/jdk/pull/27765#discussion_r2428065386 From stefank at openjdk.org Tue Oct 14 07:15:16 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 14 Oct 2025 07:15:16 GMT Subject: RFR: 8369658: Client emulation mode sets MaxRAM too late In-Reply-To: References: Message-ID: <9lJi9yul_WYS6_JXQyazgkc9r2JGIdcJr2Y7iOR285I=.68d93617-1917-4e80-a642-b0dd19d27348@github.com> On Mon, 13 Oct 2025 11:25:04 GMT, Joel Sikstr?m wrote: > Hello, > > While working on the proposal for the potential deprecation of MaxRAM (see [JDK-8369347](https://bugs.openjdk.org/browse/JDK-8369347)) I saw that `CompilerConfig::ergo_initialize()` sets the value for `MaxRAM` after ergonomic heap sizing is already done, which is the only place in the VM that cares about `MaxRAM`. I suggest we move setting the value of `MaxRAM` to `Arguments::set_heap_size()` to fix this. > > Even though the `MaxRAM` flag might be deprecated, the code should still account for the fact that client emulation mode might lower the maximum amount of physical memory that can be used for the Java heap. If the flag is removed, we'd still want to lower the maximum memory, so it makes sense to have the code in `Arguments::set_heap_size()` in both cases. > > Testing: > * Currently running Oracle's tier1-2 > * Local test with `java -XX:+NeverActAsServerClassMachine -Xlog:gc+init` to see that the lower limit is reflected in ergonomic heap sizing. > Especially after JEP-523. The `NeverActAsServerClassMachine` and `AlwaysActAsServerClassMachine` flags don't seem to be well-known and frequently used flags. If we are moving towards JEP-523 (Make G1 the Default Garbage Collector in All Environments), should we also take the opportunity to get rid of these flags to lower the maintenance cost / risk of having these flags? ------------- PR Review: https://git.openjdk.org/jdk/pull/27765#pullrequestreview-3334197621 From jsikstro at openjdk.org Tue Oct 14 08:28:04 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 14 Oct 2025 08:28:04 GMT Subject: RFR: 8369658: Client emulation mode sets MaxRAM too late [v2] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 06:49:46 GMT, Axel Boldt-Christmas wrote: >> Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment > > Looks good, I see little risk with taking the patch as is. I had one suggestion for making `should_set_client_emulation_mode_flags` a bit more robust / less scary. > > Sidenote: > `NeverActAsServerClassMachine` in the java.md seems to suggest that `MaxRAM` is "The maximum amount of memory that the JVM may use". While its section on `MaxRAM` explicitly says it is only for the java heap. > > Sidenote2: > `NeverActAsServerClassMachine` and `AlwaysActAsServerClassMachine` seem very scewed. First that we allow them to be contradictory, `NeverActAsServerClassMachine` outweighs `AlwaysActAsServerClassMachine`, and we only care about `AlwaysActAsServerClassMachine` for the GC selection, but not the compiler. I wonder if the compiler also should have used `is_server_class_machine`. But regardless, the best thing is probably just to not change behaviour and hopefully be able to remove the flags in the future. At least consolidate them into one flag. Especially after JEP-523. Thank you for looking at this @xmas92 and @stefank. I agree that it sounds like a good plan to make an effort to get rid of `Never/AlwaysActAsServerClassMachine`. With this change we are able to more smoothly move forward with deprecating `MaxRAM` and `Never/AlwaysActAsServerClassMachine` separately, which I feel is a step in the right direction. > src/hotspot/share/compiler/compilerDefinitions.cpp line 576: > >> 574: if (should_set_client_emulation_mode_flags()) { >> 575: set_client_emulation_mode_flags(); >> 576: } > > Should there be a comment here which mentions that this decision has already been taken (and acted on) in `GCArguments::set_heap_size`? > > Someone that is not aware might start ergonomically changing flags between the first and second call to `should_set_client_emulation_mode_flags` which changes its decision. > > Alternatively `should_set_client_emulation_mode_flags` could be implemented as a set once property. I've added a comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27765#issuecomment-3400680663 PR Review Comment: https://git.openjdk.org/jdk/pull/27765#discussion_r2428330915 From jsikstro at openjdk.org Tue Oct 14 08:28:02 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 14 Oct 2025 08:28:02 GMT Subject: RFR: 8369658: Client emulation mode sets MaxRAM too late [v2] In-Reply-To: References: Message-ID: > Hello, > > While working on the proposal for the potential deprecation of MaxRAM (see [JDK-8369347](https://bugs.openjdk.org/browse/JDK-8369347)) I saw that `CompilerConfig::ergo_initialize()` sets the value for `MaxRAM` after ergonomic heap sizing is already done, which is the only place in the VM that cares about `MaxRAM`. I suggest we move setting the value of `MaxRAM` to `Arguments::set_heap_size()` to fix this. > > Even though the `MaxRAM` flag might be deprecated, the code should still account for the fact that client emulation mode might lower the maximum amount of physical memory that can be used for the Java heap. If the flag is removed, we'd still want to lower the maximum memory, so it makes sense to have the code in `Arguments::set_heap_size()` in both cases. > > Testing: > * Currently running Oracle's tier1-2 > * Local test with `java -XX:+NeverActAsServerClassMachine -Xlog:gc+init` to see that the lower limit is reflected in ergonomic heap sizing. Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27765/files - new: https://git.openjdk.org/jdk/pull/27765/files/33d4f066..d812f850 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27765&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27765&range=00-01 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27765.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27765/head:pull/27765 PR: https://git.openjdk.org/jdk/pull/27765 From epeter at openjdk.org Tue Oct 14 08:35:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 08:35:45 GMT Subject: RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v6] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 13:10:56 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8369448-VTransform-reduction-out-of-loop >> - prettify code >> - for Christian part 3 >> - for Christian part 2 >> - for Christian part 1 >> - For Vladimir K7 >> - documentation >> - better tracing >> - rm scalar_opcode >> - a few todos >> - ... and 5 more: https://git.openjdk.org/jdk/compare/3815c61c...5c0e11a5 > > Update looks good, thanks for carefully addressing my suggestions and the offline discussion! And good work to finally getting to the cost model PR next :-) @chhagedorn @vnkozlov Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27704#issuecomment-3400709603 From epeter at openjdk.org Tue Oct 14 08:35:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 08:35:47 GMT Subject: Integrated: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize In-Reply-To: References: Message-ID: <1aCPoE0900nImTRZAfrTBQ7x-L3JUuq3ixbacrNE2J0=.0bcc01f6-1b83-48c7-b4df-9db6a5471302@github.com> On Wed, 8 Oct 2025 19:42:38 GMT, Emanuel Peter wrote: > I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR: > https://github.com/openjdk/jdk/pull/20964 > [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093) > > This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier. > > This should be the last one before Cost Modeling, which will enable us to vectorize more reductions ? > > -------------------------- > > **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE. > > **Details** > Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after. > > Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state. > > **Future Work** > - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093) > - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable. This pull request has now been integrated. Changeset: 4786f8be Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/4786f8bee5c79c1bcf652758a25360b4d308ce1c Stats: 738 lines in 10 files changed: 397 ins; 336 del; 5 mod 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/27704 From chagedorn at openjdk.org Tue Oct 14 08:46:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Oct 2025 08:46:23 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v2] In-Reply-To: References: Message-ID: > The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. > > #### Reduce Execution Time by not Executing the Scenarios > I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. > > To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. > > #### Changes > - Verification without actually running scenarios. > - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. > - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. > - Refactored the test a little more. > - Refactored some small things in `addCrossProductScenarios()` while looking at it. > - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. > > #### Execution Time Comparison > Measured on my local machine: > - Mainline: ~80s > - With patch: ~2-3s > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8369232 - add missing test - 8369236: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27672/files - new: https://git.openjdk.org/jdk/pull/27672/files/2875fef2..0ea118dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27672&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27672&range=00-01 Stats: 18051 lines in 332 files changed: 12323 ins; 4544 del; 1184 mod Patch: https://git.openjdk.org/jdk/pull/27672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27672/head:pull/27672 PR: https://git.openjdk.org/jdk/pull/27672 From chagedorn at openjdk.org Tue Oct 14 08:46:25 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Oct 2025 08:46:25 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v2] In-Reply-To: References: Message-ID: <4R3ZHjKBFR4P_7Lh2UtCtAYZzODeWk_87l4FKVwlVss=.e8cb4ce5-4ac2-49ef-98da-d4d1008b8a8b@github.com> On Mon, 13 Oct 2025 14:19:07 GMT, Emanuel Peter wrote: > Generally looks ok. But I'm still a little sad that we don't even get to test a single case end-to-end now. Can we not do at least a 2x2 case? Thanks for your review! That's true, we don't have any end-to-end tests anymore now. I thought about keeping one but the problem is even with a 2x2, we will execute 4 scenarios, i.e. spawning 4 test VMs. But if we assume that, whenever we have the right `Scenario` objects on the `TestFramework.scenarios` list, we can trust the IR framework to do the right things, then it's not required. We currently have scenario end-to-end tests here which should us give this confidence: https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenarios.java But I see a missing gap here: `TestScenarios.java` only does end-to-end testing by using `TestFramework.addScenarios()`. But `TestScenariosCrossProduct.java` only checks the `TestFramework.scenarios` object. By only looking at this as a black box, `TestFramework.addScenarios()` could use something different from `TestFramework.scenarios`. Thus, I suggest to add another test that verifies that `TestFramework.addCrossProductScenarios()` creates the same `TestFramework.scenarios` state as if we used `TestFramework.addScenarios()`. I pushed an update with such a test. Let me know what you think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27672#issuecomment-3400746505 From epeter at openjdk.org Tue Oct 14 08:51:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 08:51:17 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v2] In-Reply-To: <4R3ZHjKBFR4P_7Lh2UtCtAYZzODeWk_87l4FKVwlVss=.e8cb4ce5-4ac2-49ef-98da-d4d1008b8a8b@github.com> References: <4R3ZHjKBFR4P_7Lh2UtCtAYZzODeWk_87l4FKVwlVss=.e8cb4ce5-4ac2-49ef-98da-d4d1008b8a8b@github.com> Message-ID: On Tue, 14 Oct 2025 08:41:49 GMT, Christian Hagedorn wrote: >> Generally looks ok. But I'm still a little sad that we don't even get to test a single case end-to-end now. Can we not do at least a 2x2 case? > >> Generally looks ok. But I'm still a little sad that we don't even get to test a single case end-to-end now. Can we not do at least a 2x2 case? > > Thanks for your review! That's true, we don't have any end-to-end tests anymore now. I thought about keeping one but the problem is even with a 2x2, we will execute 4 scenarios, i.e. spawning 4 test VMs. But if we assume that, whenever we have the right `Scenario` objects on the `TestFramework.scenarios` list, we can trust the IR framework to do the right things, then it's not required. > > We currently have scenario end-to-end tests here which should us give this confidence: > https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenarios.java > > But I see a missing gap here: `TestScenarios.java` only does end-to-end testing by using `TestFramework.addScenarios()`. But `TestScenariosCrossProduct.java` only checks the `TestFramework.scenarios` object. By only looking at this as a black box, `TestFramework.addScenarios()` could use something different from `TestFramework.scenarios`. Thus, I suggest to add another test that verifies that `TestFramework.addCrossProductScenarios()` creates the same `TestFramework.scenarios` state as if we used `TestFramework.addScenarios()`. I pushed an update with such a test. Let me know what you think. @chhagedorn How much time would it really take for an end-to-end test with 2x2? The issue up to now was that we had multiple end-to-end tests, and that cumulatively was very much. But just a 2x2 might be some extra seconds, but not too many? There is just always a risk that we test the wrong "partial" things if we don't do end-to-end. That's just my intuition, I can also be ok with what you suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27672#issuecomment-3400769600 From chagedorn at openjdk.org Tue Oct 14 08:58:46 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Oct 2025 08:58:46 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v2] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 14:17:20 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8369232 >> - add missing test >> - 8369236: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 251: > >> 249: continue outer; >> 250: } >> 251: } > > You could probably do this with an `stream().anyMatch()`: > Suggestion: > > if (scenarios.stream().anyMatch(s -> s.equals(expectedScenarioFlags))) { > continue; > } > > You may even be able to further simplify the lambda in there, to get rid of the `s`. Thanks for the suggestion. `s` in your code is a `Scenario` while `expectedScenarioFlags` is a `Set`. But maybe you meant, to first `map()` the scenarios to `getFlags()` and then do the comparison. That would work but `getFlags()` returns a `List`. We first need to convert it to a `Set` in order to call `equals()` which ignores the order. It could look like this: if (scenariosFromCrossProduct.stream() .map(Scenario::getFlags) .map(HashSet::new) .anyMatch(flags -> flags.equals(expectedScenarioFlags))) { continue; } ``` I'm fine with both. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27672#discussion_r2428416151 From aboldtch at openjdk.org Tue Oct 14 09:09:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 14 Oct 2025 09:09:56 GMT Subject: RFR: 8369658: Client emulation mode sets MaxRAM too late [v2] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 08:28:02 GMT, Joel Sikstr?m wrote: >> Hello, >> >> While working on the proposal for the potential deprecation of MaxRAM (see [JDK-8369347](https://bugs.openjdk.org/browse/JDK-8369347)) I saw that `CompilerConfig::ergo_initialize()` sets the value for `MaxRAM` after ergonomic heap sizing is already done, which is the only place in the VM that cares about `MaxRAM`. I suggest we move setting the value of `MaxRAM` to `Arguments::set_heap_size()` to fix this. >> >> Even though the `MaxRAM` flag might be deprecated, the code should still account for the fact that client emulation mode might lower the maximum amount of physical memory that can be used for the Java heap. If the flag is removed, we'd still want to lower the maximum memory, so it makes sense to have the code in `Arguments::set_heap_size()` in both cases. >> >> Testing: >> * Currently running Oracle's tier1-2 >> * Local test with `java -XX:+NeverActAsServerClassMachine -Xlog:gc+init` to see that the lower limit is reflected in ergonomic heap sizing. > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27765#pullrequestreview-3334624012 From epeter at openjdk.org Tue Oct 14 09:14:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 09:14:21 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v2] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 08:54:55 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 251: >> >>> 249: continue outer; >>> 250: } >>> 251: } >> >> You could probably do this with an `stream().anyMatch()`: >> Suggestion: >> >> if (scenarios.stream().anyMatch(s -> s.equals(expectedScenarioFlags))) { >> continue; >> } >> >> You may even be able to further simplify the lambda in there, to get rid of the `s`. > > Thanks for the suggestion. `s` in your code is a `Scenario` while `expectedScenarioFlags` is a `Set`. But maybe you meant, to first `map()` the scenarios to `getFlags()` and then do the comparison. That would work but `getFlags()` returns a `List`. We first need to convert it to a `Set` in order to call `equals()` which ignores the order. It could look like this: > > if (scenariosFromCrossProduct.stream() > .map(Scenario::getFlags) > .map(HashSet::new) > .anyMatch(flags -> flags.equals(expectedScenarioFlags))) { > continue; > } > ``` > I'm fine with both. Hmm yes, I suppose I overlooked some details. I'm fine with both as well. Now my solution looks not as simple as I had hoped it would. Instead of making it a `HashSet`, I would use `Set::copyOf`. Actually, you should do that anyway. It gives you an unmodifyable set, and allows the JDK to pick what it prefers for that. But this is really a nit upon a nit ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27672#discussion_r2428461910 From mchevalier at openjdk.org Tue Oct 14 09:22:32 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 14 Oct 2025 09:22:32 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v2] In-Reply-To: References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: On Mon, 13 Oct 2025 14:35:30 GMT, Roland Westrelin wrote: > They would have to all go through that transition. For consistency yes. But yet, I think I recall some functions that are not called with a compile-time constant, so we can't do that everywhere. Technically, calling a function that takes it as parameter from the templated version, and just passing our template argument is fine. What is not (easily) possible is normal parameter -> template. But again, that was just "for fun". ------------- PR Comment: https://git.openjdk.org/jdk/pull/27725#issuecomment-3400894799 From epeter at openjdk.org Tue Oct 14 09:43:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 09:43:41 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v4] In-Reply-To: References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: On Mon, 13 Oct 2025 14:38:50 GMT, Roland Westrelin wrote: >> This change refactor code that's similar for LShiftINode and >> LShiftLNode into shared methods. I also added extra test cases to >> cover all transformations. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - review > - Merge branch 'master' into JDK-8369167 > - review > - sort headers > - more > - more > - more > - more > - more > - fix Seems good to me, thanks for this cleanup @rwestrel ! I have only a few minor suggestions. src/hotspot/share/opto/mulnode.cpp line 1082: > 1080: // Left input is an add of a constant? > 1081: const TypeInteger* t12 = phase->type(add1->in(2))->isa_integer(bt); > 1082: if (t12 && t12->is_con()) { // Left input is an add of a con? Suggestion: if (t12 != nullptr && t12->is_con()) { // Left input is an add of a con? Implicit null check not allowed by hotspot style guide, so we should fix it when we touch it ;) src/hotspot/share/opto/mulnode.cpp line 1084: > 1082: if (t12 && t12->is_con()) { // Left input is an add of a con? > 1083: // Compute X << con0 > 1084: Node *lsh = phase->transform(LShiftNode::make( add1->in(1), in(2), bt)); Suggestion: Node* lsh = phase->transform(LShiftNode::make(add1->in(1), in(2), bt)); src/hotspot/share/opto/mulnode.cpp line 1086: > 1084: Node *lsh = phase->transform(LShiftNode::make( add1->in(1), in(2), bt)); > 1085: // Compute X< 1086: return AddNode::make( lsh, phase->integercon(java_shift_left(t12->get_con_as_long(bt), con, bt), bt), bt); Suggestion: return AddNode::make(lsh, phase->integercon(java_shift_left(t12->get_con_as_long(bt), con, bt), bt), bt); Fixing spaces src/hotspot/share/opto/mulnode.cpp line 1189: > 1187: Node* LShiftINode::Ideal(PhaseGVN *phase, bool can_reshape) { > 1188: return IdealIL(phase, can_reshape, T_INT); > 1189: } I fear that putting the comments here will make them go out of date quicker than putting them at `IdealIL`. Seems the list here may not even be complete. What about this one? `((x >> C1) & Y) << C2` There could be more. src/hotspot/share/opto/mulnode.cpp line 1225: > 1223: > 1224: uint shift = r2->get_con(); > 1225: shift &= bits_per_java_integer(bt)-1; // semantics of Java shifts Suggestion: shift &= bits_per_java_integer(bt) - 1; // semantics of Java shifts src/hotspot/share/opto/mulnode.cpp line 1256: > 1254: > 1255: //------------------------------Value------------------------------------------ > 1256: // A LShiftINode shifts its input2 left by input1 amount. I would remove such a comment, it is rather useless here. If anything, such a comment belongs at the class definition. src/hotspot/share/opto/mulnode.cpp line 1282: > 1280: // Also collapse nested left-shifts with constant rhs: > 1281: // (X << con1) << con2 ==> X << (con1 + con2) > 1282: Node* LShiftLNode::Ideal(PhaseGVN* phase, bool can_reshape) { Same here: comment will go out of sync because it is not close to the implementation. I would move it closer to the implementation. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27725#pullrequestreview-3334118372 PR Review Comment: https://git.openjdk.org/jdk/pull/27725#discussion_r2428076140 PR Review Comment: https://git.openjdk.org/jdk/pull/27725#discussion_r2428076683 PR Review Comment: https://git.openjdk.org/jdk/pull/27725#discussion_r2428079381 PR Review Comment: https://git.openjdk.org/jdk/pull/27725#discussion_r2428512329 PR Review Comment: https://git.openjdk.org/jdk/pull/27725#discussion_r2428518490 PR Review Comment: https://git.openjdk.org/jdk/pull/27725#discussion_r2428527472 PR Review Comment: https://git.openjdk.org/jdk/pull/27725#discussion_r2428531179 From epeter at openjdk.org Tue Oct 14 09:46:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 09:46:40 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 11:38:07 GMT, Beno?t Maillard wrote: >> test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 123: >> >>> 121: public static void main(String[] t) { >>> 122: try { >>> 123: test(t); >> >> Suggestion: >> >> test(t); >> throw new RuntimeException("The expected NPE do not seen"); > > Thanks for the suggestion. I would argue that this does not really add value, as this essentially boils down to checking that accessing an uninitialized reference throws a `NullPointerException`, which is not really what this test is about. I would rather keep it specific. @benoitmaillard Drive-by comment: if you can add it, I would. Almost all checks add value, even if they don't add value to exactly what you are testing right now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2428548329 From epeter at openjdk.org Tue Oct 14 09:51:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 09:51:32 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters In-Reply-To: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> References: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> Message-ID: On Fri, 10 Oct 2025 16:27:30 GMT, Martin Doerr wrote: > Disabling the test for Power8 (see JBS issue). test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 40: > 38: * @requires (os.arch != "riscv64" & os.arch != "ppc64" & os.arch != "ppc64le") | > 39: * (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*") | > 40: * ((os.arch == "ppc64" | os.arch == "ppc64le") & vm.cpu.features ~= ".*darn.*") Drive-by comment: This is getting more convoluted now. I think it would make sense to at least document why we are skipping it for all the platforms. You should also state the reason for the change in the PR description as well as on JIRA ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27749#discussion_r2428563097 From qxing at openjdk.org Tue Oct 14 09:53:22 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 14 Oct 2025 09:53:22 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v5] In-Reply-To: References: Message-ID: > In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. > > loop-safepoint > > This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). > > > Benchmark Mode Cnt Score Error Units > LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline > LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch > > > Testing: tier1-2 on x86_64 and aarch64. Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: - Update comments in `loopnode.cpp` - Move IR test to `compiler.loopopts` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23057/files - new: https://git.openjdk.org/jdk/pull/23057/files/ba6e7e79..b52f7ba1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=03-04 Stats: 23 lines in 2 files changed: 3 ins; 1 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/23057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23057/head:pull/23057 PR: https://git.openjdk.org/jdk/pull/23057 From qxing at openjdk.org Tue Oct 14 09:53:29 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 14 Oct 2025 09:53:29 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: On Tue, 16 Sep 2025 05:44:08 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve documentation comments > > src/hotspot/share/opto/loopnode.cpp line 3818: > >> 3816: // / | | >> 3817: // v +--+ >> 3818: // exit 4 > > This drawing seems a bit confusing. There seem to be 3 edges coming out of 2. > Do you think you could fix it too, just to create more clarity in the code? I've re-drawn the graph. > src/hotspot/share/opto/loopnode.cpp line 3830: > >> 3828: // >> 3829: // The insights into the problem: >> 3830: // A) Counted loops are okay > > What does it mean to be "okay"? Why are they "okay"? Added more comments. > It seems the logic was: only outer loops need to mark safepoints for protection, because only loops further in can remove safepoints. Is that still correct? That's correct. Updated this comment. > test/hotspot/jtreg/compiler/c2/irTests/TestLoopSafepoint.java line 24: > >> 22: */ >> 23: >> 24: package compiler.c2.irTests; > > We'd like to get away from putting all IR tests in `irTests`, and we'd rather put them into thematic directories. > Proposal: `compiler/loopopts/TestRedundantSafePointElimination.java` Moved to `compiler/loopopts/TestRedundantSafepointElimination.java`. > test/hotspot/jtreg/compiler/c2/irTests/TestLoopSafepoint.java line 33: > >> 31: * @summary Tests that redundant safepoints can be eliminated in loops. >> 32: * @library /test/lib / >> 33: * @requires vm.compiler2.enabled > > Is this `@requires` strictly required? If not, remove it so we can run these tests also with C1 and other compilers. Removed the `@requires`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2428559736 PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2428561137 PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2428563274 PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2428564441 PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2428565010 From rsunderbabu at openjdk.org Tue Oct 14 10:51:37 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Tue, 14 Oct 2025 10:51:37 GMT Subject: RFR: 8369806: Remove nsk/jvmti/AttachOnDemand/attach020 from problemlist Message-ID: vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java no longer fails with latest JDK. Probably GC got better during this time. I tested with both -Xcomp and with -Xcomp -XX:UseZGC in latest builds. It was run for 100 iterations in all debug platforms. I was not able to reproduce the issue. So, unproblemlisting this test to see if the issue still persists. ------------- Commit messages: - 8369806: Remove nsk/jvmti/AttachOnDemand/attach020 from problemlist Changes: https://git.openjdk.org/jdk/pull/27795/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27795&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369806 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27795/head:pull/27795 PR: https://git.openjdk.org/jdk/pull/27795 From roland at openjdk.org Tue Oct 14 11:36:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 Oct 2025 11:36:11 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v2] In-Reply-To: <7Lx4UPBkbTwHAlXvmg2ekbKfZ2Z9GNmN9Kywkje5dxI=.ca6adbf4-d1b9-4efa-ab00-03d2bb84562b@github.com> References: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> <-B8WCJ970Pbuh7Ur4Hz51ABRdASG2SQHqqCCF2kMd6A=.63385692-29f7-4e5e-8179-d5b369878af1@github.com> <6baIrfwlGjkQPIVogY2aIX6VzQainACv_-4IsVXWOpg=.67d79427-9320-4a0a-93ef-d932bdf5eb58@github.com> <7Lx4UPBkbTwHAlXvmg2ekbKfZ2Z9GNmN9Kywkje5dxI=.ca6adbf4-d1b9-4efa-ab00-03d2bb84562b@github.com> Message-ID: On Tue, 14 Oct 2025 06:09:06 GMT, Roberto Casta?eda Lozano wrote: > > I wonder if we should do a renaming then? > > I agree, have been confused by this in the past as well. Maybe `BulkCopyNode` would be a better name? I filed https://bugs.openjdk.org/browse/JDK-8369821 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27604#issuecomment-3401353690 From roland at openjdk.org Tue Oct 14 11:41:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 Oct 2025 11:41:14 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v3] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 15:51:55 GMT, Emanuel Peter wrote: > > inlined2() calls clone() for an object loaded from field field > > that has inexact type A at parse time. The intrinsic for clone() > > inserts an Allocate and an ArrayCopy nodes. When igvn runs, the > > load of field is optimized out because it reads back a newly > > allocated B written to field in the same method. ArrayCopy can > > now be optimized because the type of its src input is known. The > > type of its dest input is the CheckCastPP from the allocation of > > the cloned object created at parse time. That one has type A. A > > series of Loads/Stores are created to copy the fields of class B > > from src (of type B) to dest of (type A). > > I'm still struggling to understand. I wonder if the test can be further simplified to make the case more clear. > > Am I understanding right, that we essentially this: > > ``` > field = new B(42, 42, 42); > A a = field; > return (A)a.clone(); > ``` > > What should the result of that be? An `A` or a `B`? I think we should be getting a `B`, right? So why is the `dest` of the `ArrayCopy` an `A`? Is that even correct? The type of `a` is initially `A` and inexact (that is `A` or some subclass). The type of the result of the clone is the same. The type of `a` is then refined to `B` exact (so only class `B`). `A` inexact is correct as dest for `ArrayCopy`. `B` exact is correct too. `A` exact would be incorrect. > > > The fix I propose is to skip ArrayCopyNode::try_clone_instance() > > when src and dest classes don't match as this seems like a rare > > enough corner case. > > How do you know that this is a rare case? Did you do some kind of profiling / benchmarking? Instead of the test that I propose here, I initially added an assert. I ran quite a bit of testing with it (ctw + jtreg) and, if I remember correctly, there was a single failure with one of compiler jtreg tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27604#issuecomment-3401367577 From roland at openjdk.org Tue Oct 14 11:42:53 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 Oct 2025 11:42:53 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v5] In-Reply-To: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: > This change refactor code that's similar for LShiftINode and > LShiftLNode into shared methods. I also added extra test cases to > cover all transformations. Roland Westrelin has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27725/files - new: https://git.openjdk.org/jdk/pull/27725/files/040d8541..6865e1c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27725&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27725&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27725/head:pull/27725 PR: https://git.openjdk.org/jdk/pull/27725 From mbaesken at openjdk.org Tue Oct 14 11:43:11 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 14 Oct 2025 11:43:11 GMT Subject: RFR: 8369642: [ubsan] nmethod::nmethod null pointer passed as argument 2 to memcpy In-Reply-To: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> References: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> Message-ID: On Mon, 13 Oct 2025 23:56:20 GMT, Chad Rakoczy wrote: > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced a counter so that the nmethod immutable data can be shared between relocated nmethods to eliminate an unnecessary copy. The counter is aligned in memory so that must be taken into account when calculating the amount of memory used by the counter This fixes the ubsan - issue reported ! ------------- Marked as reviewed by mbaesken (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27778#pullrequestreview-3335149901 From roland at openjdk.org Tue Oct 14 11:59:27 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 Oct 2025 11:59:27 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v6] In-Reply-To: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: > This change refactor code that's similar for LShiftINode and > LShiftLNode into shared methods. I also added extra test cases to > cover all transformations. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27725/files - new: https://git.openjdk.org/jdk/pull/27725/files/6865e1c4..29edbe64 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27725&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27725&range=04-05 Stats: 15 lines in 1 file changed: 1 ins; 12 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27725/head:pull/27725 PR: https://git.openjdk.org/jdk/pull/27725 From roland at openjdk.org Tue Oct 14 11:59:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 Oct 2025 11:59:30 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v4] In-Reply-To: References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: On Tue, 14 Oct 2025 09:40:26 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - review >> - Merge branch 'master' into JDK-8369167 >> - review >> - sort headers >> - more >> - more >> - more >> - more >> - more >> - fix > > Seems good to me, thanks for this cleanup @rwestrel ! I have only a few minor suggestions. @eme64 new commit should address you comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27725#issuecomment-3401421638 From dbriemann at openjdk.org Tue Oct 14 13:58:05 2025 From: dbriemann at openjdk.org (David Briemann) Date: Tue, 14 Oct 2025 13:58:05 GMT Subject: RFR: 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 00:47:01 GMT, Dean Long wrote: > If all CPU ports follow this, then it seems like we could eventually implement JavaFrameAnchor in shared code without CPU-specific parts. Seems like a good idea. However there are still differences for the different CPUs. E.g. aarch64 still contains a release memory barrier. This might be a good follow-up task. I would like to deliver this cleanup for PPC first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27768#issuecomment-3402020226 From mdoerr at openjdk.org Tue Oct 14 14:26:48 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Oct 2025 14:26:48 GMT Subject: RFR: 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers In-Reply-To: References: Message-ID: <8YL9De1t7kWVVeAMUMM5qFcm5UftcHklTJAmg8UVoLU=.45e089f9-d719-41ee-8e11-26849f81e956@github.com> On Mon, 13 Oct 2025 11:48:42 GMT, David Briemann wrote: > No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler and only observers the thread its running on. LGTM. src/hotspot/cpu/ppc/javaFrameAnchor_ppc.hpp line 76: > 74: intptr_t* last_Java_fp() const { return *(intptr_t**)_last_Java_sp; } > 75: > 76: void set_last_Java_sp(intptr_t* sp) { _last_Java_sp = sp; } Would be nice to remove 1 whitespace. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27768#pullrequestreview-3335984117 PR Review Comment: https://git.openjdk.org/jdk/pull/27768#discussion_r2429400259 From lmesnik at openjdk.org Tue Oct 14 14:43:18 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 14 Oct 2025 14:43:18 GMT Subject: RFR: 8369806: Remove nsk/jvmti/AttachOnDemand/attach020 from problemlist In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 10:28:46 GMT, Ramkumar Sunderbabu wrote: > vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java no longer fails with latest JDK. Probably GC got better during this time. > I tested with both -Xcomp and with -Xcomp -XX:UseZGC in latest builds. It was run for 100 iterations in all debug platforms. I was not able to reproduce the issue. > So, unproblemlisting this test to see if the issue still persists. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27795#pullrequestreview-3336059985 From duke at openjdk.org Tue Oct 14 14:51:47 2025 From: duke at openjdk.org (duke) Date: Tue, 14 Oct 2025 14:51:47 GMT Subject: RFR: 8369806: Remove nsk/jvmti/AttachOnDemand/attach020 from problemlist In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 10:28:46 GMT, Ramkumar Sunderbabu wrote: > vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java no longer fails with latest JDK. Probably GC got better during this time. > I tested with both -Xcomp and with -Xcomp -XX:UseZGC in latest builds. It was run for 100 iterations in all debug platforms. I was not able to reproduce the issue. > So, unproblemlisting this test to see if the issue still persists. @rsunderbabu Your change (at version 7e36b5c504ca6883d6234dc1a51a1f13ea03b791) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27795#issuecomment-3402247193 From chagedorn at openjdk.org Tue Oct 14 15:33:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Oct 2025 15:33:55 GMT Subject: RFR: 8369806: Remove nsk/jvmti/AttachOnDemand/attach020 from problemlist In-Reply-To: References: Message-ID: <2nxhM1PC-jCLulC49lDokJuD8LxRaK77WiWHqawgUiE=.43cfd50c-34f2-4f2c-807f-db44323157f1@github.com> On Tue, 14 Oct 2025 10:28:46 GMT, Ramkumar Sunderbabu wrote: > vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java no longer fails with latest JDK. Probably GC got better during this time. > I tested with both -Xcomp and with -Xcomp -XX:UseZGC in latest builds. It was run for 100 iterations in all debug platforms. I was not able to reproduce the issue. > So, unproblemlisting this test to see if the issue still persists. Looks good and trivial, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27795#pullrequestreview-3336287075 From rsunderbabu at openjdk.org Tue Oct 14 15:36:58 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Tue, 14 Oct 2025 15:36:58 GMT Subject: Integrated: 8369806: Remove nsk/jvmti/AttachOnDemand/attach020 from problemlist In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 10:28:46 GMT, Ramkumar Sunderbabu wrote: > vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java no longer fails with latest JDK. Probably GC got better during this time. > I tested with both -Xcomp and with -Xcomp -XX:UseZGC in latest builds. It was run for 100 iterations in all debug platforms. I was not able to reproduce the issue. > So, unproblemlisting this test to see if the issue still persists. This pull request has now been integrated. Changeset: 64ff7062 Author: Ramkumar Sunderbabu Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/64ff7062c1cef13d16acddbcaf5401d7c2ad6dc0 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8369806: Remove nsk/jvmti/AttachOnDemand/attach020 from problemlist Reviewed-by: lmesnik, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/27795 From chagedorn at openjdk.org Tue Oct 14 15:42:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Oct 2025 15:42:11 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v3] In-Reply-To: References: Message-ID: > The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. > > #### Reduce Execution Time by not Executing the Scenarios > I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. > > To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. > > #### Changes > - Verification without actually running scenarios. > - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. > - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. > - Refactored the test a little more. > - Refactored some small things in `addCrossProductScenarios()` while looking at it. > - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. > > #### Execution Time Comparison > Measured on my local machine: > - Mainline: ~80s > - With patch: ~2-3s > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: add simple end to end test + review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27672/files - new: https://git.openjdk.org/jdk/pull/27672/files/0ea118dc..9331b65a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27672&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27672&range=01-02 Stats: 72 lines in 1 file changed: 53 ins; 12 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/27672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27672/head:pull/27672 PR: https://git.openjdk.org/jdk/pull/27672 From chagedorn at openjdk.org Tue Oct 14 15:42:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Oct 2025 15:42:13 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v2] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 08:46:23 GMT, Christian Hagedorn wrote: >> The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. >> >> #### Reduce Execution Time by not Executing the Scenarios >> I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. >> >> To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. >> >> #### Changes >> - Verification without actually running scenarios. >> - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. >> - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. >> - Refactored the test a little more. >> - Refactored some small things in `addCrossProductScenarios()` while looking at it. >> - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. >> >> #### Execution Time Comparison >> Measured on my local machine: >> - Mainline: ~80s >> - With patch: ~2-3s >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8369232 > - add missing test > - 8369236: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out After discussing it further offline (thanks for the discussion!), you convinced me that we should at least add a sanity end-to-end test to verify the `TestFramework.addCrossProductScenarios()` is properly working. I added such a test but removed the reliance on IR matching with flag constraints matching. I instead parse the stderr output to search for the used scenario flags. I also pushed an update about the `Set::copyOf` comment further up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27672#issuecomment-3402467737 From epeter at openjdk.org Tue Oct 14 15:47:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 15:47:14 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v3] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 15:42:11 GMT, Christian Hagedorn wrote: >> The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. >> >> #### Reduce Execution Time by not Executing the Scenarios >> I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. >> >> To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. >> >> #### Changes >> - Verification without actually running scenarios. >> - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. >> - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. >> - Refactored the test a little more. >> - Refactored some small things in `addCrossProductScenarios()` while looking at it. >> - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. >> >> #### Execution Time Comparison >> Measured on my local machine: >> - Mainline: ~80s >> - With patch: ~2-3s >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > add simple end to end test + review comment Nice, thanks for the updates. Looks good to me now :) test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 266: > 264: private static void assertSameResultWhenManuallyAdding(List scenariosFromCrossProduct, > 265: Set> expectedScenariosWithFlags) { > 266: List expectedScenarios = getScenariosWIthFlags(expectedScenariosWithFlags); Suggestion: List expectedScenarios = getScenariosWithFlags(expectedScenariosWithFlags); typo? test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 271: > 269: } > 270: > 271: private static List getScenariosWIthFlags(Set> expectedScenariosWithFlags) { Suggestion: private static List getScenariosWithFlags(Set> expectedScenariosWithFlags) { looks like a typo test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 332: > 330: // Expected. > 331: System.setErr(originalErr); > 332: Asserts.assertTrue(e.getMessage().contains("The following scenarios have failed: #0, #1, #2, #3")); Can you somehow check that there is nothing after the `#3`? Just to make sure we don't have a `#4` ;) Optional if too much work. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27672#pullrequestreview-3336331741 PR Review Comment: https://git.openjdk.org/jdk/pull/27672#discussion_r2429647560 PR Review Comment: https://git.openjdk.org/jdk/pull/27672#discussion_r2429646743 PR Review Comment: https://git.openjdk.org/jdk/pull/27672#discussion_r2429651439 From mhaessig at openjdk.org Tue Oct 14 15:54:13 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 14 Oct 2025 15:54:13 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v4] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 05:13:36 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix TestMethodArguments.java after merge with master Thank you for your continued effort on this, @eme64! Better to do this now than later. I have so far only looked at the tutorial, but came up with some questions. I will continue looking at the rest, later. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 423: > 421: """ > 422: static int v3d_#{x} = #a + #b; > 423: """ I do not understand how `a` escapes the let without a `transparentScope()` like for `b`. Also the paragraph above does not really explain what the trick is that lets us simulate a lambda-less `let`. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 469: > 467: // the top of the class, and insert a field. > 468: // > 469: // The choice of transparency of an insertion scope is quite important. A common use case What is an "insertion scope"? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 498: > 496: // > 497: // Which method is used is up to the user. General guidance is if the same code may be > 498: // inserted elsewhere, one should lean towards inserting templates. But in many cases Suggestion: // Which method is used is up to the user. General guidance is if the same code may also // be inserted elsewhere, one should lean towards inserting templates. But in many cases Just a small tweak to emphasize multiple usages. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 521: > 519: // Anchoring a Hook creates a scope, spanning the braces of the > 520: // "anchor" call. Any Hook.insert that happens inside this scope > 521: // goes to the top of that scope. This first sentence is a bit strange when we have to write an explicit `scope` since we have to write it. But I do not have a better wording. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 577: > 575: static { System.out.println("Defining static field $field"); } > 576: public static int $field = #value; > 577: """ Why do we not just write `public static int $field = 5;`? Is this just for demonstration purposes, or am I missing something more fundamental? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 831: > 829: // Define a static field. > 830: // Note: it is very important that we use a "transparentScope" for the template here, > 831: // so that the DataName can escape to outer scopes. We could also use `hashtagScope` and achieve the same thing, could we not? ------------- PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3336158396 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429525666 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429538149 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429547199 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429575893 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429591266 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429605801 From epeter at openjdk.org Tue Oct 14 15:57:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 15:57:55 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v4] In-Reply-To: References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: On Tue, 14 Oct 2025 11:55:36 GMT, Roland Westrelin wrote: >> Seems good to me, thanks for this cleanup @rwestrel ! I have only a few minor suggestions. > > @eme64 new commit should address you comments. @rwestrel thanks for the updates! The code looks good to me now. I'm running internal testing again before approval :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27725#issuecomment-3402552149 From rsunderbabu at openjdk.org Tue Oct 14 16:15:10 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Tue, 14 Oct 2025 16:15:10 GMT Subject: RFR: 8369806: Remove nsk/jvmti/AttachOnDemand/attach020 from problemlist In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 14:41:00 GMT, Leonid Mesnik wrote: >> vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java no longer fails with latest JDK. Probably GC got better during this time. >> I tested with both -Xcomp and with -Xcomp -XX:UseZGC in latest builds. It was run for 100 iterations in all debug platforms. I was not able to reproduce the issue. >> So, unproblemlisting this test to see if the issue still persists. > > Marked as reviewed by lmesnik (Reviewer). Thank you @lmesnik and @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/27795#issuecomment-3402651731 From epeter at openjdk.org Tue Oct 14 16:18:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 16:18:06 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v4] In-Reply-To: References: Message-ID: <5VZjJmyNV6bi5IFFSKV0grs2Cxl2XANDQ7pM5PvH88w=.af7395fe-4a74-413c-a04e-932ac0dcd743@github.com> On Tue, 14 Oct 2025 15:02:10 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix TestMethodArguments.java after merge with master > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 423: > >> 421: """ >> 422: static int v3d_#{x} = #a + #b; >> 423: """ > > I do not understand how `a` escapes the let without a `transparentScope()` like for `b`. Also the paragraph above does not really explain what the trick is that lets us simulate a lambda-less `let`. I added this to the comment: + 419 // + 420 // Below we see the standard use of "let", where we add a hashtag replacement for "a" + 421 // for the rest of the enclosing scope. We then also use a lambda version of "let" + 422 // with a transparent scope, which means that "b" escapes that scope and is also + 423 // available in the enclosing scope. In the implementation of the framework, we + 424 // actually use a "transparentScope", so the standard "let" is really just syntactic + 425 // sugar for the lambda "let" with "transparentScope". Does that help? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429743530 From kvn at openjdk.org Tue Oct 14 16:22:47 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 14 Oct 2025 16:22:47 GMT Subject: RFR: 8369642: [ubsan] nmethod::nmethod null pointer passed as argument 2 to memcpy In-Reply-To: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> References: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> Message-ID: <5ToAoT20_ERusvlz4ZgrGs55kQbb-nCAhYzi5wgU63c=.d7418fcd-0ffb-4657-898a-bb14c018e601@github.com> On Mon, 13 Oct 2025 23:56:20 GMT, Chad Rakoczy wrote: > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced a counter so that the nmethod immutable data can be shared between relocated nmethods to eliminate an unnecessary copy. The counter is aligned in memory so that must be taken into account when calculating the amount of memory used by the counter This is annoying. In all places `ImmutableDataReferencesCounterSize` is referenced we have `align_up(ImmutableDataReferencesCounterSize, oopSize)`. May be we should `#define ImmutableDataReferencesCounterSize oopSize` with comment that we only use 4 bytes for now. We have getter/setter methods which cast to (int*) anyway. src/hotspot/share/code/nmethod.hpp line 654: > 652: #endif > 653: > 654: address immutable_data_references_counter_begin () const { return immutable_data_end() - align_up(ImmutableDataReferencesCounterSize, oopSize) ; } I suggest to move `immutable_data_references_counter_begin()` before `#if INCLUDE_JVMCI` so you can use it instead of duplicating code in `speculations_end()` and `scopes_data_end()`. ------------- PR Review: https://git.openjdk.org/jdk/pull/27778#pullrequestreview-3336502570 PR Review Comment: https://git.openjdk.org/jdk/pull/27778#discussion_r2429757173 From epeter at openjdk.org Tue Oct 14 16:24:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 16:24:16 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v5] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/a855cc4e..816c5b04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Tue Oct 14 16:24:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 16:24:18 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v4] In-Reply-To: References: Message-ID: <4Jw3Qa6VXH1x0wSqPSTEsDgVcdMekzuxZFTJaANK-j0=.e706d148-ed42-44ef-9695-5d7153d3abee@github.com> On Tue, 14 Oct 2025 15:06:00 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix TestMethodArguments.java after merge with master > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 469: > >> 467: // the top of the class, and insert a field. >> 468: // >> 469: // The choice of transparency of an insertion scope is quite important. A common use case > > What is an "insertion scope"? I extended the comment: `insertion scope (the scope that is inserted)`. I also add a definition of 3 relevant scopes: 472 // In this example, we look at the use of Hooks. They allow us to reach back, to outer 473 // scopes. For example, we can reach out from inside a method body to a hook anchored at 474 // the top of the class, and insert a field. 475 // ~ 476 // When we insert to a hook, we have 3 relevant scopes: ~ 477 // - Anchor scope: the scope defined at "hook.anchor(scope(...))" + 478 // - Insertion scope: the scope that is inserted, see "hook.insert(scope(...))" + 479 // - Caller scope: the scope we insert from. + 480 // + 481 // The choice of transparency of an insertion scope (the scope that is inserted) is quite + 482 // important. A common use case is to insert a DataName. 483 // See: generateWithDataNamesForFieldsAndVariables 484 // See: generateWithScopes1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429760095 From epeter at openjdk.org Tue Oct 14 16:28:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 16:28:57 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v4] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 15:18:23 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix TestMethodArguments.java after merge with master > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 521: > >> 519: // Anchoring a Hook creates a scope, spanning the braces of the >> 520: // "anchor" call. Any Hook.insert that happens inside this scope >> 521: // goes to the top of that scope. > > This first sentence is a bit strange when we have to write an explicit `scope` since we have to write it. But I do not have a better wording. Modified it slightly: 531 // We anchor a Hook outside the main method, but inside the Class. ~ 532 // Anchoring a Hook requires the definition of an inner scope, ~ 533 // aka the "anchor scope", spanning the braces of the "anchor" call. ~ 534 // Any Hook.insert that happens inside this scope goes to the top of ~ 535 // that scope. The new wording sounds less misleading. The old wording kinda suggested that the scope may be created implicitly, but we have to do it explicitly, hence `requires the definition of an inner scope`. > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 577: > >> 575: static { System.out.println("Defining static field $field"); } >> 576: public static int $field = #value; >> 577: """ > > Why do we not just write `public static int $field = 5;`? Is this just for demonstration purposes, or am I missing something more fundamental? Just for demonstration purposes, yes. I removed it, because it is not that helpful, and seemed to have confused you ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429770502 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429774863 From epeter at openjdk.org Tue Oct 14 16:38:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 16:38:33 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v6] In-Reply-To: References: Message-ID: <1v9rJr_jz6k9Zqa0dcfhLN1feWAvnnQdiH5n1gc4VX4=.9ec0da19-feaa-4cb1-9f5e-e819a5a4a480@github.com> > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName - improve tutorial for Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/816c5b04..f7d64326 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=04-05 Stats: 28 lines in 1 file changed: 18 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Tue Oct 14 16:38:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 16:38:35 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v4] In-Reply-To: References: Message-ID: <4lZa6u2Nx_IwsZJNA1tk61-GOt96LyQifAqeGnpM91E=.d520542e-aeeb-483a-a58c-08d61bbb916d@github.com> On Tue, 14 Oct 2025 15:51:07 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix TestMethodArguments.java after merge with master > > Thank you for your continued effort on this, @eme64! Better to do this now than later. > > I have so far only looked at the tutorial, but came up with some questions. I will continue looking at the rest, later. @mhaessig Thanks for having a first look! I addressed all your comments / questions. I hope it is a bit better now. > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 831: > >> 829: // Define a static field. >> 830: // Note: it is very important that we use a "transparentScope" for the template here, >> 831: // so that the DataName can escape to outer scopes. > > We could also use `hashtagScope` and achieve the same thing, could we not? Yes, we could. But that's not great style, because hashtags would be implicitly non-transparent at template boundary. But I can add such a comment. 842 // Define a static field. 843 // Note: it is very important that we use a "transparentScope" for the template here, 844 // so that the DataName can escape to outer scopes. + 845 // (We could also use "hashtagScope", since those are also transparent for + 846 // names. But it is not great style, because template boundaries are + 847 // non-transparent for hashtags and setFuelCost anyway. So we might as + 848 // well just use "transparentScope".) 849 var templateStaticField = Template.make("type", (DataName.Type type) -> transparentScope( ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3402730165 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2429785331 From chagedorn at openjdk.org Tue Oct 14 17:09:45 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Oct 2025 17:09:45 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v4] In-Reply-To: References: Message-ID: > The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. > > #### Reduce Execution Time by not Executing the Scenarios > I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. > > To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. > > #### Changes > - Verification without actually running scenarios. > - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. > - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. > - Refactored the test a little more. > - Refactored some small things in `addCrossProductScenarios()` while looking at it. > - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. > > #### Execution Time Comparison > Measured on my local machine: > - Mainline: ~80s > - With patch: ~2-3s > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: review Emanuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27672/files - new: https://git.openjdk.org/jdk/pull/27672/files/9331b65a..b6d18b59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27672&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27672&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27672/head:pull/27672 PR: https://git.openjdk.org/jdk/pull/27672 From epeter at openjdk.org Tue Oct 14 17:09:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Oct 2025 17:09:45 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v4] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 17:06:40 GMT, Christian Hagedorn wrote: >> The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. >> >> #### Reduce Execution Time by not Executing the Scenarios >> I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. >> >> To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. >> >> #### Changes >> - Verification without actually running scenarios. >> - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. >> - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. >> - Refactored the test a little more. >> - Refactored some small things in `addCrossProductScenarios()` while looking at it. >> - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. >> >> #### Execution Time Comparison >> Measured on my local machine: >> - Mainline: ~80s >> - With patch: ~2-3s >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > review Emanuel Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27672#pullrequestreview-3336676761 From chagedorn at openjdk.org Tue Oct 14 17:09:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Oct 2025 17:09:49 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v3] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 15:43:55 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> add simple end to end test + review comment > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 332: > >> 330: // Expected. >> 331: System.setErr(originalErr); >> 332: Asserts.assertTrue(e.getMessage().contains("The following scenarios have failed: #0, #1, #2, #3")); > > Can you somehow check that there is nothing after the `#3`? Just to make sure we don't have a `#4` ;) > > Optional if too much work. Good catch! I can just match the `.` at the end :-) Pushed an update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27672#discussion_r2429874353 From chagedorn at openjdk.org Tue Oct 14 17:49:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Oct 2025 17:49:14 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v4] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 17:09:45 GMT, Christian Hagedorn wrote: >> The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. >> >> #### Reduce Execution Time by not Executing the Scenarios >> I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. >> >> To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. >> >> #### Changes >> - Verification without actually running scenarios. >> - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. >> - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. >> - Refactored the test a little more. >> - Refactored some small things in `addCrossProductScenarios()` while looking at it. >> - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. >> >> #### Execution Time Comparison >> Measured on my local machine: >> - Mainline: ~80s >> - With patch: ~2-3s >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > review Emanuel Thanks Emanuel for your review and the discussions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27672#issuecomment-3402979119 From xgong at openjdk.org Tue Oct 14 18:17:47 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 14 Oct 2025 18:17:47 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: Message-ID: On Wed, 17 Sep 2025 08:48:16 GMT, Xiaohong Gong wrote: >> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. >> >> ### Background >> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. >> >> ### Implementation >> >> #### Challenges >> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. >> >> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: >> - SPECIES_64: Single operation with mask (8 elements, 256-bit) >> - SPECIES_128: Single operation, full register (16 elements, 512-bit) >> - SPECIES_256: Two operations + merge (32 elements, 1024-bit) >> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) >> >> Use `ByteVector.SPECIES_512` as an example: >> - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. >> - It requires 4 times of vector gather-loads to finish the whole operation. >> >> >> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] >> int[] idx = [0, 1, 2, 3, ..., 63, ...] >> >> 4 gather-load: >> idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] >> idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] >> idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] >> idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] >> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] >> >> >> #### Solution >> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. >> >> Here is the main changes: >> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. >> - Added `VectorSliceNode` for result mer... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Add more comments for IRs and added method > - Merge branch 'jdk:master' into JDK-8351623-sve > - Merge 'jdk:master' into JDK-8351623-sve > - Address review comments > - Refine IR pattern and clean backend rules > - Fix indentation issue and move the helper matcher method to header files > - Merge branch jdk:master into JDK-8351623-sve > - 8351623: VectorAPI: Add SVE implementation of subword gather load operation Hi @iwanowww , @PaulSandoz , @eme64 , Hope you?re doing well! I?ve created a prototype that moves the implementation to the Java API level, as suggested (see: https://github.com/XiaohongGong/jdk/pull/8). This refactoring has resulted in significantly cleaner and more maintainable code. Thanks for your insightful feedback @iwanowww ! However, it also introduces new issues that we have to consider. The codegen might **not be optimal**. If we want to generate the optimal instruction sequence, we need more effort. Following is the details: 1) We need a new API to cross-lane shift the lanes for a vector mask, which is used to extract different piece of a vector mask if the whole gather operation needs to be split. Consider it has a `Vector.slice()` API which can implement such a function, I added a similar one for `VectorMask`. There are two new issues that I need to address for this API: - SVE lacks a native instruction for such a mask operation. I have to convert it to a vector, call the Vector.slice(), and then convert back to a mask. Please note that the whole progress is **not SVE friendly**. The performance of such an API will have large gap on SVE compared with other arches. - To generate a SVE optimal instruction, I have to do further IR transformation and optimize the pattern with match rule. I'm not sure whether the optimization will be common enough to be accepted in future. Do you have a better idea on the new added API? I'd like to avoid adding such a performance not friendly API, and the API might not be frequently used in real world. 2) To make the interface uniform across-platforms, each API is defined as the same vector type of the target result, although we need to do separation and merging. However, as the SVE gather-load instruction works with int vector type, we need special handling in compiler IR-level. I'd like to extend `LoadVectorGather{,Masked}` with `mem_bt` to handle subword loads, adjust mask with cast/resize before and append vector cast/reinterpret after. Splitting into simple IRs make it possible for further IR-level optimization. This might make the compiler IRs different across platforms like what it is in current PR. Hence, the compiler change might not be so clean. Does this make sense to you? 3) Further compiler optimization is necessary to optimize out in-efficient instructions. This needs the combination of IR transformation and match rules. I think this might be more complex, and the result is not guaranteed now. I need further implementation. As a summary, the implementation itself of this API is clean. But it introduces more overhead especially for SVE. It's not so easy for me to make a conclusion whether the Java change wins or not. Any suggestion on this? Thanks, Xiaohong ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3400397763 From psandoz at openjdk.org Tue Oct 14 18:17:48 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 14 Oct 2025 18:17:48 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 07:04:28 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Add more comments for IRs and added method >> - Merge branch 'jdk:master' into JDK-8351623-sve >> - Merge 'jdk:master' into JDK-8351623-sve >> - Address review comments >> - Refine IR pattern and clean backend rules >> - Fix indentation issue and move the helper matcher method to header files >> - Merge branch jdk:master into JDK-8351623-sve >> - 8351623: VectorAPI: Add SVE implementation of subword gather load operation > > Hi @iwanowww , @PaulSandoz , @eme64 , > > Hope you?re doing well! > > I?ve created a prototype that moves the implementation to the Java API level, as suggested (see: https://github.com/XiaohongGong/jdk/pull/8). This refactoring has resulted in significantly cleaner and more maintainable code. Thanks for your insightful feedback @iwanowww ! > > However, it also introduces new issues that we have to consider. The codegen might **not be optimal**. If we want to generate the optimal instruction sequence, we need more effort. > > Following is the details: > > 1) We need a new API to cross-lane shift the lanes for a vector mask, which is used to extract different piece of a vector mask if the whole gather operation needs to be split. Consider it has a `Vector.slice()` API which can implement such a function, I added a similar one for `VectorMask`. > > There are two new issues that I need to address for this API: > - SVE lacks a native instruction for such a mask operation. I have to convert it to a vector, call the Vector.slice(), and then convert back to a mask. Please note that the whole progress is **not SVE friendly**. The performance of such an API will have large gap on SVE compared with other arches. > - To generate a SVE optimal instruction, I have to do further IR transformation and optimize the pattern with match rule. I'm not sure whether the optimization will be common enough to be accepted in future. > > Do you have a better idea on the new added API? I'd like to avoid adding such a performance not friendly API, and the API might not be frequently used in real world. > > 2) To make the interface uniform across-platforms, each API is defined as the same vector type of the target result, although we need to do separation and merging. However, as the SVE gather-load instruction works with int vector type, we need special handling in compiler IR-level. > > I'd like to extend `LoadVectorGather{,Masked}` with `mem_bt` to handle subword loads, adjust mask with cast/resize before and append vector cast/reinterpret after. Splitting into simple IRs make it possible for further IR-level optimization. This might make the compiler IRs different across platforms like what it is in current PR. Hence, the compiler change might not be so clean. Does this make sense to you? > > 3) Further compiler optimization is necessary to optimize out in-efficient instructions. This needs the combination of IR transformation and match rules. I think this might be more complex, an... @XiaohongGong would it help if `loadWithMap` accepted a `part` number, identifying what part of the mask to use and identifying the part where the returned elements will be located, such that the returned vectors can be easily composed with logical or (as if merging). ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3403067724 From mdoerr at openjdk.org Tue Oct 14 18:25:07 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Oct 2025 18:25:07 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters In-Reply-To: References: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> Message-ID: On Tue, 14 Oct 2025 09:48:51 GMT, Emanuel Peter wrote: >> Disabling the test for Power8 (see JBS issue). > > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 40: > >> 38: * @requires (os.arch != "riscv64" & os.arch != "ppc64" & os.arch != "ppc64le") | >> 39: * (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*") | >> 40: * ((os.arch == "ppc64" | os.arch == "ppc64le") & vm.cpu.features ~= ".*darn.*") > > Drive-by comment: > > This is getting more convoluted now. I think it would make sense to at least document why we are skipping it for all the platforms. > > You should also state the reason for the change in the PR description as well as on JIRA ;) I have an alternative solution: https://github.com/openjdk/jdk/pull/27805 Please take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27749#discussion_r2430065130 From mdoerr at openjdk.org Tue Oct 14 18:29:05 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Oct 2025 18:29:05 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters Message-ID: The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue). This is an alternative to https://github.com/openjdk/jdk/pull/27749. ------------- Commit messages: - 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters Changes: https://git.openjdk.org/jdk/pull/27805/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27805&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369511 Stats: 22 lines in 2 files changed: 21 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27805.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27805/head:pull/27805 PR: https://git.openjdk.org/jdk/pull/27805 From duke at openjdk.org Tue Oct 14 19:40:37 2025 From: duke at openjdk.org (Shawn M Emery) Date: Tue, 14 Oct 2025 19:40:37 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 Message-ID: General: ----------- i) This work is to replace the existing AES cipher under the Cryptix license. ii) The lookup tables are employed for performance, but also for operating in constant time. iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. Correctness: ----------------- The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass -intrinsics mode for: ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures iv) jdk_security_infra: passed, with 48 known failures v) tier1 and tier2: all 110257 tests pass Security: ----------- In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. Performance: ------------------ All AES related benchmarks have been executed against the new and original Cryptix code: micro:org.openjdk.bench.javax.crypto.AES micro:org.openjdk.bench.javax.crypto.full.AESBench micro:org.openjdk.bench.javax.crypto.full.AESExtraBench micro:org.openjdk.bench.javax.crypto.full.AESGCMBench micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: i) Default (no JVM options, non-intrinsics) mode: a) Encryption: the new code performed better for both architectures tested (x86: +7.6%, arm64: +3.5%) Analysis: the new code makes some optimizations in the last cipher round with the T2 lookup table that Cryptix may not, hence the better performance in non-intrinsics. b) Decryption: the new code performed mixed between architectures tested (x86: +8.3%, arm64: -1.1%) Analysis: the new code performs predominately better, except for decryption on arm64, which we believe is negligible and acceptable to have noticeably better performance with x86 decryption. ii) Default (no JVM options, intrinsics) mode: a) Encryption and Decryption: as expected, both the new code and Cryptix code performed similarly (within the error margins) Analysis: this is from the fact that the intrinsics related code has not changed with the new changes. iii) Interpreted-only (-Xint) mode: a) Encryption: the new code performed better than the Cryptix code for both architectures (x86: +0.6%, arm64: +6.0%). b) Decryption: the new code performed slightly worse than the Cryptix code for both architectures (x86: -3.3%, arm64: -2.4%). Analysis: the design of the new code was focused on instruction efficiency; eliminating unnecessary index variables, rolling out the rounds loop, and using no objects for round and inverse round transforms. This is especially noticeable in arm64 encryption, but we believe that decryption's slight drop in performance is negligible. iv) JIT compiler (-Xcomp) mode: a) Encryption: in this mode, performance is mixed performant between the two architectures tested (x86: +11.7%, arm64: +1.5%). b) Decryption: performance is decreases for both of the architectures tested (x86: -4.9%, arm64: -3.2%). Analysis: As with the no options results, we believe that the increase in performance for both architectures in encryption is most likely from the T2 gadgetry that we've implemented. We believe that the slight performance drop in decryption is negligible. In any case, the -Xcomp option is primarily used for debugging purposes, ergo we are not as concerned about this slight drop. Resource utilization: ---------------------------- The new AES code uses similar resources to that of the existing Cryptix code. Memory allocation has the following characteristics: i) Peak allocation for both Cryptix and the new code is only a fraction of a percentage point different for both the 1 cipher object and 10 cipher objects test. Analysis: We believe that this is negligible given the difference in the 20ms to 50ms window of peak allocation. ii) Total GC pause for Cryptix and the new code only differs by less than 5% for both the 1 object and 10 objects test. Analysis: This is acceptable behavior given that the benchmark performance for the new code is better overall. iii) Peak pre-GC allocation for Cryptix and the new code is only a fraction of a percent more for the new code in the 1 object case and is only 2% more for the 10 objects case. Analysis: These differences indicate ~500 bytes per object discrepancy between the Cryptix and new code, which is also negligible. ------------- Commit messages: - Implement the rest of the class name changes in intrinsics - 8326609: New AES implementation with updates specified in FIPS 197 Changes: https://git.openjdk.org/jdk/pull/27807/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326609 Stats: 2989 lines in 5 files changed: 1506 ins; 1473 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Tue Oct 14 19:48:47 2025 From: duke at openjdk.org (Shawn M Emery) Date: Tue, 14 Oct 2025 19:48:47 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v2] In-Reply-To: References: Message-ID: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Add vmIntrinsics.hpp updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/1f742c1f..af0f9c4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From vlivanov at openjdk.org Tue Oct 14 20:07:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 14 Oct 2025 20:07:13 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v2] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 19:48:47 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Add vmIntrinsics.hpp updates src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 43: > 41: * https://www.internationaljournalcorner.com/index.php/ijird_ojs/article/view/134688 > 42: */ > 43: public final class AESCrypt extends SymmetricCipher { Should the class be named `AES_Crypt` instead? src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1408: > 1406: */ > 1407: public void encryptBlock(byte[] plain, int pOff, byte[] cipher, int cOff) { > 1408: implEncryptBlock(plain, pOff, cipher, cOff); There are no bounds checks around intrinsic methods. Previous implementation has a comment stating that the checks are placed in caller code (for performance reasons) and declared the methods package-private. It makes sense either to introduce bounds checks here or keep the wrappers package-private. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2430292341 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2430306371 From dlong at openjdk.org Tue Oct 14 21:47:01 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 14 Oct 2025 21:47:01 GMT Subject: RFR: 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 11:48:42 GMT, David Briemann wrote: > No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler and only observers the thread its running on. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27768#pullrequestreview-3337614439 From duke at openjdk.org Tue Oct 14 22:33:40 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 14 Oct 2025 22:33:40 GMT Subject: RFR: 8369642: [ubsan] nmethod::nmethod null pointer passed as argument 2 to memcpy [v2] In-Reply-To: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> References: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> Message-ID: > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced a counter so that the nmethod immutable data can be shared between relocated nmethods to eliminate an unnecessary copy. The counter is aligned in memory so that must be taken into account when calculating the amount of memory used by the counter Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Add reference counter offset ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27778/files - new: https://git.openjdk.org/jdk/pull/27778/files/64e9be35..2827dca8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27778&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27778&range=00-01 Stats: 30 lines in 2 files changed: 14 ins; 10 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/27778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27778/head:pull/27778 PR: https://git.openjdk.org/jdk/pull/27778 From duke at openjdk.org Tue Oct 14 22:33:40 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 14 Oct 2025 22:33:40 GMT Subject: RFR: 8369642: [ubsan] nmethod::nmethod null pointer passed as argument 2 to memcpy [v2] In-Reply-To: <5ToAoT20_ERusvlz4ZgrGs55kQbb-nCAhYzi5wgU63c=.d7418fcd-0ffb-4657-898a-bb14c018e601@github.com> References: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> <5ToAoT20_ERusvlz4ZgrGs55kQbb-nCAhYzi5wgU63c=.d7418fcd-0ffb-4657-898a-bb14c018e601@github.com> Message-ID: On Tue, 14 Oct 2025 16:20:16 GMT, Vladimir Kozlov wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Add reference counter offset > > This is annoying. In all places `ImmutableDataReferencesCounterSize` is referenced we have `align_up(ImmutableDataReferencesCounterSize, oopSize)`. > > May be we should `#define ImmutableDataReferencesCounterSize oopSize` with comment that we only use 4 bytes for now. We have getter/setter methods which cast to (int*) anyway. @vnkozlov What do you think about this change? I think we should just treat the reference counter the same as we treat the other immutable data fields. I think the addition of an offset int in the nmethod is worth it to make the code readable and consistent ------------- PR Comment: https://git.openjdk.org/jdk/pull/27778#issuecomment-3403821706 From duke at openjdk.org Tue Oct 14 23:44:12 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 14 Oct 2025 23:44:12 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 Message-ID: [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC ------------- Commit messages: - Fix tests Changes: https://git.openjdk.org/jdk/pull/27659/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27659&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369147 Stats: 37 lines in 5 files changed: 1 ins; 21 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/27659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27659/head:pull/27659 PR: https://git.openjdk.org/jdk/pull/27659 From duke at openjdk.org Tue Oct 14 23:44:13 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 14 Oct 2025 23:44:13 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 In-Reply-To: References: Message-ID: On Mon, 6 Oct 2025 20:13:46 GMT, Chad Rakoczy wrote: > [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) > > Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) > > `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` > > `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions > > `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC @vnkozlov Could you provide your configure and more test output for https://bugs.openjdk.org/browse/JDK-8369150 I'm not able to reproduce ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3373861196 From duke at openjdk.org Tue Oct 14 23:47:00 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 14 Oct 2025 23:47:00 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 In-Reply-To: References: Message-ID: <-Lv9OC8lY8jwMPZ0HxA1DUWoa611YLmOKM7JQmv0mpc=.607ec490-a9af-45b2-9a23-aa64a82d40c0@github.com> On Mon, 6 Oct 2025 20:13:46 GMT, Chad Rakoczy wrote: > [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) > > Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) > > `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` > > `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions > > `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC The following occurs when running DeoptimizeRelocatedNMethod on PPC64 # Internal Error (jdk/src/hotspot/cpu/ppc/nativeInst_ppc.cpp:405) # assert(!decode(i1, i2)) failed: already patched Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1701784] NativePostCallNop::patch(int, int)+0xf4 (nativeInst_ppc.cpp:405) V [libjvm.so+0x1718414] nmethod::finalize_relocations()+0x6f4 (nmethod.cpp:2059) V [libjvm.so+0x171891c] nmethod::post_init()+0x5c (nmethod.cpp:1252) V [libjvm.so+0x171a8dc] nmethod::relocate(CodeBlobType)+0x1ec (nmethod.cpp:1515) V [libjvm.so+0x200b598] WB_RelocateNMethodFromMethod+0x388 (whitebox.cpp:1653) j jdk.test.whitebox.WhiteBox.relocateNMethodFromMethod0(Ljava/lang/reflect/Executable;I)V+0 j jdk.test.whitebox.WhiteBox.relocateNMethodFromMethod(Ljava/lang/reflect/Executable;I)V+8 j compiler.whitebox.DeoptimizeRelocatedNMethod.main([Ljava/lang/String;)V+50 @TheRealMDoerr @reinrich Do you have any ideas on a solution for this? I don't have any experience working with PPC so guidance would be greatly appreciated ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3403992989 From xgong at openjdk.org Wed Oct 15 01:32:21 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 15 Oct 2025 01:32:21 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 07:04:28 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Add more comments for IRs and added method >> - Merge branch 'jdk:master' into JDK-8351623-sve >> - Merge 'jdk:master' into JDK-8351623-sve >> - Address review comments >> - Refine IR pattern and clean backend rules >> - Fix indentation issue and move the helper matcher method to header files >> - Merge branch jdk:master into JDK-8351623-sve >> - 8351623: VectorAPI: Add SVE implementation of subword gather load operation > > Hi @iwanowww , @PaulSandoz , @eme64 , > > Hope you?re doing well! > > I?ve created a prototype that moves the implementation to the Java API level, as suggested (see: https://github.com/XiaohongGong/jdk/pull/8). This refactoring has resulted in significantly cleaner and more maintainable code. Thanks for your insightful feedback @iwanowww ! > > However, it also introduces new issues that we have to consider. The codegen might **not be optimal**. If we want to generate the optimal instruction sequence, we need more effort. > > Following is the details: > > 1) We need a new API to cross-lane shift the lanes for a vector mask, which is used to extract different piece of a vector mask if the whole gather operation needs to be split. Consider it has a `Vector.slice()` API which can implement such a function, I added a similar one for `VectorMask`. > > There are two new issues that I need to address for this API: > - SVE lacks a native instruction for such a mask operation. I have to convert it to a vector, call the Vector.slice(), and then convert back to a mask. Please note that the whole progress is **not SVE friendly**. The performance of such an API will have large gap on SVE compared with other arches. > - To generate a SVE optimal instruction, I have to do further IR transformation and optimize the pattern with match rule. I'm not sure whether the optimization will be common enough to be accepted in future. > > Do you have a better idea on the new added API? I'd like to avoid adding such a performance not friendly API, and the API might not be frequently used in real world. > > 2) To make the interface uniform across-platforms, each API is defined as the same vector type of the target result, although we need to do separation and merging. However, as the SVE gather-load instruction works with int vector type, we need special handling in compiler IR-level. > > I'd like to extend `LoadVectorGather{,Masked}` with `mem_bt` to handle subword loads, adjust mask with cast/resize before and append vector cast/reinterpret after. Splitting into simple IRs make it possible for further IR-level optimization. This might make the compiler IRs different across platforms like what it is in current PR. Hence, the compiler change might not be so clean. Does this make sense to you? > > 3) Further compiler optimization is necessary to optimize out in-efficient instructions. This needs the combination of IR transformation and match rules. I think this might be more complex, an... > @XiaohongGong would it help if `loadWithMap` accepted a `part` number, identifying what part of the mask to use and identifying the part where the returned elements will be located, such that the returned vectors can be easily composed with logical or (as if merging). Thanks for you input @PaulSandoz ! Yes, I think passing a `part` to hotspot would be helpful. But that would move the cross-lane shift operation for a vector&mask to VM intrinsic part. This is more convenient for compiler optimization. But seems this will be a composition of java and VM intrinsic co-work, which makes sense to me. WDYT @iwanowww ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3404189711 From duke at openjdk.org Wed Oct 15 01:37:07 2025 From: duke at openjdk.org (erifan) Date: Wed, 15 Oct 2025 01:37:07 GMT Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of vector compress [v5] In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 06:24:22 GMT, erifan wrote: >> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions. >> >> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects: >> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction. >> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid. >> >> This pull request introduces the following changes: >> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput. >> 2. Eliminates unnecessary compress operations for partial subword type cases. >> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure. >> >> Benchmark results demonstrate that these changes significantly improve performance. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 >> Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 >> Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 >> Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 >> >> >> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Enable the IR test for x86 > - Merge branch 'master' into JDK-8366333-compress > - Improve coding style a bit > - Improve some code style > - Merge branch 'master' into JDK-8366333-compress > - Merge branch 'master' into JDK-8366333-compress > - 8366333: AArch64: Enhance SVE subword type implementation of vector compress > > The AArch64 SVE and SVE2 architectures lack an instruction suitable for > subword-type `compress` operations. Therefore, the current implementation > uses the 32-bit SVE `compact` instruction to compress subword types by > first widening the high and low parts to 32 bits, compressing them, and > then narrowing them back to their original type. Finally, the high and > low parts are merged using the `index + tbl` instructions. > > This approach is significantly slower compared to architectures with native > support. After evaluating all available AArch64 SVE instructions and > experimenting with various implementations?such as looping over the active > elements, extraction, and insertion?I confirmed that the existing algorithm > is optimal given the instruction set. However, there is still room for > optimization in the following two aspects: > 1. Merging with `index + tbl` is suboptimal due to the high latency of > the `index` instruction. > 2. For partial subword types, operations to the highest half are unnecessary > because those bits are invalid. > > This pull request introduces the following changes: > 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which > offer lower latency and higher throughput. > 2. Eliminates unnecessary compress operations for partial subword type cases. > 3. For `sve_compress_byte`, one less temporary register is used to alleviate > potential register pressure. > > Benchmark results demonstrate that these changes significantly improve performance. > > Benchmarks on Nvidia Grace machine with 128-bit SVE: > ``` > Benchmark Unit Before Error After Error Uplift > Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 > Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 > Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 > Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 > ``` > > This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, > and all... Hi, can I integrate this patch now? Could any Oracle friends help me with internal testing of this patch? Thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/27188#issuecomment-3404196622 From duke at openjdk.org Wed Oct 15 01:39:04 2025 From: duke at openjdk.org (erifan) Date: Wed, 15 Oct 2025 01:39:04 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: <8dJgnBxVjMaJwehfhoaZI_ioZhg_MNG1cKrVCINtyrI=.9a8a89d8-3c7d-49f4-ba94-a63b5f9951b7@github.com> On Thu, 9 Oct 2025 07:13:26 GMT, erifan wrote: >> The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes. >> >> This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove JBS number 8368205 from VectorMaskCompareNotTest.java > - Merge branch 'master' into JDK-8368205-VectorMaskCompareNotTest-failure > - 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 > > The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** > is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. > Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which > is an illegal value because the minimum vector size requirement is 8 bytes. > > This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` > is set to 16 bytes or higher. Hi, could anyone help take a look at this PR, it's a simple test bug fix, thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/27418#issuecomment-3404199640 From dlong at openjdk.org Wed Oct 15 02:04:07 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 15 Oct 2025 02:04:07 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 07:13:26 GMT, erifan wrote: >> The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes. >> >> This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove JBS number 8368205 from VectorMaskCompareNotTest.java > - Merge branch 'master' into JDK-8368205-VectorMaskCompareNotTest-failure > - 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 > > The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** > is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. > Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which > is an illegal value because the minimum vector size requirement is 8 bytes. > > This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` > is set to 16 bytes or higher. Isn't this a duplicate of https://github.com/openjdk/jdk/pull/27805? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27418#issuecomment-3404241119 From duke at openjdk.org Wed Oct 15 02:30:02 2025 From: duke at openjdk.org (erifan) Date: Wed, 15 Oct 2025 02:30:02 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 07:13:26 GMT, erifan wrote: >> The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes. >> >> This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove JBS number 8368205 from VectorMaskCompareNotTest.java > - Merge branch 'master' into JDK-8368205-VectorMaskCompareNotTest-failure > - 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 > > The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** > is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. > Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which > is an illegal value because the minimum vector size requirement is 8 bytes. > > This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` > is set to 16 bytes or higher. > Isn't this a duplicate of #27805? I have filed the PR 3 weeks ago, I think https://github.com/openjdk/jdk/pull/27805 is a duplicate of this PR. Hi @TheRealMDoerr , could you please help me test whether this PR can fix the test failure on PPC64? I don't have a PPC64 environment. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27418#issuecomment-3404281221 From qxing at openjdk.org Wed Oct 15 03:29:47 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 15 Oct 2025 03:29:47 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v6] In-Reply-To: References: Message-ID: <44vNo-Ozy0bHV5km3Pl3N1dvQjSWInOj623LxF0WRq0=.f3724cdd-7200-494a-a7c9-dc18d14b84e8@github.com> > In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. > > loop-safepoint > > This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). > > > Benchmark Mode Cnt Score Error Units > LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline > LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch > > > Testing: tier1-2 on x86_64 and aarch64. Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: Fix comments of `check_safepts` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23057/files - new: https://git.openjdk.org/jdk/pull/23057/files/b52f7ba1..9b2bd6e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23057/head:pull/23057 PR: https://git.openjdk.org/jdk/pull/23057 From duke at openjdk.org Wed Oct 15 05:28:31 2025 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 15 Oct 2025 05:28:31 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v3] In-Reply-To: References: Message-ID: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with two additional commits since the last revision: - encryptBlock/decryptBlock methods set to package-private - Revert AESCrypt to AES_Crypt ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/af0f9c4c..07003719 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Wed Oct 15 05:28:34 2025 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 15 Oct 2025 05:28:34 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v2] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 19:59:04 GMT, Vladimir Ivanov wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Add vmIntrinsics.hpp updates > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 43: > >> 41: * https://www.internationaljournalcorner.com/index.php/ijird_ojs/article/view/134688 >> 42: */ >> 43: public final class AESCrypt extends SymmetricCipher { > > Should the class be named `AES_Crypt` instead? Yes, you're right. I'm not sure how it reverted back to AESCrypt. Fixed. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1408: > >> 1406: */ >> 1407: public void encryptBlock(byte[] plain, int pOff, byte[] cipher, int cOff) { >> 1408: implEncryptBlock(plain, pOff, cipher, cOff); > > There are no bounds checks around intrinsic methods. Previous implementation has a comment stating that the checks are placed in caller code (for performance reasons) and declared the methods package-private. It makes sense either to introduce bounds checks here or keep the wrappers package-private. Good catch, I will leave it as package-private then. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2431157744 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2431158083 From epeter at openjdk.org Wed Oct 15 05:46:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 05:46:01 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 18:21:45 GMT, Martin Doerr wrote: > The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue). This is an alternative to https://github.com/openjdk/jdk/pull/27749. @TheRealMDoerr I think I prefer this solution ? Though I wonder if we need to add some kind of label for the `jtreg` changes? I'm not familiar with the code there. But I can surely approve the test changes :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27805#issuecomment-3404605353 From duke at openjdk.org Wed Oct 15 05:51:40 2025 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 15 Oct 2025 05:51:40 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v4] In-Reply-To: References: Message-ID: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Add remaining files to be staged ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/07003719..3fc25aef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=02-03 Stats: 30 lines in 12 files changed: 0 ins; 0 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From chagedorn at openjdk.org Wed Oct 15 05:57:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 15 Oct 2025 05:57:08 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v3] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 15:15:56 GMT, Roland Westrelin wrote: >> In: >> >> >> for (int i = 100; i < 1100; i++) { >> v += floatArray[i - 100]; >> Objects.checkIndex(i, longRange); >> } >> >> >> The int counted loop has both an int range check and a long range. The >> int range check is optimized first. Assertion predicates are inserted >> above the loop. One predicates checks that: >> >> >> init - 100 > >> >> The loop is then transformed to enable the optimization of the long >> range check. The loop is short running, so there's no need to create a >> loop nest. The counted loop is mostly left as is but, the loop's >> bounds are changed from: >> >> >> for (int i = 100; i < 1100; i++) { >> >> >> to: >> >> >> for (int i = 0; i < 1000; i++) { >> >> >> The reason for that the long range check transformation expects the >> loop to start at 0. >> >> Pre/main/post loops are created. Template Assertion predicates are >> added above the main loop. The loop is unrolled. Initialized assertion >> predicates are created. The one created from the condition: >> >> >> init - 100 > >> >> checks the value of `i` out of the pre loop which is 1. That check fails. >> >> The root cause of the failure is that when bounds of the counted loop >> are changed, template assertion predicates need to be updated with and >> adjusted init input. >> >> When the bounds of the loop are known, the assertion predicates can be >> updated in place. Otherwise, when the loop is speculated to be short >> running, the assertion predicates are updated when they are cloned. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8366888 > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - whitespaces > - fix Update looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27250#pullrequestreview-3338461960 From rrich at openjdk.org Wed Oct 15 06:47:03 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 Oct 2025 06:47:03 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 18:21:45 GMT, Martin Doerr wrote: > The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue). This is an alternative to https://github.com/openjdk/jdk/pull/27749. I think this is a good and easy to understand solution. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27805#pullrequestreview-3338618569 From rrich at openjdk.org Wed Oct 15 06:55:02 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 Oct 2025 06:55:02 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters In-Reply-To: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> References: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> Message-ID: On Fri, 10 Oct 2025 16:27:30 GMT, Martin Doerr wrote: > Disabling the test for Power8 (see JBS issue). Witzig: wie gestern erz?hlt, hatte ich genau das versucht. Allerdings habe ich MaxVectorSize in die Liste der Boolean-Flags aufgenommen und in der langen JTR Ausgabe ?bersehen, dass es deshalb nicht ging. Gr??e, Richard. From: Martin ***@***.***> Date: Tuesday, 14. October 2025 at 20:22 To: openjdk/jdk ***@***.***> Cc: Reingruber, Richard ***@***.***>, Mention ***@***.***> Subject: Re: [openjdk/jdk] 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters (PR #27749) @TheRealMDoerr commented on this pull request. ________________________________ In test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java: > + * @requires (os.arch != "riscv64" & os.arch != "ppc64" & os.arch != "ppc64le") | + * (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*") | + * ((os.arch == "ppc64" | os.arch == "ppc64le") & vm.cpu.features ~= ".*darn.*") I have an alternative solution: #27805 Please take a look. ? Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***> ------------- PR Comment: https://git.openjdk.org/jdk/pull/27749#issuecomment-3404836747 From rrich at openjdk.org Wed Oct 15 07:03:37 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 Oct 2025 07:03:37 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 In-Reply-To: <-Lv9OC8lY8jwMPZ0HxA1DUWoa611YLmOKM7JQmv0mpc=.607ec490-a9af-45b2-9a23-aa64a82d40c0@github.com> References: <-Lv9OC8lY8jwMPZ0HxA1DUWoa611YLmOKM7JQmv0mpc=.607ec490-a9af-45b2-9a23-aa64a82d40c0@github.com> Message-ID: <_Qq4FsgAKYZ3OOy3TqcUusIS-wwp2r0bVdMRA8P_yQg=.544f2edd-5e99-489b-923e-07d69a2de407@github.com> On Tue, 14 Oct 2025 23:44:39 GMT, Chad Rakoczy wrote: > The following occurs when running DeoptimizeRelocatedNMethod on PPC64 > > ``` > # Internal Error (jdk/src/hotspot/cpu/ppc/nativeInst_ppc.cpp:405) > # assert(!decode(i1, i2)) failed: already patched > > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1701784] NativePostCallNop::patch(int, int)+0xf4 (nativeInst_ppc.cpp:405) > V [libjvm.so+0x1718414] nmethod::finalize_relocations()+0x6f4 (nmethod.cpp:2059) > V [libjvm.so+0x171891c] nmethod::post_init()+0x5c (nmethod.cpp:1252) > V [libjvm.so+0x171a8dc] nmethod::relocate(CodeBlobType)+0x1ec (nmethod.cpp:1515) > V [libjvm.so+0x200b598] WB_RelocateNMethodFromMethod+0x388 (whitebox.cpp:1653) > j jdk.test.whitebox.WhiteBox.relocateNMethodFromMethod0(Ljava/lang/reflect/Executable;I)V+0 > j jdk.test.whitebox.WhiteBox.relocateNMethodFromMethod(Ljava/lang/reflect/Executable;I)V+8 > j compiler.whitebox.DeoptimizeRelocatedNMethod.main([Ljava/lang/String;)V+50 > ``` > > @TheRealMDoerr @reinrich Do you have any ideas on a solution for this? I don't have any experience working with PPC so guidance would be greatly appreciated It's fixed with https://bugs.openjdk.org/browse/JDK-8369257 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3404874700 From dbriemann at openjdk.org Wed Oct 15 07:43:21 2025 From: dbriemann at openjdk.org (David Briemann) Date: Wed, 15 Oct 2025 07:43:21 GMT Subject: RFR: 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers [v2] In-Reply-To: References: Message-ID: > No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler and only observers the thread its running on. David Briemann has updated the pull request incrementally with one additional commit since the last revision: remove whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27768/files - new: https://git.openjdk.org/jdk/pull/27768/files/cb5bbe38..251763d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27768&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27768&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27768.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27768/head:pull/27768 PR: https://git.openjdk.org/jdk/pull/27768 From qamai at openjdk.org Wed Oct 15 08:16:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 08:16:14 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> References: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> Message-ID: On Mon, 13 Oct 2025 12:23:13 GMT, Emanuel Peter wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > src/hotspot/share/opto/rangeinference.hpp line 96: > >> 94: KnownBits _bits; >> 95: >> 96: private: > > Did you mean to drop the `private:` here? It also makes other things below public now... Yes, we have a ton of ad-hoc `friend` declarations below, and I would need more for this patch. In addition, this prototype class is not meant to be used widely anyway, so it is better to just make it `public`. > src/hotspot/share/opto/rangeinference.hpp line 152: > >> 150: >> 151: template >> 152: static bool int_type_is_equal(const CTP t1, const CTP t2) { > > Out of curiosity: why the change `CT*` -> `CTP`? It is so that we can use this for `TypeIntMirror`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431558521 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431559665 From epeter at openjdk.org Wed Oct 15 08:17:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 08:17:19 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 18:21:45 GMT, Martin Doerr wrote: > The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue). This is an alternative to https://github.com/openjdk/jdk/pull/27749. @TheRealMDoerr I'll approve after some internal sanity testing :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27805#issuecomment-3405123517 From epeter at openjdk.org Wed Oct 15 08:17:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 08:17:53 GMT Subject: RFR: 8369804: TestGenerators.java fails with IllegalArgumentException: bound must be greater than origin Message-ID: We sample two floats, assuming we would get two different results. But ever so rarely, we get the same values, and the test fails. So now, I sample with retry. And also improve the error reporting, throwing an exception on generator construction rather than sampling from the generator. ------------- Commit messages: - JDK-8369804 Changes: https://git.openjdk.org/jdk/pull/27816/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27816&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369804 Stats: 18 lines in 3 files changed: 12 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/27816.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27816/head:pull/27816 PR: https://git.openjdk.org/jdk/pull/27816 From qamai at openjdk.org Wed Oct 15 08:22:19 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 08:22:19 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> Message-ID: On Mon, 13 Oct 2025 12:30:11 GMT, Emanuel Peter wrote: >> Hmm, we also have the third class `TypeIntPrototype`. Do you think we really need all 3 classes? > > Ah, I suppose `TypeIntMirror` is always canonicalized from `TypeIntPrototype`, in the constructor? Yah `TypeIntPrototype` is a prototype and can contain arbitrary field values while `TypeInt` and `TypeIntMirror` are canonicalized. It is sad that `TypeInt` is heavily coupled with the `Type` infrastructure, so making a mirror that does not suffer from one is necessary. I tried messing around with the `Type` allocator but it seems error-prone since we have several `TypeInt` instances coming from a different allocator (`TypeInt::ZERO` for example). As a result, having a separate type prevents these errors (`TypeIntMirror::INT` won't compile). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431569822 From qamai at openjdk.org Wed Oct 15 08:22:22 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 08:22:22 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> References: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> Message-ID: On Mon, 13 Oct 2025 12:29:01 GMT, Emanuel Peter wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > src/hotspot/share/opto/rangeinference.hpp line 199: > >> 197: S _hi; >> 198: U _ulo; >> 199: U _uhi; > > Why not use `RangeInt`? These are the mirrors of `TypeInt` and `TypeLong` so they need to be structurally similar to `TypeInt` and `TypeLong`. > src/hotspot/share/opto/rangeinference.hpp line 218: > >> 216: bool contains(U u) const; >> 217: bool contains(const TypeIntMirror& o) const; >> 218: bool operator==(const TypeIntMirror& o) const; > > Could we limit this to `DEBUG_ONLY`? Maybe, it disables these gtest in product builds, however. What do you think? > src/hotspot/share/opto/rangeinference.hpp line 221: > >> 219: >> 220: template >> 221: TypeIntMirror cast() const; > > Can you explain what this casting method is for? This is to mimic the behaviour when we do `t1->meet(t2)->cast()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431572634 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431574484 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431578081 From dlunden at openjdk.org Wed Oct 15 08:25:34 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 15 Oct 2025 08:25:34 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns Message-ID: The documentation of the signature part of method patterns in `java -XX:CompileCommand=help` is missing. We should add it. ### Changeset - Improve the documentation of signatures in `java -XX:CompileCommand=help`. ### Testing - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500483034) - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. ------------- Commit messages: - Fix issue Changes: https://git.openjdk.org/jdk/pull/27818/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27818&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369573 Stats: 38 lines in 1 file changed: 30 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27818/head:pull/27818 PR: https://git.openjdk.org/jdk/pull/27818 From qamai at openjdk.org Wed Oct 15 08:32:53 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 08:32:53 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> References: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> Message-ID: On Mon, 13 Oct 2025 12:40:45 GMT, Emanuel Peter wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > src/hotspot/share/opto/rangeinference.hpp line 230: > >> 228: // TypeLong*, or they can be TypeIntMirror which behave similar to TypeInt* and TypeLong* during >> 229: // testing. This allows us to verify the correctness of the implementation without coupling with >> 230: // the hotspot compiler allocation infrastructure. > > This sounds a bit like a hack, but maybe a currently necessary one. But it sounds like we are passing something different in the production code vs in gtest testing code, and that's not ideal. > > I suppose an alternative would be to always do the transition from `TypeInt` -> `TypeIntMirror`, before passing it into `RangeInference`. Would that be too much overhead, or have other downsides? I suppose an issue with that is how do you get back a `TypeInt` at the end... yeah not ideal. So maybe your hack is required. > > It would have been nice if we could just compose `TypeIntMirror` inside `TypeInt`, but maybe even that does not solve the whole problem. > > What do you think? In the strict sense, what is passed in product code and what is passed in gtest will never be the same. This is because `TypeInt` is the set of 32-bit integral values, while we do testing on 3-bit integral values. However, with templates, we can be much more confident since we know that the code being executed for `intn_t<3>` and `jint` is the same one, just specialized with different template parameters. With this approach, I believe we achieve the best similarity between what is executed and what is tested. > src/hotspot/share/opto/rangeinference.hpp line 324: > >> 322: static CTP infer_binary(CTP t1, CTP t2, Inference infer) { >> 323: CTP res; >> 324: bool init = false; > > `init` confused me at first. I intuitively read it as `please_initialize_me`, or the imperative `initialize`! But of course you meant `is_initialized`, right? I would use a longer name to be explicit ;) You are right, I renamed it to `is_init`. > test/hotspot/gtest/opto/test_rangeinference.cpp line 250: > >> 248: static_assert(std::is_same_v); >> 249: return *this; >> 250: } > > We now re-implement these from `TypeIntHelper::int_type_xmeet`. I wonder if we could not at least share some code. Not sure if that is worth it. But having this kind of code duplication opens the risk of divergence and hence bugs. That's a good idea. I implemented `TypeIntHelper::int_type_union` to do the union. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431593528 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431600394 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431606941 From qamai at openjdk.org Wed Oct 15 08:32:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 08:32:55 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 13:27:18 GMT, Emanuel Peter wrote: >> test/hotspot/gtest/opto/test_rangeinference.cpp line 277: >> >>> 275: return 1732; >>> 276: } >>> 277: } >> >> What do the numbers mean here? I'm lost :/ > > Ah, this is the number of instances for a type! Makes sense. How did you get those numbers, how do we know they are right? I just manually calculated them and recorded the value here. The correctness is validated when we try to initialize the set, which then assert that the size of the set is the same as the value presented here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431610657 From bmaillard at openjdk.org Wed Oct 15 08:33:24 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 15 Oct 2025 08:33:24 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v7] In-Reply-To: References: Message-ID: > This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. > > ### Analysis > > This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced > and added to this PR as a regression test. > > The test contains a switch inside a loop, and stressing the loop peeling results in > a fairly complex graph. The split-if optimization is applied agressively, and we > run a verification pass at every progress made. > > We end up with a relatively high number of verification passes, with each pass being > fairly expensive because of the size of the graph. > Each verification pass requires building a new `IdealLoopTree`. This is quite slow > (which is unfortunately hard to mitigate), and also causes inefficient memory usage > on the `ciEnv` arena. > > The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. > At every call, we have > - One allocation on the `ciEnv` arena to store the returned `ciField` > - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: > - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) > - Pushes the new symbol to the `_symbols` array > > The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to > check if the `BasicType` of a static field is a reference type. > > In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols > (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called > repeatedly as it is done here. > > The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: > > > ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 > TypeOopPtr::TypeOopPtr type.cpp:3484 > TypeInstPtr::TypeInstPtr type.cpp:3953 > TypeInstPtr::make type.cpp:3990 > TypeInstPtr::add_offset type.cpp:4509 > AddPNode::bottom_type addnode.cpp:696 > MemNode::adr_type memnode.cpp:73 > PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 > PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 > PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827 > PhaseIdealLoop::build_loop_late_post loopnode.cpp:67... Beno?t Maillard has updated the pull request incrementally with three additional commits since the last revision: - Add check for expected NPE - Replace test body with reduced, faster version - Add memlimit constraint to the test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27731/files - new: https://git.openjdk.org/jdk/pull/27731/files/6c93a873..1491922d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=05-06 Stats: 58 lines in 1 file changed: 3 ins; 35 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/27731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731 PR: https://git.openjdk.org/jdk/pull/27731 From bmaillard at openjdk.org Wed Oct 15 08:33:26 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 15 Oct 2025 08:33:26 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 09:43:31 GMT, Emanuel Peter wrote: >> Thanks for the suggestion. I would argue that this does not really add value, as this essentially boils down to checking that accessing an uninitialized reference throws a `NullPointerException`, which is not really what this test is about. I would rather keep it specific. > > @benoitmaillard Drive-by comment: if you can add it, I would. Almost all checks add value, even if they don't add value to exactly what you are testing right now. @eme64 alright, done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2431614267 From mdoerr at openjdk.org Wed Oct 15 08:35:02 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Oct 2025 08:35:02 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 18:21:45 GMT, Martin Doerr wrote: > The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue). This is an alternative to https://github.com/openjdk/jdk/pull/27749. @DingliZhang, @RealFYang: Can one of you please check if this is also fine for RISC-V, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27805#issuecomment-3405199891 From qamai at openjdk.org Wed Oct 15 08:41:16 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 08:41:16 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 13:17:42 GMT, Emanuel Peter wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > test/hotspot/gtest/opto/test_rangeinference.cpp line 33: > >> 31: #include >> 32: #include >> 33: #include > > I don't know the current state of code style guide: but are we allowed to use `std::unordered_set`? I can't think of a better way, we have `HashTable` but it is terrible since the table size is fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431640023 From mdoerr at openjdk.org Wed Oct 15 08:48:29 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Oct 2025 08:48:29 GMT Subject: RFR: 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers [v2] In-Reply-To: References: Message-ID: <5UqDw9Jygmw8srTmdOeW2Tf2rlGmA-Tx68WtPC1bzZ0=.d87df934-6c2c-45e8-9e95-286216e680a8@github.com> On Wed, 15 Oct 2025 07:43:21 GMT, David Briemann wrote: >> No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler and only observers the thread its running on. > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > remove whitespace Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27768#pullrequestreview-3339138044 From qxing at openjdk.org Wed Oct 15 08:49:29 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 15 Oct 2025 08:49:29 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: On Tue, 16 Sep 2025 06:09:33 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve documentation comments > > test/hotspot/jtreg/compiler/loopopts/TestRedundantSafepointElimination.java line 66: > >> (failed to retrieve contents of file, check the PR for context) > So these do not end up being CountedLoop? The first one is a counted loop. The second one is not, because it calls `empty` (not inlined) which may modify `loopCount`. Both the two loops should have no safepoints, since the `empty` call always polls the safepoint. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2431669294 From duke at openjdk.org Wed Oct 15 08:51:35 2025 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane Ampudia) Date: Wed, 15 Oct 2025 08:51:35 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:17:46 GMT, Daniel Lund?n wrote: > The documentation of the signature part of method patterns in `java -XX:CompileCommand=help` is missing. We should add it. > > ### Changeset > - Improve the documentation of signatures in `java -XX:CompileCommand=help`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500483034) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Nice documentation! LGTM As a side note, looking at your line splits I got curious whether we are actively enforcing an 80-char limit in some output or not (the last two "paragraphs" exceed this size, although they've been there from before) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27818#issuecomment-3405268817 From qamai at openjdk.org Wed Oct 15 08:52:37 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 08:52:37 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 13:30:39 GMT, Emanuel Peter wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > test/hotspot/gtest/opto/test_rangeinference.cpp line 313: > >> 311: res[idx] = t; >> 312: idx++; >> 313: } > > Not sure if this is possible with `std::array`, but you could do it with `std::vector`: > `std::vector tmp(unordered.begin(), unordered.end());` > > Just an idea, feel free to leave it as is. `std::array` is trivially destructible, which is strongly encouraged for static objects in Hotspot. > test/hotspot/gtest/opto/test_rangeinference.cpp line 370: > >> 368: } >> 369: } >> 370: }; > > Using uniform distribution will make it very unlikely that you get a hit in a narrow long range, right? > Maybe we just have to live with that. Doing something smarter could probably be done (generate in the signed / unsigned bounds, and masking the bits), but there is also a risk: we may generate values that are too narrow by accident / bug... > > What do you think? At least adding some comment here about why we do what we do would be good. That is right, we can partially avoid this by exercising the random tests with narrow integral types. For wide types such as `jint` or ``jlong`, it is however unclear which distribution would be the best. > test/hotspot/gtest/opto/test_rangeinference.cpp line 449: > >> 447: if (all_instances().size() < 100) { >> 448: // This effectively covers the cases up to uintn_t<2> >> 449: test_binary_instance_monotonicity_exhaustive(infer, input1, input2); > > Wow, that's really not much. It's really only a "sign" bit and one "mantissa" bit. Would have been nice if we could have handled at least 3 bits. Is that prohibitively slow? Yes, testing the monotonicity for those is really slow since we traverse the set of all instances multiple times. > test/hotspot/gtest/opto/test_rangeinference.cpp line 524: > >> 522: samples[idx] = TypeIntMirror{canonicalized_t._data._srange._lo, canonicalized_t._data._srange._hi, >> 523: canonicalized_t._data._urange._lo, canonicalized_t._data._urange._hi, >> 524: canonicalized_t._data._bits}; > > What about using a constructor that creates `TypeIntMirror` directly from a `TypeIntPrototype`? Maybe there is a reason that does not work? It is only done here so it is questionable whether making another constructor is beneficial. This is also a testing backdoor since we don't want to create a `TypeIntMirror` with arbitrary field values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431661840 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431671085 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431673278 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2431679428 From mdoerr at openjdk.org Wed Oct 15 08:55:50 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Oct 2025 08:55:50 GMT Subject: Withdrawn: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters In-Reply-To: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> References: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> Message-ID: On Fri, 10 Oct 2025 16:27:30 GMT, Martin Doerr wrote: > Disabling the test for Power8 (see JBS issue). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/27749 From mdoerr at openjdk.org Wed Oct 15 08:55:49 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Oct 2025 08:55:49 GMT Subject: RFR: 8369511: PPC64: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without VectorRegisters In-Reply-To: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> References: <8Jkjo4Y7ZGPV-YOJPw4jdfUTUkyQUVrZLwOverU4ElE=.8cca0973-57a6-4ba2-8e44-87fb43e00150@github.com> Message-ID: <2kYEDgTrl5Ioa9dpED4OhLZfX5KF-1v-nu1aAqFbDzs=.6ecb2dc5-b79a-4103-9f1c-686b3b2ec807@github.com> On Fri, 10 Oct 2025 16:27:30 GMT, Martin Doerr wrote: > Disabling the test for Power8 (see JBS issue). Closing in favor of https://github.com/openjdk/jdk/pull/27805. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27749#issuecomment-3405288115 From bmaillard at openjdk.org Wed Oct 15 08:57:02 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 15 Oct 2025 08:57:02 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v8] In-Reply-To: References: Message-ID: > This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. > > ### Analysis > > This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced > and added to this PR as a regression test. > > The test contains a switch inside a loop, and stressing the loop peeling results in > a fairly complex graph. The split-if optimization is applied agressively, and we > run a verification pass at every progress made. > > We end up with a relatively high number of verification passes, with each pass being > fairly expensive because of the size of the graph. > Each verification pass requires building a new `IdealLoopTree`. This is quite slow > (which is unfortunately hard to mitigate), and also causes inefficient memory usage > on the `ciEnv` arena. > > The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. > At every call, we have > - One allocation on the `ciEnv` arena to store the returned `ciField` > - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: > - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) > - Pushes the new symbol to the `_symbols` array > > The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to > check if the `BasicType` of a static field is a reference type. > > In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols > (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called > repeatedly as it is done here. > > The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: > > > ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 > TypeOopPtr::TypeOopPtr type.cpp:3484 > TypeInstPtr::TypeInstPtr type.cpp:3953 > TypeInstPtr::make type.cpp:3990 > TypeInstPtr::add_offset type.cpp:4509 > AddPNode::bottom_type addnode.cpp:696 > MemNode::adr_type memnode.cpp:73 > PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 > PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 > PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827 > PhaseIdealLoop::build_loop_late_post loopnode.cpp:67... Beno?t Maillard has updated the pull request incrementally with four additional commits since the last revision: - Add run without fixed stress seed - Reorder flags - Remove unnecessary CompileCommand=dontinline - Change name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27731/files - new: https://git.openjdk.org/jdk/pull/27731/files/1491922d..1f13f874 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=06-07 Stats: 9 lines in 1 file changed: 6 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731 PR: https://git.openjdk.org/jdk/pull/27731 From bmaillard at openjdk.org Wed Oct 15 08:57:04 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 15 Oct 2025 08:57:04 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v2] In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 15:04:36 GMT, Christian Hagedorn wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Add -XX:+UnlockDiagnosticVMOptions > > test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 39: > >> 37: * -XX:-TieredCompilation -Xcomp -XX:CompileCommand=dontinline,*::* >> 38: * -XX:+StressLoopPeeling -XX:PerMethodTrapLimit=0 -XX:+VerifyLoopOptimizations >> 39: * -XX:StressSeed=1870557292 > > I suggest to remove the stress seed since it might not trigger anymore in later builds. Usually, we add a run with a fixed stress seed and one without but since this test requires to do just some verification work, I would suggest to not add two runs but only one without fixed seed. > > Another question: How close are we to hit the default the memory limit with this test? With your fix it probably consumes not much memory anymore. I therefore suggest to add `MemLimit` as additional flag with a much smaller value to be sure that your fix works as expected (you might need to check how low we can choose the limit to not run into problems in higher tiers). I was able to reduce the test further using a memory limit of 100M (approximately 10 times less than the default) and a shorter timeout with `creduce`. Compilation of the new `test` method with a fast debug build now takes an average of `1.22 s` over 100 runs according to `-XX:+CITime`. With the decrease compilation time I think it now reasonable to have two runs (one with the stress seed, one without). Let me know if you think otherwise! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2431693355 From qxing at openjdk.org Wed Oct 15 08:57:33 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 15 Oct 2025 08:57:33 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v16] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: Fix include order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25928/files - new: https://git.openjdk.org/jdk/pull/25928/files/d7ebc8f2..9bb3f7d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=14-15 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From bmaillard at openjdk.org Wed Oct 15 09:01:28 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 15 Oct 2025 09:01:28 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v8] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:57:02 GMT, Beno?t Maillard wrote: >> This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. >> >> ### Analysis >> >> This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced >> and added to this PR as a regression test. >> >> The test contains a switch inside a loop, and stressing the loop peeling results in >> a fairly complex graph. The split-if optimization is applied agressively, and we >> run a verification pass at every progress made. >> >> We end up with a relatively high number of verification passes, with each pass being >> fairly expensive because of the size of the graph. >> Each verification pass requires building a new `IdealLoopTree`. This is quite slow >> (which is unfortunately hard to mitigate), and also causes inefficient memory usage >> on the `ciEnv` arena. >> >> The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. >> At every call, we have >> - One allocation on the `ciEnv` arena to store the returned `ciField` >> - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: >> - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) >> - Pushes the new symbol to the `_symbols` array >> >> The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to >> check if the `BasicType` of a static field is a reference type. >> >> In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols >> (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called >> repeatedly as it is done here. >> >> The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: >> >> >> ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 >> TypeOopPtr::TypeOopPtr type.cpp:3484 >> TypeInstPtr::TypeInstPtr type.cpp:3953 >> TypeInstPtr::make type.cpp:3990 >> TypeInstPtr::add_offset type.cpp:4509 >> AddPNode::bottom_type addnode.cpp:696 >> MemNode::adr_type memnode.cpp:73 >> PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 >> PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 >> PhaseIdealLoop::build_lo... > > Beno?t Maillard has updated the pull request incrementally with four additional commits since the last revision: > > - Add run without fixed stress seed > - Reorder flags > - Remove unnecessary CompileCommand=dontinline > - Change name I have made the following (significant) changes that are ready for review: - Replaced the test method with a further reduced version that now takes a little more than one second compared to ~40s previously - Added a second run without a fixed stress seed (as the compilation is now fast enough) - Added a memory limit of `100M` ------------- PR Comment: https://git.openjdk.org/jdk/pull/27731#issuecomment-3405314464 From mdoerr at openjdk.org Wed Oct 15 09:04:36 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Oct 2025 09:04:36 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 07:13:26 GMT, erifan wrote: >> The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes. >> >> This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove JBS number 8368205 from VectorMaskCompareNotTest.java > - Merge branch 'master' into JDK-8368205-VectorMaskCompareNotTest-failure > - 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 > > The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** > is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. > Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which > is an illegal value because the minimum vector size requirement is 8 bytes. > > This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` > is set to 16 bytes or higher. > > Isn't this a duplicate of #27805? > > I have filed the PR 3 weeks ago, I think #27805 is a duplicate of this PR. > > Hi @TheRealMDoerr , could you please help me test whether this PR can fix the test failure on PPC64? I don't have a PPC64 environment. Thanks! Sorry, I had missed that there's already a JBS issue and PR. Unfortunately, your PR doesn't solve the problem. Can we use mine instead? I can change my PR to fix 8368205 and close my JBS issue as duplicate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27418#issuecomment-3405329634 From qamai at openjdk.org Wed Oct 15 09:06:09 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 09:06:09 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Emanuel's reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27618/files - new: https://git.openjdk.org/jdk/pull/27618/files/46ce95fd..b73850d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=00-01 Stats: 43 lines in 3 files changed: 21 ins; 12 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/27618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27618/head:pull/27618 PR: https://git.openjdk.org/jdk/pull/27618 From epeter at openjdk.org Wed Oct 15 09:11:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 09:11:10 GMT Subject: RFR: 8369881: C2: Unexpected node in SuperWord truncation: ReverseBytesS, ReverseBytesUS Message-ID: The fuzzer found the `ReverseByteS` case, and I checked all other `*.reverseBytes`, and found a failure with `Character.reverseBytes` as well. Adding them to the list, and added tests for both. Note, this is just another 2 boxes checked, there were many similar ones fixed, or on the way: https://github.com/openjdk/jdk/pull/26827 https://github.com/openjdk/jdk/pull/26334 https://github.com/openjdk/jdk/pull/26494 https://github.com/openjdk/jdk/pull/26423 ------------- Commit messages: - fix ReverseBytesUS as well - fix for ReverseBytesS - JDK-8369881 Changes: https://git.openjdk.org/jdk/pull/27819/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27819&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369881 Stats: 54 lines in 2 files changed: 53 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27819.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27819/head:pull/27819 PR: https://git.openjdk.org/jdk/pull/27819 From chagedorn at openjdk.org Wed Oct 15 09:18:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 15 Oct 2025 09:18:00 GMT Subject: RFR: 8369881: C2: Unexpected node in SuperWord truncation: ReverseBytesS, ReverseBytesUS In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:46:23 GMT, Emanuel Peter wrote: > The fuzzer found the `ReverseByteS` case, and I checked all other `*.reverseBytes`, and found a failure with `Character.reverseBytes` as well. > > Adding them to the list, and added tests for both. > > Note, this is just another 2 boxes checked, there were many similar ones fixed, or on the way: > https://github.com/openjdk/jdk/pull/26827 > https://github.com/openjdk/jdk/pull/26334 > https://github.com/openjdk/jdk/pull/26494 > https://github.com/openjdk/jdk/pull/26423 Looks good, thanks for fixing this! test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 355: > 353: > 354: @Test > 355: @IR(counts = { IRNode.STORE_VECTOR, "=0" }) You could use `failOn` instead. Same below. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27819#pullrequestreview-3339283808 PR Review Comment: https://git.openjdk.org/jdk/pull/27819#discussion_r2431780805 From epeter at openjdk.org Wed Oct 15 09:21:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 09:21:49 GMT Subject: RFR: 8369881: C2: Unexpected node in SuperWord truncation: ReverseBytesS, ReverseBytesUS In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:14:56 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 355: >> >>> 353: >>> 354: @Test >>> 355: @IR(counts = { IRNode.STORE_VECTOR, "=0" }) >> >> You could use `failOn` instead. Same below. > > Okay, I see that we use `= 0` in the other tests as well further down. Feel free to ignore. The rest of the test does the same. I think I'll just ignore ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27819#discussion_r2431799538 From thartmann at openjdk.org Wed Oct 15 09:24:55 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 15 Oct 2025 09:24:55 GMT Subject: RFR: 8369804: TestGenerators.java fails with IllegalArgumentException: bound must be greater than origin [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:21:34 GMT, Emanuel Peter wrote: >> We sample two floats, assuming we would get two different results. But ever so rarely, we get the same values, and the test fails. >> >> So now, I sample with retry. And also improve the error reporting, throwing an exception on generator construction rather than sampling from the generator. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review Thanks for quickly fixing this, looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27816#pullrequestreview-3339308027 From chagedorn at openjdk.org Wed Oct 15 09:18:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 15 Oct 2025 09:18:01 GMT Subject: RFR: 8369881: C2: Unexpected node in SuperWord truncation: ReverseBytesS, ReverseBytesUS In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:14:22 GMT, Christian Hagedorn wrote: >> The fuzzer found the `ReverseByteS` case, and I checked all other `*.reverseBytes`, and found a failure with `Character.reverseBytes` as well. >> >> Adding them to the list, and added tests for both. >> >> Note, this is just another 2 boxes checked, there were many similar ones fixed, or on the way: >> https://github.com/openjdk/jdk/pull/26827 >> https://github.com/openjdk/jdk/pull/26334 >> https://github.com/openjdk/jdk/pull/26494 >> https://github.com/openjdk/jdk/pull/26423 > > test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 355: > >> 353: >> 354: @Test >> 355: @IR(counts = { IRNode.STORE_VECTOR, "=0" }) > > You could use `failOn` instead. Same below. Okay, I see that we use `= 0` in the other tests as well further down. Feel free to ignore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27819#discussion_r2431783384 From qamai at openjdk.org Wed Oct 15 09:22:34 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 09:22:34 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> References: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> Message-ID: On Mon, 13 Oct 2025 13:15:38 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > @merykitty Thank you very much for working on this, very exciting. And it seems that the actual logic is now simpler than all the custom logic before! > > However, we need to make sure that all cases that you are not deleting are indeed covered. > 1. `OrINode::add_ring` > > if ( r0 == TypeInt::BOOL ) { > if ( r1 == TypeInt::ONE) { > return TypeInt::ONE; > } else if ( r1 == TypeInt::BOOL ) { > return TypeInt::BOOL; > } > } else if ( r0 == TypeInt::ONE ) { > if ( r1 == TypeInt::BOOL ) { > return TypeInt::ONE; > } > } > > That seems to be covered by KnownBits. > > 2. `OrINode::add_ring` > > if (r0 == TypeInt::MINUS_1 || r1 == TypeInt::MINUS_1) { > return TypeInt::MINUS_1; > } > > Seems also ok, handled by the KnownBits. > > 3. `OrINode::add_ring` > > // If either input is not a constant, just return all integers. > if( !r0->is_con() || !r1->is_con() ) > return TypeInt::INT; // Any integer, but still no symbols. > > // Otherwise just OR them bits. > return TypeInt::make( r0->get_con() | r1->get_con() ); > > Constants would also be handeld by KnownBits. > > 4. `xor_upper_bound_for_ranges` > I think also this should be handled by doing KnownBits first, and then inferring the signed/unsigned bounds, right? > > 5. `and_value` > Does not look so trivial. Maybe you can go over it step by step, and leave some GitHub code comments? @eme64 Thanks for your review, I believe I have addressed all of your suggestions. > However, we need to make sure that all cases that you are not deleting are indeed covered. For this, from the testing POV, all the current idealization tests pass. >From the theoretical POV, let me present it below: For `Xor`: return round_up_power_of_2(U(hi_0 | hi_1) + 1) - 1; // This should be trivially covered by `KnownBits`, since it tries to deal with the highest bits that are known to be 0 in both inputs For `Or`: // If both args are bool, can figure out better types if ( r0 == TypeInt::BOOL ) { if ( r1 == TypeInt::ONE) { return TypeInt::ONE; // Trivial, since all bits except the lowest is 0 in both inputs, and the lowest bit is 1 in the second input } else if ( r1 == TypeInt::BOOL ) { return TypeInt::BOOL; // Trivial, since all bits except the lowest is 0 in both inputs } } else if ( r0 == TypeInt::ONE ) { if ( r1 == TypeInt::BOOL ) { return TypeInt::ONE; // Same as above } } if (r0 == TypeInt::MINUS_1 || r1 == TypeInt::MINUS_1) { return TypeInt::MINUS_1; // Trivial, since all bits is 1 in 1 of the inputs } // If either input is not a constant, just return all integers. if( !r0->is_con() || !r1->is_con() ) return TypeInt::INT; // Any integer, but still no symbols. // Otherwise just OR them bits. return TypeInt::make( r0->get_con() | r1->get_con() ); // Constant folding is trivial For `And`: // If both types are constants, we can calculate a constant result. if (r0->is_con() && r1->is_con()) { return IntegerType::make(r0->get_con() & r1->get_con()); // Constant folding is trivial } // If both ranges are positive, the result will range from 0 up to the hi value of the smaller range. The minimum // of the two constrains the upper bound because any higher value in the other range will see all zeroes, so it will be masked out. if (r0->_lo >= 0 && r1->_lo >= 0) { return IntegerType::make(0, MIN2(r0->_hi, r1->_hi), widen); // In this case, both have a single simple interval, and the max of the result (which is the same as the unsigned max) is not larger than the min of either input. } // If only one range is positive, the result will range from 0 up to that range's maximum value. // For the operation 'x & C' where C is a positive constant, the result will be in the range [0..C]. With that observation, // we can say that for any integer c such that 0 <= c <= C will also be in the range [0..C]. Therefore, 'x & [c..C]' // where c >= 0 will be in the range [0..C]. if (r0->_lo >= 0) { return IntegerType::make(0, r0->_hi, widen); // r0 will have a single simple interval, and the result will be the union of 2 sets both of which have the max being not larger than r0->_hi } if (r1->_lo >= 0) { return IntegerType::make(0, r1->_hi, widen); // Same as above } // At this point, all positive ranges will have already been handled, so the only remaining cases will be negative ranges // and constants. assert(r0->_lo < 0 && r1->_lo < 0, "positive ranges should already be handled!"); // As two's complement means that both numbers will start with leading 1s, the lower bound of both ranges will contain // the common leading 1s of both minimum values. In order to count them with count_leading_zeros, the bits are inverted. NativeType sel_val = ~MIN2(r0->_lo, r1->_lo); NativeType min; // This takes into consideration that the result is negative iff both the inputs are negative, then uses the lower bound to infer the leading 1s in that case if (sel_val == 0) { // Since count_leading_zeros is undefined at 0, we short-circuit the condition where both ranges have a minimum of -1. min = -1; } else { // To get the number of bits to shift, we count the leading 0-bits and then subtract one, as the sign bit is already set. int shift_bits = count_leading_zeros(sel_val) - 1; min = std::numeric_limits::min() >> shift_bits; } NativeType max; if (r0->_hi < 0 && r1->_hi < 0) { // If both ranges are negative, then the same optimization as both positive ranges will apply, and the smaller hi // value will mask off any bits set by higher values. max = MIN2(r0->_hi, r1->_hi); // Both ranges are negative, then similar to when r0->_lo >= 0 && r1->_lo >= 0 } else { // In the case of ranges that cross zero, negative values can cause the higher order bits to be set, so the maximum // positive value can be as high as the larger hi value. max = MAX2(r0->_hi, r1->_hi); // Consider the union of the results when inferring from 4 combinations of simple intervals of the inputs. If both simple intervals are in the negative range, the result is negative. Otherwise, the result will be not larger than the upper bound of the simple interval in the non-negative range. } return IntegerType::make(min, max, widen); ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3405411175 From epeter at openjdk.org Wed Oct 15 09:24:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 09:24:55 GMT Subject: RFR: 8369804: TestGenerators.java fails with IllegalArgumentException: bound must be greater than origin [v2] In-Reply-To: References: Message-ID: > We sample two floats, assuming we would get two different results. But ever so rarely, we get the same values, and the test fails. > > So now, I sample with retry. And also improve the error reporting, throwing an exception on generator construction rather than sampling from the generator. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27816/files - new: https://git.openjdk.org/jdk/pull/27816/files/9cc9b1ff..34e6fd55 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27816&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27816&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27816.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27816/head:pull/27816 PR: https://git.openjdk.org/jdk/pull/27816 From duke at openjdk.org Wed Oct 15 09:33:02 2025 From: duke at openjdk.org (erifan) Date: Wed, 15 Oct 2025 09:33:02 GMT Subject: Withdrawn: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 In-Reply-To: References: Message-ID: <3kxXQ7Je1WbLWtOzH0Nb7PKpwL_EuWnE0BaXZ9Qn788=.39b488d4-7ebe-401c-b0a0-5cd38d7e3ca8@github.com> On Mon, 22 Sep 2025 07:39:24 GMT, erifan wrote: > The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes. > > This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/27418 From duke at openjdk.org Wed Oct 15 09:33:01 2025 From: duke at openjdk.org (erifan) Date: Wed, 15 Oct 2025 09:33:01 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:01:55 GMT, Martin Doerr wrote: > > > Isn't this a duplicate of #27805? > > > > > > I have filed the PR 3 weeks ago, I think #27805 is a duplicate of this PR. > > Hi @TheRealMDoerr , could you please help me test whether this PR can fix the test failure on PPC64? I don't have a PPC64 environment. Thanks! > > Sorry, I had missed that there's already a JBS issue and PR. Unfortunately, your PR doesn't solve the problem. Can we use mine instead? I can change my PR to fix 8368205 and close my JBS issue as duplicate. Hi @TheRealMDoerr , thanks for your feedback. I realized I had overlooked the machine being 64-bit itself. Thanks for the fix, I'll drop mine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27418#issuecomment-3405459477 From epeter at openjdk.org Wed Oct 15 09:24:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 09:24:57 GMT Subject: RFR: 8369804: TestGenerators.java fails with IllegalArgumentException: bound must be greater than origin [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:21:34 GMT, Emanuel Peter wrote: >> We sample two floats, assuming we would get two different results. But ever so rarely, we get the same values, and the test fails. >> >> So now, I sample with retry. And also improve the error reporting, throwing an exception on generator construction rather than sampling from the generator. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 596: > 594: for (int j = 0; j < 500; j++) { > 595: float lo = 1, hi = 0; > 596: // Failur of a single round is very rare, repeated failure even rarer. Suggestion: // Failure of a single round is very rare, repeated failure even rarer. test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 613: > 611: for (int j = 0; j < 500; j++) { > 612: double lo = 1, hi = 0; > 613: // Failur of a single round is very rare, repeated failure even rarer. Suggestion: // Failure of a single round is very rare, repeated failure even rarer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27816#discussion_r2431793215 PR Review Comment: https://git.openjdk.org/jdk/pull/27816#discussion_r2431793881 From qamai at openjdk.org Wed Oct 15 09:42:31 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 09:42:31 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v2] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 14:48:40 GMT, Roland Westrelin wrote: >> Currently ReassociateInvariants is only enabled for int counted >> loops. I noticed, enabling it for long counted loops helps RCE. It >> also seems like something that would help any loop. I propose enabling >> it for all inner loops. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8369258 > - test fixes > - test and fix Marked as reviewed by qamai (Committer). test/hotspot/jtreg/compiler/loopopts/TestReassociateInvariants.java line 63: > 61: @IR(failOn = { IRNode.COUNTED_LOOP, IRNode.LONG_COUNTED_LOOP }) > 62: @IR(counts = { IRNode.LOOP, "1" }) > 63: @Arguments(values = { Argument.NUMBER_42, Argument.NUMBER_42 }) What are we verifying here, should this fail on some kind of range check? ------------- PR Review: https://git.openjdk.org/jdk/pull/27666#pullrequestreview-3339386891 PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2431866113 From qamai at openjdk.org Wed Oct 15 09:42:33 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 09:42:33 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:38:30 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8369258 >> - test fixes >> - test and fix > > test/hotspot/jtreg/compiler/loopopts/TestReassociateInvariants.java line 63: > >> 61: @IR(failOn = { IRNode.COUNTED_LOOP, IRNode.LONG_COUNTED_LOOP }) >> 62: @IR(counts = { IRNode.LOOP, "1" }) >> 63: @Arguments(values = { Argument.NUMBER_42, Argument.NUMBER_42 }) > > What are we verifying here, should this fail on some kind of range check? Also, why are these not recognized as `CountedLoop`s, it seems we need to fix them, too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2431870029 From duke at openjdk.org Wed Oct 15 10:04:04 2025 From: duke at openjdk.org (erifan) Date: Wed, 15 Oct 2025 10:04:04 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 In-Reply-To: References: Message-ID: <1EHAyr4afp0ihQANrbA5gvQ9v5_b7LSDPLAhOJzTQhA=.a425958a-5272-4518-9484-196bfe07564d@github.com> On Tue, 14 Oct 2025 18:21:45 GMT, Martin Doerr wrote: > The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue [JDK-8369511](https://bugs.openjdk.org/browse/JDK-8369511)). This is an alternative to https://github.com/openjdk/jdk/pull/27749. LGTM, thanks for taking care of the issue! ------------- Marked as reviewed by erifan at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/27805#pullrequestreview-3339500880 From chagedorn at openjdk.org Wed Oct 15 10:33:44 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 15 Oct 2025 10:33:44 GMT Subject: RFR: 8369804: TestGenerators.java fails with IllegalArgumentException: bound must be greater than origin [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:24:55 GMT, Emanuel Peter wrote: >> We sample two floats, assuming we would get two different results. But ever so rarely, we get the same values, and the test fails. >> >> So now, I sample with retry. And also improve the error reporting, throwing an exception on generator construction rather than sampling from the generator. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review Looks good to me, too! test/hotspot/jtreg/compiler/lib/generators/UniformDoubleGenerator.java line 38: > 36: public UniformDoubleGenerator(Generators g, double lo, double hi) { > 37: super(g, lo, hi); > 38: if (Double.compare(lo, hi) >= 0) throw new EmptyGeneratorException(); I suggest to add braces Suggestion: if (Double.compare(lo, hi) >= 0) { throw new EmptyGeneratorException(); } test/hotspot/jtreg/compiler/lib/generators/UniformFloatGenerator.java line 38: > 36: public UniformFloatGenerator(Generators g, float lo, float hi) { > 37: super(g, lo, hi); > 38: if (Float.compare(lo, hi) >= 0) throw new EmptyGeneratorException(); Suggestion: if (Float.compare(lo, hi) >= 0) { throw new EmptyGeneratorException(); } ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27816#pullrequestreview-3339650938 PR Review Comment: https://git.openjdk.org/jdk/pull/27816#discussion_r2432037985 PR Review Comment: https://git.openjdk.org/jdk/pull/27816#discussion_r2432038866 From epeter at openjdk.org Wed Oct 15 12:05:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 12:05:12 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: <9csq9foYsLDNJH6sLU1qLEAqihigGGHaViMPnC5BuWc=.7de0ccca-ecf2-4e8d-b8ee-6716c0a9bd7e@github.com> Message-ID: On Wed, 15 Oct 2025 08:18:49 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.hpp line 218: >> >>> 216: bool contains(U u) const; >>> 217: bool contains(const TypeIntMirror& o) const; >>> 218: bool operator==(const TypeIntMirror& o) const; >> >> Could we limit this to `DEBUG_ONLY`? > > Maybe, it disables these gtest in product builds, however. What do you think? Right, I see. I suppose we can keep it. Can you somehow make it clear which block it is, maybe with some start/end markers? I was wondering if the method below is still part of it, but I don't think so. But it was unclear at first. >> src/hotspot/share/opto/rangeinference.hpp line 230: >> >>> 228: // TypeLong*, or they can be TypeIntMirror which behave similar to TypeInt* and TypeLong* during >>> 229: // testing. This allows us to verify the correctness of the implementation without coupling with >>> 230: // the hotspot compiler allocation infrastructure. >> >> This sounds a bit like a hack, but maybe a currently necessary one. But it sounds like we are passing something different in the production code vs in gtest testing code, and that's not ideal. >> >> I suppose an alternative would be to always do the transition from `TypeInt` -> `TypeIntMirror`, before passing it into `RangeInference`. Would that be too much overhead, or have other downsides? I suppose an issue with that is how do you get back a `TypeInt` at the end... yeah not ideal. So maybe your hack is required. >> >> It would have been nice if we could just compose `TypeIntMirror` inside `TypeInt`, but maybe even that does not solve the whole problem. >> >> What do you think? > > In the strict sense, what is passed in product code and what is passed in gtest will never be the same. This is because `TypeInt` is the set of 32-bit integral values, while we do testing on 3-bit integral values. However, with templates, we can be much more confident since we know that the code being executed for `intn_t<3>` and `jint` is the same one, just specialized with different template parameters. With this approach, I believe we achieve the best similarity between what is executed and what is tested. Hmm I see. Right because we cannot actually ever pass a `TypeInt` because of the allocators. That's a shame. But I think it is still worth doing the testing you are doing. So let's just go with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432293070 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432297007 From epeter at openjdk.org Wed Oct 15 12:14:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 12:14:46 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:06:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's reviews src/hotspot/share/opto/rangeinference.hpp line 358: > 356: U ulo = std::numeric_limits>::min(); > 357: // The unsigned value of the result of 'and' is always not greater than both of its inputs > 358: // since there is no position at which the bit is 1 in the result and 0 in either input That does not sound correct. We could have ranges `0..0b1000` for both. But then both values are `0b0010`, and so the result is `0b0010`, which is a 1 at a position where both `uhi` values had zeros. I think you need to talk about leading zeros somehow. src/hotspot/share/opto/rangeinference.hpp line 372: > 370: S hi = std::numeric_limits>::max(); > 371: // The unsigned value of the result of 'or' is always not less than both of its inputs since > 372: // there is no position at which the bit is 0 in the result and 1 in either input Same issue here as above ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432314546 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432315446 From epeter at openjdk.org Wed Oct 15 12:14:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 12:14:49 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:38:15 GMT, Quan Anh Mai wrote: >> test/hotspot/gtest/opto/test_rangeinference.cpp line 33: >> >>> 31: #include >>> 32: #include >>> 33: #include >> >> I don't know the current state of code style guide: but are we allowed to use `std::unordered_set`? > > I can't think of a better way, we have `HashTable` but it is terrible since the table size is fixed. Not sure. I'll ask some folks who might know / have an anser / opinion ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432319221 From epeter at openjdk.org Wed Oct 15 12:14:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 12:14:50 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 12:10:12 GMT, Emanuel Peter wrote: >> I can't think of a better way, we have `HashTable` but it is terrible since the table size is fixed. > > Not sure. I'll ask some folks who might know / have an anser / opinion ;) It surely would be very easy, and not affect the product. But let's see what they say. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432323133 From epeter at openjdk.org Wed Oct 15 12:37:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 12:37:57 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 12:11:39 GMT, Emanuel Peter wrote: >> Not sure. I'll ask some folks who might know / have an anser / opinion ;) > > It surely would be very easy, and not affect the product. But let's see what they say. They tell me it is fine, and we are already doing similar things here: `test/hotspot/gtest/jfr/test_networkUtilization.cpp` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432383176 From epeter at openjdk.org Wed Oct 15 12:37:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 12:37:59 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: <8O3frYK44WsmFoN1Guiqp1ISN-09d7KKaqm9D4X-rhU=.726fff69-5e6e-4aec-aa94-46f58f3c60bc@github.com> On Wed, 15 Oct 2025 08:29:18 GMT, Quan Anh Mai wrote: >> Ah, this is the number of instances for a type! Makes sense. How did you get those numbers, how do we know they are right? > > I just manually calculated them and recorded the value here. The correctness is validated when we try to initialize the set, which then assert that the size of the set is the same as the value presented here. Ok. Well if you have any description of how you computed it manually, it would be nice if you could write that down here ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432386336 From epeter at openjdk.org Wed Oct 15 12:37:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 12:37:59 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: <8O3frYK44WsmFoN1Guiqp1ISN-09d7KKaqm9D4X-rhU=.726fff69-5e6e-4aec-aa94-46f58f3c60bc@github.com> References: <8O3frYK44WsmFoN1Guiqp1ISN-09d7KKaqm9D4X-rhU=.726fff69-5e6e-4aec-aa94-46f58f3c60bc@github.com> Message-ID: On Wed, 15 Oct 2025 12:33:21 GMT, Emanuel Peter wrote: >> I just manually calculated them and recorded the value here. The correctness is validated when we try to initialize the set, which then assert that the size of the set is the same as the value presented here. > > Ok. Well if you have any description of how you computed it manually, it would be nice if you could write that down here ;) If you just computed it once, and are now fixing the value, that's ok-ish too, just write that down. Just in case someone else runs into this later, and needs to fix a bug they would probably like to know where the magic numbers are from ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432389156 From epeter at openjdk.org Wed Oct 15 12:38:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 12:38:02 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:44:28 GMT, Quan Anh Mai wrote: >> test/hotspot/gtest/opto/test_rangeinference.cpp line 313: >> >>> 311: res[idx] = t; >>> 312: idx++; >>> 313: } >> >> Not sure if this is possible with `std::array`, but you could do it with `std::vector`: >> `std::vector tmp(unordered.begin(), unordered.end());` >> >> Just an idea, feel free to leave it as is. > > `std::array` is trivially destructible, which is strongly encouraged for static objects in Hotspot. Fine with me, was just an idea :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432392573 From mdoerr at openjdk.org Wed Oct 15 12:38:27 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Oct 2025 12:38:27 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: > The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue [JDK-8369511](https://bugs.openjdk.org/browse/JDK-8369511)). This is an alternative to https://github.com/openjdk/jdk/pull/27749. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add check if flag is available. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27805/files - new: https://git.openjdk.org/jdk/pull/27805/files/3041e656..4af2585a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27805&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27805&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27805.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27805/head:pull/27805 PR: https://git.openjdk.org/jdk/pull/27805 From mdoerr at openjdk.org Wed Oct 15 12:38:28 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Oct 2025 12:38:28 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: <1EHAyr4afp0ihQANrbA5gvQ9v5_b7LSDPLAhOJzTQhA=.a425958a-5272-4518-9484-196bfe07564d@github.com> References: <1EHAyr4afp0ihQANrbA5gvQ9v5_b7LSDPLAhOJzTQhA=.a425958a-5272-4518-9484-196bfe07564d@github.com> Message-ID: On Wed, 15 Oct 2025 10:01:49 GMT, erifan wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add check if flag is available. > > LGTM, thanks for taking care of the issue! @erifan: Thanks for the review! I've added a check if the flag is available similar to what you had in your PR. That may be needed for VM configurations without C2 compiler. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27805#issuecomment-3406184099 From epeter at openjdk.org Wed Oct 15 12:57:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 12:57:51 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:47:45 GMT, Quan Anh Mai wrote: >> test/hotspot/gtest/opto/test_rangeinference.cpp line 449: >> >>> 447: if (all_instances().size() < 100) { >>> 448: // This effectively covers the cases up to uintn_t<2> >>> 449: test_binary_instance_monotonicity_exhaustive(infer, input1, input2); >> >> Wow, that's really not much. It's really only a "sign" bit and one "mantissa" bit. Would have been nice if we could have handled at least 3 bits. Is that prohibitively slow? > > Yes, testing the monotonicity for those is really slow since we traverse the set of all instances multiple times. Do you know how slow exactly? Just a few seconds or more than that? >> test/hotspot/gtest/opto/test_rangeinference.cpp line 524: >> >>> 522: samples[idx] = TypeIntMirror{canonicalized_t._data._srange._lo, canonicalized_t._data._srange._hi, >>> 523: canonicalized_t._data._urange._lo, canonicalized_t._data._urange._hi, >>> 524: canonicalized_t._data._bits}; >> >> What about using a constructor that creates `TypeIntMirror` directly from a `TypeIntPrototype`? Maybe there is a reason that does not work? > > It is only done here so it is questionable whether making another constructor is beneficial. This is also a testing backdoor since we don't want to create a `TypeIntMirror` with arbitrary field values. Ok, sounds good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432457600 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432458541 From epeter at openjdk.org Wed Oct 15 13:01:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 13:01:59 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:06:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's reviews Nice, thanks for all the updates. I responded to some of the points above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3406327927 From epeter at openjdk.org Wed Oct 15 13:06:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 13:06:41 GMT Subject: RFR: 8369804: TestGenerators.java fails with IllegalArgumentException: bound must be greater than origin [v3] In-Reply-To: References: Message-ID: > We sample two floats, assuming we would get two different results. But ever so rarely, we get the same values, and the test fails. > > So now, I sample with retry. And also improve the error reporting, throwing an exception on generator construction rather than sampling from the generator. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27816/files - new: https://git.openjdk.org/jdk/pull/27816/files/34e6fd55..38fa2fb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27816&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27816&range=01-02 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27816.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27816/head:pull/27816 PR: https://git.openjdk.org/jdk/pull/27816 From duke at openjdk.org Wed Oct 15 13:13:21 2025 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane Ampudia) Date: Wed, 15 Oct 2025 13:13:21 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: References: Message-ID: <8rMmWnSkryAoHuk30W6f4cPw4n-d5rKkIzUA8BelXfM=.377b6d0c-ef21-4b7f-b28b-984847ee2174@github.com> On Wed, 15 Oct 2025 08:17:46 GMT, Daniel Lund?n wrote: > The documentation of the signature part of method patterns in `java -XX:CompileCommand=help` is missing. We should add it. > > ### Changeset > - Improve the documentation of signatures in `java -XX:CompileCommand=help`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500483034) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Marked as reviewed by anton-seoane at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/27818#pullrequestreview-3340336705 From rcastanedalo at openjdk.org Wed Oct 15 13:13:22 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 15 Oct 2025 13:13:22 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:17:46 GMT, Daniel Lund?n wrote: > The documentation of the signature part of method patterns in `java -XX:CompileCommand=help` is missing. We should add it. > > ### Changeset > - Improve the documentation of signatures in `java -XX:CompileCommand=help`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500483034) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Thanks for doing this, Daniel! Would it be possible to use the more precise term "method descriptor" instead of "signature" in the help message? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27818#issuecomment-3406383884 From jcking at openjdk.org Wed Oct 15 13:33:43 2025 From: jcking at openjdk.org (Justin King) Date: Wed, 15 Oct 2025 13:33:43 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 Message-ID: Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. ------------- Commit messages: - Remove trailing whitespace added by Github - Update src/hotspot/cpu/aarch64/templateTable_aarch64.cpp - JDK-8369506: Bytecode rewriting causes Java heap corruption on AArch64 Changes: https://git.openjdk.org/jdk/pull/27748/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27748&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369506 Stats: 24 lines in 3 files changed: 22 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27748.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27748/head:pull/27748 PR: https://git.openjdk.org/jdk/pull/27748 From aph at openjdk.org Wed Oct 15 13:33:44 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 15 Oct 2025 13:33:44 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 16:21:17 GMT, Justin King wrote: > Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. > Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR` guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. How does this follow? We need some sort of happens-before relationship on the reader side to make sure that the resolved field entry is observed. I guess this PR relies on a control dependency between reading the patched bytecode and executing the code that reads the resolved field entry. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 238: > 236: // Patch the bytecode using STLR, this is required so that the last STLR used in > 237: // ResolvedFieldEntry::fill_in is obsevable before the patched bytecode. If it is not, > 238: // TemplateTable::fast_* will observe an unresolved ResolvedFieldEntry and corrupt the Java heap. Suggestion: // Patch the bytecode using STLR so that the last STLR used in // ResolvedFieldEntry::fill_in is observed before the patched bytecode. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3393652648 PR Review Comment: https://git.openjdk.org/jdk/pull/27748#discussion_r2422652088 From jcking at openjdk.org Wed Oct 15 13:33:45 2025 From: jcking at openjdk.org (Justin King) Date: Wed, 15 Oct 2025 13:33:45 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 In-Reply-To: References: Message-ID: On Sat, 11 Oct 2025 20:55:21 GMT, Andrew Haley wrote: > > Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR` guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. > > How does this follow? We need some sort of happens-before relationship on the reader side to make sure that the resolved field entry is observed. I guess this PR relies on a control dependency between reading the patched bytecode and executing the code that reads the resolved field entry. Yes, that is what the PR relies upon. However we are still discussing internally on whether that is enough as we would rather not have a repeat N years down the line as hardware advances. I left this in draft until we figure it out and will poke you once we are more confident. The AArch64 docs are not super clear. It does have this sentence: `A store-release guarantees that all earlier memory accesses are visible before the store-release becomes visible and that the store is visible to all parts of the system capable of storing cached data at the same time.` Which to me, a long with other terminology, seems to imply the two STLRs are enough. But I wouldn't bet money on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3395360285 From aph at openjdk.org Wed Oct 15 13:33:46 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 15 Oct 2025 13:33:46 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 In-Reply-To: References: Message-ID: On Sun, 12 Oct 2025 21:13:11 GMT, Justin King wrote: > But I wouldn't bet money on it. B2.3.6, _Dependency relations, Control dependency_, gives you what you need on the reader side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3396407498 From jcking at openjdk.org Wed Oct 15 13:33:46 2025 From: jcking at openjdk.org (Justin King) Date: Wed, 15 Oct 2025 13:33:46 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 In-Reply-To: References: Message-ID: <-67Ry1EaiKFtU-5lagozrNdWZUqI___sR_awnkXT-6M=.b90a4aaf-8574-4394-b159-983878deb176@github.com> On Fri, 10 Oct 2025 16:21:17 GMT, Justin King wrote: > Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. Based on conversations in https://bugs.openjdk.org/browse/JDK-8369506 and herd7 models, it is believed that this patch is sufficient for AArch64. Marking as ready. @shipilev PTAL as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3406439864 PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3406441650 From shade at openjdk.org Wed Oct 15 14:19:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 15 Oct 2025 14:19:09 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 In-Reply-To: References: Message-ID: <6FRuW_iI48GZuCHbVE28A2syj357hWZoqX0wG24xdsc=.3d48dc7c-ca76-4f75-a858-cf75ad4367d5@github.com> On Fri, 10 Oct 2025 16:21:17 GMT, Justin King wrote: > Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. Yes, I agree this is sufficient. I have only cosmetic comments: src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1716: > 1714: br(Assembler::GE, valid); > 1715: stop("bad field offset"); > 1716: bind(valid); Suggestion: // Verify the field offset is not in the header, implicitly checks for 0 Label L; subs(zr, reg, oopDesc::base_offset_in_bytes()); br(Assembler::GE, L); stop("bad field offset"); bind(L); src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 237: > 235: > 236: // Patch the bytecode using STLR so that the last STLR used in > 237: // ResolvedFieldEntry::fill_in is observed before the patched bytecode. Suggestion: // Patch bytecode with release store to coordinate with ResolvedFieldEntry loads // in fast bytecode codelets. load_field_entry has a memory barrier that gains // the needed ordering, together with control dependency on entering the fast codelet // itself. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27748#pullrequestreview-3340616460 PR Review Comment: https://git.openjdk.org/jdk/pull/27748#discussion_r2432715563 PR Review Comment: https://git.openjdk.org/jdk/pull/27748#discussion_r2432740718 From jcking at openjdk.org Wed Oct 15 14:32:05 2025 From: jcking at openjdk.org (Justin King) Date: Wed, 15 Oct 2025 14:32:05 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v2] In-Reply-To: References: Message-ID: <5amwIojX0oBANAL0XEbxrSI7vQ1VIzcgxi_X1OmAw3Q=.b8b2895a-8358-414a-a600-18cfdbb885ac@github.com> > Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. Justin King has updated the pull request incrementally with one additional commit since the last revision: Suggestions from shipilev Signed-off-by: Justin King ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27748/files - new: https://git.openjdk.org/jdk/pull/27748/files/2f1b5e0a..3575e7ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27748&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27748&range=00-01 Stats: 9 lines in 2 files changed: 1 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/27748.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27748/head:pull/27748 PR: https://git.openjdk.org/jdk/pull/27748 From jcking at openjdk.org Wed Oct 15 14:32:08 2025 From: jcking at openjdk.org (Justin King) Date: Wed, 15 Oct 2025 14:32:08 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v2] In-Reply-To: <6FRuW_iI48GZuCHbVE28A2syj357hWZoqX0wG24xdsc=.3d48dc7c-ca76-4f75-a858-cf75ad4367d5@github.com> References: <6FRuW_iI48GZuCHbVE28A2syj357hWZoqX0wG24xdsc=.3d48dc7c-ca76-4f75-a858-cf75ad4367d5@github.com> Message-ID: On Wed, 15 Oct 2025 14:08:17 GMT, Aleksey Shipilev wrote: >> Justin King has updated the pull request incrementally with one additional commit since the last revision: >> >> Suggestions from shipilev >> >> Signed-off-by: Justin King > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1716: > >> 1714: br(Assembler::GE, valid); >> 1715: stop("bad field offset"); >> 1716: bind(valid); > > Suggestion: > > // Verify the field offset is not in the header, implicitly checks for 0 > Label L; > subs(zr, reg, oopDesc::base_offset_in_bytes()); > br(Assembler::GE, L); > stop("bad field offset"); > bind(L); Done. > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 237: > >> 235: >> 236: // Patch the bytecode using STLR so that the last STLR used in >> 237: // ResolvedFieldEntry::fill_in is observed before the patched bytecode. > > Suggestion: > > // Patch bytecode with release store to coordinate with ResolvedFieldEntry loads > // in fast bytecode codelets. load_field_entry has a memory barrier that gains > // the needed ordering, together with control dependency on entering the fast codelet > // itself. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27748#discussion_r2432785427 PR Review Comment: https://git.openjdk.org/jdk/pull/27748#discussion_r2432785956 From dfenacci at openjdk.org Wed Oct 15 14:38:51 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 15 Oct 2025 14:38:51 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v4] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 17:09:45 GMT, Christian Hagedorn wrote: >> The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. >> >> #### Reduce Execution Time by not Executing the Scenarios >> I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. >> >> To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. >> >> #### Changes >> - Verification without actually running scenarios. >> - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. >> - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. >> - Refactored the test a little more. >> - Refactored some small things in `addCrossProductScenarios()` while looking at it. >> - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. >> >> #### Execution Time Comparison >> Measured on my local machine: >> - Mainline: ~80s >> - With patch: ~2-3s >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > review Emanuel Looks good to me otherwise. Thanks @chhagedorn. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 337: > 335: Asserts.assertTrue(stdErr.contains("Scenario flags: [-XX:-UseNewCode, -XX:-UseNewCode2]")); > 336: Asserts.assertTrue(stdErr.contains("Scenario flags: [-XX:+UseNewCode, -XX:-UseNewCode2]")); > 337: Asserts.assertTrue(stdErr.contains("Scenario flags: [-XX:-UseNewCode, -XX:+UseNewCode2]")); This might be partially redundant with the full stop in the first assert above but maybe it would be worth checking that we don't have any additional "Scenario flags:..." string. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/27672#pullrequestreview-3340205271 PR Review Comment: https://git.openjdk.org/jdk/pull/27672#discussion_r2432430965 From shade at openjdk.org Wed Oct 15 14:43:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 15 Oct 2025 14:43:00 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v2] In-Reply-To: <5amwIojX0oBANAL0XEbxrSI7vQ1VIzcgxi_X1OmAw3Q=.b8b2895a-8358-414a-a600-18cfdbb885ac@github.com> References: <5amwIojX0oBANAL0XEbxrSI7vQ1VIzcgxi_X1OmAw3Q=.b8b2895a-8358-414a-a600-18cfdbb885ac@github.com> Message-ID: On Wed, 15 Oct 2025 14:32:05 GMT, Justin King wrote: >> Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. > > Justin King has updated the pull request incrementally with one additional commit since the last revision: > > Suggestions from shipilev > > Signed-off-by: Justin King Now running tests in my Graviton 3 instance to make sure the new verification code is not barfing up anywhere. PR patch applies with fuzz, BTW, consider merging from mainline: % patch -p1 < 27748.diff patching file src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp patching file src/hotspot/cpu/aarch64/interp_masm_aarch64.hpp patching file src/hotspot/cpu/aarch64/templateTable_aarch64.cpp Hunk #3 succeeded at 3098 (offset 15 lines). Hunk #4 succeeded at 3188 (offset 15 lines). Hunk #5 succeeded at 3256 (offset 15 lines). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3406799934 From epeter at openjdk.org Wed Oct 15 14:44:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 14:44:29 GMT Subject: RFR: 8369912: [TESTBUG] testlibrary_tests/template_framework/examples/TestExpressions.java fails with ArithmeticException: / by zero - forgot to respect Expression.info Message-ID: The test generates a test for each operator, like this: public static int primitiveConTest_185_compiled() { return (989451435 % 0); } However, some operators throw exceptions, just like here the `%`, when given a zero rhs argument. The expression already knows about that, we just need to generate try-catch statements in the code. Similarly, some operators do not always return deterministic results (different Nan, or precision). So we need to handle that too. Note: we already do all of that in the `test/hotspot/jtreg/compiler/igvn/ExpressionFuzzer.java`. ------------- Commit messages: - JDK-8369912 Changes: https://git.openjdk.org/jdk/pull/27824/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27824&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369912 Stats: 30 lines in 1 file changed: 23 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/27824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27824/head:pull/27824 PR: https://git.openjdk.org/jdk/pull/27824 From roland at openjdk.org Wed Oct 15 14:47:17 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 Oct 2025 14:47:17 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:39:40 GMT, Quan Anh Mai wrote: >> test/hotspot/jtreg/compiler/loopopts/TestReassociateInvariants.java line 63: >> >>> 61: @IR(failOn = { IRNode.COUNTED_LOOP, IRNode.LONG_COUNTED_LOOP }) >>> 62: @IR(counts = { IRNode.LOOP, "1" }) >>> 63: @Arguments(values = { Argument.NUMBER_42, Argument.NUMBER_42 }) >> >> What are we verifying here, should this fail on some kind of range check? > > Also, why are these not recognized as `CountedLoop`s, it seems we need to fix them, too? Without reassociate invariants, RCE elimination doesn't happen. With it, it does happen. So the loop becomes empty and that's why there's no `CountedLoop`. No problem here with `CountedLoop` recognition. The IR framework is not powerful enough to test this sort of things so I have to rely on some side effect that can be observed by the IR framework. In that case, reassociate invariants -> RCE optimization -> empty loop -> no `CountedLoop`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2432855832 From jcking at openjdk.org Wed Oct 15 14:54:36 2025 From: jcking at openjdk.org (Justin King) Date: Wed, 15 Oct 2025 14:54:36 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v3] In-Reply-To: References: Message-ID: > Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. Justin King has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'openjdk:master' into aarch64-rewrite-bytecodes - Suggestions from shipilev Signed-off-by: Justin King - Remove trailing whitespace added by Github Signed-off-by: Justin King - Update src/hotspot/cpu/aarch64/templateTable_aarch64.cpp Co-authored-by: Andrew Haley - JDK-8369506: Bytecode rewriting causes Java heap corruption on AArch64 Signed-off-by: Justin King ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27748/files - new: https://git.openjdk.org/jdk/pull/27748/files/3575e7ba..95d4831f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27748&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27748&range=01-02 Stats: 10780 lines in 273 files changed: 5974 ins; 4231 del; 575 mod Patch: https://git.openjdk.org/jdk/pull/27748.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27748/head:pull/27748 PR: https://git.openjdk.org/jdk/pull/27748 From jcking at openjdk.org Wed Oct 15 14:54:37 2025 From: jcking at openjdk.org (Justin King) Date: Wed, 15 Oct 2025 14:54:37 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v2] In-Reply-To: References: <5amwIojX0oBANAL0XEbxrSI7vQ1VIzcgxi_X1OmAw3Q=.b8b2895a-8358-414a-a600-18cfdbb885ac@github.com> Message-ID: On Wed, 15 Oct 2025 14:39:52 GMT, Aleksey Shipilev wrote: > PR patch applies with fuzz, BTW, consider merging from mainline: Merged. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3406851136 From kbarrett at openjdk.org Wed Oct 15 14:57:24 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 15 Oct 2025 14:57:24 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: <6de0G8wZZ7STOtX8_IdiE6ZRKmOC1Y6-559Q6Y2Uxxw=.bef6c2de-68f8-4cec-924e-dd42d7bd13e2@github.com> On Wed, 15 Oct 2025 12:32:28 GMT, Emanuel Peter wrote: >> It surely would be very easy, and not affect the product. But let's see what they say. > > They tell me it is fine, and we are already doing similar things here: > `test/hotspot/gtest/jfr/test_networkUtilization.cpp` Not a review, just a drive-by comment, following up on @eme64 "They tell me its fine". I do not think it's okay to use most standard library headers. Doing so can run into issues with things like our forbidden function mechanism, assert macro collision, and others. My opinion is the uses in `jfr/test_networkUtilization.cpp` shouldn't be there, and aren't actually necessary. I just did a spot check, and the only "good" case I found is `test_codestrings.cpp` using ``, where there isn't any similar functionality available in hotspot. The suggestion in the discussion @eme64 for a set is `RBTree`. The O(1) lookup by a hashtable is unlikely to matter to a gtest. There is ongoing work updating our usage (see, for example, https://bugs.openjdk.org/browse/JDK-8369186) and how to do that in a safe and consistent manner. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2432895220 From dlunden at openjdk.org Wed Oct 15 15:23:19 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 15 Oct 2025 15:23:19 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:48:36 GMT, Anton Seoane Ampudia wrote: > As a side note, looking at your line splits I got curious whether we are actively enforcing an 80-char limit in some output or not (the last two "paragraphs" exceed this size, although they've been there from before) I had closer look, and it seems we try to not exceed 80 characters, but occasionally do if it helps readability. For example, run `java` without arguments and look at the default help message. > Thanks for doing this, Daniel! Would it be possible to use the more precise term "method descriptor" instead of "signature" in the help message? Yes, I agree and did consider this as well after consulting the JVM spec. I let "signature" remain as that is what is used currently and also seems to be the terminology used in the code. I'll wait a bit for more comments before committing to changing it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27818#issuecomment-3406988516 From epeter at openjdk.org Wed Oct 15 15:23:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 15 Oct 2025 15:23:22 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 14:44:29 GMT, Roland Westrelin wrote: >> Also, why are these not recognized as `CountedLoop`s, it seems we need to fix them, too? > > Without reassociate invariants, RCE elimination doesn't happen. With it, it does happen. So the loop becomes empty and that's why there's no `CountedLoop`. No problem here with `CountedLoop` recognition. > The IR framework is not powerful enough to test this sort of things so I have to rely on some side effect that can be observed by the IR framework. In that case, reassociate invariants -> RCE optimization -> empty loop -> no `CountedLoop`. Ah, nice explanation! Can you add that in a comment in the test code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2432995236 From qamai at openjdk.org Wed Oct 15 15:33:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 15:33:01 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 12:08:23 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > src/hotspot/share/opto/rangeinference.hpp line 358: > >> 356: U ulo = std::numeric_limits>::min(); >> 357: // The unsigned value of the result of 'and' is always not greater than both of its inputs >> 358: // since there is no position at which the bit is 1 in the result and 0 in either input > > That does not sound correct. > > We could have ranges `0..0b1000` for both. But then both values are `0b0010`, and so the result is `0b0010`, which is a 1 at a position where both `uhi` values had zeros. > > I think you need to talk about leading zeros somehow. No this is not about the range, but about the value in an operation. I.e. If `z = x & y` then `z u<= x && z u<= y`. This leads to the fact that the upper bound of `z` is not larger than the upper bounds of `x` and `y`. > src/hotspot/share/opto/rangeinference.hpp line 372: > >> 370: S hi = std::numeric_limits>::max(); >> 371: // The unsigned value of the result of 'or' is always not less than both of its inputs since >> 372: // there is no position at which the bit is 0 in the result and 1 in either input > > Same issue here as above Same here, if `z = x | y` then `z u>= x && z u>= y`. This means that the lower bound of `z` is not smaller than the lower bounds of `x` and `y`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2433035783 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2433040171 From jsjolen at openjdk.org Wed Oct 15 15:39:11 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 15 Oct 2025 15:39:11 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: <6de0G8wZZ7STOtX8_IdiE6ZRKmOC1Y6-559Q6Y2Uxxw=.bef6c2de-68f8-4cec-924e-dd42d7bd13e2@github.com> References: <6de0G8wZZ7STOtX8_IdiE6ZRKmOC1Y6-559Q6Y2Uxxw=.bef6c2de-68f8-4cec-924e-dd42d7bd13e2@github.com> Message-ID: On Wed, 15 Oct 2025 14:54:18 GMT, Kim Barrett wrote: >> They tell me it is fine, and we are already doing similar things here: >> `test/hotspot/gtest/jfr/test_networkUtilization.cpp` > > Not a review, just a drive-by comment, following up on @eme64 "They tell me its fine". > > I do not think it's okay to use most standard library headers. Doing so can run into issues with things > like our forbidden function mechanism, assert macro collision, and others. My opinion is the uses in > `jfr/test_networkUtilization.cpp` shouldn't be there, and aren't actually necessary. I just did a spot check, > and the only "good" case I found is `test_codestrings.cpp` using ``, where there isn't any similar > functionality available in hotspot. The suggestion in the discussion @eme64 for a set is `RBTree`. The O(1) > lookup by a hashtable is unlikely to matter to a gtest. > > There is ongoing work updating our usage (see, for example, https://bugs.openjdk.org/browse/JDK-8369186) > and how to do that in a safe and consistent manner. Use `RBTreeCHeap`, if going the RBTree route. It's just the easiest way of using it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2433060235 From qamai at openjdk.org Wed Oct 15 15:47:18 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 15:47:18 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 12:55:15 GMT, Emanuel Peter wrote: >> Yes, testing the monotonicity for those is really slow since we traverse the set of all instances multiple times. > > Do you know how slow exactly? Just a few seconds or more than that? The runtime jumps from 400ms to 10s, so it's quite a big jump indeed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2433092847 From roland at openjdk.org Wed Oct 15 15:52:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 Oct 2025 15:52:08 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v3] In-Reply-To: References: Message-ID: <99G1wPXEs1RoqnPlvNzteqd4Pf96pkqNilnaOPMiSgA=.608c0323-03ce-4e77-925b-ea3732ebbb0a@github.com> > Currently ReassociateInvariants is only enabled for int counted > loops. I noticed, enabling it for long counted loops helps RCE. It > also seems like something that would help any loop. I propose enabling > it for all inner loops. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27666/files - new: https://git.openjdk.org/jdk/pull/27666/files/87d69288..87f08209 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27666&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27666&range=01-02 Stats: 14 lines in 1 file changed: 14 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27666/head:pull/27666 PR: https://git.openjdk.org/jdk/pull/27666 From qamai at openjdk.org Wed Oct 15 15:52:10 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 15:52:10 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v3] In-Reply-To: <99G1wPXEs1RoqnPlvNzteqd4Pf96pkqNilnaOPMiSgA=.608c0323-03ce-4e77-925b-ea3732ebbb0a@github.com> References: <99G1wPXEs1RoqnPlvNzteqd4Pf96pkqNilnaOPMiSgA=.608c0323-03ce-4e77-925b-ea3732ebbb0a@github.com> Message-ID: On Wed, 15 Oct 2025 15:48:49 GMT, Roland Westrelin wrote: >> Currently ReassociateInvariants is only enabled for int counted >> loops. I noticed, enabling it for long counted loops helps RCE. It >> also seems like something that would help any loop. I propose enabling >> it for all inner loops. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27666#pullrequestreview-3341113657 From roland at openjdk.org Wed Oct 15 15:52:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 15 Oct 2025 15:52:12 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 15:19:55 GMT, Emanuel Peter wrote: >> Without reassociate invariants, RCE elimination doesn't happen. With it, it does happen. So the loop becomes empty and that's why there's no `CountedLoop`. No problem here with `CountedLoop` recognition. >> The IR framework is not powerful enough to test this sort of things so I have to rely on some side effect that can be observed by the IR framework. In that case, reassociate invariants -> RCE optimization -> empty loop -> no `CountedLoop`. > > Ah, nice explanation! Can you add that in a comment in the test code? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2433085907 From eastigeevich at openjdk.org Wed Oct 15 15:54:23 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 15 Oct 2025 15:54:23 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 In-Reply-To: <-Lv9OC8lY8jwMPZ0HxA1DUWoa611YLmOKM7JQmv0mpc=.607ec490-a9af-45b2-9a23-aa64a82d40c0@github.com> References: <-Lv9OC8lY8jwMPZ0HxA1DUWoa611YLmOKM7JQmv0mpc=.607ec490-a9af-45b2-9a23-aa64a82d40c0@github.com> Message-ID: <7HB--HqIsxx7wOEO95W-my4e462F4f99jqxZp2GxQA0=.f1ff5a91-5d69-4b40-9981-9588b264e9bf@github.com> On Tue, 14 Oct 2025 23:44:39 GMT, Chad Rakoczy wrote: >> [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) >> >> Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) >> >> `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` >> >> `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions >> >> `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC > > The following occurs when running DeoptimizeRelocatedNMethod on PPC64 > > # Internal Error (jdk/src/hotspot/cpu/ppc/nativeInst_ppc.cpp:405) > # assert(!decode(i1, i2)) failed: already patched > > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1701784] NativePostCallNop::patch(int, int)+0xf4 (nativeInst_ppc.cpp:405) > V [libjvm.so+0x1718414] nmethod::finalize_relocations()+0x6f4 (nmethod.cpp:2059) > V [libjvm.so+0x171891c] nmethod::post_init()+0x5c (nmethod.cpp:1252) > V [libjvm.so+0x171a8dc] nmethod::relocate(CodeBlobType)+0x1ec (nmethod.cpp:1515) > V [libjvm.so+0x200b598] WB_RelocateNMethodFromMethod+0x388 (whitebox.cpp:1653) > j jdk.test.whitebox.WhiteBox.relocateNMethodFromMethod0(Ljava/lang/reflect/Executable;I)V+0 > j jdk.test.whitebox.WhiteBox.relocateNMethodFromMethod(Ljava/lang/reflect/Executable;I)V+8 > j compiler.whitebox.DeoptimizeRelocatedNMethod.main([Ljava/lang/String;)V+50 > > > @TheRealMDoerr @reinrich Do you have any ideas on a solution for this? I don't have any experience working with PPC so guidance would be greatly appreciated @chadrako > StressNMethodRelocation.java runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Maybe instead of hardcoding the number of methods (1024), we can have a reasonable time slice, e.g. 10 seconds, and compile as many methods as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3407142928 From psandoz at openjdk.org Wed Oct 15 16:08:25 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 15 Oct 2025 16:08:25 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: Message-ID: <8cCO3_XSLXtzulIYd3AVHvzQzgkQ9CVVepy61I2QkiI=.8fdee0d5-520a-4987-9b55-cc1b559f37aa@github.com> On Wed, 17 Sep 2025 08:48:16 GMT, Xiaohong Gong wrote: >> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. >> >> ### Background >> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. >> >> ### Implementation >> >> #### Challenges >> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. >> >> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: >> - SPECIES_64: Single operation with mask (8 elements, 256-bit) >> - SPECIES_128: Single operation, full register (16 elements, 512-bit) >> - SPECIES_256: Two operations + merge (32 elements, 1024-bit) >> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) >> >> Use `ByteVector.SPECIES_512` as an example: >> - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. >> - It requires 4 times of vector gather-loads to finish the whole operation. >> >> >> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] >> int[] idx = [0, 1, 2, 3, ..., 63, ...] >> >> 4 gather-load: >> idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] >> idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] >> idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] >> idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] >> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] >> >> >> #### Solution >> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. >> >> Here is the main changes: >> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. >> - Added `VectorSliceNode` for result mer... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Add more comments for IRs and added method > - Merge branch 'jdk:master' into JDK-8351623-sve > - Merge 'jdk:master' into JDK-8351623-sve > - Address review comments > - Refine IR pattern and clean backend rules > - Fix indentation issue and move the helper matcher method to header files > - Merge branch jdk:master into JDK-8351623-sve > - 8351623: VectorAPI: Add SVE implementation of subword gather load operation I suspect it's likely more complex overall adding a slice operation to mask, that is really only needed for a specific case. (A more general operation would be compress/expand of the mask bits, but i don't believe there are hardware instructions for such operations on mask registers.) In my view adding a part parameter is a compromise and seems less complex that requiring N index vectors, and it fits with a general pattern we have around parts of the vector. It moves the specialized operation requirements on the mask into the area where it is needed rather than trying to generalize in a manner that i don't think is appropriate in the mask API. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3407226485 From qamai at openjdk.org Wed Oct 15 16:15:05 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 16:15:05 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v3] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: - remove std::hash - remove unordered_map, add some comments for all_instances_size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27618/files - new: https://git.openjdk.org/jdk/pull/27618/files/b73850d1..513e3e9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=01-02 Stats: 56 lines in 2 files changed: 37 ins; 11 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/27618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27618/head:pull/27618 PR: https://git.openjdk.org/jdk/pull/27618 From qamai at openjdk.org Wed Oct 15 16:15:07 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 16:15:07 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 12:59:04 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > Nice, thanks for all the updates. I responded to some of the points above. @eme64 I have removed the usage of `std::unordered_map` as well as added comments explaining the values of `all_instances_size`. Do you have any other concern? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3407258312 From qamai at openjdk.org Wed Oct 15 16:15:08 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 15 Oct 2025 16:15:08 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v3] In-Reply-To: References: <6de0G8wZZ7STOtX8_IdiE6ZRKmOC1Y6-559Q6Y2Uxxw=.bef6c2de-68f8-4cec-924e-dd42d7bd13e2@github.com> Message-ID: On Wed, 15 Oct 2025 15:36:18 GMT, Johan Sj?len wrote: >> Not a review, just a drive-by comment, following up on @eme64 "They tell me its fine". >> >> I do not think it's okay to use most standard library headers. Doing so can run into issues with things >> like our forbidden function mechanism, assert macro collision, and others. My opinion is the uses in >> `jfr/test_networkUtilization.cpp` shouldn't be there, and aren't actually necessary. I just did a spot check, >> and the only "good" case I found is `test_codestrings.cpp` using ``, where there isn't any similar >> functionality available in hotspot. The suggestion in the discussion @eme64 for a set is `RBTree`. The O(1) >> lookup by a hashtable is unlikely to matter to a gtest. >> >> There is ongoing work updating our usage (see, for example, https://bugs.openjdk.org/browse/JDK-8369186) >> and how to do that in a safe and consistent manner. > > Use `RBTreeCHeap`, if going the RBTree route. It's just the easiest way of using it. Thanks for your inputs, I have removed the usage of `std::unordered_map` and replaced it with `RBTreeCHeap`. Is using `std::array` here fine? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2433203193 From jsjolen at openjdk.org Wed Oct 15 16:29:16 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 15 Oct 2025 16:29:16 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v3] In-Reply-To: References: <6de0G8wZZ7STOtX8_IdiE6ZRKmOC1Y6-559Q6Y2Uxxw=.bef6c2de-68f8-4cec-924e-dd42d7bd13e2@github.com> Message-ID: On Wed, 15 Oct 2025 16:10:56 GMT, Quan Anh Mai wrote: >> Use `RBTreeCHeap`, if going the RBTree route. It's just the easiest way of using it. > > Thanks for your inputs, I have removed the usage of `std::unordered_map` and replaced it with `RBTreeCHeap`. Is using `std::array` here fine? Hi @merykitty, I think that we don't use the STL because we run without exceptions and because we want our production data structures to have custom allocators, and history :-). As `std::array` (AFAIU) is 'just' a typed and sized `T*`, I think it should be fine, as long as you avoid things that might throw! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2433256750 From kvn at openjdk.org Wed Oct 15 17:17:34 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 15 Oct 2025 17:17:34 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 In-Reply-To: References: Message-ID: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> On Mon, 6 Oct 2025 20:13:46 GMT, Chad Rakoczy wrote: > [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) > > Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) > > `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` > > `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions > > `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC Comments. test/hotspot/jtreg/compiler/whitebox/DeoptimizeRelocatedNMethod.java line 30: > 28: * @library /test/lib / > 29: * @modules java.base/jdk.internal.misc java.management > 30: * @requires vm.opt.DeoptimizeALot != true & vm.gc.Serial Suggestion: @requires vm.gc == "null" | vm.gc == "Serial" test/hotspot/jtreg/compiler/whitebox/DeoptimizeRelocatedNMethod.java line 42: > 40: * @library /test/lib / > 41: * @modules java.base/jdk.internal.misc java.management > 42: * @requires vm.opt.DeoptimizeALot != true & vm.gc.Parallel @requires vm.gc == "null" | vm.gc == "Parallel" test/hotspot/jtreg/compiler/whitebox/DeoptimizeRelocatedNMethod.java line 54: > 52: * @library /test/lib / > 53: * @modules java.base/jdk.internal.misc java.management > 54: * @requires vm.opt.DeoptimizeALot != true & vm.gc.G1 @requires vm.gc == "null" | vm.gc == "G1" test/hotspot/jtreg/compiler/whitebox/DeoptimizeRelocatedNMethod.java line 66: > 64: * @library /test/lib / > 65: * @modules java.base/jdk.internal.misc java.management > 66: * @requires vm.opt.DeoptimizeALot != true & vm.gc.Shenandoah @requires vm.gc == "null" | vm.gc == "Shenandoah" test/hotspot/jtreg/compiler/whitebox/DeoptimizeRelocatedNMethod.java line 78: > 76: * @library /test/lib / > 77: * @modules java.base/jdk.internal.misc java.management > 78: * @requires vm.opt.DeoptimizeALot != true & vm.gc.Z @requires vm.gc == "null" | vm.gc == "Z" test/hotspot/jtreg/compiler/whitebox/RelocateNMethod.java line 32: > 30: * @modules java.base/jdk.internal.misc java.management > 31: * > 32: * @requires vm.opt.DeoptimizeALot != true & vm.gc.Serial Same here as in test/hotspot/jtreg/compiler/whitebox/DeoptimizeRelocatedNMethod.java test/hotspot/jtreg/compiler/whitebox/StressNMethodRelocation.java line 34: > 32: * @build jdk.test.whitebox.WhiteBox > 33: * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox > 34: * @run main/othervm/timeout=600 -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI I prefer @eastig suggestion to limit run time instead of increase timeout. test/hotspot/jtreg/compiler/whitebox/StressNMethodRelocation.java line 35: > 33: * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox > 34: * @run main/othervm/timeout=600 -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI > 35: * -XX:+SegmentedCodeCache -XX:+TieredCompilation -XX:+UnlockExperimentalVMOptions I am not sure you need to specify `-XX:+SegmentedCodeCache -XX:+TieredCompilation ` because you require them to be enabled to run test. Please, test that. test/hotspot/jtreg/serviceability/jvmti/NMethodRelocation/NMethodRelocationTest.java line 29: > 27: * @bug 8316694 > 28: * @summary Verify that nmethod relocation posts the correct JVMTI events > 29: * @requires vm.jvmti & vm.gc.Serial * @requires vm.gc == "null" | vm.gc == "Serial" ------------- PR Review: https://git.openjdk.org/jdk/pull/27659#pullrequestreview-3341481186 PR Review Comment: https://git.openjdk.org/jdk/pull/27659#discussion_r2433367608 PR Review Comment: https://git.openjdk.org/jdk/pull/27659#discussion_r2433368572 PR Review Comment: https://git.openjdk.org/jdk/pull/27659#discussion_r2433369949 PR Review Comment: https://git.openjdk.org/jdk/pull/27659#discussion_r2433371850 PR Review Comment: https://git.openjdk.org/jdk/pull/27659#discussion_r2433373000 PR Review Comment: https://git.openjdk.org/jdk/pull/27659#discussion_r2433374245 PR Review Comment: https://git.openjdk.org/jdk/pull/27659#discussion_r2433383437 PR Review Comment: https://git.openjdk.org/jdk/pull/27659#discussion_r2433386087 PR Review Comment: https://git.openjdk.org/jdk/pull/27659#discussion_r2433376516 From valeriep at openjdk.org Wed Oct 15 17:58:38 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Wed, 15 Oct 2025 17:58:38 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v4] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 05:51:40 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Add remaining files to be staged src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 43: > 41: * https://www.internationaljournalcorner.com/index.php/ijird_ojs/article/view/134688 > 42: */ > 43: public final class AES_Crypt extends SymmetricCipher { This internal class does not need to be public? I'd assume it's only used within the same package? test/micro/org/openjdk/bench/javax/crypto/AESDecrypt.java line 53: > 51: public void setup() throws Exception { > 52: SecretKeySpec keySpec = new SecretKeySpec(new byte[]{-80, -103, -1, 68, -29, -94, 61, -52, 93, -59, -128, 105, 110, 88, 44, 105}, "AES"); > 53: IvParameterSpec iv = new IvParameterSpec(new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}); Is the all-0s IV intentional? test/micro/org/openjdk/bench/javax/crypto/AESDecrypt.java line 82: > 80: public byte[] testUseAesIntrinsics() throws Exception { > 81: return cipher.doFinal(ct); > 82: } These 3 methods look same to me except for the method names? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2433490853 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2433487319 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2433483607 From duke at openjdk.org Wed Oct 15 18:03:43 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 15 Oct 2025 18:03:43 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 In-Reply-To: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> Message-ID: On Wed, 15 Oct 2025 17:14:49 GMT, Vladimir Kozlov wrote: >> [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) >> >> Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) >> >> `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` >> >> `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions >> >> `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC > > Comments. > @vnkozlov Could you provide your configure and more test output for https://bugs.openjdk.org/browse/JDK-8369150 > > I'm not able to reproduce I'm not sure if you saw this because of the bot comments but I'm not able to reproduce the COH failure ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3407655874 From shade at openjdk.org Wed Oct 15 18:30:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 15 Oct 2025 18:30:48 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v3] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 14:54:36 GMT, Justin King wrote: >> Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. > > Justin King has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'openjdk:master' into aarch64-rewrite-bytecodes > - Suggestions from shipilev > > Signed-off-by: Justin King > - Remove trailing whitespace added by Github > > Signed-off-by: Justin King > - Update src/hotspot/cpu/aarch64/templateTable_aarch64.cpp > > Co-authored-by: Andrew Haley > - JDK-8369506: Bytecode rewriting causes Java heap corruption on AArch64 > > Signed-off-by: Justin King Marked as reviewed by shade (Reviewer). Linux AArch64 server fastdebug, `make test TEST=all` has no new problems here. ------------- PR Review: https://git.openjdk.org/jdk/pull/27748#pullrequestreview-3341755373 PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3407744620 From dbriemann at openjdk.org Wed Oct 15 18:31:58 2025 From: dbriemann at openjdk.org (David Briemann) Date: Wed, 15 Oct 2025 18:31:58 GMT Subject: RFR: 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 07:43:21 GMT, David Briemann wrote: >> No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler and only observers the thread its running on. > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > remove whitespace Thank you both for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27768#issuecomment-3407740882 From dbriemann at openjdk.org Wed Oct 15 18:31:59 2025 From: dbriemann at openjdk.org (David Briemann) Date: Wed, 15 Oct 2025 18:31:59 GMT Subject: Integrated: 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 11:48:42 GMT, David Briemann wrote: > No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler and only observers the thread its running on. This pull request has now been integrated. Changeset: bfe69372 Author: David Briemann URL: https://git.openjdk.org/jdk/commit/bfe6937244ff7ec9899bb6a5eaa4222736898177 Stats: 9 lines in 1 file changed: 3 ins; 5 del; 1 mod 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers Reviewed-by: mdoerr, dlong ------------- PR: https://git.openjdk.org/jdk/pull/27768 From kvn at openjdk.org Wed Oct 15 18:45:25 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 15 Oct 2025 18:45:25 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 In-Reply-To: References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> Message-ID: On Wed, 15 Oct 2025 18:00:33 GMT, Chad Rakoczy wrote: > I'm not sure if you saw this because of the bot comments but I'm not able to reproduce the COH failure @chadrako, I added test output to [8369150](https://bugs.openjdk.org/browse/JDK-8369150) bug report. Do you unload old method after coping and let GC do it normal way? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3407790926 From valeriep at openjdk.org Wed Oct 15 18:48:29 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Wed, 15 Oct 2025 18:48:29 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v4] In-Reply-To: References: Message-ID: <-QG-DjJMOsPrJDY4lSgr0jOrcwEH_40FmvRgt4CQW90=.df69e912-a5e1-4574-a70d-b30e855c1dcc@github.com> On Wed, 15 Oct 2025 05:51:40 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Add remaining files to be staged src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 55: > 53: > 54: private static final int AES_256_ROUNDS = 14; > 55: private static final int AES_256_NKEYS = 32; The `AES_XXX_NKEYS` constants (valued 16, 24, 32) are also defined in `AESConstants` class, maybe we can just refer to that class instead of duplicate the definition here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2433608974 From valeriep at openjdk.org Wed Oct 15 18:53:55 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Wed, 15 Oct 2025 18:53:55 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v4] In-Reply-To: <-QG-DjJMOsPrJDY4lSgr0jOrcwEH_40FmvRgt4CQW90=.df69e912-a5e1-4574-a70d-b30e855c1dcc@github.com> References: <-QG-DjJMOsPrJDY4lSgr0jOrcwEH_40FmvRgt4CQW90=.df69e912-a5e1-4574-a70d-b30e855c1dcc@github.com> Message-ID: <8Hb0rpkClUOY9-wId7h9oVsyZx6Q1BfCKKEshQ8u6PA=.c3a9339b-0567-47bb-a2f2-92836e1af493@github.com> On Wed, 15 Oct 2025 18:45:24 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Add remaining files to be staged > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 55: > >> 53: >> 54: private static final int AES_256_ROUNDS = 14; >> 55: private static final int AES_256_NKEYS = 32; > > The `AES_XXX_NKEYS` constants (valued 16, 24, 32) are also defined in `AESConstants` class, maybe we can just refer to that class instead of duplicate the definition here? Or, merge the values defined in `AESConstants` into this class. Either way is fine with me as long as no duplicated values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2433622686 From aph at openjdk.org Wed Oct 15 18:57:59 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 15 Oct 2025 18:57:59 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v3] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 14:54:36 GMT, Justin King wrote: >> Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. > > Justin King has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'openjdk:master' into aarch64-rewrite-bytecodes > - Suggestions from shipilev > > Signed-off-by: Justin King > - Remove trailing whitespace added by Github > > Signed-off-by: Justin King > - Update src/hotspot/cpu/aarch64/templateTable_aarch64.cpp > > Co-authored-by: Andrew Haley > - JDK-8369506: Bytecode rewriting causes Java heap corruption on AArch64 > > Signed-off-by: Justin King OK. Please make sure that this has bug reports for PPC and RISCV, and announce them to hotspot-dev. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27748#pullrequestreview-3341840431 From duke at openjdk.org Wed Oct 15 19:20:39 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 15 Oct 2025 19:20:39 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 In-Reply-To: References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> Message-ID: <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> On Wed, 15 Oct 2025 18:42:20 GMT, Vladimir Kozlov wrote: > Do you unload old method after coping and let GC do it normal way? When an nmethod is relocated the old is marked not entrant. Then yes it is unloaded normally by the GC. The issue is most likely the GC deciding not to unload it for whatever reason. I'll see if there is a more deterministic way to test this ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3407924235 From kvn at openjdk.org Wed Oct 15 19:33:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 15 Oct 2025 19:33:51 GMT Subject: RFR: 8369912: [TESTBUG] testlibrary_tests/template_framework/examples/TestExpressions.java fails with ArithmeticException: / by zero - forgot to respect Expression.info In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 14:34:52 GMT, Emanuel Peter wrote: > The test generates a test for each operator, like this: > > public static int primitiveConTest_185_compiled() { > return (989451435 % 0); > } > > However, some operators throw exceptions, just like here the `%`, when given a zero rhs argument. The expression already knows about that, we just need to generate try-catch statements in the code. > > Similarly, some operators do not always return deterministic results (different Nan, or precision). So we need to handle that too. > > Note: we already do all of that in the `test/hotspot/jtreg/compiler/igvn/ExpressionFuzzer.java`. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27824#pullrequestreview-3341954279 From kvn at openjdk.org Wed Oct 15 19:35:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 15 Oct 2025 19:35:46 GMT Subject: RFR: 8369881: C2: Unexpected node in SuperWord truncation: ReverseBytesS, ReverseBytesUS In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:46:23 GMT, Emanuel Peter wrote: > The fuzzer found the `ReverseByteS` case, and I checked all other `*.reverseBytes`, and found a failure with `Character.reverseBytes` as well. > > Adding them to the list, and added tests for both. > > Note, this is just another 2 boxes checked, there were many similar ones fixed, or on the way: > https://github.com/openjdk/jdk/pull/26827 > https://github.com/openjdk/jdk/pull/26334 > https://github.com/openjdk/jdk/pull/26494 > https://github.com/openjdk/jdk/pull/26423 Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27819#pullrequestreview-3341958642 From jcking at openjdk.org Wed Oct 15 19:49:48 2025 From: jcking at openjdk.org (Justin King) Date: Wed, 15 Oct 2025 19:49:48 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v3] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 18:55:00 GMT, Andrew Haley wrote: > OK. Please make sure that this has bug reports for PPC and RISCV, and announce them to hotspot-dev. When you talk about announce, are you just referring to sending a somewhat short and sweet email to hotspot-dev calling out the 3 issues (RISCV and PPC bugs to be created) and that they can result in Java heap corruption? Just want to make sure I am not annoying people. I'll file the RISCV and PPC bugs shortly or tomorrow morning, integrate this after, and pursue a backport to JDK 25. There was a code change in this path before JDK 25, IIRC, so the fix would have to be validated again for JDKs before that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3408009957 From kvn at openjdk.org Wed Oct 15 19:50:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 15 Oct 2025 19:50:45 GMT Subject: RFR: 8369642: [ubsan] nmethod::nmethod null pointer passed as argument 2 to memcpy [v2] In-Reply-To: References: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> Message-ID: <2yjOWIZKUvcgdK39DYuCGwqoZFdTL9PAPRn_nbnr9LU=.bd91278d-9587-4bbb-bb63-031038ce1a70@github.com> On Tue, 14 Oct 2025 22:33:40 GMT, Chad Rakoczy wrote: >> [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced a counter so that the nmethod immutable data can be shared between relocated nmethods to eliminate an unnecessary copy. The counter is aligned in memory so that must be taken into account when calculating the amount of memory used by the counter > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add reference counter offset Yes, this works too. Let me test it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27778#issuecomment-3408011734 From manc at openjdk.org Wed Oct 15 20:21:26 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 15 Oct 2025 20:21:26 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v3] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 14:54:36 GMT, Justin King wrote: >> Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. > > Justin King has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'openjdk:master' into aarch64-rewrite-bytecodes > - Suggestions from shipilev > > Signed-off-by: Justin King > - Remove trailing whitespace added by Github > > Signed-off-by: Justin King > - Update src/hotspot/cpu/aarch64/templateTable_aarch64.cpp > > Co-authored-by: Andrew Haley > - JDK-8369506: Bytecode rewriting causes Java heap corruption on AArch64 > > Signed-off-by: Justin King LG, but with a question about hoisting the `member(LoadLoad)`: https://bugs.openjdk.org/browse/JDK-8369506?focusedId=14825240&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14825240 ------------- Marked as reviewed by manc (Committer). PR Review: https://git.openjdk.org/jdk/pull/27748#pullrequestreview-3342096416 From valeriep at openjdk.org Wed Oct 15 20:48:45 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Wed, 15 Oct 2025 20:48:45 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v4] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 05:51:40 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Add remaining files to be staged src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 920: > 918: if (prevKey != null) { > 919: Arrays.fill(prevKey, (byte) 0); > 920: } Can be moved down to be right before `prevKey = key.clone()` call? This way, `sessionK` assignments are together and not separated by this call ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2433898660 From duke at openjdk.org Wed Oct 15 21:31:23 2025 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 15 Oct 2025 21:31:23 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v4] In-Reply-To: References: Message-ID: <7faVZ8RHcws5sOoA1rfBrE3f7ON__bKvBCra2R3rLNU=.dfd5d4f3-7506-4d99-9ba5-ccb0b4ca0184@github.com> On Wed, 15 Oct 2025 17:54:37 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Add remaining files to be staged > > test/micro/org/openjdk/bench/javax/crypto/AESDecrypt.java line 53: > >> 51: public void setup() throws Exception { >> 52: SecretKeySpec keySpec = new SecretKeySpec(new byte[]{-80, -103, -1, 68, -29, -94, 61, -52, 93, -59, -128, 105, 110, 88, 44, 105}, "AES"); >> 53: IvParameterSpec iv = new IvParameterSpec(new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}); > > Is the all-0s IV intentional? Yes, it's in keeping with the other benchmarks (e.g., test/micro/org/openjdk/bench/javax/crypto/AES.java). > test/micro/org/openjdk/bench/javax/crypto/AESDecrypt.java line 82: > >> 80: public byte[] testUseAesIntrinsics() throws Exception { >> 81: return cipher.doFinal(ct); >> 82: } > > These 3 methods look same to me except for the method names? The forked arguments will test different levels of optimizations: testBaseline: no optimizations testUseAes: AES optimizations testUseAesIntrinsics: AES machine instructions ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434006387 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434001817 From duke at openjdk.org Wed Oct 15 23:04:10 2025 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 15 Oct 2025 23:04:10 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v5] In-Reply-To: References: Message-ID: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Updates for code review comments from @valeriepeng ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/3fc25aef..f48160cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=03-04 Stats: 31 lines in 1 file changed: 6 ins; 10 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Wed Oct 15 23:04:11 2025 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 15 Oct 2025 23:04:11 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v4] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 17:55:58 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Add remaining files to be staged > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 43: > >> 41: * https://www.internationaljournalcorner.com/index.php/ijird_ojs/article/view/134688 >> 42: */ >> 43: public final class AES_Crypt extends SymmetricCipher { > > This internal class does not need to be public? I'd assume it's only used within the same package? You're right, it doesn't appear to be used externally. Fixed. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 920: > >> 918: if (prevKey != null) { >> 919: Arrays.fill(prevKey, (byte) 0); >> 920: } > > Can be moved down to be right before `prevKey = key.clone()` call? This way, `sessionK` assignments are together and not separated by this call It can be and I agree. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434164829 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434165068 From duke at openjdk.org Wed Oct 15 23:04:13 2025 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 15 Oct 2025 23:04:13 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v4] In-Reply-To: <8Hb0rpkClUOY9-wId7h9oVsyZx6Q1BfCKKEshQ8u6PA=.c3a9339b-0567-47bb-a2f2-92836e1af493@github.com> References: <-QG-DjJMOsPrJDY4lSgr0jOrcwEH_40FmvRgt4CQW90=.df69e912-a5e1-4574-a70d-b30e855c1dcc@github.com> <8Hb0rpkClUOY9-wId7h9oVsyZx6Q1BfCKKEshQ8u6PA=.c3a9339b-0567-47bb-a2f2-92836e1af493@github.com> Message-ID: On Wed, 15 Oct 2025 18:51:29 GMT, Valerie Peng wrote: >> src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 55: >> >>> 53: >>> 54: private static final int AES_256_ROUNDS = 14; >>> 55: private static final int AES_256_NKEYS = 32; >> >> The `AES_XXX_NKEYS` constants (valued 16, 24, 32) are also defined in `AESConstants` class, maybe we can just refer to that class instead of duplicate the definition here? > > Or, merge the values defined in `AESConstants` into this class. Either way is fine with me as long as no duplicated values. I've made the update that references the AESConstants to avoid duplication. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434164972 From valeriep at openjdk.org Wed Oct 15 23:31:03 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Wed, 15 Oct 2025 23:31:03 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v4] In-Reply-To: <7faVZ8RHcws5sOoA1rfBrE3f7ON__bKvBCra2R3rLNU=.dfd5d4f3-7506-4d99-9ba5-ccb0b4ca0184@github.com> References: <7faVZ8RHcws5sOoA1rfBrE3f7ON__bKvBCra2R3rLNU=.dfd5d4f3-7506-4d99-9ba5-ccb0b4ca0184@github.com> Message-ID: <-4wxYGqcPRXJYZEXd91_MPCBd491IFjd7TOnMP_HdxE=.060bb090-1084-4a3b-8315-2116e73e69df@github.com> On Wed, 15 Oct 2025 21:26:33 GMT, Shawn M Emery wrote: >> test/micro/org/openjdk/bench/javax/crypto/AESDecrypt.java line 82: >> >>> 80: public byte[] testUseAesIntrinsics() throws Exception { >>> 81: return cipher.doFinal(ct); >>> 82: } >> >> These 3 methods look same to me except for the method names? > > The forked arguments will test different levels of optimizations: > testBaseline: no optimizations > testUseAes: AES optimizations > testUseAesIntrinsics: AES machine instructions Ah, I see, some has "+" vs some uses "-". My eye sights are getting worse. (sigh) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434206251 From valeriep at openjdk.org Thu Oct 16 00:21:12 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Thu, 16 Oct 2025 00:21:12 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v5] In-Reply-To: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> References: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> Message-ID: <5aAa7y5hs9zHT05sehzUaDmSXY31QDQPj_ioD9V9nK8=.78641e57-e2df-4f8b-9c0b-35a56fbea7db@github.com> On Wed, 15 Oct 2025 23:04:10 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 896: > 894: if (key.length == AESConstants.AES_KEYSIZES[0]) { > 895: rounds = AES_128_ROUNDS; > 896: nk = AESConstants.AES_KEYSIZES[0]/WB; Looks like we can get rid of `nk` as the `genRKeys(byte[])` method can calculate/derive it based on the specified `key` argument, i.e. `key.length >> 2` or `key.length / WB` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434256434 From valeriep at openjdk.org Thu Oct 16 00:42:05 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Thu, 16 Oct 2025 00:42:05 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v5] In-Reply-To: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> References: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> Message-ID: On Wed, 15 Oct 2025 23:04:10 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 954: > 952: } > 953: w[i] = w[i - nk] ^ tmp; > 954: } Looks like most of these local variables can be removed? Since you are not changing the value of `len`, you can just use `WB`. `rW` is only used inside the if-block from line 944-948, so it can be declared on line 945. Line 946-948 can be merged on one line, e.g. `tmp = subByte(rW, SBOX) ^ RCON[(i / nk) - 1];` and no need for `subWord` and `g`. Same goes for line 950 and 951. Also, the value of `WB * (rounds + 1)` is used twice, this can be stored in a local variable say `wLen`, so it's only calculated once. Same goes for the `i * WB` value from line 937-940 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434276514 From valeriep at openjdk.org Thu Oct 16 00:45:03 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Thu, 16 Oct 2025 00:45:03 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v5] In-Reply-To: References: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> Message-ID: <2Qva3E5Ux0iHeg_Xx_yDV06gkjVmtIQL0J1aL2oSkmE=.28ead96c-2f65-4501-b44d-97898d032d6a@github.com> On Thu, 16 Oct 2025 00:38:09 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates for code review comments from @valeriepeng > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 954: > >> 952: } >> 953: w[i] = w[i - nk] ^ tmp; >> 954: } > > Looks like most of these local variables can be removed? Since you are not changing the value of `len`, you can just use `WB`. `rW` is only used inside the if-block from line 944-948, so it can be declared on line 945. Line 946-948 can be merged on one line, e.g. `tmp = subByte(rW, SBOX) ^ RCON[(i / nk) - 1];` and no need for `subWord` and `g`. Same goes for line 950 and 951. > Also, the value of `WB * (rounds + 1)` is used twice, this can be stored in a local variable say `wLen`, so it's only calculated once. > Same goes for the `i * WB` value from line 937-940 On the second thought, instead of calculating `i * WB` value, You can use another local variable to store this index and increment it by 4 for each iteration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434280527 From valeriep at openjdk.org Thu Oct 16 00:51:02 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Thu, 16 Oct 2025 00:51:02 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v5] In-Reply-To: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> References: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> Message-ID: On Wed, 15 Oct 2025 23:04:10 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 932: > 930: * @return w the cipher round keys. > 931: */ > 932: private int[] genRKeys(byte[] key, int nk) { nit: The spec names this "keyExpansion" and documents it under section 5.2. Could we include this in the method javadoc? Also, how about "genRoundKeys" or "doKeyExpansion" as method name? `nk` can be derived from `key` and maybe no need for extra argument? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434286447 From valeriep at openjdk.org Thu Oct 16 00:54:10 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Thu, 16 Oct 2025 00:54:10 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v5] In-Reply-To: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> References: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> Message-ID: On Wed, 15 Oct 2025 23:04:10 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 932: > 930: * @return w the cipher round keys. > 931: */ > 932: private int[] genRKeys(byte[] key, int nk) { This method can be static? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434289681 From dlong at openjdk.org Thu Oct 16 02:39:04 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 16 Oct 2025 02:39:04 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v4] In-Reply-To: References: Message-ID: On Sat, 11 Oct 2025 18:25:48 GMT, Francesco Andreuzzi wrote: >> I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. >> >> Passes tier1 and tier2 (fastdebug). > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > update foundOne src/hotspot/share/code/nmethod.cpp line 2599: > 2597: // nmethods that don't seem to be all that relevant any longer. > 2598: bool nmethod::is_cold() { > 2599: if (!MethodFlushing || (is_native_method() && is_in_use()) || is_not_installed()) { So I guess we need to decide what to do about native wrappers that are still "in use", but are "cold" because they haven't been called in a while. The above change would keep them around forever. We could instead allow them to be cleaned up like regular nmethods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27742#discussion_r2434399527 From xgong at openjdk.org Thu Oct 16 03:14:14 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 16 Oct 2025 03:14:14 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: <8cCO3_XSLXtzulIYd3AVHvzQzgkQ9CVVepy61I2QkiI=.8fdee0d5-520a-4987-9b55-cc1b559f37aa@github.com> References: <8cCO3_XSLXtzulIYd3AVHvzQzgkQ9CVVepy61I2QkiI=.8fdee0d5-520a-4987-9b55-cc1b559f37aa@github.com> Message-ID: On Wed, 15 Oct 2025 16:05:59 GMT, Paul Sandoz wrote: > I suspect it's likely more complex overall adding a slice operation to mask, that is really only needed for a specific case. (A more general operation would be compress/expand of the mask bits, but i don't believe there are hardware instructions for such operations on mask registers.) > Yes, I agree with you. Personally, I?d prefer not to introduce such APIs for a vector mask. > In my view adding a part parameter is a compromise and seems less complex that requiring N index vectors, and it fits with a general pattern we have around parts of the vector. It moves the specialized operation requirements on the mask into the area where it is needed rather than trying to generalize in a manner that i don't think is appropriate in the mask API. Yeah, it can sound reasonable that an API can finish a simple task and then choose to move the results to different part of a vector based on an offset. Consider `loadWithMap` is used as a VM interface, we have to add checks for the passed `origin` against the vector length. Besides, we have to support the same cross-lane shift for other vector types like int/long/double. I will prepare a prototype for this. Thanks for your inputs @PaulSandoz . ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3408999408 From duke at openjdk.org Thu Oct 16 04:04:23 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 16 Oct 2025 04:04:23 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v6] In-Reply-To: References: Message-ID: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Updates for code review comments from @valeriepeng ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/f48160cf..fbf2117f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=04-05 Stats: 28 lines in 1 file changed: 1 ins; 10 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Thu Oct 16 04:04:25 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 16 Oct 2025 04:04:25 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v5] In-Reply-To: <5aAa7y5hs9zHT05sehzUaDmSXY31QDQPj_ioD9V9nK8=.78641e57-e2df-4f8b-9c0b-35a56fbea7db@github.com> References: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> <5aAa7y5hs9zHT05sehzUaDmSXY31QDQPj_ioD9V9nK8=.78641e57-e2df-4f8b-9c0b-35a56fbea7db@github.com> Message-ID: On Thu, 16 Oct 2025 00:18:08 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates for code review comments from @valeriepeng > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 896: > >> 894: if (key.length == AESConstants.AES_KEYSIZES[0]) { >> 895: rounds = AES_128_ROUNDS; >> 896: nk = AESConstants.AES_KEYSIZES[0]/WB; > > Looks like we can get rid of `nk` as the `genRKeys(byte[])` method can calculate/derive it based on the specified `key` argument, i.e. `key.length >> 2` or `key.length / WB` Sounds reasonable. Fixed. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 932: > >> 930: * @return w the cipher round keys. >> 931: */ >> 932: private int[] genRKeys(byte[] key, int nk) { > > nit: The spec names this "keyExpansion" and documents it under section 5.2. Could we include this in the method javadoc? Also, how about "genRoundKeys" or "doKeyExpansion" as method name? `nk` can be derived from `key` and maybe no need for extra argument? Actually I used Stallings' cryptography book specifically for the round key concepts, hence the variable name mismatch at least for 128 bit keys. But after your suggestions is does look more like FIPS 192-upd 1 so I will reference the section ;) Fixed. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 932: > >> 930: * @return w the cipher round keys. >> 931: */ >> 932: private int[] genRKeys(byte[] key, int nk) { > > This method can be static if you pass the `rounds` value as a method argument. Yes and subWord() would also need to be static for this change. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434497086 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434497821 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434498001 From duke at openjdk.org Thu Oct 16 04:04:27 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 16 Oct 2025 04:04:27 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v5] In-Reply-To: <2Qva3E5Ux0iHeg_Xx_yDV06gkjVmtIQL0J1aL2oSkmE=.28ead96c-2f65-4501-b44d-97898d032d6a@github.com> References: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> <2Qva3E5Ux0iHeg_Xx_yDV06gkjVmtIQL0J1aL2oSkmE=.28ead96c-2f65-4501-b44d-97898d032d6a@github.com> Message-ID: On Thu, 16 Oct 2025 00:42:20 GMT, Valerie Peng wrote: >> src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 954: >> >>> 952: } >>> 953: w[i] = w[i - nk] ^ tmp; >>> 954: } >> >> Looks like most of these local variables can be removed? Since you are not changing the value of `len`, you can just use `WB`. `rW` is only used inside the if-block from line 944-948, so it can be declared on line 945. Line 946-948 can be merged on one line, e.g. `tmp = subByte(rW, SBOX) ^ RCON[(i / nk) - 1];` and no need for `subWord` and `g`. Same goes for line 950 and 951. >> Also, the value of `WB * (rounds + 1)` is used twice, this can be stored in a local variable say `wLen`, so it's only calculated once. >> Same goes for the `i * WB` value from line 937-940 > > On the second thought, instead of calculating `i * WB` value, You can use another local variable to store this index and increment it by 4 for each iteration. I've made these changes and used the 2nd approach for indexing key. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434497439 From dzhang at openjdk.org Thu Oct 16 04:13:01 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 16 Oct 2025 04:13:01 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: <6IdsiW4U906Jh8JvnFDuxF34FJn93aVpoRUwDn2RLoU=.ea104d38-ee56-4071-a793-3241d5624e10@github.com> On Wed, 15 Oct 2025 12:38:27 GMT, Martin Doerr wrote: >> The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue [JDK-8369511](https://bugs.openjdk.org/browse/JDK-8369511)). This is an alternative to https://github.com/openjdk/jdk/pull/27749. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add check if flag is available. Hi @TheRealMDoerr Thanks for the patch! I?ve tested on k1 and k230 (with RVV) as well as sg2042 (without RVV), and it looks good. ------------- Marked as reviewed by dzhang (Author). PR Review: https://git.openjdk.org/jdk/pull/27805#pullrequestreview-3342995338 From valeriep at openjdk.org Thu Oct 16 04:52:04 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Thu, 16 Oct 2025 04:52:04 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v5] In-Reply-To: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> References: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> Message-ID: On Wed, 15 Oct 2025 23:04:10 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1032: > 1030: * @return the substituted word. > 1031: */ > 1032: private int subByte(int state, byte[][] sub) { Given the input and output are both `int` type, i.e. word, maybe it's better named as `subWord` ? This also matches the pseudocode routine name used in the spec. This method also can be made static. It seems that `sub` is always the static `SBOX`, maybe we don't have to use an argument to pass it? nit: the variable name `state` is a bit misleading as we are only using part of it. A state is consisting of 4 words and the input here is only 1 word. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434562796 From shade at openjdk.org Thu Oct 16 04:57:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 16 Oct 2025 04:57:04 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v3] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 19:47:18 GMT, Justin King wrote: > When you talk about announce, are you just referring to sending a somewhat short and sweet email to hotspot-dev calling out the 3 issues (RISCV and PPC bugs to be created) and that they can result in Java heap corruption? There is no need to announce on hotspot-dev. I looked through the other arches, so: - ARM32 looks affected; @bulasevich, take note: [no bug filed yet] - PPC64 looks affected; @TheRealMDoerr, take note: https://bugs.openjdk.org/browse/JDK-8369946 - RISC-V looks affected; @RealFYang, take note: https://bugs.openjdk.org/browse/JDK-8369947 - S390 does not look affected; TSO handles this for us - x86 does not look affected; TSO handles this for us ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3409171412 From duke at openjdk.org Thu Oct 16 05:14:44 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 16 Oct 2025 05:14:44 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v7] In-Reply-To: References: Message-ID: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Updates for code review comments from @valeriepeng ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/fbf2117f..9f00c355 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=05-06 Stats: 12 lines in 1 file changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Thu Oct 16 05:14:44 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 16 Oct 2025 05:14:44 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v5] In-Reply-To: References: <-4R3yVX3ufogOLHqu2y6c2EGOPKmiy0rTuVwQoCk_SE=.942b296d-4540-4f57-ac5c-ff214d2985bc@github.com> Message-ID: On Thu, 16 Oct 2025 04:49:11 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates for code review comments from @valeriepeng > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1032: > >> 1030: * @return the substituted word. >> 1031: */ >> 1032: private int subByte(int state, byte[][] sub) { > > Given the input and output are both `int` type, i.e. word, maybe it's better named as `subWord` ? This also matches the pseudocode routine name used in the spec. > This method also can be made static. It seems that `sub` is always the static `SBOX`, maybe we don't have to use an argument to pass it? > nit: the variable name `state` is a bit misleading as we are only using part of it. A state is consisting of 4 words and the input here is only 1 word. Good, it was a byte operation, but evolved to a word. Last commit made it a static. Yes, before I switched over to a LUT for the inverse mix column transform of the inverse key expansion it needed both, but doesn't anymore. I'll switch from state to word then. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2434594049 From thartmann at openjdk.org Thu Oct 16 06:34:00 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 16 Oct 2025 06:34:00 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: References: Message-ID: <2n51aprDDBa2ELEaHNmrjKmBKuAe_XkjfVkGKXTwv_0=.d655878d-e47c-40fb-9311-abb1c332eb15@github.com> On Wed, 15 Oct 2025 08:17:46 GMT, Daniel Lund?n wrote: > The documentation of the signature part of method patterns in `java -XX:CompileCommand=help` is missing. We should add it. > > ### Changeset > - Improve the documentation of signatures in `java -XX:CompileCommand=help`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500483034) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Looks good to me. I'm fine with using "signature" but I don't have a strong opinion. src/hotspot/share/compiler/compilerOracle.cpp line 639: > 637: tty->print_cr(" package/Class.method,(Lpackage/Parameter;)Lpackage/Return;"); > 638: tty->cr(); > 639: tty->print_cr("The and accept leading and trailing *'s for wildcard"); Suggestion: tty->print_cr("The and accept leading and trailing '*' wildcards"); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27818#pullrequestreview-3343247716 PR Review Comment: https://git.openjdk.org/jdk/pull/27818#discussion_r2434724926 From epeter at openjdk.org Thu Oct 16 06:48:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 Oct 2025 06:48:21 GMT Subject: Integrated: 8369881: C2: Unexpected node in SuperWord truncation: ReverseBytesS, ReverseBytesUS In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:46:23 GMT, Emanuel Peter wrote: > The fuzzer found the `ReverseByteS` case, and I checked all other `*.reverseBytes`, and found a failure with `Character.reverseBytes` as well. > > Adding them to the list, and added tests for both. > > Note, this is just another 2 boxes checked, there were many similar ones fixed, or on the way: > https://github.com/openjdk/jdk/pull/26827 > https://github.com/openjdk/jdk/pull/26334 > https://github.com/openjdk/jdk/pull/26494 > https://github.com/openjdk/jdk/pull/26423 This pull request has now been integrated. Changeset: aa194c6a Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/aa194c6a5a21aca64d454e4c5eeed1464c8f190b Stats: 54 lines in 2 files changed: 53 ins; 0 del; 1 mod 8369881: C2: Unexpected node in SuperWord truncation: ReverseBytesS, ReverseBytesUS Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/27819 From epeter at openjdk.org Thu Oct 16 06:48:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 Oct 2025 06:48:20 GMT Subject: RFR: 8369881: C2: Unexpected node in SuperWord truncation: ReverseBytesS, ReverseBytesUS In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 09:15:32 GMT, Christian Hagedorn wrote: >> The fuzzer found the `ReverseByteS` case, and I checked all other `*.reverseBytes`, and found a failure with `Character.reverseBytes` as well. >> >> Adding them to the list, and added tests for both. >> >> Note, this is just another 2 boxes checked, there were many similar ones fixed, or on the way: >> https://github.com/openjdk/jdk/pull/26827 >> https://github.com/openjdk/jdk/pull/26334 >> https://github.com/openjdk/jdk/pull/26494 >> https://github.com/openjdk/jdk/pull/26423 > > Looks good, thanks for fixing this! @chhagedorn @vnkozlov Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27819#issuecomment-3409414465 From shade at openjdk.org Thu Oct 16 06:50:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 16 Oct 2025 06:50:06 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 [v3] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 20:19:04 GMT, Man Cao wrote: > LG, but with a question about hoisting the `member(LoadLoad)`: One can, but then you'll have to remember to put it in every `TemplateTable::fast_*` that might be accessing `RFE`. So putting it near `RFE` access itself looks more reliable, even though not exactly on point. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3409422649 From epeter at openjdk.org Thu Oct 16 06:50:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 Oct 2025 06:50:09 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v3] In-Reply-To: <99G1wPXEs1RoqnPlvNzteqd4Pf96pkqNilnaOPMiSgA=.608c0323-03ce-4e77-925b-ea3732ebbb0a@github.com> References: <99G1wPXEs1RoqnPlvNzteqd4Pf96pkqNilnaOPMiSgA=.608c0323-03ce-4e77-925b-ea3732ebbb0a@github.com> Message-ID: On Wed, 15 Oct 2025 15:52:08 GMT, Roland Westrelin wrote: >> Currently ReassociateInvariants is only enabled for int counted >> loops. I noticed, enabling it for long counted loops helps RCE. It >> also seems like something that would help any loop. I propose enabling >> it for all inner loops. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the updates, looks even better now :) test/hotspot/jtreg/compiler/loopopts/TestReassociateInvariants.java line 72: > 70: // removal of long counted loop. The long counted loop is > 71: // transformed into a loop nest with an inner int counted > 72: // loop. That one is empty and is removed. Sounds like we should file an RFE for long counted loop removal, right? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27666#pullrequestreview-3343291039 PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2434759865 From epeter at openjdk.org Thu Oct 16 06:54:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 Oct 2025 06:54:04 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: <3ID4WB3EuY7sqB0fkyseOAvD_HaM_j604HiOUITjUEE=.209330f9-5023-41ea-b5c5-a5ef0b00ebde@github.com> On Wed, 15 Oct 2025 12:38:27 GMT, Martin Doerr wrote: >> The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue [JDK-8369511](https://bugs.openjdk.org/browse/JDK-8369511)). This is an alternative to https://github.com/openjdk/jdk/pull/27749. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add check if flag is available. Looks reasonable. Sanity testing for commit 1 passed. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27805#pullrequestreview-3343302723 From mchevalier at openjdk.org Thu Oct 16 07:00:00 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 16 Oct 2025 07:00:00 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v3] In-Reply-To: References: Message-ID: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Bailout if IGVN ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27586/files - new: https://git.openjdk.org/jdk/pull/27586/files/d0458b2e..82a92172 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=01-02 Stats: 232 lines in 3 files changed: 229 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27586/head:pull/27586 PR: https://git.openjdk.org/jdk/pull/27586 From mhaessig at openjdk.org Thu Oct 16 07:01:43 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 16 Oct 2025 07:01:43 GMT Subject: RFR: 8369912: [TESTBUG] testlibrary_tests/template_framework/examples/TestExpressions.java fails with ArithmeticException: / by zero - forgot to respect Expression.info In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 14:34:52 GMT, Emanuel Peter wrote: > The test generates a test for each operator, like this: > > public static int primitiveConTest_185_compiled() { > return (989451435 % 0); > } > > However, some operators throw exceptions, just like here the `%`, when given a zero rhs argument. The expression already knows about that, we just need to generate try-catch statements in the code. > > Similarly, some operators do not always return deterministic results (different Nan, or precision). So we need to handle that too. > > Note: we already do all of that in the `test/hotspot/jtreg/compiler/igvn/ExpressionFuzzer.java`. Thank you for fixing this, @eme64. Looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/27824#pullrequestreview-3343323595 From mchevalier at openjdk.org Thu Oct 16 07:05:02 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 16 Oct 2025 07:05:02 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v3] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 07:00:00 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Bailout if IGVN Since we consider this invariant actually desirable, since we don't want to run IGVN every time we call `PhaseIdealLoop::fix_data_uses`, and since we shouldn't reinvent IGVN and `PhiNode`'s idealization to simplify the case, let's bailout if the invariant doesn't hold, while asserting that we are in this cleanable situation: both nodes are `PhiNodes` recorded for IGVN, and major progress has been made. I've tried checking more precisely that these nodes were created in loop cloning in the same loop opt round, but that started to be quite complex, likely more than it's worth it for just an assert. On all the reproducers, it just delayed the unroll until after IGVN and then, end up in the same state as if we ignored the assert. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3409460843 From chagedorn at openjdk.org Thu Oct 16 07:19:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 Oct 2025 07:19:13 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v3] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 07:00:00 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Bailout if IGVN That looks like a good compromise given it's quite an edge case. Looks good to me now, thanks for doing another iteration! > On all the reproducers, it just delayed the unroll until after IGVN and then, end up in the same state as if we ignored the assert. Just an idea: Could we also add an IR that just does some matching on phase `PHASE_AFTER_LOOP_UNROLLING`? If the phase was found (when wrongly bailing out each time), we should fail with a "no compilation output found" exception test/hotspot/jtreg/compiler/loopopts/TooStrictAssertForUnrollAfterStressPeeling.java line 37: > 35: * -XX:PerMethodTrapLimit=0 > 36: * compiler.loopopts.TooStrictAssertForUnrollAfterStressPeeling > 37: * @run driver compiler.loopopts.TooStrictAssertForUnrollAfterStressPeeling Should be `main` to allow passing flags to the run from the outside. Suggestion: * @run main compiler.loopopts.TooStrictAssertForUnrollAfterStressPeeling test/hotspot/jtreg/compiler/loopopts/TooStrictAssertForUnrollAfterStressPeeling2.java line 62: > 60: * @summary assert in do_unroll does not hold in some cases when peeling comes > 61: * just before unrolling. It seems to happen only with stress peeling > 62: * @run driver compiler.loopopts.TooStrictAssertForUnrollAfterStressPeeling2 Suggestion: * @run main compiler.loopopts.TooStrictAssertForUnrollAfterStressPeeling2 test/hotspot/jtreg/compiler/loopopts/TooStrictAssertForUnrollAfterStressPeeling3.java line 63: > 61: * @summary assert in do_unroll does not hold in some cases when peeling comes > 62: * just before unrolling. It seems to happen only with stress peeling > 63: * @run driver compiler.loopopts.TooStrictAssertForUnrollAfterStressPeeling3 Suggestion: * @run main compiler.loopopts.TooStrictAssertForUnrollAfterStressPeeling3 ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27586#pullrequestreview-3343345299 PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2434803482 PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2434804649 PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2434805100 From epeter at openjdk.org Thu Oct 16 07:21:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 Oct 2025 07:21:07 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v6] In-Reply-To: References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: On Tue, 14 Oct 2025 11:59:27 GMT, Roland Westrelin wrote: >> This change refactor code that's similar for LShiftINode and >> LShiftLNode into shared methods. I also added extra test cases to >> cover all transformations. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Testing passed -> approved ? Thanks for the work @rwestrel ! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27725#pullrequestreview-3343391826 From roland at openjdk.org Thu Oct 16 07:26:26 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 Oct 2025 07:26:26 GMT Subject: RFR: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal [v2] In-Reply-To: References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: <8wMmXdDYSsHL7olSC92AmkyXDEEXCMZgGAJO4tRQwPQ=.3475c9c8-f18e-4b8c-9fed-806e5b318ee5@github.com> On Tue, 14 Oct 2025 09:19:33 GMT, Marc Chevalier wrote: >>> An idea (not a suggestion, just something that crossed my mind, take it more as a thought experiment): we could also parametrize everything not with a `BasicType` parameter but a template parameter (since `IdealIL` and co are invoked with literal values). It wouldn't change much, but for instance it would allow to replace the assert in `java_shift_left` and friends with static checks (I have a bias toward static checks). >> >> I wondered about that too. There are many more methods that are parameterized by a `BasicType`. They would have to all go through that transition. > >> They would have to all go through that transition. > > For consistency yes. But yet, I think I recall some functions that are not called with a compile-time constant, so we can't do that everywhere. Technically, calling a function that takes it as parameter from the templated version, and just passing our template argument is fine. What is not (easily) possible is normal parameter -> template. But again, that was just "for fun". @marc-chevalier @eme64 thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27725#issuecomment-3409531060 From roland at openjdk.org Thu Oct 16 07:26:27 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 Oct 2025 07:26:27 GMT Subject: Integrated: 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal In-Reply-To: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> References: <8uDjc9ZiSz2ecT3_fLcx2R21eIGBNyUSa7BvbkU_CCY=.d7552391-c42a-4de8-9577-dec5913048ff@github.com> Message-ID: On Thu, 9 Oct 2025 13:16:13 GMT, Roland Westrelin wrote: > This change refactor code that's similar for LShiftINode and > LShiftLNode into shared methods. I also added extra test cases to > cover all transformations. This pull request has now been integrated. Changeset: 7fe06657 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/7fe066573004a525673e4ec55df6783b13bfc189 Stats: 616 lines in 6 files changed: 334 ins; 172 del; 110 mod 8369167: C2: refactor LShiftINode/LShiftLNode Value/Identity/Ideal Reviewed-by: epeter, mchevalier ------------- PR: https://git.openjdk.org/jdk/pull/27725 From rcastanedalo at openjdk.org Thu Oct 16 07:37:13 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 16 Oct 2025 07:37:13 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 15:19:42 GMT, Daniel Lund?n wrote: > > Thanks for doing this, Daniel! Would it be possible to use the more precise term "method descriptor" instead of "signature" in the help message? > > Yes, I agree and did consider this as well after consulting the JVM spec. I let "signature" remain as that is what is used currently and also seems to be the terminology used in the code. I'll wait a bit for more comments before committing to changing it. Thanks! To elaborate a bit further on my proposed change, I think using "method descriptor" all trough the help message is a bit clearer because 1) it removes one term (and hence the need to explain the connection between "signature" and "method descriptor" in https://github.com/openjdk/jdk/pull/27818/files#diff-80400270ae0db6c776055d9fd5ab13b909d2db8a5dde8df46063cb54b1c3c0d3R644-R646) and 2) in the context of Java, a "signature" tends to refer to the parameter types only, excluding return type (see e.g. https://en.wikipedia.org/wiki/Type_signature#Java_2 or https://docs.oracle.com/javase/tutorial/java/javaOO/methods.html). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27818#issuecomment-3409570437 From roland at openjdk.org Thu Oct 16 07:38:31 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 Oct 2025 07:38:31 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types In-Reply-To: <2VDf9Od04bxB3b5rBeV9O00SX1dNUjTZ4XOK364n1do=.34541815-662d-4e11-8b6d-705eb3633d63@github.com> References: <2VDf9Od04bxB3b5rBeV9O00SX1dNUjTZ4XOK364n1do=.34541815-662d-4e11-8b6d-705eb3633d63@github.com> Message-ID: On Mon, 13 Oct 2025 14:00:54 GMT, Emanuel Peter wrote: >> Currently ReassociateInvariants is only enabled for int counted >> loops. I noticed, enabling it for long counted loops helps RCE. It >> also seems like something that would help any loop. I propose enabling >> it for all inner loops. > > BTW: testing passed! @eme64 @merykitty thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27666#issuecomment-3409571639 From roland at openjdk.org Thu Oct 16 07:38:33 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 Oct 2025 07:38:33 GMT Subject: RFR: 8369258: C2: enable ReassociateInvariants for all loop types [v3] In-Reply-To: References: <99G1wPXEs1RoqnPlvNzteqd4Pf96pkqNilnaOPMiSgA=.608c0323-03ce-4e77-925b-ea3732ebbb0a@github.com> Message-ID: <0TgDByyfIl9lTLBpPSwIv4uT0NmirLSVTkK54rOtxak=.24b4fefa-c526-4370-bea9-939075149a2c@github.com> On Thu, 16 Oct 2025 06:46:41 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > test/hotspot/jtreg/compiler/loopopts/TestReassociateInvariants.java line 72: > >> 70: // removal of long counted loop. The long counted loop is >> 71: // transformed into a loop nest with an inner int counted >> 72: // loop. That one is empty and is removed. > > Sounds like we should file an RFE for long counted loop removal, right? https://bugs.openjdk.org/browse/JDK-8369976 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27666#discussion_r2434883237 From roland at openjdk.org Thu Oct 16 07:38:35 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 Oct 2025 07:38:35 GMT Subject: Integrated: 8369258: C2: enable ReassociateInvariants for all loop types In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 07:36:41 GMT, Roland Westrelin wrote: > Currently ReassociateInvariants is only enabled for int counted > loops. I noticed, enabling it for long counted loops helps RCE. It > also seems like something that would help any loop. I propose enabling > it for all inner loops. This pull request has now been integrated. Changeset: ff6a0170 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ff6a0170f0ab5cfb4af6d6a4a779451823c486d6 Stats: 463 lines in 6 files changed: 267 ins; 190 del; 6 mod 8369258: C2: enable ReassociateInvariants for all loop types Reviewed-by: epeter, qamai ------------- PR: https://git.openjdk.org/jdk/pull/27666 From jbhateja at openjdk.org Thu Oct 16 07:47:24 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 16 Oct 2025 07:47:24 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v4] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 - Fix jtreg, one less spill - Updating as per reivew suggestions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 - Some refactoring - 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions ------------- Changes: https://git.openjdk.org/jdk/pull/26283/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=03 Stats: 90 lines in 3 files changed: 70 ins; 8 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From chagedorn at openjdk.org Thu Oct 16 07:53:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 Oct 2025 07:53:08 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v4] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 12:47:36 GMT, Damon Fenacci wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> review Emanuel > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java line 337: > >> 335: Asserts.assertTrue(stdErr.contains("Scenario flags: [-XX:-UseNewCode, -XX:-UseNewCode2]")); >> 336: Asserts.assertTrue(stdErr.contains("Scenario flags: [-XX:+UseNewCode, -XX:-UseNewCode2]")); >> 337: Asserts.assertTrue(stdErr.contains("Scenario flags: [-XX:-UseNewCode, -XX:+UseNewCode2]")); > > This might be partially redundant with the full stop in the first assert above but maybe it would be worth checking that we don't have any additional "Scenario flags:..." string. Good idea, I added a scenario count. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27672#discussion_r2434925355 From chagedorn at openjdk.org Thu Oct 16 07:58:03 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 Oct 2025 07:58:03 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v5] In-Reply-To: References: Message-ID: > The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. > > #### Reduce Execution Time by not Executing the Scenarios > I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. > > To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. > > #### Changes > - Verification without actually running scenarios. > - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. > - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. > - Refactored the test a little more. > - Refactored some small things in `addCrossProductScenarios()` while looking at it. > - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. > > #### Execution Time Comparison > Measured on my local machine: > - Mainline: ~80s > - With patch: ~2-3s > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: add scenario count ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27672/files - new: https://git.openjdk.org/jdk/pull/27672/files/b6d18b59..2763bcb9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27672&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27672&range=03-04 Stats: 13 lines in 1 file changed: 13 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27672/head:pull/27672 PR: https://git.openjdk.org/jdk/pull/27672 From dlunden at openjdk.org Thu Oct 16 08:10:52 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 16 Oct 2025 08:10:52 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns [v2] In-Reply-To: References: Message-ID: > The documentation of the signature part of method patterns in `java -XX:CompileCommand=help` is missing. We should add it. > > ### Changeset > - Improve the documentation of signatures in `java -XX:CompileCommand=help`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500483034) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/compiler/compilerOracle.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27818/files - new: https://git.openjdk.org/jdk/pull/27818/files/687bbfbb..b9ade2d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27818&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27818&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27818/head:pull/27818 PR: https://git.openjdk.org/jdk/pull/27818 From dlunden at openjdk.org Thu Oct 16 08:19:41 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 16 Oct 2025 08:19:41 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 07:34:46 GMT, Roberto Casta?eda Lozano wrote: > 2) in the context of Java, a "signature" tends to refer to the parameter types only, excluding return type Additionally, the method signature also includes the method name itself (unlike the method descriptor). From the Java 25 spec: > 8.4.2 Method Signature > Two methods or constructors, M and N, have the same signature if they have the > same **name**, the same type parameters (if any) (?8.4.4), and, after adapting the > formal parameter types of N to the type parameters of M, the same formal parameter > types. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27818#issuecomment-3409734943 From rcastanedalo at openjdk.org Thu Oct 16 08:26:41 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 16 Oct 2025 08:26:41 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: References: Message-ID: <5jCNRiwxsr3kB6qEHfuQ45VAwZAs8EZtt4dr0FagCn8=.87f123ca-1a1f-4ee2-9c81-74c14f97fc2a@github.com> On Thu, 16 Oct 2025 08:17:05 GMT, Daniel Lund?n wrote: > > 2. in the context of Java, a "signature" tends to refer to the parameter types only, excluding return type > > Additionally, the method signature also includes the method name itself (unlike the method descriptor). From the Java 25 spec: > > > 8.4.2 Method Signature > > Two methods or constructors, M and N, have the same signature if they have the > > same **name**, the same type parameters (if any) (?8.4.4), and, after adapting the > > formal parameter types of N to the type parameters of M, the same formal parameter > > types. This also raises the question about the usefulness of matching method return types, but that's out of the scope of this RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27818#issuecomment-3409756281 From dlunden at openjdk.org Thu Oct 16 08:37:12 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 16 Oct 2025 08:37:12 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: <5jCNRiwxsr3kB6qEHfuQ45VAwZAs8EZtt4dr0FagCn8=.87f123ca-1a1f-4ee2-9c81-74c14f97fc2a@github.com> References: <5jCNRiwxsr3kB6qEHfuQ45VAwZAs8EZtt4dr0FagCn8=.87f123ca-1a1f-4ee2-9c81-74c14f97fc2a@github.com> Message-ID: On Thu, 16 Oct 2025 08:23:10 GMT, Roberto Casta?eda Lozano wrote: > This also raises the question about the usefulness of matching method return types, but that's out of the scope of this RFE. Right, that's a good point: in Java you cannot have methods with the same name and parameter types, but different return value types. I don't know if that property also translates to the JVM level. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27818#issuecomment-3409797303 From mchevalier at openjdk.org Thu Oct 16 09:28:49 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 16 Oct 2025 09:28:49 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v4] In-Reply-To: References: Message-ID: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: driver -> main ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27586/files - new: https://git.openjdk.org/jdk/pull/27586/files/82a92172..a47f6f70 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=02-03 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27586/head:pull/27586 PR: https://git.openjdk.org/jdk/pull/27586 From mdoerr at openjdk.org Thu Oct 16 09:41:17 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 Oct 2025 09:41:17 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 12:38:27 GMT, Martin Doerr wrote: >> The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue [JDK-8369511](https://bugs.openjdk.org/browse/JDK-8369511)). This is an alternative to https://github.com/openjdk/jdk/pull/27749. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add check if flag is available. Thank you for all the reviews and for testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27805#issuecomment-3410036844 From roland at openjdk.org Thu Oct 16 09:42:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 Oct 2025 09:42:54 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< We already transform: (LShiftX (AddX a con0), con1) into (AddX (LShiftX a con1) con0< References: Message-ID: On Tue, 14 Oct 2025 18:21:45 GMT, Martin Doerr wrote: > The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue [JDK-8369511](https://bugs.openjdk.org/browse/JDK-8369511)). This is an alternative to https://github.com/openjdk/jdk/pull/27749. This pull request has now been integrated. Changeset: 6e911d81 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/6e911d819efa0f14ab1f9009b5bf325d99edb26c Stats: 22 lines in 2 files changed: 21 ins; 0 del; 1 mod 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 Reviewed-by: dzhang, epeter, rrich ------------- PR: https://git.openjdk.org/jdk/pull/27805 From duke at openjdk.org Thu Oct 16 10:29:49 2025 From: duke at openjdk.org (erifan) Date: Thu, 16 Oct 2025 10:29:49 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: <1XWk0HdpU9jS-XXku5vjDepQUcp-q_YaBpGWIrHYYkg=.6043859f-7560-40ca-bd02-77b598afc8e6@github.com> On Thu, 16 Oct 2025 09:38:49 GMT, Martin Doerr wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add check if flag is available. > > Thank you for all the reviews and for testing! @TheRealMDoerr Please consider whether my comment is reasonable. Also, @eme64 tested your commit 1, but then you pushed commit 2. So commit 1 is no longer the latest code. Although I think commit 2 will likely pass the test as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27805#issuecomment-3410206619 From shade at openjdk.org Thu Oct 16 11:54:05 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 16 Oct 2025 11:54:05 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 08:30:17 GMT, Andrew Haley wrote: >>> > Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR` guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. >>> >>> How does this follow? We need some sort of happens-before relationship on the reader side to make sure that the resolved field entry is observed. I guess this PR relies on a control dependency between reading the patched bytecode and executing the code that reads the resolved field entry. >> >> Yes, that is what the PR relies upon. However we are still discussing internally on whether that is enough as we would rather not have a repeat N years down the line as hardware advances. I left this in draft until we figure it out and will poke you once we are more confident. >> >> The AArch64 docs are not super clear. It does have this sentence: `A store-release guarantees that all earlier memory accesses are visible before the store-release becomes visible and that the store is visible to all parts of the system capable of storing cached data at the same time.` Which to me, a long with other terminology, seems to imply the two STLRs are enough. But I wouldn't bet money on it. > >> But I wouldn't bet money on it. > > B2.3.6, _Dependency relations, Control dependency_, gives you what you need on the reader side. @theRealAph -- oddly, I have GH notifications about some of your recent suggestions in comments, but I do not see them in PR itself. Maybe you need to "publish" the review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3410497658 From qamai at openjdk.org Thu Oct 16 12:14:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 16 Oct 2025 12:14:44 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: On Thu, 16 Oct 2025 09:36:03 GMT, Roland Westrelin wrote: > We already transform: > > (LShiftX (AddX a con0), con1) into (AddX (LShiftX a con1) con0< > THis is a variant with SubX. I found that this helps RCE. LGTM src/hotspot/share/opto/mulnode.cpp line 1096: > 1094: // Left input is a sub from a constant? > 1095: const TypeInteger* t11 = phase->type(add1->in(1))->isa_integer(bt); > 1096: if (t11 && t11->is_con()) { `t11 != nullptr` src/hotspot/share/opto/mulnode.cpp line 1098: > 1096: if (t11 && t11->is_con()) { > 1097: // Compute X << con0 > 1098: Node *lsh = phase->transform(LShiftNode::make(add1->in(2), in(2), bt)); `Node* lsh` ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/27842#pullrequestreview-3344504227 PR Review Comment: https://git.openjdk.org/jdk/pull/27842#discussion_r2435668987 PR Review Comment: https://git.openjdk.org/jdk/pull/27842#discussion_r2435668174 From dlunden at openjdk.org Thu Oct 16 12:29:30 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 16 Oct 2025 12:29:30 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns [v3] In-Reply-To: References: Message-ID: > The documentation of the signature part of method patterns in `java -XX:CompileCommand=help` is missing. We should add it. > > ### Changeset > - Improve the documentation of signatures in `java -XX:CompileCommand=help`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500483034) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Change from signature to descriptor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27818/files - new: https://git.openjdk.org/jdk/pull/27818/files/b9ade2d4..9a920ae3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27818&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27818&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27818/head:pull/27818 PR: https://git.openjdk.org/jdk/pull/27818 From dlunden at openjdk.org Thu Oct 16 12:29:30 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 16 Oct 2025 12:29:30 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: <5jCNRiwxsr3kB6qEHfuQ45VAwZAs8EZtt4dr0FagCn8=.87f123ca-1a1f-4ee2-9c81-74c14f97fc2a@github.com> References: <5jCNRiwxsr3kB6qEHfuQ45VAwZAs8EZtt4dr0FagCn8=.87f123ca-1a1f-4ee2-9c81-74c14f97fc2a@github.com> Message-ID: <-EGkvgL92q1I7THjuR4QSCCnm1FZC5yl9OtbpIHZg50=.e4b8006c-c7b6-441b-907c-3094c9306b63@github.com> On Thu, 16 Oct 2025 08:23:10 GMT, Roberto Casta?eda Lozano wrote: >>> 2) in the context of Java, a "signature" tends to refer to the parameter types only, excluding return type >> >> Additionally, the method signature also includes the method name itself (unlike the method descriptor). From the Java 25 spec: >> >>> 8.4.2 Method Signature >>> Two methods or constructors, M and N, have the same signature if they have the >>> same **name**, the same type parameters (if any) (?8.4.4), and, after adapting the >>> formal parameter types of N to the type parameters of M, the same formal parameter >>> types. > >> > 2. in the context of Java, a "signature" tends to refer to the parameter types only, excluding return type >> >> Additionally, the method signature also includes the method name itself (unlike the method descriptor). From the Java 25 spec: >> >> > 8.4.2 Method Signature >> > Two methods or constructors, M and N, have the same signature if they have the >> > same **name**, the same type parameters (if any) (?8.4.4), and, after adapting the >> > formal parameter types of N to the type parameters of M, the same formal parameter >> > types. > > This also raises the question about the usefulness of matching method return types, but that's out of the scope of this RFE. @robcasloz: Now updated from "signature" to "descriptor" (instead of "method descriptor", for brevity). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27818#issuecomment-3410660415 From rcastanedalo at openjdk.org Thu Oct 16 12:41:14 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 16 Oct 2025 12:41:14 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns [v3] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 12:29:30 GMT, Daniel Lund?n wrote: >> The documentation of the signature part of method patterns in `java -XX:CompileCommand=help` is missing. We should add it. >> >> ### Changeset >> - Improve the documentation of signatures in `java -XX:CompileCommand=help`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500483034) >> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Change from signature to descriptor Looks good, thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27818#pullrequestreview-3344637463 From mchevalier at openjdk.org Thu Oct 16 12:48:24 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 16 Oct 2025 12:48:24 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v4] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 09:28:49 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > driver -> main It'd be nice to have such a test, but a lot of other transformations are happening successfully before, including some unrolling. I don't think it is possible, or at least not so directly, to write such a test with the current capabilities of the IR framework. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3410733313 From roland at openjdk.org Thu Oct 16 13:18:01 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 Oct 2025 13:18:01 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: > We already transform: > > (LShiftX (AddX a con0), con1) into (AddX (LShiftX a con1) con0< > THis is a variant with SubX. I found that this helps RCE. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27842/files - new: https://git.openjdk.org/jdk/pull/27842/files/3ab4b313..8b581911 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27842&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27842&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27842.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27842/head:pull/27842 PR: https://git.openjdk.org/jdk/pull/27842 From roland at openjdk.org Thu Oct 16 13:18:03 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 16 Oct 2025 13:18:03 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: On Thu, 16 Oct 2025 12:10:59 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/mulnode.cpp line 1096: > >> 1094: // Left input is a sub from a constant? >> 1095: const TypeInteger* t11 = phase->type(add1->in(1))->isa_integer(bt); >> 1096: if (t11 && t11->is_con()) { > > `t11 != nullptr` Done in new commit. > src/hotspot/share/opto/mulnode.cpp line 1098: > >> 1096: if (t11 && t11->is_con()) { >> 1097: // Compute X << con0 >> 1098: Node *lsh = phase->transform(LShiftNode::make(add1->in(2), in(2), bt)); > > `Node* lsh` Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27842#discussion_r2435868061 PR Review Comment: https://git.openjdk.org/jdk/pull/27842#discussion_r2435867489 From chagedorn at openjdk.org Thu Oct 16 14:12:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 Oct 2025 14:12:49 GMT Subject: RFR: 8369804: TestGenerators.java fails with IllegalArgumentException: bound must be greater than origin [v3] In-Reply-To: References: Message-ID: <3SmKyPRg0Vnfa_EQCozB48hnKU0NPKY6IAV3uyhcwMg=.795576cf-5524-4fca-8386-572b6711053d@github.com> On Wed, 15 Oct 2025 13:06:41 GMT, Emanuel Peter wrote: >> We sample two floats, assuming we would get two different results. But ever so rarely, we get the same values, and the test fails. >> >> So now, I sample with retry. And also improve the error reporting, throwing an exception on generator construction rather than sampling from the generator. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27816#pullrequestreview-3345090627 From epeter at openjdk.org Thu Oct 16 14:13:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 Oct 2025 14:13:03 GMT Subject: Integrated: 8369912: [TESTBUG] testlibrary_tests/template_framework/examples/TestExpressions.java fails with ArithmeticException: / by zero - forgot to respect Expression.info In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 14:34:52 GMT, Emanuel Peter wrote: > The test generates a test for each operator, like this: > > public static int primitiveConTest_185_compiled() { > return (989451435 % 0); > } > > However, some operators throw exceptions, just like here the `%`, when given a zero rhs argument. The expression already knows about that, we just need to generate try-catch statements in the code. > > Similarly, some operators do not always return deterministic results (different Nan, or precision). So we need to handle that too. > > Note: we already do all of that in the `test/hotspot/jtreg/compiler/igvn/ExpressionFuzzer.java`. This pull request has now been integrated. Changeset: 5dfe115c Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/5dfe115ce1fbcff67777518a3c23a7560ebec423 Stats: 30 lines in 1 file changed: 23 ins; 0 del; 7 mod 8369912: [TESTBUG] testlibrary_tests/template_framework/examples/TestExpressions.java fails with ArithmeticException: / by zero - forgot to respect Expression.info Reviewed-by: kvn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/27824 From chagedorn at openjdk.org Thu Oct 16 14:15:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 Oct 2025 14:15:06 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v4] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 09:28:49 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > driver -> main Marked as reviewed by chagedorn (Reviewer). That's unfortunate but I guess there is not much we can do at the moment. So, this looks good! ------------- PR Review: https://git.openjdk.org/jdk/pull/27586#pullrequestreview-3345104410 PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3411103455 From dfenacci at openjdk.org Thu Oct 16 14:16:11 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 16 Oct 2025 14:16:11 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v5] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 07:58:03 GMT, Christian Hagedorn wrote: >> The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. >> >> #### Reduce Execution Time by not Executing the Scenarios >> I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. >> >> To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. >> >> #### Changes >> - Verification without actually running scenarios. >> - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. >> - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. >> - Refactored the test a little more. >> - Refactored some small things in `addCrossProductScenarios()` while looking at it. >> - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. >> >> #### Execution Time Comparison >> Measured on my local machine: >> - Mainline: ~80s >> - With patch: ~2-3s >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > add scenario count LGTM. Thanks for adding the count @chhagedorn! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/27672#pullrequestreview-3345114695 From epeter at openjdk.org Thu Oct 16 14:13:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 Oct 2025 14:13:02 GMT Subject: RFR: 8369912: [TESTBUG] testlibrary_tests/template_framework/examples/TestExpressions.java fails with ArithmeticException: / by zero - forgot to respect Expression.info In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 06:58:40 GMT, Manuel H?ssig wrote: >> The test generates a test for each operator, like this: >> >> public static int primitiveConTest_185_compiled() { >> return (989451435 % 0); >> } >> >> However, some operators throw exceptions, just like here the `%`, when given a zero rhs argument. The expression already knows about that, we just need to generate try-catch statements in the code. >> >> Similarly, some operators do not always return deterministic results (different Nan, or precision). So we need to handle that too. >> >> Note: we already do all of that in the `test/hotspot/jtreg/compiler/igvn/ExpressionFuzzer.java`. > > Thank you for fixing this, @eme64. Looks good to me. @mhaessig @vnkozlov Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27824#issuecomment-3411082237 From epeter at openjdk.org Thu Oct 16 14:24:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 Oct 2025 14:24:40 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v5] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 07:58:03 GMT, Christian Hagedorn wrote: >> The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. >> >> #### Reduce Execution Time by not Executing the Scenarios >> I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. >> >> To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. >> >> #### Changes >> - Verification without actually running scenarios. >> - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. >> - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. >> - Refactored the test a little more. >> - Refactored some small things in `addCrossProductScenarios()` while looking at it. >> - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. >> >> #### Execution Time Comparison >> Measured on my local machine: >> - Mainline: ~80s >> - With patch: ~2-3s >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > add scenario count LGTM ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27672#pullrequestreview-3345170335 From epeter at openjdk.org Thu Oct 16 14:24:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 Oct 2025 14:24:51 GMT Subject: RFR: 8369804: TestGenerators.java fails with IllegalArgumentException: bound must be greater than origin [v3] In-Reply-To: <3SmKyPRg0Vnfa_EQCozB48hnKU0NPKY6IAV3uyhcwMg=.795576cf-5524-4fca-8386-572b6711053d@github.com> References: <3SmKyPRg0Vnfa_EQCozB48hnKU0NPKY6IAV3uyhcwMg=.795576cf-5524-4fca-8386-572b6711053d@github.com> Message-ID: On Thu, 16 Oct 2025 14:10:20 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn @TobiHartmann Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27816#issuecomment-3411152997 From epeter at openjdk.org Thu Oct 16 14:24:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 16 Oct 2025 14:24:53 GMT Subject: Integrated: 8369804: TestGenerators.java fails with IllegalArgumentException: bound must be greater than origin In-Reply-To: References: Message-ID: <0nYVh577KMOgrRI2gSu_WsWnVroFrMs_c7geSqByMTM=.f9ef2d01-d436-4eed-b3ff-f1c50a0fef2e@github.com> On Wed, 15 Oct 2025 08:07:51 GMT, Emanuel Peter wrote: > We sample two floats, assuming we would get two different results. But ever so rarely, we get the same values, and the test fails. > > So now, I sample with retry. And also improve the error reporting, throwing an exception on generator construction rather than sampling from the generator. This pull request has now been integrated. Changeset: f2a99832 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/f2a998326a6bebd4a7d2d0a39f785b2e6dac68c4 Stats: 22 lines in 3 files changed: 16 ins; 0 del; 6 mod 8369804: TestGenerators.java fails with IllegalArgumentException: bound must be greater than origin Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/27816 From fandreuzzi at openjdk.org Thu Oct 16 14:39:18 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Thu, 16 Oct 2025 14:39:18 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v4] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 02:36:11 GMT, Dean Long wrote: > We could instead allow them to be cleaned up like regular nmethods. That sounds reasonable to me, native methods seem to be tracked like all other nmethods. Removing `is_native_method()` altogether from the condition was the first implementation I had, and as far as I remember there was no failure in tier1 or tier2. Should I propose this alternative implementation as part of this PR? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27742#discussion_r2436249273 From dlunden at openjdk.org Thu Oct 16 15:05:20 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 16 Oct 2025 15:05:20 GMT Subject: RFR: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:48:36 GMT, Anton Seoane Ampudia wrote: >> The documentation of the signature part of method patterns in `java -XX:CompileCommand=help` is missing. We should add it. >> >> ### Changeset >> - Improve the documentation of signatures in `java -XX:CompileCommand=help`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500483034) >> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Nice documentation! LGTM > > As a side note, looking at your line splits I got curious whether we are actively enforcing an 80-char limit in some output or not (the last two "paragraphs" exceed this size, although they've been there from before) GHA is green after the change. Going ahead with integration. Thanks for the reviews @anton-seoane @robcasloz @TobiHartmann! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27818#issuecomment-3411343545 From dlunden at openjdk.org Thu Oct 16 15:05:22 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 16 Oct 2025 15:05:22 GMT Subject: Integrated: 8369573: Add missing compile commands help documentation for the signature part of method patterns In-Reply-To: References: Message-ID: <8EHhD4rJ9DhB-NlrEqAHetmH3PJHjEY_vPfUtprusHk=.6ab33d11-d1c7-400c-8b2f-70a4f030e0f7@github.com> On Wed, 15 Oct 2025 08:17:46 GMT, Daniel Lund?n wrote: > The documentation of the signature part of method patterns in `java -XX:CompileCommand=help` is missing. We should add it. > > ### Changeset > - Improve the documentation of signatures in `java -XX:CompileCommand=help`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500483034) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. This pull request has now been integrated. Changeset: 303eb109 Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/303eb1096ccaf06106aa080b9ea0553c0f6912dd Stats: 37 lines in 1 file changed: 29 ins; 3 del; 5 mod 8369573: Add missing compile commands help documentation for the signature part of method patterns Reviewed-by: rcastanedalo, aseoane, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/27818 From fjiang at openjdk.org Thu Oct 16 15:31:56 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 16 Oct 2025 15:31:56 GMT Subject: RFR: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V Message-ID: As discussed in https://github.com/openjdk/jdk/pull/27748#pullrequestreview-3341840431, the same issue occurs with the RISC-V port. ------------- Commit messages: - Bytecode rewriting causes Java heap corruption on RISC-V Changes: https://git.openjdk.org/jdk/pull/27850/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27850&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369947 Stats: 22 lines in 3 files changed: 21 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27850.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27850/head:pull/27850 PR: https://git.openjdk.org/jdk/pull/27850 From chagedorn at openjdk.org Thu Oct 16 16:05:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 Oct 2025 16:05:23 GMT Subject: RFR: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out [v5] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 14:13:51 GMT, Damon Fenacci wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> add scenario count > > LGTM. Thanks for adding the count @chhagedorn! Thanks for your careful reviews @dafedafe and @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27672#issuecomment-3411582474 From chagedorn at openjdk.org Thu Oct 16 16:05:24 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 16 Oct 2025 16:05:24 GMT Subject: Integrated: 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 11:03:13 GMT, Christian Hagedorn wrote: > The test ` testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java` intermittently timed out in our CI. On my local machine, I measured ~80s which is quite long given that we actually only want to test that Cartesian product for scenarios work. > > #### Reduce Execution Time by not Executing the Scenarios > I had a closer look at the test to try to cut the execution time down. Currently, we are executing many IR framework runs with different Cartesian products for scenarios. Afterwards, we compare the output of IR matching to check if the computation of the Cartesian products were correct. However, since we are only interested in verifying that the Cartesian product computation for scenarios works (i.e. `addCrossProductScenarios()`), we could skip the actual execution of the scenarios itself - we can trust that the IR framework is already tested well enough. > > To achieve that, we can use reflection to get the added scenarios to the IR framework (I don't want to add a public accessor because a user should not need access to them) and then fetch the corresponding scenario flags and compare against our expectation. That's what I propose with this change. > > #### Changes > - Verification without actually running scenarios. > - Added a test passing 3 sets to `addCrossProductScenarios()` which was missing before. > - Improved `addScenarios()` where we added a scenario to the list even though it already existed. That normally does not matter because we are throwing a `TestFormatException` anyway afterwards. But it messes with the test: We are adding the duplicated scenario and then read it again in the verification part of the test. > - Refactored the test a little more. > - Refactored some small things in `addCrossProductScenarios()` while looking at it. > - Added a sentence about passing a single set to `addCrossProductScenarios()` which was not evidently clear what is happening when looking at the method comment. > > #### Execution Time Comparison > Measured on my local machine: > - Mainline: ~80s > - With patch: ~2-3s > > Thanks, > Christian This pull request has now been integrated. Changeset: e56db377 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/e56db37734aa7cbc0f20ba3fc469f51224f288fa Stats: 318 lines in 2 files changed: 219 ins; 20 del; 79 mod 8369232: testlibrary_tests/ir_framework/tests/TestScenariosCrossProduct.java timed out Reviewed-by: dfenacci, epeter ------------- PR: https://git.openjdk.org/jdk/pull/27672 From aph at openjdk.org Thu Oct 16 19:12:00 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 16 Oct 2025 19:12:00 GMT Subject: RFR: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 08:30:17 GMT, Andrew Haley wrote: >>> > Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR` guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. >>> >>> How does this follow? We need some sort of happens-before relationship on the reader side to make sure that the resolved field entry is observed. I guess this PR relies on a control dependency between reading the patched bytecode and executing the code that reads the resolved field entry. >> >> Yes, that is what the PR relies upon. However we are still discussing internally on whether that is enough as we would rather not have a repeat N years down the line as hardware advances. I left this in draft until we figure it out and will poke you once we are more confident. >> >> The AArch64 docs are not super clear. It does have this sentence: `A store-release guarantees that all earlier memory accesses are visible before the store-release becomes visible and that the store is visible to all parts of the system capable of storing cached data at the same time.` Which to me, a long with other terminology, seems to imply the two STLRs are enough. But I wouldn't bet money on it. > >> But I wouldn't bet money on it. > > B2.3.6, _Dependency relations, Control dependency_, gives you what you need on the reader side. > @theRealAph -- oddly, I have GH notifications about some of your recent suggestions in comments, but I do not see them in PR itself. Maybe you need to "publish" the review? No, I made some mistakes so deleted the comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27748#issuecomment-3412482106 From jcking at openjdk.org Thu Oct 16 20:01:50 2025 From: jcking at openjdk.org (Justin King) Date: Thu, 16 Oct 2025 20:01:50 GMT Subject: Integrated: 8369506: Bytecode rewriting causes Java heap corruption on AArch64 In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 16:21:17 GMT, Justin King wrote: > Fix JDK-8369506 by adding `STLR` when updating the bytecode. Additionally I added a quick debug only check which verifies the field offset we get from `ResolvedFieldEntry` in `TemplateTable::fast_*` will not clobber the header or Klass pointer. The added `STLR`, a long with the already existing `DMB ISHLD` in `InterpreterMacroAssembler::load_field_entry`, guarantees that the fully filled out `ResolvedFieldEntry` is observable if the patched bytecode is observable. We do not need to add `LDAR` for bytecode loading or `LDAR` in `TemplateTable::fast_*` for that reason. If another observer happens to observe a `0` field offset, its guaranteed then that they will also observe the non-patched bytecode which will ultimately end up doing the resolution again, which is okay. This pull request has now been integrated. Changeset: 18fd0477 Author: Justin King URL: https://git.openjdk.org/jdk/commit/18fd04770294e27011bd576b5ea5fe43fa03e5e3 Stats: 25 lines in 3 files changed: 22 ins; 0 del; 3 mod 8369506: Bytecode rewriting causes Java heap corruption on AArch64 Co-authored-by: Man Cao Co-authored-by: Chuck Rasbold Reviewed-by: shade, aph, manc ------------- PR: https://git.openjdk.org/jdk/pull/27748 From valeriep at openjdk.org Thu Oct 16 20:04:35 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Thu, 16 Oct 2025 20:04:35 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v7] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 05:14:44 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1040: > 1038: * @param p [in] the plaintext to be encrypted. > 1039: * @param po [in] the plaintext offset in the array of bytes. > 1040: * @param c [out] the encrypted ciphertext output. nit: ciphertext already implied to be encrypted. Maybe no need for the "encrypted" adj. src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1157: > 1155: ti3 = T0[a3 >>> 24] ^ T1[(a0 >> 16) & 0xFF] > 1156: ^ T2[(a1 >> 8) & 0xFF] ^ T3[a2 & 0xFF] ^ K[w + 7]; > 1157: w += 8; No need for w, since you already checked the `rounds` value, you can directly reference K inside this block, i.e. K[40] - K[47]. Same goes for the next block for AES-256, i.e. directly reference K[48]-K[55]. src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1195: > 1193: ^ T3[(ti0 >> 16) & 0xFF] & 0xFF0000 > 1194: ^ T0[(ti1 >> 8) & 0xFF] & 0xFF00 > 1195: ^ T1[ti2 & 0xFF] & 0xFF ^ K[w+3]; Here you always use the last 4 elements of `K`, so you can just use `w = K.length - 4` and no need to keep tracking it in the earlier 2 blocks. src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1220: > 1218: * @param c [in] the ciphertext to be decrypted. > 1219: * @param co [in] the ciphertext offset in the array of bytes. > 1220: * @param p [out] the decrypted plaintext output. nit: same comment for removing "decrypted" adj. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2437316942 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2437308268 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2437313179 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2437325831 From duke at openjdk.org Thu Oct 16 20:04:37 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 16 Oct 2025 20:04:37 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v7] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 19:55:12 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates for code review comments from @valeriepeng > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1157: > >> 1155: ti3 = T0[a3 >>> 24] ^ T1[(a0 >> 16) & 0xFF] >> 1156: ^ T2[(a1 >> 8) & 0xFF] ^ T3[a2 & 0xFF] ^ K[w + 7]; >> 1157: w += 8; > > No need for w, since you already checked the `rounds` value, you can directly reference K inside this block, i.e. K[40] - K[47]. Same goes for the next block for AES-256, i.e. directly reference K[48]-K[55]. I would still need w for lines 1180 - 1195 though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2437324027 From duke at openjdk.org Thu Oct 16 20:22:20 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 16 Oct 2025 20:22:20 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: References: Message-ID: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Updates for code review comments from @valeriepeng ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/9f00c355..a5991a2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=06-07 Stats: 42 lines in 1 file changed: 0 ins; 4 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Thu Oct 16 20:22:22 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 16 Oct 2025 20:22:22 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v7] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 19:58:40 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates for code review comments from @valeriepeng > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1040: > >> 1038: * @param p [in] the plaintext to be encrypted. >> 1039: * @param po [in] the plaintext offset in the array of bytes. >> 1040: * @param c [out] the encrypted ciphertext output. > > nit: ciphertext already implied to be encrypted. Maybe no need for the "encrypted" adj. Agreed. Fixed. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1195: > >> 1193: ^ T3[(ti0 >> 16) & 0xFF] & 0xFF0000 >> 1194: ^ T0[(ti1 >> 8) & 0xFF] & 0xFF00 >> 1195: ^ T1[ti2 & 0xFF] & 0xFF ^ K[w+3]; > > Here you always use the last 4 elements of `K`, so you can just use `w = K.length - 4` and no need to keep tracking it in the earlier 2 blocks. Agreed. I've changed decryption as well. Fixed. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1220: > >> 1218: * @param c [in] the ciphertext to be decrypted. >> 1219: * @param co [in] the ciphertext offset in the array of bytes. >> 1220: * @param p [out] the decrypted plaintext output. > > nit: same comment for removing "decrypted" adj. Agreed. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2437361415 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2437361126 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2437362009 From vlivanov at openjdk.org Thu Oct 16 21:33:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 16 Oct 2025 21:33:29 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v16] In-Reply-To: References: Message-ID: <3K5p6eKg2XuH_g5d6iNyTjoBvn0cZ5VvH5o63H_SLqA=.58a444e0-1958-46d5-bcce-cb08718a908a@github.com> > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - update - Merge remote-tracking branch 'origin/master' into 8290892.rf - Merge branch 'master' into 8290892.rf - scalarization support - Remove comment - Add PreserveReachabilityFencesOnConstants test - Minor fix - minor fixes - Fix guaranteed_safepoint usage - update - ... and 14 more: https://git.openjdk.org/jdk/compare/c9cbd31f...d80aee5d ------------- Changes: https://git.openjdk.org/jdk/pull/25315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=15 Stats: 1504 lines in 38 files changed: 1442 ins; 20 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From duke at openjdk.org Thu Oct 16 23:20:32 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 16 Oct 2025 23:20:32 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v48] In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 19:52:49 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [x] Linux x64 fastdebug tier 1/2/3/4 >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix race when not installed nmethod is deoptimized > - Fix NMethodRelocationTest.java logging race > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Refactor JVMTI test > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Lock nmethod::relocate behind experimental flag > - Use CompiledICLocker instead of CompiledIC_lock > - ... and 104 more: https://git.openjdk.org/jdk/compare/012e079d...104661c6 nmethod relocation functionality is utilized in https://github.com/openjdk/jdk/pull/27858 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3413201911 From dlong at openjdk.org Thu Oct 16 23:43:02 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 16 Oct 2025 23:43:02 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v4] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 14:36:06 GMT, Francesco Andreuzzi wrote: >> src/hotspot/share/code/nmethod.cpp line 2599: >> >>> 2597: // nmethods that don't seem to be all that relevant any longer. >>> 2598: bool nmethod::is_cold() { >>> 2599: if (!MethodFlushing || (is_native_method() && is_in_use()) || is_not_installed()) { >> >> So I guess we need to decide what to do about native wrappers that are still "in use", but are "cold" because they haven't been called in a while. The above change would keep them around forever. We could instead allow them to be cleaned up like regular nmethods. > >> We could instead allow them to be cleaned up like regular nmethods. > > That sounds reasonable to me, native methods seem to be tracked like all other nmethods. > > Removing `is_native_method()` altogether from the condition was the first implementation I had, and as far as I remember there was no failure in tier1 or tier2. Should I propose this alternative implementation as part of this PR? I am tempted to say yes, for consistency, but it probably won't make much of a difference either way. But now I am wondering, if these cold native wrappers continue to be immortal, then do they really need to give them nmethod entry barriers? Removing the barrier could remove some overhead. Whatever direction we decide to go, it would be good to add a comment here explaining the decision and/or trade-offs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27742#discussion_r2437762613 From duke at openjdk.org Fri Oct 17 01:25:19 2025 From: duke at openjdk.org (erifan) Date: Fri, 17 Oct 2025 01:25:19 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: <5sceaMXEJ9MNrBNfvWhKPhlYmosdY9IZ1KdxfxnDSqc=.95f1ed2e-da92-4aca-bd47-95a7bdf2e6ff@github.com> On Wed, 15 Oct 2025 12:38:27 GMT, Martin Doerr wrote: >> The test `VectorMaskCompareNotTest` requires 16 Byte vectors (or larger). If the machine only uses 8 Byte vectors, we get an exception in the static initializer because the code tries to use a 4 Byte vector which is unsupported (stack trace: see JBS issue [JDK-8369511](https://bugs.openjdk.org/browse/JDK-8369511)). This is an alternative to https://github.com/openjdk/jdk/pull/27749. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add check if flag is available. test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 38: > 36: * @summary test combining vector not operation with compare > 37: * @modules jdk.incubator.vector > 38: * @requires vm.opt.final.MaxVectorSize == "null" | vm.opt.final.MaxVectorSize >= 16 Should be Suggestion: * @requires vm.opt.final.MaxVectorSize == "null" & vm.opt.final.MaxVectorSize >= 16 or Suggestion: * @requires vm.compiler2.enabled & vm.opt.final.MaxVectorSize >= 16 ? Assume this test is run with another compiler (like Graal) that doesn't support the option `MaxVectorSize`, then `vm.opt.final.MaxVectorSize == "null"` holds. But this can't guarantee that the max vector size >= 16. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27805#discussion_r2434435929 From duke at openjdk.org Fri Oct 17 01:25:20 2025 From: duke at openjdk.org (erifan) Date: Fri, 17 Oct 2025 01:25:20 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: <5sceaMXEJ9MNrBNfvWhKPhlYmosdY9IZ1KdxfxnDSqc=.95f1ed2e-da92-4aca-bd47-95a7bdf2e6ff@github.com> References: <5sceaMXEJ9MNrBNfvWhKPhlYmosdY9IZ1KdxfxnDSqc=.95f1ed2e-da92-4aca-bd47-95a7bdf2e6ff@github.com> Message-ID: On Thu, 16 Oct 2025 03:09:24 GMT, erifan wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add check if flag is available. > > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 38: > >> 36: * @summary test combining vector not operation with compare >> 37: * @modules jdk.incubator.vector >> 38: * @requires vm.opt.final.MaxVectorSize == "null" | vm.opt.final.MaxVectorSize >= 16 > > Should be > Suggestion: > > * @requires vm.opt.final.MaxVectorSize == "null" & vm.opt.final.MaxVectorSize >= 16 > > > or > Suggestion: > > * @requires vm.compiler2.enabled & vm.opt.final.MaxVectorSize >= 16 > > ? > > Assume this test is run with another compiler (like Graal) that doesn't support the option `MaxVectorSize`, then `vm.opt.final.MaxVectorSize == "null"` holds. But this can't guarantee that the max vector size >= 16. I forgot to submit this comment. I thought I submitted it two days ago. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27805#discussion_r2437969594 From valeriep at openjdk.org Fri Oct 17 05:44:04 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Fri, 17 Oct 2025 05:44:04 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: References: Message-ID: <2_eqasQo7DtbnrxwxuFYvl_yhVh7P6wzlxwPEl_DB-Q=.ff81a933-2c29-4d69-826b-d1ccf04d2e1c@github.com> On Thu, 16 Oct 2025 20:22:20 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 911: > 909: } > 910: sessionK[0] = genRoundKeys(key, rounds); > 911: sessionK[1] = invGenRoundKeys(); Given the decryption round keys are somewhat based on the encryption round keys, we could combine these two methods into one, e.g. private static int[][] genRoundKeys(byte[] key, int rounds) { int[][] ks = new int[2][]; // key schedule int wLen = (rounds + 1) * WB; int nk = key.length / WB; // generate the round keys for encryption int[] w = new int[wLen]; for (int i = 0, j = 0; i < nk; i++, j+=4) { w[i] = ((key[j] & 0xFF) << 24) | ((key[j + 1] & 0xFF) << 16) | ((key[j + 2] & 0xFF) << 8) | (key[j + 3] & 0xFF); } for (int i = nk; i < wLen; i++) { int tmp = w[i - 1]; if (i % nk == 0) { int rW = (tmp << 8) & 0xFFFFFF00 | (tmp >>> 24); tmp = subWord(rW) ^ RCON[(i / nk) - 1]; } else if ((nk > 6) && ((i % nk) == WB)) { tmp = subWord(tmp); } w[i] = w[i - nk] ^ tmp; } ks[0] = w; // generate the decryption round keys based on encryption ones int[] dw = new int[wLen]; int[] temp = new int[WB]; // Intrinsics requires the inverse key expansion to be reverse order // except for the first and last round key as the first two round keys // are without a mix column transform. for (int i = 1; i < rounds; i++) { System.arraycopy(w, i * WB, temp, 0, WB); invMixRKey(temp); System.arraycopy(temp, 0, dw, wLen - (i * WB), WB); } // dw[0...3] <- w[0...3] AND dw[4...7] <- w[(wLen - 4)...(wLen -1)] System.arraycopy(w, 0, dw, 0, WB); System.arraycopy(w, wLen - WB, dw, WB, WB); ks[1] = dw; Arrays.fill(temp, 0); return ks; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2438441223 From valeriep at openjdk.org Fri Oct 17 06:17:06 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Fri, 17 Oct 2025 06:17:06 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 20:22:20 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 958: > 956: * @return the processed round key row. > 957: */ > 958: private static int invMix(int[] state, int idx) { It seems that we can just use an `int` argument and make the callers do the array dereferencing. This way we can get rid of the temporary buffer inside `invMixRKey(int[])` as passing an integer to `invMix(int)` method will not affect the array, e.g. private static void invMixRKey(int[] state) { state[0] = invMix(state[0]); state[1] = invMix(state[1]); state[2] = invMix(state[2]); state[3] = invMix(state[3]); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2438501716 From valeriep at openjdk.org Fri Oct 17 06:30:02 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Fri, 17 Oct 2025 06:30:02 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: References: Message-ID: <56BA-l6ZrdSY0IRWxXme_AM_CWW_vtBSrzYqIA4oZaE=.b9011356-c376-455a-8964-4534f9db6035@github.com> On Thu, 16 Oct 2025 20:22:20 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 976: > 974: * @param state [in, out] the round key for inverse mix column processing. > 975: */ > 976: private static void invMixRKey(int[] state) { nit: name the method "invMixColumns(int[])". This name matches the spec psuedo code and goes better with the "state" argument name. Or use "invMixRoundKey(int[] roundKey)"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2438537870 From duke at openjdk.org Fri Oct 17 06:56:43 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 17 Oct 2025 06:56:43 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v9] In-Reply-To: References: Message-ID: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Updates for code review comments from @valeriepeng ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/a5991a2f..2ce35b97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=07-08 Stats: 47 lines in 1 file changed: 7 ins; 39 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Fri Oct 17 06:56:46 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 17 Oct 2025 06:56:46 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: <2_eqasQo7DtbnrxwxuFYvl_yhVh7P6wzlxwPEl_DB-Q=.ff81a933-2c29-4d69-826b-d1ccf04d2e1c@github.com> References: <2_eqasQo7DtbnrxwxuFYvl_yhVh7P6wzlxwPEl_DB-Q=.ff81a933-2c29-4d69-826b-d1ccf04d2e1c@github.com> Message-ID: <5SGZrrpgKtrf7IKtfEDPFb4LnggKNoeaZpFVGuHu-p4=.1ec8dd26-e125-4c8a-ba22-9006c7da4024@github.com> On Fri, 17 Oct 2025 05:41:25 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates for code review comments from @valeriepeng > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 911: > >> 909: } >> 910: sessionK[0] = genRoundKeys(key, rounds); >> 911: sessionK[1] = invGenRoundKeys(); > > Given the decryption round keys are somewhat based on the encryption round keys, we could combine these two methods into one, e.g. > > private static int[][] genRoundKeys(byte[] key, int rounds) { > int[][] ks = new int[2][]; // key schedule > > int wLen = (rounds + 1) * WB; > int nk = key.length / WB; > > // generate the round keys for encryption > int[] w = new int[wLen]; > for (int i = 0, j = 0; i < nk; i++, j+=4) { > w[i] = ((key[j] & 0xFF) << 24) > | ((key[j + 1] & 0xFF) << 16) > | ((key[j + 2] & 0xFF) << 8) > | (key[j + 3] & 0xFF); > } > for (int i = nk; i < wLen; i++) { > int tmp = w[i - 1]; > if (i % nk == 0) { > int rW = (tmp << 8) & 0xFFFFFF00 | (tmp >>> 24); > tmp = subWord(rW) ^ RCON[(i / nk) - 1]; > } else if ((nk > 6) && ((i % nk) == WB)) { > tmp = subWord(tmp); > } > w[i] = w[i - nk] ^ tmp; > } > ks[0] = w; > > // generate the decryption round keys based on encryption ones > int[] dw = new int[wLen]; > int[] temp = new int[WB]; > > // Intrinsics requires the inverse key expansion to be reverse order > // except for the first and last round key as the first two round keys > // are without a mix column transform. > for (int i = 1; i < rounds; i++) { > System.arraycopy(w, i * WB, temp, 0, WB); > invMixRKey(temp); > System.arraycopy(temp, 0, dw, wLen - (i * WB), WB); > } > // dw[0...3] <- w[0...3] AND dw[4...7] <- w[(wLen - 4)...(wLen -1)] > System.arraycopy(w, 0, dw, 0, WB); > System.arraycopy(w, wLen - WB, dw, WB, WB); > ks[1] = dw; > Arrays.fill(temp, 0); > > return ks; > } These two methods were only the few that I was able to make that were compact and singular in purpose (gen round key, gen inverse round key) code as the coding style guidelines espouse. The rest of the methods' construction were dictated by performance improvements, where compactness came at the cost of interpreter speed. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 958: > >> 956: * @return the processed round key row. >> 957: */ >> 958: private static int invMix(int[] state, int idx) { > > It seems that we can just use an `int` argument and make the callers do the array dereferencing. This way we can get rid of the temporary buffer inside `invMixRKey(int[])` as passing an integer to `invMix(int)` method will not affect the array, e.g. > > private static void invMixRKey(int[] state) { > state[0] = invMix(state[0]); > state[1] = invMix(state[1]); > state[2] = invMix(state[2]); > state[3] = invMix(state[3]); > } I've removed this method and inlined this logic in the invGenRoundKeys method. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 976: > >> 974: * @param state [in, out] the round key for inverse mix column processing. >> 975: */ >> 976: private static void invMixRKey(int[] state) { > > nit: name the method "invMixColumns(int[])". This name matches the spec psuedo code and goes better with the "state" argument name. Or use "invMixRoundKey(int[] roundKey)"? I've removed this method and inlined this logic in the invGenRoundKeys method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2438587221 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2438587085 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2438586207 From fyang at openjdk.org Fri Oct 17 07:06:08 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 17 Oct 2025 07:06:08 GMT Subject: RFR: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V In-Reply-To: References: Message-ID: <6k8bFDES2UHSq3PJv8oj0sDdTDpBKnHMcifoA2szXSw=.73e161eb-6847-4632-8d20-78296ff091ec@github.com> On Thu, 16 Oct 2025 15:23:28 GMT, Feilong Jiang wrote: > As discussed in https://github.com/openjdk/jdk/pull/27748#pullrequestreview-3341840431, the same issue occurs with the RISC-V port. > > Testing: > > - [x] tier1 - tier4 linux-riscv64 fastdebug Hi, I am having some difficulty in understanding the issue. @shipilev @theRealAph : For the aarch64 counterpart, shouldn't the `ldarb` at [1] prevent the reordering of `STR` of PBC and `STLR` of RFE? It's a load instruction with acquire semantics. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp#L200 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27850#issuecomment-3414146472 From valeriep at openjdk.org Fri Oct 17 07:07:11 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Fri, 17 Oct 2025 07:07:11 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: <5SGZrrpgKtrf7IKtfEDPFb4LnggKNoeaZpFVGuHu-p4=.1ec8dd26-e125-4c8a-ba22-9006c7da4024@github.com> References: <2_eqasQo7DtbnrxwxuFYvl_yhVh7P6wzlxwPEl_DB-Q=.ff81a933-2c29-4d69-826b-d1ccf04d2e1c@github.com> <5SGZrrpgKtrf7IKtfEDPFb4LnggKNoeaZpFVGuHu-p4=.1ec8dd26-e125-4c8a-ba22-9006c7da4024@github.com> Message-ID: <3IhmbTDiDNPdMTe_K1OZx6sC67UGjObzOXwX8Ekp7pA=.0e742e44-4dba-4680-8f24-7321f8516071@github.com> On Fri, 17 Oct 2025 06:52:39 GMT, Shawn M Emery wrote: >> src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 958: >> >>> 956: * @return the processed round key row. >>> 957: */ >>> 958: private static int invMix(int[] state, int idx) { >> >> It seems that we can just use an `int` argument and make the callers do the array dereferencing. This way we can get rid of the temporary buffer inside `invMixRKey(int[])` as passing an integer to `invMix(int)` method will not affect the array, e.g. >> >> private static void invMixRKey(int[] state) { >> state[0] = invMix(state[0]); >> state[1] = invMix(state[1]); >> state[2] = invMix(state[2]); >> state[3] = invMix(state[3]); >> } > > I've removed this method and inlined this logic in the invGenRoundKeys method. Sure, this works as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2438612714 From shade at openjdk.org Fri Oct 17 07:17:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 17 Oct 2025 07:17:04 GMT Subject: RFR: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V In-Reply-To: <6k8bFDES2UHSq3PJv8oj0sDdTDpBKnHMcifoA2szXSw=.73e161eb-6847-4632-8d20-78296ff091ec@github.com> References: <6k8bFDES2UHSq3PJv8oj0sDdTDpBKnHMcifoA2szXSw=.73e161eb-6847-4632-8d20-78296ff091ec@github.com> Message-ID: <9tOiCAzqT5S7rt0Md8PUFy8s0az97uQVK2DPTPIo7Wo=.03840263-1742-42dd-9cf1-a867f0b9cff4@github.com> On Fri, 17 Oct 2025 07:01:49 GMT, Fei Yang wrote: > @shipilev @theRealAph : For the aarch64 counterpart, shouldn't the `ldarb` at [1] prevent the reordering of `STR` of PBC and `STLR` of RFE? It's a load instruction with acquire semantics. Yes, I was confused about this myself. A key thing for this particular issue: the _reader_ we need to sync up with is not `patch_bytecode`, it is the thread that _executes_ the patched bytecode. In other words, the _writer_ is `patch_bytecode`, and _reader_ is executing thread. So acquire barrier in `patch_bytecode` does not help this case, because it is a write path, it needs release. The read path needs some other synchronization for acquire-like semantics; in aarch64 we reasoned the control dependency on bytecode itself and the barrier in RFE resolution is enough to do this. See my writeup here: https://bugs.openjdk.org/browse/JDK-8369506?focusedId=14824157#comment-14824157 -- and the comments after it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27850#issuecomment-3414168731 From mdoerr at openjdk.org Fri Oct 17 08:32:13 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 17 Oct 2025 08:32:13 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: <5sceaMXEJ9MNrBNfvWhKPhlYmosdY9IZ1KdxfxnDSqc=.95f1ed2e-da92-4aca-bd47-95a7bdf2e6ff@github.com> Message-ID: On Fri, 17 Oct 2025 01:21:59 GMT, erifan wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 38: >> >>> 36: * @summary test combining vector not operation with compare >>> 37: * @modules jdk.incubator.vector >>> 38: * @requires vm.opt.final.MaxVectorSize == "null" | vm.opt.final.MaxVectorSize >= 16 >> >> Should be >> Suggestion: >> >> * @requires vm.opt.final.MaxVectorSize == "null" & vm.opt.final.MaxVectorSize >= 16 >> >> >> or >> Suggestion: >> >> * @requires vm.compiler2.enabled & vm.opt.final.MaxVectorSize >= 16 >> >> ? >> >> Assume this test is run with another compiler (like Graal) that doesn't support the option `MaxVectorSize`, then `vm.opt.final.MaxVectorSize == "null"` holds. But this can't guarantee that the max vector size >= 16. > > I forgot to submit this comment. I thought I submitted it two days ago. I thought someone may want to run the test in a configuration without C2. So, I didn't want to change the test for such cases. I leave such decisions to others. Maybe @dougxc has an opinion about Graal? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27805#discussion_r2438845933 From epeter at openjdk.org Fri Oct 17 08:44:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 17 Oct 2025 08:44:45 GMT Subject: RFR: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands Message-ID: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> **TLDR** `is_NaN` -> `is_zero`, just like the code comment says. Thanks to @mhaessig for debugging the ARM32 bug below. He found the buggy line of code. ---------------------------------------- **Details** It seems there is a little "typo" (logic error) in `MemPointerParser::canonicalize_raw_summands` that slipped through the cracks in https://github.com/openjdk/jdk/pull/24278. The JavaFuzzer now found an example, and independently the issue was also reported on ARM32 [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578). Filtering out `NaN` instead of `zero` for the `scaleL` has two manifestations: - If `scaleL` is zero, but does not get filtered out even though it should be: we hit the assert in `MemPointerSummand` constructor, `assert(!_scale.is_zero(), "non-zero scale");`. - See [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578), though those tests seem to only fail on ARM32, and nowhere else. - I was able to construct a `MemorySegment` regression test, see `TestMemorySegmentFilterSummands.test1`. I suspect that the ARM32 failures happened on an array, as it failed in places like `BigInteger::implMultiplyToLen`. But now I was able to reproduce it with native memory, to get a pointer expression that has the same cancellation issue. - If `scaleL` is `NaN`, and gets filtered even though it should not be: We get a non-trivial MemPointer that is missing a summand. So we will succeed in optimizing, but with wrong assumptions. We generate a runtime aliasing check that is incorrect, leading to wrong results. - This was reported by the fuzzer, see attached `TestDoNotFilterNaNSummands`. - I was also able to create a simpler example with `MemorySegments`, see attached `TestMemorySegmentFilterSummands.test2`. **Why did this slip through the cracks?** In https://github.com/openjdk/jdk/pull/24278 I added pretty extensive testing, even fuzzer style tests, see `TestAliasingFuzzer.java`. But I think all of those tests exercise `scale` that are in "nice" [int ranges](https://github.com/openjdk/jdk/pull/24278/files#diff-26de03e864a492fe8aa8178818968f2097b99cf36a763605e2fb11fbc04eedffR303-R322). Also the JavaFuzzer does not directly generate such long constants for array accesses (not possible without Unsafe I think), we were lucky that it generated the index with `%` that got optimized to some magic long constant. There is already an RFE filed for improvements to `TestAliasingFuzzer.java`: [JDK-8365985](https://bugs.openjdk.org/browse/JDK-8365985). I now added some extra comments to the test it self, in the "future work section". -------------------------------- **More Background Info** I've been asked this a few times now. "Where do the NaN summands get detected eventually?". So let me explain the pipeline: - `MemPointerParser::parse` creates a `MemPointer`. A `MemPointer` cannot be (in)valid. Rather, it can either be "trivial" `1 * pointer`, or non-trivial (some actual decomposition with multiple summands). - We create the raw summands, that may contain `NaN` or `zero`. - `canonicalize_raw_summands`: sorts and combines raw summands. Zero summands need to be filtered out. - `create_summands`: turn raw summands into (regular) summands. The only "surprise" that could happen here is that we have an overflow on `_con` addition, leading to a `NaN`. - `canonicalize_summands`: sort and combine summands. When adding scales, we could introduce `NaN` again. Zero summands are filtered out again. - Call `MemPointer::make` to create a summand. We could now have `_con = NaN` or a scale that is `NaN`. - `MemPointer::make`: checks `has_no_NaN_in_con_and_summands`, and if that fails, we just create a trivial `MemPointer`, which is alway correct, but ultimately not helpful for optimizations, so probably no vectorization from that happens. I hope that helps :) ------------------------------------- **More thoughts** I had always thought that large constant offsets are very rare. And I still think so. But the fuzzer case shows that long constants could arise from IGVN optimizations as well. If you think that long-constants in pointers are important: file an RFE, and show me some use-cases that are important for performance. ------------- Commit messages: - add comments to TestAliasingFuzzer.java - typo - add fuzzer test - test improvements and fix - second test - rename test - JDK-8369902 Changes: https://git.openjdk.org/jdk/pull/27848/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27848&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369902 Stats: 253 lines in 4 files changed: 251 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27848/head:pull/27848 PR: https://git.openjdk.org/jdk/pull/27848 From epeter at openjdk.org Fri Oct 17 08:46:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 17 Oct 2025 08:46:19 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: <5sceaMXEJ9MNrBNfvWhKPhlYmosdY9IZ1KdxfxnDSqc=.95f1ed2e-da92-4aca-bd47-95a7bdf2e6ff@github.com> Message-ID: On Fri, 17 Oct 2025 08:29:35 GMT, Martin Doerr wrote: >> I forgot to submit this comment. I thought I submitted it two days ago. > > I thought someone may want to run the test in a configuration without C2. So, I didn't want to change the test for such cases. I leave such decisions to others. Maybe @dougxc has an opinion about Graal? Maybe they just need to put their own guards there if they care? It's hard to guard this very generically. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27805#discussion_r2438883499 From duke at openjdk.org Fri Oct 17 09:17:36 2025 From: duke at openjdk.org (erifan) Date: Fri, 17 Oct 2025 09:17:36 GMT Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8 [v2] In-Reply-To: References: <5sceaMXEJ9MNrBNfvWhKPhlYmosdY9IZ1KdxfxnDSqc=.95f1ed2e-da92-4aca-bd47-95a7bdf2e6ff@github.com> Message-ID: On Fri, 17 Oct 2025 08:42:57 GMT, Emanuel Peter wrote: >> I thought someone may want to run the test in a configuration without C2. So, I didn't want to change the test for such cases. I leave such decisions to others. Maybe @dougxc has an opinion about Graal? > > Maybe they just need to put their own guards there if they care? It's hard to guard this very generically. > I thought someone may want to run the test in a configuration without C2. Yeah I know your point. But `vm.opt.final.MaxVectorSize == "null" ` doesn't work I think, because this doesn't guarantee the max vector size >= 16. Perhaps as @eme64 pointed out, it is hard for us to ensure that all compilers work. This do no harm to C2 test, just a bit unnecessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27805#discussion_r2438969890 From qxing at openjdk.org Fri Oct 17 09:32:48 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 17 Oct 2025 09:32:48 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v7] In-Reply-To: References: Message-ID: > In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. > > loop-safepoint > > This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). > > > Benchmark Mode Cnt Score Error Units > LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline > LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch > > > Testing: tier1-2 on x86_64 and aarch64. Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: - Update microbench - Add IR tests for nested loops ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23057/files - new: https://git.openjdk.org/jdk/pull/23057/files/9b2bd6e6..b132bddc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=05-06 Stats: 180 lines in 2 files changed: 150 ins; 3 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/23057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23057/head:pull/23057 PR: https://git.openjdk.org/jdk/pull/23057 From qxing at openjdk.org Fri Oct 17 09:32:49 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 17 Oct 2025 09:32:49 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:22:13 GMT, Emanuel Peter wrote: >> The second question: >> >>> If we now removed safepoints in places where we would actually have needed them: how would we find out? I suppose we would get longer time to safepoint - higher latency in some cases. How would we catch this with our tests? >> >> I tried running tier1 tests with `JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+SafepointTimeout -XX:+AbortVMOnSafepointTimeout -XX:SafepointTimeoutDelay=1000`, and there were no failures. >> >> Running with `-XX:SafepointTimeoutDelay=500` caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option. > > @MaxXSoft Would you mind improving the documentation comments, so that they are easier to understand? Maybe you can even add more comments around your code change, to "prove" why it is ok to do what we would do with your change? Hi @eme64, thanks for your kind and detailed review! I've updated this patch based on your previous reviews. Do you have any other suggestions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-3414633927 From qxing at openjdk.org Fri Oct 17 09:32:53 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 17 Oct 2025 09:32:53 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: On Tue, 16 Sep 2025 06:11:15 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve documentation comments > > test/hotspot/jtreg/compiler/loopopts/TestRedundantSafepointElimination.java line 84: > >> (failed to retrieve contents of file, check the PR for context) > All of the cases here are only single loops, right? But is the algorithm not mostly dealing with nested loops, where we have to make sure that in some cases the `SafePoint` is not eliminated? Could you add some extra cases for that? Updated IR test. > test/micro/org/openjdk/bench/vm/compiler/LoopSafepoint.java line 76: > >> 74: } >> 75: return sum; >> 76: } > > I think it would be nice if you made the examples in the JMH and the JTREG as similar as possible. Updated microbenchmark to be in sync with the IR test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2439005605 PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2439007562 From aph at openjdk.org Fri Oct 17 09:49:09 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 17 Oct 2025 09:49:09 GMT Subject: RFR: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 15:23:28 GMT, Feilong Jiang wrote: > As discussed in https://github.com/openjdk/jdk/pull/27748#pullrequestreview-3341840431, the same issue occurs with the RISC-V port. > > Testing: > > - [x] tier1 - tier4 linux-riscv64 fastdebug Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27850#pullrequestreview-3349154812 From aph at openjdk.org Fri Oct 17 09:49:12 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 17 Oct 2025 09:49:12 GMT Subject: RFR: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V In-Reply-To: <9tOiCAzqT5S7rt0Md8PUFy8s0az97uQVK2DPTPIo7Wo=.03840263-1742-42dd-9cf1-a867f0b9cff4@github.com> References: <6k8bFDES2UHSq3PJv8oj0sDdTDpBKnHMcifoA2szXSw=.73e161eb-6847-4632-8d20-78296ff091ec@github.com> <9tOiCAzqT5S7rt0Md8PUFy8s0az97uQVK2DPTPIo7Wo=.03840263-1742-42dd-9cf1-a867f0b9cff4@github.com> Message-ID: On Fri, 17 Oct 2025 07:10:05 GMT, Aleksey Shipilev wrote: > > @shipilev @theRealAph : For the aarch64 counterpart, shouldn't the `ldarb` at [1] prevent the reordering of `STR` of PBC and `STLR` of RFE? It's a load instruction with acquire semantics. > > Yes, I was confused about this myself. The key thing for this particular issue: the _reader_ we need to sync up with is not `patch_bytecode`, it is the thread that _executes_ the patched bytecode. In other words, the _writer_ is `patch_bytecode`, and _reader_ is executing thread. > > So acquire barrier in `patch_bytecode` does not help this case, because it is a write path, it needs release, which aarch64 fix did. The read path needs some other synchronization for acquire-like semantics; in aarch64 we reasoned the control dependency on bytecode itself and the barrier in RFE resolution is already enough to do this. RISCV is good on the read side, we just need this patch to fix the write: void InterpreterMacroAssembler::load_field_entry(Register cache, Register index, int bcp_offset) { ... // Get address of field entries array ld(cache, Address(xcpool, ConstantPoolCache::field_entries_offset())); addi(cache, cache, Array::base_offset_in_bytes()); add(cache, cache, index); // Prevents stale data from being read after the bytecode is patched to the fast bytecode membar(MacroAssembler::LoadLoad); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/27850#issuecomment-3414678165 From rcastanedalo at openjdk.org Fri Oct 17 11:54:09 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 17 Oct 2025 11:54:09 GMT Subject: RFR: 8369569: Rename methods in regmask.hpp to conform with HotSpot coding style In-Reply-To: References: Message-ID: <9TKpNnrHh3DnhvUWvGQQFDIuIgWFIcyo7En1HcoCGfI=.febc6239-4d54-433d-be30-dc2959c0002a@github.com> On Wed, 15 Oct 2025 08:14:26 GMT, Daniel Lund?n wrote: > A number of methods in regmask.hpp do not conform with the HotSpot coding style. We should make sure they do. > > ### Changeset > - Rename methods in `regmask.hpp` to conform with HotSpot coding style. > - Similarly rename directly related methods in `chaitin.hpp`. > - Rename the constant register masks `All` and `Empty` to `ALL` and `EMPTY`. > - Fix a few additional code style issues at lines touched by the changeset. > > Note: this is a syntax-only changeset (no functional changes). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500704336) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Looks good otherwise! test/hotspot/gtest/opto/test_regmask.cpp line 586: > 584: } > 585: > 586: TEST_VM(RegMask, Set_ALL_extended) { Suggestion: TEST_VM(RegMask, set_all_extended) { test/hotspot/gtest/opto/test_regmask.cpp line 599: > 597: } > 598: > 599: TEST_VM(RegMask, Set_ALL_From_extended) { Suggestion: TEST_VM(RegMask, set_all_from_extended) { test/hotspot/gtest/opto/test_regmask.cpp line 606: > 604: } > 605: > 606: TEST_VM(RegMask, Set_ALL_From_extended_grow) { Suggestion: TEST_VM(RegMask, set_all_from_extended_grow) { test/hotspot/gtest/opto/test_regmask.cpp line 613: > 611: } > 612: > 613: TEST_VM(RegMask, Clear_extended) { Suggestion: TEST_VM(RegMask, clear_extended) { test/hotspot/gtest/opto/test_regmask.cpp line 614: > 612: > 613: TEST_VM(RegMask, Clear_extended) { > 614: // Check that Clear doesn't leave any stray bits on extended RegMasks. Suggestion: // Check that clear doesn't leave any stray bits on extended RegMasks. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27817#pullrequestreview-3349813577 PR Review Comment: https://git.openjdk.org/jdk/pull/27817#discussion_r2439574516 PR Review Comment: https://git.openjdk.org/jdk/pull/27817#discussion_r2439576190 PR Review Comment: https://git.openjdk.org/jdk/pull/27817#discussion_r2439578391 PR Review Comment: https://git.openjdk.org/jdk/pull/27817#discussion_r2439579707 PR Review Comment: https://git.openjdk.org/jdk/pull/27817#discussion_r2439581329 From mdoerr at openjdk.org Fri Oct 17 12:21:23 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 17 Oct 2025 12:21:23 GMT Subject: RFR: 8369946: Bytecode rewriting causes Java heap corruption on PPC Message-ID: Like the aarch64 fix (https://github.com/openjdk/jdk/pull/27748). PPC64 has additional requirements: - It implements `fast_invokevfinal` which uses `ResolvedMethodEntry`. - Speculative loads need to get prevented by memory barrier instructions (even on control dependent paths). I've refactored `load_field_entry` and `load_method_entry` into a common function and added support for rewritten "fast" Bytecodes. I'm using `isync` instructions because we already have a control dependency (via Bytecode dispatch). ------------- Commit messages: - 8369946: Bytecode rewriting causes Java heap corruption on PPC Changes: https://git.openjdk.org/jdk/pull/27867/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27867&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369946 Stats: 36 lines in 3 files changed: 12 ins; 4 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/27867.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27867/head:pull/27867 PR: https://git.openjdk.org/jdk/pull/27867 From mdoerr at openjdk.org Fri Oct 17 12:27:29 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 17 Oct 2025 12:27:29 GMT Subject: RFR: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V In-Reply-To: <9tOiCAzqT5S7rt0Md8PUFy8s0az97uQVK2DPTPIo7Wo=.03840263-1742-42dd-9cf1-a867f0b9cff4@github.com> References: <6k8bFDES2UHSq3PJv8oj0sDdTDpBKnHMcifoA2szXSw=.73e161eb-6847-4632-8d20-78296ff091ec@github.com> <9tOiCAzqT5S7rt0Md8PUFy8s0az97uQVK2DPTPIo7Wo=.03840263-1742-42dd-9cf1-a867f0b9cff4@github.com> Message-ID: On Fri, 17 Oct 2025 07:10:05 GMT, Aleksey Shipilev wrote: >> Hi, I am having some difficulty in understanding the issue. >> @shipilev @theRealAph : For the aarch64 counterpart, shouldn't the `ldarb` at [1] prevent the reordering of `STR` of PBC and `STLR` of RFE? It's a load instruction with acquire semantics. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp#L200 > >> @shipilev @theRealAph : For the aarch64 counterpart, shouldn't the `ldarb` at [1] prevent the reordering of `STR` of PBC and `STLR` of RFE? It's a load instruction with acquire semantics. > > Yes, I was confused about this myself. The key thing for this particular issue: the _reader_ we need to sync up with is not `patch_bytecode`, it is the thread that _executes_ the patched bytecode. In other words, the _writer_ is `patch_bytecode`, and _reader_ is executing thread. > > So acquire barrier in `patch_bytecode` does not help this case, because it is a write path, it needs release, which aarch64 fix did. The read path needs some other synchronization for acquire-like semantics; in aarch64 we reasoned the control dependency on bytecode itself and the barrier in RFE resolution is already enough to do this. See my writeup here: https://bugs.openjdk.org/browse/JDK-8369506?focusedId=14824157#comment-14824157 -- and the comments after it. > > > @shipilev @theRealAph : For the aarch64 counterpart, shouldn't the `ldarb` at [1] prevent the reordering of `STR` of PBC and `STLR` of RFE? It's a load instruction with acquire semantics. > > > > > > Yes, I was confused about this myself. The key thing for this particular issue: the _reader_ we need to sync up with is not `patch_bytecode`, it is the thread that _executes_ the patched bytecode. In other words, the _writer_ is `patch_bytecode`, and _reader_ is executing thread. > > So acquire barrier in `patch_bytecode` does not help this case, because it is a write path, it needs release, which aarch64 fix did. The read path needs some other synchronization for acquire-like semantics; in aarch64 we reasoned the control dependency on bytecode itself and the barrier in RFE resolution is already enough to do this. > > RISCV is good on the read side, we just need this patch to fix the write: > > ``` > void InterpreterMacroAssembler::load_field_entry(Register cache, Register index, int bcp_offset) { > ... > // Get address of field entries array > ld(cache, Address(xcpool, ConstantPoolCache::field_entries_offset())); > addi(cache, cache, Array::base_offset_in_bytes()); > add(cache, cache, index); > // Prevents stale data from being read after the bytecode is patched to the fast bytecode > membar(MacroAssembler::LoadLoad); > } > ``` I just made a similar proposal for PPC64. But, I only use a barrier for fast Bytecodes: https://github.com/TheRealMDoerr/jdk/blob/138df669209ae58676e0559cf825d0a0cc81ee1b/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp#L491 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27850#issuecomment-3415352767 From mdoerr at openjdk.org Fri Oct 17 12:41:13 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 17 Oct 2025 12:41:13 GMT Subject: RFR: 8369444: JavaFrameAnchor on PPC64 has unnecessary barriers In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 13:55:49 GMT, David Briemann wrote: > > If all CPU ports follow this, then it seems like we could eventually implement JavaFrameAnchor in shared code without CPU-specific parts. > > Seems like a good idea. However there are still differences for the different CPUs. E.g. aarch64 still contains a release memory barrier. This might be a good follow-up task. > > I would like to deliver this cleanup for PPC first. In addition, some platforms have a `_last_Java_fp`, others don't. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27768#issuecomment-3415415174 From epeter at openjdk.org Fri Oct 17 12:54:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 17 Oct 2025 12:54:05 GMT Subject: RFR: 8369569: Rename methods in regmask.hpp to conform with HotSpot coding style In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:14:26 GMT, Daniel Lund?n wrote: > A number of methods in regmask.hpp do not conform with the HotSpot coding style. We should make sure they do. > > ### Changeset > - Rename methods in `regmask.hpp` to conform with HotSpot coding style. > - Similarly rename directly related methods in `chaitin.hpp`. > - Rename the constant register masks `All` and `Empty` to `ALL` and `EMPTY`. > - Fix a few additional code style issues at lines touched by the changeset. > > Note: this is a syntax-only changeset (no functional changes). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500704336) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. LGTM. Roberto had some good suggestions though, so you should look at those :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27817#pullrequestreview-3350205643 From epeter at openjdk.org Fri Oct 17 13:02:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 17 Oct 2025 13:02:04 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v9] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 06:56:43 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng test/micro/org/openjdk/bench/javax/crypto/AESDecrypt.java line 44: > 42: > 43: @Param("10000000") > 44: private int count; Drive-by comment / question: Did you do all benchmarking with this single (quite large) size? How are the results for much smaller sizes? It may be worth it to just get a nice plot that goes over a range of sizes, to see if it behaves as expected. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2439948821 From epeter at openjdk.org Fri Oct 17 13:08:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 17 Oct 2025 13:08:09 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: <_gdObRdkYkS7d3fQQ6bcms709TpeM1IQtuPJtI0fcyE=.073d0496-8dfc-48a9-aee8-b64b408a6e62@github.com> On Wed, 3 Sep 2025 17:08:08 GMT, Fei Gao wrote: >> I'm a little sick and don't feel very focused, so I'll have to look at the PR next week. >> >> BTW: I just integrated https://github.com/openjdk/jdk/pull/24278 which may have silent merge conflicts, so it would be good if you merged and tested again. Once you do that I could also run some internal testing, if you like :) > >> BTW: I just integrated https://github.com/openjdk/jdk/pull/24278 which may have silent merge conflicts, so it would be good if you merged and tested again. > > Hi @eme64 , I?ve rebased the patch onto the latest JDK, and all tier1 to tier3 tests have passed on my local AArch64 and x86 machines. > >> It would be good if you re-ran the benchmarks. It seems the last ones you did in December of 2024. > We should see that we have various benchmarks, both for array and MemorySegment. > You could look at the array benchmarks from here: https://github.com/openjdk/jdk/pull/22070 > > I also re-verified the benchmark from [PR #22070](https://github.com/openjdk/jdk/pull/22070) on 128-bit, 256-bit, and 512-bit vector machines. The results show no significant regressions and performance changes are consistent with the previous round described in [perf results]( https://bugs.openjdk.org/browse/JDK-8307084?focusedId=14729524&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14729524). > >> Once you do that I could also run some internal testing, if you like :) > > I?d really appreciate it if you could run some internal testing at a time you think is suitable. > Thanks :) @fg1417 Are you still working on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3415539539 From mhaessig at openjdk.org Fri Oct 17 13:19:04 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 17 Oct 2025 13:19:04 GMT Subject: RFR: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands In-Reply-To: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> References: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> Message-ID: <_WGjwVyk2zBmssCwv220X1CWZAQcAPB0uJJCqTaN9EU=.31ca4bd1-622e-4579-9bfc-7f273aac069d@github.com> On Thu, 16 Oct 2025 14:20:29 GMT, Emanuel Peter wrote: > **TLDR** `is_NaN` -> `is_zero`, just like the code comment says. > > Thanks to @mhaessig for debugging the ARM32 bug below. He found the buggy line of code. > > ---------------------------------------- > > **Details** > > It seems there is a little "typo" (logic error) in `MemPointerParser::canonicalize_raw_summands` that slipped through the cracks in https://github.com/openjdk/jdk/pull/24278. The JavaFuzzer now found an example, and independently the issue was also reported on ARM32 [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578). > > Filtering out `NaN` instead of `zero` for the `scaleL` has two manifestations: > - If `scaleL` is zero, but does not get filtered out even though it should be: we hit the assert in `MemPointerSummand` constructor, `assert(!_scale.is_zero(), "non-zero scale");`. > - See [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578), though those tests seem to only fail on ARM32, and nowhere else. > - I was able to construct a `MemorySegment` regression test, see `TestMemorySegmentFilterSummands.test1`. I suspect that the ARM32 failures happened on an array, as it failed in places like `BigInteger::implMultiplyToLen`. But now I was able to reproduce it with native memory, to get a pointer expression that has the same cancellation issue. > - If `scaleL` is `NaN`, and gets filtered even though it should not be: We get a non-trivial MemPointer that is missing a summand. So we will succeed in optimizing, but with wrong assumptions. We generate a runtime aliasing check that is incorrect, leading to wrong results. > - This was reported by the fuzzer, see attached `TestDoNotFilterNaNSummands`. > - I was also able to create a simpler example with `MemorySegments`, see attached `TestMemorySegmentFilterSummands.test2`. > > **Why did this slip through the cracks?** > > In https://github.com/openjdk/jdk/pull/24278 I added pretty extensive testing, even fuzzer style tests, see `TestAliasingFuzzer.java`. But I think all of those tests exercise `scale` that are in "nice" [int ranges](https://github.com/openjdk/jdk/pull/24278/files#diff-26de03e864a492fe8aa8178818968f2097b99cf36a763605e2fb11fbc04eedffR303-R322). Also the JavaFuzzer does not directly generate such long constants for array accesses (not possible without Unsafe I think), we were lucky that it generated the index with `%` that got optimized to some magic long constant. > > There is already an RFE filed for improvements to `TestAliasingFuzzer.java`: [JDK-8365985](https://bugs.openjdk.org/browse/JDK-836... Thank you for fixing my bug as well and doing the work to find a 64-bit reproducer, @eme64! Also, thanks for providing an explanation for the NaNs in the MemPointer parsing. The change looks good to me. I only have a suggestion to simplify your scenarios. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentFilterSummands.java line 71: > 69: new Scenario(1, "-XX:+AlignVector", "-XX:-ShortRunningLongLoop"), > 70: new Scenario(2, "-XX:-AlignVector", "-XX:+ShortRunningLongLoop"), > 71: new Scenario(3, "-XX:+AlignVector", "-XX:+ShortRunningLongLoop")); That might be the perfect opportunity to break out the cross-product scenario: Suggestion: f.addCrossProductScenarios(Set.of("-XX:-AlignVector", "-XX:+AlignVector"), Set.of("-XX:-ShortRunningLongLoop", "-XX:+ShortRunningLoop); test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentFilterSummands.java line 82: > 80: applyIfPlatform = {"64-bit", "true"}, > 81: applyIf = {"AlignVector", "false"}, > 82: applyIfCPUFeatureOr = {"avx", "true", "asimd", "true"}) I always forget what the best practice is regarding detecting CPU features to not break the CIs of riscv and others. But this should be fine, since you are matching for the CPU features, right? ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/27848#pullrequestreview-3350310410 PR Review Comment: https://git.openjdk.org/jdk/pull/27848#discussion_r2439983427 PR Review Comment: https://git.openjdk.org/jdk/pull/27848#discussion_r2439992482 From dlunden at openjdk.org Fri Oct 17 14:17:08 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 17 Oct 2025 14:17:08 GMT Subject: RFR: 8369569: Rename methods in regmask.hpp to conform with HotSpot coding style [v2] In-Reply-To: References: Message-ID: <8KTiD_-r-3Z3BrfjO2hghq5rdPEukPFQ1TIwDnIvM90=.a2b6d535-38ce-4184-a67a-5dcce6c4d6b7@github.com> > A number of methods in regmask.hpp do not conform with the HotSpot coding style. We should make sure they do. > > ### Changeset > - Rename methods in `regmask.hpp` to conform with HotSpot coding style. > - Similarly rename directly related methods in `chaitin.hpp`. > - Rename the constant register masks `All` and `Empty` to `ALL` and `EMPTY`. > - Fix a few additional code style issues at lines touched by the changeset. > > Note: this is a syntax-only changeset (no functional changes). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500704336) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27817/files - new: https://git.openjdk.org/jdk/pull/27817/files/1a9c93a9..b44a9a6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27817&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27817&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27817.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27817/head:pull/27817 PR: https://git.openjdk.org/jdk/pull/27817 From roland at openjdk.org Fri Oct 17 14:16:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 17 Oct 2025 14:16:41 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v4] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 09:28:49 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > driver -> main Can we, maybe, add an assert early during loop opts that the invariant holds? That would check that, once things have been cleaned up and before major transformations, there's no graph with an unexpected shape (and we don't skip unrolling when we shouldn't). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3415789748 From rcastanedalo at openjdk.org Fri Oct 17 14:24:19 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 17 Oct 2025 14:24:19 GMT Subject: RFR: 8369569: Rename methods in regmask.hpp to conform with HotSpot coding style [v2] In-Reply-To: <8KTiD_-r-3Z3BrfjO2hghq5rdPEukPFQ1TIwDnIvM90=.a2b6d535-38ce-4184-a67a-5dcce6c4d6b7@github.com> References: <8KTiD_-r-3Z3BrfjO2hghq5rdPEukPFQ1TIwDnIvM90=.a2b6d535-38ce-4184-a67a-5dcce6c4d6b7@github.com> Message-ID: On Fri, 17 Oct 2025 14:17:08 GMT, Daniel Lund?n wrote: >> A number of methods in regmask.hpp do not conform with the HotSpot coding style. We should make sure they do. >> >> ### Changeset >> - Rename methods in `regmask.hpp` to conform with HotSpot coding style. >> - Similarly rename directly related methods in `chaitin.hpp`. >> - Rename the constant register masks `All` and `Empty` to `ALL` and `EMPTY`. >> - Fix a few additional code style issues at lines touched by the changeset. >> >> Note: this is a syntax-only changeset (no functional changes). >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500704336) >> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27817#pullrequestreview-3350595495 From aseoane at openjdk.org Fri Oct 17 14:24:20 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Fri, 17 Oct 2025 14:24:20 GMT Subject: RFR: 8369569: Rename methods in regmask.hpp to conform with HotSpot coding style [v2] In-Reply-To: <8KTiD_-r-3Z3BrfjO2hghq5rdPEukPFQ1TIwDnIvM90=.a2b6d535-38ce-4184-a67a-5dcce6c4d6b7@github.com> References: <8KTiD_-r-3Z3BrfjO2hghq5rdPEukPFQ1TIwDnIvM90=.a2b6d535-38ce-4184-a67a-5dcce6c4d6b7@github.com> Message-ID: On Fri, 17 Oct 2025 14:17:08 GMT, Daniel Lund?n wrote: >> A number of methods in regmask.hpp do not conform with the HotSpot coding style. We should make sure they do. >> >> ### Changeset >> - Rename methods in `regmask.hpp` to conform with HotSpot coding style. >> - Similarly rename directly related methods in `chaitin.hpp`. >> - Rename the constant register masks `All` and `Empty` to `ALL` and `EMPTY`. >> - Fix a few additional code style issues at lines touched by the changeset. >> >> Note: this is a syntax-only changeset (no functional changes). >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500704336) >> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by aseoane (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/27817#pullrequestreview-3350601930 From dlunden at openjdk.org Fri Oct 17 14:24:23 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 17 Oct 2025 14:24:23 GMT Subject: RFR: 8369569: Rename methods in regmask.hpp to conform with HotSpot coding style [v2] In-Reply-To: <9TKpNnrHh3DnhvUWvGQQFDIuIgWFIcyo7En1HcoCGfI=.febc6239-4d54-433d-be30-dc2959c0002a@github.com> References: <9TKpNnrHh3DnhvUWvGQQFDIuIgWFIcyo7En1HcoCGfI=.febc6239-4d54-433d-be30-dc2959c0002a@github.com> Message-ID: On Fri, 17 Oct 2025 11:51:38 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Roberto Casta?eda Lozano > > Looks good otherwise! Thanks for the reviews! I've fixed the gtest naming issues @robcasloz caught. Please approve the latest commit and I'll integrate on Monday! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27817#issuecomment-3415806028 From mchevalier at openjdk.org Fri Oct 17 14:41:10 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 17 Oct 2025 14:41:10 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v4] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 09:28:49 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > driver -> main That sounds like a good idea, but I'm not sure where that should go. I suspect in `PhaseIdealLoop::build_and_optimize` between `build_loop_tree` and `iteration_split` (since it's recursive, we would check also between steps, which we don't want). That leaves quite some room. If I understand well, `iteration_split` will call `insert_pre_post_loops` which creates these `OpaqueZeroTripGuardNode`. which means that before `iteration_split`, we will see only the ones created at the previous round of loop opts, so with IGVN's cleaning since, which is good. A technical concern: I assume we want to check the invariant on every loop with such an opaque node (and I can't know ahead of time where loop unrolling is going to happen), so I should traverse the loop tree and look at every loop in there? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3415873461 From roland at openjdk.org Fri Oct 17 14:56:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 17 Oct 2025 14:56:54 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v4] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 14:38:40 GMT, Marc Chevalier wrote: > A technical concern: I assume we want to check the invariant on every loop with such an opaque node (and I can't know ahead of time where loop unrolling is going to happen), so I should traverse the loop tree and look at every loop in there? Maybe that can be done in `IdealLoopTree::counted_loop()`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3415920474 From chagedorn at openjdk.org Fri Oct 17 14:56:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 17 Oct 2025 14:56:57 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v4] In-Reply-To: References: Message-ID: <3jSctyKb4Zi-tG17Yn9xKACgwnJBVU079t5m7VcvoGA=.ef922908-c05a-4046-ad30-365b228ee089@github.com> On Thu, 16 Oct 2025 09:28:49 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > driver -> main Could `PhaseIdealLoop::eliminate_useless_zero_trip_guard()` maybe also be an option? We are looping through all loops and check the `OpaqueZeroTripGuardNodes` anyways there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3415927766 From jcking at openjdk.org Fri Oct 17 15:01:11 2025 From: jcking at openjdk.org (Justin King) Date: Fri, 17 Oct 2025 15:01:11 GMT Subject: RFR: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 15:23:28 GMT, Feilong Jiang wrote: > As discussed in https://github.com/openjdk/jdk/pull/27748#pullrequestreview-3341840431, the same issue occurs with the RISC-V port. > > Testing: > > - [x] tier1 - tier4 linux-riscv64 fastdebug Marked as reviewed by jcking (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27850#pullrequestreview-3350736178 From bkilambi at openjdk.org Fri Oct 17 15:02:32 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 17 Oct 2025 15:02:32 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 10:45:47 GMT, erifan wrote: > According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 environment using qemu. > > This PR fixes these test failures. Hello, apologies for this late response. The patch looks ok to me, however I am just running the JTREG test internally on a few aarch64 machines. I'll get back soon! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27723#issuecomment-3415945059 From bmaillard at openjdk.org Fri Oct 17 15:39:03 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 17 Oct 2025 15:39:03 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: On Thu, 16 Oct 2025 13:18:01 GMT, Roland Westrelin wrote: >> We already transform: >> >> (LShiftX (AddX a con0), con1) into (AddX (LShiftX a con1) con0<> >> THis is a variant with SubX. I found that this helps RCE. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for making this change @rwestrel! Looks good to me, I only have nits regarding the comments. I have submitted testing and will come back with the results. src/hotspot/share/opto/mulnode.cpp line 1091: > 1089: } > 1090: } > 1091: // Transform is legal, but check for profit. Avoid breaking 'i2s' I would add a comment that explicitly states the pattern we are looking at, this makes reading the code much faster imo: Suggestion: // Check for "(con0 - X) << con1" // Transform is legal, but check for profit. Avoid breaking 'i2s' src/hotspot/share/opto/mulnode.cpp line 1099: > 1097: // Compute X << con0 > 1098: Node* lsh = phase->transform(LShiftNode::make(add1->in(2), in(2), bt)); > 1099: // Compute X< References: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> Message-ID: <1d6ZwG3l4ccxfOgDyJd3hvFbkX2E5t9dc0VEro0MbY8=.c09440d1-f6a3-462b-b1be-6869a5772f27@github.com> On Tue, 14 Oct 2025 22:33:40 GMT, Chad Rakoczy wrote: >> [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced a counter so that the nmethod immutable data can be shared between relocated nmethods to eliminate an unnecessary copy. The counter is aligned in memory so that must be taken into account when calculating the amount of memory used by the counter > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add reference counter offset My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27778#pullrequestreview-3351187097 From duke at openjdk.org Fri Oct 17 17:22:06 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 17 Oct 2025 17:22:06 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v9] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 12:59:42 GMT, Emanuel Peter wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates for code review comments from @valeriepeng > > test/micro/org/openjdk/bench/javax/crypto/AESDecrypt.java line 44: > >> 42: >> 43: @Param("10000000") >> 44: private int count; > > Drive-by comment / question: > Did you do all benchmarking with this single (quite large) size? How are the results for much smaller sizes? It may be worth it to just get a nice plot that goes over a range of sizes, to see if it behaves as expected. The benchmarks listed in the PR description execute tests for data sizes ranging from 16 to 10_000_000 bytes for decryption and encryption. The difference in performance between the old and new code were within SE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2440660231 From duke at openjdk.org Fri Oct 17 17:44:21 2025 From: duke at openjdk.org (duke) Date: Fri, 17 Oct 2025 17:44:21 GMT Subject: RFR: 8369642: [ubsan] nmethod::nmethod null pointer passed as argument 2 to memcpy [v2] In-Reply-To: References: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> Message-ID: On Tue, 14 Oct 2025 22:33:40 GMT, Chad Rakoczy wrote: >> [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced a counter so that the nmethod immutable data can be shared between relocated nmethods to eliminate an unnecessary copy. The counter is aligned in memory so that must be taken into account when calculating the amount of memory used by the counter > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add reference counter offset @chadrako Your change (at version 2827dca8f2e677b96f1694bf184c09052e56db90) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27778#issuecomment-3416498964 From duke at openjdk.org Fri Oct 17 18:06:12 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 17 Oct 2025 18:06:12 GMT Subject: Integrated: 8369642: [ubsan] nmethod::nmethod null pointer passed as argument 2 to memcpy In-Reply-To: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> References: <3kb34wTVzDNE3qDozi9eO9_kWuJXfz4GPVcKTRo4veM=.9ae36b69-d193-4d4e-b993-6d6a35af245b@github.com> Message-ID: On Mon, 13 Oct 2025 23:56:20 GMT, Chad Rakoczy wrote: > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced a counter so that the nmethod immutable data can be shared between relocated nmethods to eliminate an unnecessary copy. The counter is aligned in memory so that must be taken into account when calculating the amount of memory used by the counter This pull request has now been integrated. Changeset: 0cb8ccd8 Author: Chad Rakoczy Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/0cb8ccd89a659eaf1e245cfb7f8c32fb16bff4c7 Stats: 30 lines in 2 files changed: 14 ins; 10 del; 6 mod 8369642: [ubsan] nmethod::nmethod null pointer passed as argument 2 to memcpy Reviewed-by: kvn, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/27778 From kxu at openjdk.org Fri Oct 17 18:36:26 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 17 Oct 2025 18:36:26 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v14] In-Reply-To: References: Message-ID: <8tG69oVLSZGGqeFjzmmDAPHxKYHX1vSLQeWzBxf2fPo=.fc0127e0-f3dd-4dd3-be8b-c5656ba24b0d@github.com> > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: follow-up review 3321712957 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/2cf1da18..1da190a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=12-13 Stats: 169 lines in 3 files changed: 44 ins; 49 del; 76 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From kxu at openjdk.org Fri Oct 17 18:36:29 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 17 Oct 2025 18:36:29 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v13] In-Reply-To: References: <6zw0uSB1sUHZTyDUXDjiXcB0Chmu0XH1cEngzhG-UNk=.b239a687-cfb7-49a3-993a-34327a83c4de@github.com> Message-ID: On Fri, 10 Oct 2025 08:26:28 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8354383: C2: enable sinking of Type nodes out of loop >> >> Reviewed-by: chagedorn, thartmann >> (cherry picked from commit a2f99fd88bd03337e1ba73b413ffe4e39f3584cf) > > src/hotspot/share/opto/loopnode.cpp line 1659: > >> 1657: assert(exit_test.mask() != BoolTest::ne, "unexpected condition"); >> 1658: assert(iv_incr.phi_incr() == nullptr, "bad loop shape"); >> 1659: assert(exit_test.cmp()->in(1) == iv_incr.incr(), "bad exit test shape"); > > About these assertion in this method: Aren't these already implicitly checked with the `is_valid*()` checks further up? No. This is method is performing more specific checks that normally don't happen (and therefore in `ASSERT` and `assert()`) that are no included in `.is_valid*()`. > [...] directly initialize `_exit_test` in the constructor of `CountedLoopConverter` by [...] Do you mean the constructor of `LoopStructure`? Also `_back_control` is actually just `_phase->loop_exit_control(_head, _loop)` which are already available in `LoopStructure`. I've made those changes to save reinitializations. > src/hotspot/share/opto/loopnode.cpp line 2460: > >> 2458: } >> 2459: >> 2460: return iff->in(0)->isa_SafePoint(); > > `isa_SafePoint()` will also return non-null for subclasses of `SafePoint` like `Call` nodes. But IIUC, we want to have only exact `SafePoint` nodes. Maybe @rwestrel can double-check. > > Same in your patch here: > > _sfpt = _loop->_child == nullptr > ? _phase->find_safepoint(_back_control, _head, _loop) > : _back_control->in(0)->in(0)->isa_SafePoint(); I believe this is a valid concern, although I'm not sure if this ever happens in practice. Added `sfpt->Opcode() == Op_SafePoint` condition. > src/hotspot/share/opto/loopnode.cpp line 3113: > >> 3111: _incr = n1; >> 3112: _trunc1 = t1; >> 3113: _trunc2 = t2; > > Can we find some better names instead of `trunc1` and `trunc2`? Renamed to `_outer_trunc` and `_inner_trunc`. // Optional truncation for: CHAR: (i+1)&0x7fff, BYTE: ((i+1)<<8)>>8, or SHORT: ((i+1)<<16)>>16 Node* outer_trunc() const { return _outer_trunc; } // the outermost truncating node (either the & or the final >>) Node* inner_trunc() const { return _inner_trunc; } // the inner truncating node, if applicable (the << in a <> pair) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2440858361 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2440858412 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2440857836 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2440858015 From kxu at openjdk.org Fri Oct 17 18:45:43 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 17 Oct 2025 18:45:43 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v15] In-Reply-To: References: Message-ID: <5v9uvdC5zStLv_30k1n8eRGvtMjDe1_OuTfw6xfer8w=.822df4fa-96e9-4cf2-ac6b-c0e66e2f817a@github.com> > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: add safepoint opcode condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/1da190a5..80c2a62a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=13-14 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From vlivanov at openjdk.org Fri Oct 17 20:01:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 17 Oct 2025 20:01:50 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v17] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/d80aee5d..a491594e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=15-16 Stats: 8 lines in 2 files changed: 2 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Fri Oct 17 20:01:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 17 Oct 2025 20:01:51 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11] In-Reply-To: References: <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com> Message-ID: On Thu, 18 Sep 2025 11:55:46 GMT, Emanuel Peter wrote: > would your stress flag catch the conflict? Yes, w/ -XX:+StressReachabilityFences it becomes very likely that a late inline call has reachability edges. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2441090706 From vlivanov at openjdk.org Fri Oct 17 20:01:53 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 17 Oct 2025 20:01:53 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11] In-Reply-To: References: <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com> Message-ID: On Fri, 12 Sep 2025 14:06:15 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor fix > > test/hotspot/jtreg/compiler/c2/TestReachabilityFence.java line 40: > >> 38: * @run main/othervm -Xbatch compiler.c2.TestReachabilityFence >> 39: */ >> 40: public class TestReachabilityFence { > > This test seems very important to me. Can you please add some extra code comments, about what goes wrong before the fix, i.e. if RF are not present? > > Maybe some explanation about what it took to write this test, so that we can build on that to extend the test later? I added more comments in the test code to elaborate on the problematic scenario. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2441049200 From duke at openjdk.org Fri Oct 17 20:07:05 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 17 Oct 2025 20:07:05 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v10] In-Reply-To: References: Message-ID: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Updates for code review comments from @valeriepeng ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/2ce35b97..1102c609 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=08-09 Stats: 17 lines in 1 file changed: 0 ins; 1 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Fri Oct 17 20:07:06 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 17 Oct 2025 20:07:06 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: <5SGZrrpgKtrf7IKtfEDPFb4LnggKNoeaZpFVGuHu-p4=.1ec8dd26-e125-4c8a-ba22-9006c7da4024@github.com> References: <2_eqasQo7DtbnrxwxuFYvl_yhVh7P6wzlxwPEl_DB-Q=.ff81a933-2c29-4d69-826b-d1ccf04d2e1c@github.com> <5SGZrrpgKtrf7IKtfEDPFb4LnggKNoeaZpFVGuHu-p4=.1ec8dd26-e125-4c8a-ba22-9006c7da4024@github.com> Message-ID: On Fri, 17 Oct 2025 06:52:44 GMT, Shawn M Emery wrote: >> src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 911: >> >>> 909: } >>> 910: sessionK[0] = genRoundKeys(key, rounds); >>> 911: sessionK[1] = invGenRoundKeys(); >> >> Given the decryption round keys are somewhat based on the encryption round keys, we could combine these two methods into one, e.g. >> >> private static int[][] genRoundKeys(byte[] key, int rounds) { >> int[][] ks = new int[2][]; // key schedule >> >> int wLen = (rounds + 1) * WB; >> int nk = key.length / WB; >> >> // generate the round keys for encryption >> int[] w = new int[wLen]; >> for (int i = 0, j = 0; i < nk; i++, j+=4) { >> w[i] = ((key[j] & 0xFF) << 24) >> | ((key[j + 1] & 0xFF) << 16) >> | ((key[j + 2] & 0xFF) << 8) >> | (key[j + 3] & 0xFF); >> } >> for (int i = nk; i < wLen; i++) { >> int tmp = w[i - 1]; >> if (i % nk == 0) { >> int rW = (tmp << 8) & 0xFFFFFF00 | (tmp >>> 24); >> tmp = subWord(rW) ^ RCON[(i / nk) - 1]; >> } else if ((nk > 6) && ((i % nk) == WB)) { >> tmp = subWord(tmp); >> } >> w[i] = w[i - nk] ^ tmp; >> } >> ks[0] = w; >> >> // generate the decryption round keys based on encryption ones >> int[] dw = new int[wLen]; >> int[] temp = new int[WB]; >> >> // Intrinsics requires the inverse key expansion to be reverse order >> // except for the first and last round key as the first two round keys >> // are without a mix column transform. >> for (int i = 1; i < rounds; i++) { >> System.arraycopy(w, i * WB, temp, 0, WB); >> invMixRKey(temp); >> System.arraycopy(temp, 0, dw, wLen - (i * WB), WB); >> } >> // dw[0...3] <- w[0...3] AND dw[4...7] <- w[(wLen - 4)...(wLen -1)] >> System.arraycopy(w, 0, dw, 0, WB); >> System.arraycopy(w, wLen - WB, dw, WB, WB); >> ks[1] = dw; >> Arrays.fill(temp, 0); >> >> return ks; >> } > > These two methods were only the few that I was able to make that were compact and singular in purpose (gen round key, gen inverse round key) code as the coding style guidelines espouse. The rest of the methods' construction were dictated by performance improvements, where compactness came at the cost of interpreter speed. I did make changes based on your code to eliminate len and updates to variable names. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2441096947 From vlivanov at openjdk.org Fri Oct 17 20:08:24 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 17 Oct 2025 20:08:24 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2] In-Reply-To: References: <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com> <1FgOFS7aAlEbvVUez6iTfzgf2l7qUbL9C4wfSGmmfo0=.406c10f1-63d5-4333-af6d-525e46203182@github.com> Message-ID: On Fri, 12 Sep 2025 14:09:52 GMT, Emanuel Peter wrote: >> @eme64 I think I addressed/answered all your suggestions/questions. Please, take another look. Thanks! > > @iwanowww Thanks for the updates! I again only looked through most comments as well. > > These are the major topics for me: > - `StressReachabilityFences` only inserts RF where they are not needed. So this allows us to test the consistency of the RF machinery, but not to test if we are missing RF where they are needed. That is much harder, and we should probably invest in writing more tests for those cases, even if it is really hard. Maybe we can even write fuzzing tests for it? > - There seems to be missing support for carrying RF edges through incremental inlining, right? File an RFE, or track it elsewhere. Could we create a reproducer for this case / can we extend the existing one? https://github.com/openjdk/jdk/pull/25315#discussion_r2330095168 > - Are we sure that we don't eliminate the RF for the wrong allocation? https://github.com/openjdk/jdk/pull/25315#discussion_r2330230044 > - Extra compile-time due to extra loop-opts round. https://github.com/openjdk/jdk/pull/25315#discussion_r2330176841 . It used to be a 20% increase, now you managed to make it only 10%. Still considerable. All of it just to call `get_ctrl(referent)` in `enumerate_interfering_sfpts`. > > I think some of these issues should also be discussed in the PR description / JIRA description. > It would be especially nice if you could summarize the scope of the problem of RF, and which parts are now fixed, and which parts you know are not yet fixed. Of course there may be even more we don't know, but best write everything down we already do know. ;) > > Other ideas: > - You should file an RFE to add your stress flags to the stress job, and also the fuzzer. > - I did not yet study the reproducer `TestReachabilityFence.java`. We should consider making a fuzzer style test out of it, maybe using the template framework. Feel free to just file an RFE for that, and assign it to me. > > @shipilev @TobiHartmann @chhagedorn > I'm soon going on vacation (in a week), and so I'd like the other reviewers to be aware of these issues. > I don't want to hold up the patch, so feel free to have someone else review. But I'm also happy to come back to this mid October. @eme64 @dean-long I addressed your feedback in the latest version. Please, take another look. Thanks! (It passed testing up to hs-tier10 w/ `-XX:+StressReachabilityFences`.) In particular, I reverted the optimization to piggy-back on last PhaseIdealLoop pass to eliminate RF nodes. Filed JDK-8370127 to address it separately. I filed the following bugs/enhancements for follow-up work: 1. JDK-8370127 "C2: Improve ReachabilityFence elimination" 2. JDK-8370129 "C2: Support ReachabilityFence with constant referent" 3. JDK-8370131 "C2: Improve ReachabilityFence test coverage" 4. JDK-8370132 "C2: Enforce post-loop opts phase" 5. JDK-8370133 "C2: Manage non-debug safepoint edges in structural manner" 6. JDK-8370137 "C2: Ensure dependent memory accesses can't float past ReachabilityFence" ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3416934379 From valeriep at openjdk.org Fri Oct 17 20:21:02 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Fri, 17 Oct 2025 20:21:02 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: References: <2_eqasQo7DtbnrxwxuFYvl_yhVh7P6wzlxwPEl_DB-Q=.ff81a933-2c29-4d69-826b-d1ccf04d2e1c@github.com> <5SGZrrpgKtrf7IKtfEDPFb4LnggKNoeaZpFVGuHu-p4=.1ec8dd26-e125-4c8a-ba22-9006c7da4024@github.com> Message-ID: On Fri, 17 Oct 2025 20:01:21 GMT, Shawn M Emery wrote: >> These two methods were only the few that I was able to make that were compact and singular in purpose (gen round key, gen inverse round key) code as the coding style guidelines espouse. The rest of the methods' construction were dictated by performance improvements, where compactness came at the cost of interpreter speed. > > I did make changes based on your code to eliminate len and updates to variable names. Yes, I take a second look and maybe a smaller adjustments would work as well. E.g, 1) nit: method name `invGenRoundKeys` -> `genInvRoundKeys` 2) make this method static by passing `sessionKey[0]` and `rounds` as arguments, 3) no need for `len` since it's always `WB` 4) for the intermediate buffer of 4 words, can we not use `w` as this name is used in both the spec and genRoundKeys method as "Word array for the key schedule". It'd help people understand the code better if we adopt the same naming convention in "Algorithm 5 Pseudocode for KEYEXPANSIONEIC()", e.g. `temp` for the intermediate buffer and `dw` for the final result. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2441139964 From valeriep at openjdk.org Fri Oct 17 21:15:04 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Fri, 17 Oct 2025 21:15:04 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v10] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 20:07:05 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 45: > 43: final class AES_Crypt extends SymmetricCipher { > 44: > 45: // Number of words in a block nit: from the usage, e.g. `int nk = key.length / WB`;, it seems WB means "number of bytes in a word". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2441263128 From valeriep at openjdk.org Fri Oct 17 21:39:05 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Fri, 17 Oct 2025 21:39:05 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v10] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 20:07:05 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 134: > 132: }; > 133: > 134: private static final int[] T0 = { nit: add comment for all these precomputed lookup tables and their usage. Are these tables publicly available somewhere? I checked both spec in the class header and they don't have these included. I wonder if they are made available somewhere which corresponds with the current impl code better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2441320498 From duke at openjdk.org Fri Oct 17 22:19:26 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 17 Oct 2025 22:19:26 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v11] In-Reply-To: References: Message-ID: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Updates for code review comments from @valeriepeng ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/1102c609..e0741b17 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=09-10 Stats: 22 lines in 1 file changed: 20 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Fri Oct 17 22:19:28 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 17 Oct 2025 22:19:28 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v10] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 21:12:21 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates for code review comments from @valeriepeng > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 45: > >> 43: final class AES_Crypt extends SymmetricCipher { >> 44: >> 45: // Number of words in a block > > nit: from the usage, e.g. `int nk = key.length / WB`;, it seems WB means "number of bytes in a word". I agree, it should be bytes per word for number of keys (nk) calculation, so BW? I want to preserved words per block (WB) for maintainability (e.g., if we decide to implement Rijndael-256, where WB = 8). Fixed. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 134: > >> 132: }; >> 133: >> 134: private static final int[] T0 = { > > nit: add comment for all these precomputed lookup tables and their usage. > > Are these tables publicly available somewhere? I checked both spec in the class header and they don't have these included. I wonder if they are made available somewhere which corresponds with the current impl code better. I generated the tables separately, but their usage is referenced in the original specification cited in section 5.2.1. If made comments indicating of such. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2441377285 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2441377328 From fyang at openjdk.org Sat Oct 18 01:05:08 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 18 Oct 2025 01:05:08 GMT Subject: RFR: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V In-Reply-To: References: Message-ID: <4pe0PTQ-Z4vRVSs0U7rJKBKK1ArzZgIhkVNy7IW_EZ8=.0e2c73e6-8d21-4003-af48-804ef1d4cd9f@github.com> On Thu, 16 Oct 2025 15:23:28 GMT, Feilong Jiang wrote: > As discussed in https://github.com/openjdk/jdk/pull/27748#pullrequestreview-3341840431, the same issue occurs with the RISC-V port. > > Testing: > > - [x] tier1 - tier4 linux-riscv64 fastdebug Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27850#pullrequestreview-3352438525 From fyang at openjdk.org Sat Oct 18 01:05:10 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 18 Oct 2025 01:05:10 GMT Subject: RFR: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V In-Reply-To: <9tOiCAzqT5S7rt0Md8PUFy8s0az97uQVK2DPTPIo7Wo=.03840263-1742-42dd-9cf1-a867f0b9cff4@github.com> References: <6k8bFDES2UHSq3PJv8oj0sDdTDpBKnHMcifoA2szXSw=.73e161eb-6847-4632-8d20-78296ff091ec@github.com> <9tOiCAzqT5S7rt0Md8PUFy8s0az97uQVK2DPTPIo7Wo=.03840263-1742-42dd-9cf1-a867f0b9cff4@github.com> Message-ID: On Fri, 17 Oct 2025 07:10:05 GMT, Aleksey Shipilev wrote: > > > @shipilev @theRealAph : For the aarch64 counterpart, shouldn't the `ldarb` at [1] prevent the reordering of `STR` of PBC and `STLR` of RFE? It's a load instruction with acquire semantics. > > > > > > Yes, I was confused about this myself. The key thing for this particular issue: the _reader_ we need to sync up with is not `patch_bytecode`, it is the thread that _executes_ the patched bytecode. In other words, the _writer_ is `patch_bytecode`, and _reader_ is executing thread. > > So acquire barrier in `patch_bytecode` does not help this case, because it is a write path, it needs release, which aarch64 fix did. The read path needs some other synchronization for acquire-like semantics; in aarch64 we reasoned the control dependency on bytecode itself and the barrier in RFE resolution is already enough to do this. Nice analysis! I read it several times as well and I think I know what's going on now. Thanks. > RISCV is good on the read side, we just need this patch to fix the write: > > ``` > void InterpreterMacroAssembler::load_field_entry(Register cache, Register index, int bcp_offset) { > ... > // Get address of field entries array > ld(cache, Address(xcpool, ConstantPoolCache::field_entries_offset())); > addi(cache, cache, Array::base_offset_in_bytes()); > add(cache, cache, index); > // Prevents stale data from being read after the bytecode is patched to the fast bytecode > membar(MacroAssembler::LoadLoad); > } > ``` Yes, we only need the necessary barrier on the write side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27850#issuecomment-3417652631 From fjiang at openjdk.org Sat Oct 18 01:12:12 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 18 Oct 2025 01:12:12 GMT Subject: RFR: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V In-Reply-To: References: Message-ID: <_wdM0LvZTOKsXxF3XIG6euqTi9L7eiBRptQuZGncDnQ=.92c83bf1-87ac-4503-adec-6f255b43ddc7@github.com> On Thu, 16 Oct 2025 15:23:28 GMT, Feilong Jiang wrote: > As discussed in https://github.com/openjdk/jdk/pull/27748#pullrequestreview-3341840431, the same issue occurs with the RISC-V port. > > Testing: > > - [x] tier1 - tier4 linux-riscv64 fastdebug Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27850#issuecomment-3417656687 From fjiang at openjdk.org Sat Oct 18 01:12:13 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 18 Oct 2025 01:12:13 GMT Subject: Integrated: 8369947: Bytecode rewriting causes Java heap corruption on RISC-V In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 15:23:28 GMT, Feilong Jiang wrote: > As discussed in https://github.com/openjdk/jdk/pull/27748#pullrequestreview-3341840431, the same issue occurs with the RISC-V port. > > Testing: > > - [x] tier1 - tier4 linux-riscv64 fastdebug This pull request has now been integrated. Changeset: 46251993 Author: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/462519935827e25475f2fb35746ad81a14bc5da7 Stats: 22 lines in 3 files changed: 21 ins; 0 del; 1 mod 8369947: Bytecode rewriting causes Java heap corruption on RISC-V Reviewed-by: aph, jcking, fyang ------------- PR: https://git.openjdk.org/jdk/pull/27850 From fandreuzzi at openjdk.org Sat Oct 18 09:54:05 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Sat, 18 Oct 2025 09:54:05 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v4] In-Reply-To: References: Message-ID: <7dJKHoq6_LDsL_BAaorip6w78YYI_bh1qY5jWeb2ehk=.44cfd4c7-1111-4422-a3ad-aba03ca05d99@github.com> On Sat, 11 Oct 2025 18:25:48 GMT, Francesco Andreuzzi wrote: >> I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. >> >> Passes tier1 and tier2 (fastdebug). > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > update foundOne Marking as draft, I'll take some measurements to check if removing nmethod entry barriers for native stubs gives a measurable benefit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27742#issuecomment-3418137388 From fandreuzzi at openjdk.org Sat Oct 18 10:16:02 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Sat, 18 Oct 2025 10:16:02 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v4] In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 23:40:08 GMT, Dean Long wrote: >>> We could instead allow them to be cleaned up like regular nmethods. >> >> That sounds reasonable to me, native methods seem to be tracked like all other nmethods. >> >> Removing `is_native_method()` altogether from the condition was the first implementation I had, and as far as I remember there was no failure in tier1 or tier2. Should I propose this alternative implementation as part of this PR? > > I am tempted to say yes, for consistency, but it probably won't make much of a difference either way. But now I am wondering, if these cold native wrappers continue to be immortal, then do they really need to give them nmethod entry barriers? Removing the barrier could remove some overhead. Whatever direction we decide to go, it would be good to add a comment here explaining the decision and/or trade-offs. Is it actually possible to remove entry barriers for _any_ garbage collectible nmethod? How can we know an nmethod is not used anymore, even when it is made not entrant? `is_cold()` bails out when an nmethod does not support entry barriers: // On platforms that don't support nmethod entry barriers, we can't // trust the temporal aspect of the gc epochs. So we can't detect // cold nmethods on such platforms. So, the decision of removing entry barriers for native nmethods would make the memory leak I'm trying to fix here effectively unfixable? Let me know if I'm missing something. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27742#discussion_r2442009918 From wenanjian at openjdk.org Sat Oct 18 11:31:37 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 18 Oct 2025 11:31:37 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v11] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Merge branch 'openjdk:master' into aes_ctr - add assertion and change test - add zbb and zvbb check - Merge branch 'openjdk:master' into aes_ctr - Merge branch 'openjdk:master' into aes_ctr - fix the counter increase at limit and add test - change format - update reg use and instruction - change some name and format - delete useless Label, change L_judge_used to L_slow_loop - ... and 2 more: https://git.openjdk.org/jdk/compare/eff6439e...716825a4 ------------- Changes: https://git.openjdk.org/jdk/pull/25281/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=10 Stats: 283 lines in 3 files changed: 278 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From aph at openjdk.org Sat Oct 18 12:08:10 2025 From: aph at openjdk.org (Andrew Haley) Date: Sat, 18 Oct 2025 12:08:10 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v11] In-Reply-To: References: Message-ID: On Sat, 18 Oct 2025 11:31:37 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge branch 'openjdk:master' into aes_ctr > - add assertion and change test > - add zbb and zvbb check > - Merge branch 'openjdk:master' into aes_ctr > - Merge branch 'openjdk:master' into aes_ctr > - fix the counter increase at limit and add test > - change format > - update reg use and instruction > - change some name and format > - delete useless Label, change L_judge_used to L_slow_loop > - ... and 2 more: https://git.openjdk.org/jdk/compare/eff6439e...716825a4 src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2626: > 2624: // > 2625: address generate_counterMode_AESCrypt() { > 2626: assert(UseAESCTRIntrinsics, "need AES instructions (Zvkned extension) support"); It's hard for anyone to understand the control flow. If you look at the same routine in the AArch64 port you'll see plenty of comments to help the reader. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2442332565 From duke at openjdk.org Sun Oct 19 02:18:43 2025 From: duke at openjdk.org (Shawn M Emery) Date: Sun, 19 Oct 2025 02:18:43 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v12] In-Reply-To: References: Message-ID: <3nnB5pkNnYqYi6OAH3u83PNmiO607_xwHXCmCeIE7gA=.371aa791-952e-4143-9012-1728dbf31ae9@github.com> > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Updates for code review comments from @valeriepeng ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/e0741b17..5ea6933b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=10-11 Stats: 19 lines in 1 file changed: 0 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Sun Oct 19 02:18:44 2025 From: duke at openjdk.org (Shawn M Emery) Date: Sun, 19 Oct 2025 02:18:44 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: References: <2_eqasQo7DtbnrxwxuFYvl_yhVh7P6wzlxwPEl_DB-Q=.ff81a933-2c29-4d69-826b-d1ccf04d2e1c@github.com> <5SGZrrpgKtrf7IKtfEDPFb4LnggKNoeaZpFVGuHu-p4=.1ec8dd26-e125-4c8a-ba22-9006c7da4024@github.com> Message-ID: <7kU60fP_Aw4oiwfwMN48YfFwC70wVObcHQpSXqZvyC4=.5094fe4c-de7e-4a96-8000-121a9db66ae5@github.com> On Fri, 17 Oct 2025 20:18:24 GMT, Valerie Peng wrote: >> I did make changes based on your code to eliminate len and updates to variable names. > > Yes, I take a second look and maybe a smaller adjustments would work as well. E.g, > 1) nit: method name `invGenRoundKeys` -> `genInvRoundKeys` > 2) make this method static by passing `sessionKey[0]` and `rounds` as arguments, > 3) no need for `len` since it's always `WB` > 4) for the intermediate buffer of 4 words, can we not use `w` as this name is used in both the spec and genRoundKeys method as "Word array for the key schedule". It'd help people understand the code better if we adopt the same naming convention in "Algorithm 5 Pseudocode for KEYEXPANSIONEIC()", e.g. `temp` for the intermediate buffer and `dw` for the final result. Sorry, missed this comment in the melee. Re: 1) method name, agreed; 2) to static, agreed; 3) remove len, prior commit; 4) variable name alignment, agreed. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2442722654 From duke at openjdk.org Sun Oct 19 10:01:04 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 19 Oct 2025 10:01:04 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v7] In-Reply-To: References: Message-ID: <3HJu_S1E43xIC7b_KglB7EUTIqry7lLFmgvC75OlOwc=.aef64d7e-2a60-4fa7-934b-00adf7dd5e9c@github.com> > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: - Adjust long constant folding test as well - Adjust test, assert and comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26143/files - new: https://git.openjdk.org/jdk/pull/26143/files/bb9151e4..4bb2e0c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=05-06 Stats: 209 lines in 2 files changed: 187 ins; 3 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From duke at openjdk.org Sun Oct 19 10:24:05 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 19 Oct 2025 10:24:05 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v7] In-Reply-To: <3HJu_S1E43xIC7b_KglB7EUTIqry7lLFmgvC75OlOwc=.aef64d7e-2a60-4fa7-934b-00adf7dd5e9c@github.com> References: <3HJu_S1E43xIC7b_KglB7EUTIqry7lLFmgvC75OlOwc=.aef64d7e-2a60-4fa7-934b-00adf7dd5e9c@github.com> Message-ID: On Sun, 19 Oct 2025 10:01:04 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: > > - Adjust long constant folding test as well > - Adjust test, assert and comments Thanks for all your reviews! I've adjusted the tests and comments as per your suggestions. Please review again. And sorry for the long delay @mhaessig ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3419554041 From duke at openjdk.org Sun Oct 19 10:24:10 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 19 Oct 2025 10:24:10 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v6] In-Reply-To: <586YaR01-YPK1-N4UvVo-WM2HfTGvZz8lz3leP42BTA=.d125635f-069b-43ba-a623-86e26e693bb2@github.com> References: <586YaR01-YPK1-N4UvVo-WM2HfTGvZz8lz3leP42BTA=.d125635f-069b-43ba-a623-86e26e693bb2@github.com> Message-ID: On Mon, 25 Aug 2025 12:26:02 GMT, Emanuel Peter wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove too strict assert from old code path > > src/hotspot/share/opto/divnode.cpp line 508: > >> 506: >> 507: template >> 508: static const IntegerType* compute_generic_div_type(const IntegerType* i1, const IntegerType* i2, int widen) { > > Do we need the `generic` in the name? The `template` already suggests that it can be used for different types, right? > > Also: I'm wondering if we can somehow extend this for `UDivI` and `UDIvL`. > I suppose you would have to use the `_ulo` and `_uhi` instead of `_lo` and `_hi`. > > I'm not saying this all has to be done in this PR, but we could at least anticipate the extension to unsigned division. I've adjusted it to `compute_signed_div_type`. I would not worry about unsigned division now and would leave this for a later RFE. Would you perhaps create a issue for that? I have no permission for the JBS > src/hotspot/share/opto/divnode.cpp line 533: > >> 531: // Here i2 is entirely negative or entirely positive. >> 532: // Let d_min and d_max be the nonzero endpoints of i2. >> 533: // Then a/b is monotonic in a and in b (when b keeps the same sign). > > I think you should talk about `i1` and `i2`. You have not defined `a` and `b` up to now. Yeah, fixed > src/hotspot/share/opto/divnode.cpp line 547: > >> 545: // Special overflow case: min_val / (-1) == min_val (cf. JVMS?6.5 idiv/ldiv) >> 546: // We need to be careful that we never run min_val / (-1) in C++ code, as this overflow is UB there >> 547: // We also must include min_val in the output if i1->_lo == min_val and i2->_hi. > > `if i1->_lo == min_val and i2->_hi` I cannot parse this. > The `if` suggests that there will be a condition following. The `and` confirms that. > Then I see `i1->_lo == min_val` which is a boolean condition. But `i2->_hi` is not. > Ah, did you mean this the condition from below? > Suggestion: > > // We also must include min_val in the output if i1->_lo == min_val and i2->_hi == -1. I've stripped this line entirely. It doesn't make much sense in this version and is a leftover > src/hotspot/share/opto/divnode.cpp line 552: > >> 550: NativeType new_lo = min_val; >> 551: NativeType new_hi; >> 552: // compute new_hi for non-constant divisor and/or dividend. > > You suggest we only land here in non-constant cases. Is that true? > What if `i1=min_val` and `i2=-1`? Yeah this comment was outdated. I've reworded it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443238083 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443237457 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443237301 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443237018 From duke at openjdk.org Sun Oct 19 10:24:13 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 19 Oct 2025 10:24:13 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v6] In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 09:19:24 GMT, Manuel H?ssig wrote: >> src/hotspot/share/opto/divnode.cpp line 543: >> >>> 541: NativeType i2_hi = i2->_hi == 0 ? -1 : i2->_hi; >>> 542: NativeType min_val = std::numeric_limits::min(); >>> 543: assert(min_val == min_jint || min_val == min_jlong, "min has to be either min_jint or min_jlong"); >> >> I find this assert a little confusing, as its outcome is completely independent from the inputs of the function. I would remove it > > It depends on the template type. I would rather keep it to sanity check that the minimum value of `NativeType` is as we expect. If that does not hold, the optimization below is potentially wrong and has UB. As @mhaessig said, this is a simple sanity check and helps establishing a mental model of what min is (IMO). I've turned it into a static_assert. If we need more types later on, I can expand it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443238389 From duke at openjdk.org Sun Oct 19 10:24:14 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 19 Oct 2025 10:24:14 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v6] In-Reply-To: References: Message-ID: <1Ms-CvPEHV7FmwDQSw08ntE8w98y1hTTlW1H4EjFl44=.768c656c-d1b1-4efc-9e9a-fd30478050d5@github.com> On Mon, 25 Aug 2025 14:27:23 GMT, Johannes Graham wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove too strict assert from old code path > > test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 49: > >> 47: public int testIntConstantFolding() { >> 48: // All constants available during parsing >> 49: return 50 / 25; > > This will be constant-folded by javac, so won't exercise c2 Whoops, thanks! Fixed in the latest commit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443236876 From hgreule at openjdk.org Sun Oct 19 13:06:08 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sun, 19 Oct 2025 13:06:08 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v7] In-Reply-To: <3HJu_S1E43xIC7b_KglB7EUTIqry7lLFmgvC75OlOwc=.aef64d7e-2a60-4fa7-934b-00adf7dd5e9c@github.com> References: <3HJu_S1E43xIC7b_KglB7EUTIqry7lLFmgvC75OlOwc=.aef64d7e-2a60-4fa7-934b-00adf7dd5e9c@github.com> Message-ID: On Sun, 19 Oct 2025 10:01:04 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: > > - Adjust long constant folding test as well > - Adjust test, assert and comments I've got a few comments src/hotspot/share/opto/divnode.cpp line 651: > 649: if( (t1 == bot) || (t2 == bot) || > 650: (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM) ) > 651: return bot; I think this can be removed - and in cases where one side is the local bottom (i.e., `TypeInt::INT`) and the other is more restricted, the result should even more precise after removing. Could you also add tests for such cases? For example dividing `TypeInt::INT` by some interval with a lower bound of 2, the resulting range can be narrowed. Similarly, dividing some small interval `[lo, hi]` by `TypeInt::INT` should result in a similar interval with bounds adjusted to deal with sign changes. If I didn't miss something, your code should already be able to deal with this, it's just this early return here preventing it. src/hotspot/share/opto/divnode.cpp line 657: > 655: const TypeInt *i1 = t1->is_int(); > 656: const TypeInt *i2 = t2->is_int(); > 657: int widen = MAX2(i1->_widen, i2->_widen); This line can be moved into `compute_signed_div_type` as well. I guess it would also be okay to adjust the formatting in the rest of the method, i.e., `T *v` -> `T* v` and `if( a )` -> `if (a)` etc. src/hotspot/share/opto/divnode.cpp line 662: > 660: if (d == 0) { > 661: // this division will always throw an exception > 662: return Type::TOP; I'd personally prefer to have that code directly below the other cases returning TOP. You can also probably simplify it by testing for `i2 == TypeInt::ZERO` instead. test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 55: > 53: // All constants available during parsing > 54: return getIntConstant(50) / getIntConstant(25); > 55: } Could you also add a test with randomly generated constants? Then you also don't need to use `getIntConstant` here. It would also make sense to write a short comment for `getIntConstant` to clarify why it's needed. When dividing by a constant you might also want to make sure that other nodes generated in `transform_int/long_divide` also aren't present here anymore. I'd also assume that e.g., `v / C > Integer.MAX_VALUE / C` for a positive constant C wouldn't be fully optimized away due to https://bugs.openjdk.org/browse/JDK-8366815, can you confirm that? ------------- PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3354507481 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443289805 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443296455 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443295564 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443302805 From hgreule at openjdk.org Sun Oct 19 17:36:37 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sun, 19 Oct 2025 17:36:37 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation Message-ID: The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. Please let me know what you think. ------------- Commit messages: - delay integral Div/Mod Ideal() until IGVN - test Changes: https://git.openjdk.org/jdk/pull/27886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366815 Stats: 108 lines in 3 files changed: 106 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27886/head:pull/27886 PR: https://git.openjdk.org/jdk/pull/27886 From duke at openjdk.org Sun Oct 19 19:20:58 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 19 Oct 2025 19:20:58 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v8] In-Reply-To: References: Message-ID: > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: - Add additional nodes to fail conditions to detect idealized/transformed DivI Nodes that did not constant fold - Remove checks for bottom and reorganize DivI/DivL Value functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26143/files - new: https://git.openjdk.org/jdk/pull/26143/files/4bb2e0c4..e2a2bcdf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=06-07 Stats: 104 lines in 2 files changed: 31 ins; 32 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From duke at openjdk.org Sun Oct 19 19:21:01 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 19 Oct 2025 19:21:01 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v7] In-Reply-To: References: <3HJu_S1E43xIC7b_KglB7EUTIqry7lLFmgvC75OlOwc=.aef64d7e-2a60-4fa7-934b-00adf7dd5e9c@github.com> Message-ID: On Sun, 19 Oct 2025 12:31:06 GMT, Hannes Greule wrote: >> Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: >> >> - Adjust long constant folding test as well >> - Adjust test, assert and comments > > src/hotspot/share/opto/divnode.cpp line 651: > >> 649: if( (t1 == bot) || (t2 == bot) || >> 650: (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM) ) >> 651: return bot; > > I think this can be removed - and in cases where one side is the local bottom (i.e., `TypeInt::INT`) and the other is more restricted, the result should even more precise after removing. Could you also add tests for such cases? For example dividing `TypeInt::INT` by some interval with a lower bound of 2, the resulting range can be narrowed. Similarly, dividing some small interval `[lo, hi]` by `TypeInt::INT` should result in a similar interval with bounds adjusted to deal with sign changes. If I didn't miss something, your code should already be able to deal with this, it's just this early return here preventing it. I think you are correct. The only part where I am not sure is if every instance where i1/i2 can be Type::BOTTOM, i1/i2 can be cast to TypeInt. Can someone please confirm the removal is safe? > test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 55: > >> 53: // All constants available during parsing >> 54: return getIntConstant(50) / getIntConstant(25); >> 55: } > > Could you also add a test with randomly generated constants? Then you also don't need to use `getIntConstant` here. It would also make sense to write a short comment for `getIntConstant` to clarify why it's needed. > > When dividing by a constant you might also want to make sure that other nodes generated in `transform_int/long_divide` also aren't present here anymore. I'd also assume that e.g., `v / C > Integer.MAX_VALUE / C` for a positive constant C wouldn't be fully optimized away due to https://bugs.openjdk.org/browse/JDK-8366815, can you confirm that? I've changed this test to use random constants and added random constants. Together with stricter IR verification, I also found two cases where JDK-8366815 actually causes two long division to not constant fold due to the transformation being too eager. I've linked the issue in a comment on the two cases and disabled IR verification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443450993 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443450184 From hgreule at openjdk.org Sun Oct 19 20:07:08 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sun, 19 Oct 2025 20:07:08 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v7] In-Reply-To: References: <3HJu_S1E43xIC7b_KglB7EUTIqry7lLFmgvC75OlOwc=.aef64d7e-2a60-4fa7-934b-00adf7dd5e9c@github.com> Message-ID: On Sun, 19 Oct 2025 19:15:06 GMT, Tobias Hotz wrote: >> test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 55: >> >>> 53: // All constants available during parsing >>> 54: return getIntConstant(50) / getIntConstant(25); >>> 55: } >> >> Could you also add a test with randomly generated constants? Then you also don't need to use `getIntConstant` here. It would also make sense to write a short comment for `getIntConstant` to clarify why it's needed. >> >> When dividing by a constant you might also want to make sure that other nodes generated in `transform_int/long_divide` also aren't present here anymore. I'd also assume that e.g., `v / C > Integer.MAX_VALUE / C` for a positive constant C wouldn't be fully optimized away due to https://bugs.openjdk.org/browse/JDK-8366815, can you confirm that? > > I've changed this test to use random constants and added random constants. > Together with stricter IR verification, I also found two cases where JDK-8366815 actually causes two long division to not constant fold due to the transformation being too eager. I've linked the issue in a comment on the two cases and disabled IR verification. Interesting find! I'm wondering if division by a constant (but with a variable dividend) after idealization can *always* give a type as precise as your implementation. I.e., you divide by 12 in the `testIntRange` test, but this seems to already constant-fold on the current master branch. If your changes don't improve anything for constant divisors, that's still totally fine (because it helps with non-constant divisors, and such tests are still valuable to avoid regressions), but it would be interesting to know. I created https://bugs.openjdk.org/browse/JDK-8370196 regarding the MulHiNode deficit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2443468855 From duke at openjdk.org Mon Oct 20 01:22:29 2025 From: duke at openjdk.org (ExE Boss) Date: Mon, 20 Oct 2025 01:22:29 GMT Subject: RFR: 8358749: Fix input checks in Vector API intrinsics In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 06:14:34 GMT, Aleksey Shipilev wrote: >> We have been carrying this patch in Leyden/premain for a while: https://github.com/openjdk/leyden/commit/7faed7fc5c8e1bbd9a16ab22673a77099396179c. I believe it deserves to be in mainline. I polished it a little further. >> >> It is _mostly_ a cleanup, but there are also new checks, on the paths where we do take constants off the arguments. In those cases, I believe the alternative is compiler SEGV-ing. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `hotspot_vector_1 hotspot_vector_2` >> - [x] Linux x86_64 server fastdebug, `jdk_vector` > > Thanks for reviews! Here goes. @shipilev > I believe that comment should be in another bug. Can?thou?create that?bug?report? (I?don?t?have an?OpenJDK?account to?create it?with) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25673#issuecomment-3420189827 From xgong at openjdk.org Mon Oct 20 02:49:14 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 20 Oct 2025 02:49:14 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: <8cCO3_XSLXtzulIYd3AVHvzQzgkQ9CVVepy61I2QkiI=.8fdee0d5-520a-4987-9b55-cc1b559f37aa@github.com> Message-ID: On Thu, 16 Oct 2025 03:11:27 GMT, Xiaohong Gong wrote: > > I suspect it's likely more complex overall adding a slice operation to mask, that is really only needed for a specific case. (A more general operation would be compress/expand of the mask bits, but i don't believe there are hardware instructions for such operations on mask registers.) > > Yes, I agree with you. Personally, I?d prefer not to introduce such APIs for a vector mask. > Hi @PaulSandoz , how about we change it to a private method for mask, and implement it with `Vector.slice()` as before? My only concern is the performance of this method, that we have to change the mask to vector and change it back after slice. > > > In my view adding a part parameter is a compromise and seems less complex that requiring N index vectors, and it fits with a general pattern we have around parts of the vector. It moves the specialized operation requirements on the mask into the area where it is needed rather than trying to generalize in a manner that i don't think is appropriate in the mask API. > > Yeah, it can sound reasonable that an API can finish a simple task and then choose to move the results to different part of a vector based on an offset. Consider `loadWithMap` is used as a VM interface, we have to add checks for the passed `origin` against the vector length. Besides, we have to support the same cross-lane shift for other vector types like int/long/double. I will prepare a prototype for this. Thanks for your inputs @PaulSandoz . The `origin` passed to hotspot compiler is required to be a constant, or the operation for the cross-lane shift will be much more complex. Once the passed `origin` is not a constant, the whole gather-load API intrinsifaction will fail and fall-back to java, which is a risk to the performance. If accepting it as a variable in compiler, the implementation is just the same with `Vector.unslice()/unslice()`, which calls the vector rearrange and blend API. Hence, I'd like not move such operation part to compiler. WDYT @PaulSandoz ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3420337535 From xgong at openjdk.org Mon Oct 20 03:03:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 20 Oct 2025 03:03:07 GMT Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of vector compress [v5] In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 06:24:22 GMT, erifan wrote: >> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions. >> >> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects: >> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction. >> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid. >> >> This pull request introduces the following changes: >> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput. >> 2. Eliminates unnecessary compress operations for partial subword type cases. >> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure. >> >> Benchmark results demonstrate that these changes significantly improve performance. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 >> Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 >> Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 >> Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 >> >> >> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Enable the IR test for x86 > - Merge branch 'master' into JDK-8366333-compress > - Improve coding style a bit > - Improve some code style > - Merge branch 'master' into JDK-8366333-compress > - Merge branch 'master' into JDK-8366333-compress > - 8366333: AArch64: Enhance SVE subword type implementation of vector compress > > The AArch64 SVE and SVE2 architectures lack an instruction suitable for > subword-type `compress` operations. Therefore, the current implementation > uses the 32-bit SVE `compact` instruction to compress subword types by > first widening the high and low parts to 32 bits, compressing them, and > then narrowing them back to their original type. Finally, the high and > low parts are merged using the `index + tbl` instructions. > > This approach is significantly slower compared to architectures with native > support. After evaluating all available AArch64 SVE instructions and > experimenting with various implementations?such as looping over the active > elements, extraction, and insertion?I confirmed that the existing algorithm > is optimal given the instruction set. However, there is still room for > optimization in the following two aspects: > 1. Merging with `index + tbl` is suboptimal due to the high latency of > the `index` instruction. > 2. For partial subword types, operations to the highest half are unnecessary > because those bits are invalid. > > This pull request introduces the following changes: > 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which > offer lower latency and higher throughput. > 2. Eliminates unnecessary compress operations for partial subword type cases. > 3. For `sve_compress_byte`, one less temporary register is used to alleviate > potential register pressure. > > Benchmark results demonstrate that these changes significantly improve performance. > > Benchmarks on Nvidia Grace machine with 128-bit SVE: > ``` > Benchmark Unit Before Error After Error Uplift > Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 > Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 > Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 > Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 > ``` > > This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, > and all... LGTM! Thanks! Reviewed internally. ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/27188#pullrequestreview-3354913291 From duke at openjdk.org Mon Oct 20 03:08:04 2025 From: duke at openjdk.org (erifan) Date: Mon, 20 Oct 2025 03:08:04 GMT Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of vector compress [v5] In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 06:24:22 GMT, erifan wrote: >> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions. >> >> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects: >> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction. >> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid. >> >> This pull request introduces the following changes: >> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput. >> 2. Eliminates unnecessary compress operations for partial subword type cases. >> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure. >> >> Benchmark results demonstrate that these changes significantly improve performance. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 >> Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 >> Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 >> Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 >> >> >> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Enable the IR test for x86 > - Merge branch 'master' into JDK-8366333-compress > - Improve coding style a bit > - Improve some code style > - Merge branch 'master' into JDK-8366333-compress > - Merge branch 'master' into JDK-8366333-compress > - 8366333: AArch64: Enhance SVE subword type implementation of vector compress > > The AArch64 SVE and SVE2 architectures lack an instruction suitable for > subword-type `compress` operations. Therefore, the current implementation > uses the 32-bit SVE `compact` instruction to compress subword types by > first widening the high and low parts to 32 bits, compressing them, and > then narrowing them back to their original type. Finally, the high and > low parts are merged using the `index + tbl` instructions. > > This approach is significantly slower compared to architectures with native > support. After evaluating all available AArch64 SVE instructions and > experimenting with various implementations?such as looping over the active > elements, extraction, and insertion?I confirmed that the existing algorithm > is optimal given the instruction set. However, there is still room for > optimization in the following two aspects: > 1. Merging with `index + tbl` is suboptimal due to the high latency of > the `index` instruction. > 2. For partial subword types, operations to the highest half are unnecessary > because those bits are invalid. > > This pull request introduces the following changes: > 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which > offer lower latency and higher throughput. > 2. Eliminates unnecessary compress operations for partial subword type cases. > 3. For `sve_compress_byte`, one less temporary register is used to alleviate > potential register pressure. > > Benchmark results demonstrate that these changes significantly improve performance. > > Benchmarks on Nvidia Grace machine with 128-bit SVE: > ``` > Benchmark Unit Before Error After Error Uplift > Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 > Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 > Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 > Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 > ``` > > This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, > and all... I have tested a lot of different configurations on both aarch64 and x64, including 128/256/512 bits SVE2/SVE/NEON, AVX3/2/1, SSE4/3/2/1. All tests passed, so I'll integrate the PR, thanks for all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27188#issuecomment-3420370830 From duke at openjdk.org Mon Oct 20 03:08:05 2025 From: duke at openjdk.org (duke) Date: Mon, 20 Oct 2025 03:08:05 GMT Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of vector compress [v5] In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 06:24:22 GMT, erifan wrote: >> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions. >> >> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects: >> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction. >> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid. >> >> This pull request introduces the following changes: >> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput. >> 2. Eliminates unnecessary compress operations for partial subword type cases. >> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure. >> >> Benchmark results demonstrate that these changes significantly improve performance. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 >> Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 >> Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 >> Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 >> >> >> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Enable the IR test for x86 > - Merge branch 'master' into JDK-8366333-compress > - Improve coding style a bit > - Improve some code style > - Merge branch 'master' into JDK-8366333-compress > - Merge branch 'master' into JDK-8366333-compress > - 8366333: AArch64: Enhance SVE subword type implementation of vector compress > > The AArch64 SVE and SVE2 architectures lack an instruction suitable for > subword-type `compress` operations. Therefore, the current implementation > uses the 32-bit SVE `compact` instruction to compress subword types by > first widening the high and low parts to 32 bits, compressing them, and > then narrowing them back to their original type. Finally, the high and > low parts are merged using the `index + tbl` instructions. > > This approach is significantly slower compared to architectures with native > support. After evaluating all available AArch64 SVE instructions and > experimenting with various implementations?such as looping over the active > elements, extraction, and insertion?I confirmed that the existing algorithm > is optimal given the instruction set. However, there is still room for > optimization in the following two aspects: > 1. Merging with `index + tbl` is suboptimal due to the high latency of > the `index` instruction. > 2. For partial subword types, operations to the highest half are unnecessary > because those bits are invalid. > > This pull request introduces the following changes: > 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which > offer lower latency and higher throughput. > 2. Eliminates unnecessary compress operations for partial subword type cases. > 3. For `sve_compress_byte`, one less temporary register is used to alleviate > potential register pressure. > > Benchmark results demonstrate that these changes significantly improve performance. > > Benchmarks on Nvidia Grace machine with 128-bit SVE: > ``` > Benchmark Unit Before Error After Error Uplift > Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 > Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 > Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 > Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 > ``` > > This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, > and all... @erifan Your change (at version c75df30bacb32e446bbe1f6b9eb2916538285609) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27188#issuecomment-3420373747 From epeter at openjdk.org Mon Oct 20 05:53:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 05:53:52 GMT Subject: RFR: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands [v2] In-Reply-To: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> References: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> Message-ID: > **TLDR** `is_NaN` -> `is_zero`, just like the code comment says. > > Thanks to @mhaessig for debugging the ARM32 bug below. He found the buggy line of code. > > ---------------------------------------- > > **Details** > > It seems there is a little "typo" (logic error) in `MemPointerParser::canonicalize_raw_summands` that slipped through the cracks in https://github.com/openjdk/jdk/pull/24278. The JavaFuzzer now found an example, and independently the issue was also reported on ARM32 [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578). > > Filtering out `NaN` instead of `zero` for the `scaleL` has two manifestations: > - If `scaleL` is zero, but does not get filtered out even though it should be: we hit the assert in `MemPointerSummand` constructor, `assert(!_scale.is_zero(), "non-zero scale");`. > - See [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578), though those tests seem to only fail on ARM32, and nowhere else. > - I was able to construct a `MemorySegment` regression test, see `TestMemorySegmentFilterSummands.test1`. I suspect that the ARM32 failures happened on an array, as it failed in places like `BigInteger::implMultiplyToLen`. But now I was able to reproduce it with native memory, to get a pointer expression that has the same cancellation issue. > - If `scaleL` is `NaN`, and gets filtered even though it should not be: We get a non-trivial MemPointer that is missing a summand. So we will succeed in optimizing, but with wrong assumptions. We generate a runtime aliasing check that is incorrect, leading to wrong results. > - This was reported by the fuzzer, see attached `TestDoNotFilterNaNSummands`. > - I was also able to create a simpler example with `MemorySegments`, see attached `TestMemorySegmentFilterSummands.test2`. > > **Why did this slip through the cracks?** > > In https://github.com/openjdk/jdk/pull/24278 I added pretty extensive testing, even fuzzer style tests, see `TestAliasingFuzzer.java`. But I think all of those tests exercise `scale` that are in "nice" [int ranges](https://github.com/openjdk/jdk/pull/24278/files#diff-26de03e864a492fe8aa8178818968f2097b99cf36a763605e2fb11fbc04eedffR303-R322). Also the JavaFuzzer does not directly generate such long constants for array accesses (not possible without Unsafe I think), we were lucky that it generated the index with `%` that got optimized to some magic long constant. > > There is already an RFE filed for improvements to `TestAliasingFuzzer.java`: [JDK-8365985](https://bugs.openjdk.org/browse/JDK-836... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentFilterSummands.java Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27848/files - new: https://git.openjdk.org/jdk/pull/27848/files/3398c465..e19a22f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27848&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27848&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27848/head:pull/27848 PR: https://git.openjdk.org/jdk/pull/27848 From epeter at openjdk.org Mon Oct 20 05:53:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 05:53:56 GMT Subject: RFR: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands [v2] In-Reply-To: <_WGjwVyk2zBmssCwv220X1CWZAQcAPB0uJJCqTaN9EU=.31ca4bd1-622e-4579-9bfc-7f273aac069d@github.com> References: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> <_WGjwVyk2zBmssCwv220X1CWZAQcAPB0uJJCqTaN9EU=.31ca4bd1-622e-4579-9bfc-7f273aac069d@github.com> Message-ID: On Fri, 17 Oct 2025 13:09:49 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentFilterSummands.java >> >> Co-authored-by: Manuel H?ssig > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentFilterSummands.java line 71: > >> 69: new Scenario(1, "-XX:+AlignVector", "-XX:-ShortRunningLongLoop"), >> 70: new Scenario(2, "-XX:-AlignVector", "-XX:+ShortRunningLongLoop"), >> 71: new Scenario(3, "-XX:+AlignVector", "-XX:+ShortRunningLongLoop")); > > That might be the perfect opportunity to break out the cross-product scenario: > Suggestion: > > f.addCrossProductScenarios(Set.of("-XX:-AlignVector", "-XX:+AlignVector"), > Set.of("-XX:-ShortRunningLongLoop", "-XX:+ShortRunningLoop); Nice idea, let me try that :) > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentFilterSummands.java line 82: > >> 80: applyIfPlatform = {"64-bit", "true"}, >> 81: applyIf = {"AlignVector", "false"}, >> 82: applyIfCPUFeatureOr = {"avx", "true", "asimd", "true"}) > > I always forget what the best practice is regarding detecting CPU features to not break the CIs of riscv and others. But this should be fine, since you are matching for the CPU features, right? Yes, I do this everywhere, and have not gotten any complaints ;) I think we first check CPU features / platform, and only then flags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27848#discussion_r2443865579 PR Review Comment: https://git.openjdk.org/jdk/pull/27848#discussion_r2443865330 From epeter at openjdk.org Mon Oct 20 05:59:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 05:59:07 GMT Subject: RFR: 8369569: Rename methods in regmask.hpp to conform with HotSpot coding style [v2] In-Reply-To: <8KTiD_-r-3Z3BrfjO2hghq5rdPEukPFQ1TIwDnIvM90=.a2b6d535-38ce-4184-a67a-5dcce6c4d6b7@github.com> References: <8KTiD_-r-3Z3BrfjO2hghq5rdPEukPFQ1TIwDnIvM90=.a2b6d535-38ce-4184-a67a-5dcce6c4d6b7@github.com> Message-ID: On Fri, 17 Oct 2025 14:17:08 GMT, Daniel Lund?n wrote: >> A number of methods in regmask.hpp do not conform with the HotSpot coding style. We should make sure they do. >> >> ### Changeset >> - Rename methods in `regmask.hpp` to conform with HotSpot coding style. >> - Similarly rename directly related methods in `chaitin.hpp`. >> - Rename the constant register masks `All` and `Empty` to `ALL` and `EMPTY`. >> - Fix a few additional code style issues at lines touched by the changeset. >> >> Note: this is a syntax-only changeset (no functional changes). >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500704336) >> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27817#pullrequestreview-3355094444 From epeter at openjdk.org Mon Oct 20 05:59:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 05:59:46 GMT Subject: RFR: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands [v3] In-Reply-To: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> References: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> Message-ID: > **TLDR** `is_NaN` -> `is_zero`, just like the code comment says. > > Thanks to @mhaessig for debugging the ARM32 bug below. He found the buggy line of code. > > ---------------------------------------- > > **Details** > > It seems there is a little "typo" (logic error) in `MemPointerParser::canonicalize_raw_summands` that slipped through the cracks in https://github.com/openjdk/jdk/pull/24278. The JavaFuzzer now found an example, and independently the issue was also reported on ARM32 [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578). > > Filtering out `NaN` instead of `zero` for the `scaleL` has two manifestations: > - If `scaleL` is zero, but does not get filtered out even though it should be: we hit the assert in `MemPointerSummand` constructor, `assert(!_scale.is_zero(), "non-zero scale");`. > - See [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578), though those tests seem to only fail on ARM32, and nowhere else. > - I was able to construct a `MemorySegment` regression test, see `TestMemorySegmentFilterSummands.test1`. I suspect that the ARM32 failures happened on an array, as it failed in places like `BigInteger::implMultiplyToLen`. But now I was able to reproduce it with native memory, to get a pointer expression that has the same cancellation issue. > - If `scaleL` is `NaN`, and gets filtered even though it should not be: We get a non-trivial MemPointer that is missing a summand. So we will succeed in optimizing, but with wrong assumptions. We generate a runtime aliasing check that is incorrect, leading to wrong results. > - This was reported by the fuzzer, see attached `TestDoNotFilterNaNSummands`. > - I was also able to create a simpler example with `MemorySegments`, see attached `TestMemorySegmentFilterSummands.test2`. > > **Why did this slip through the cracks?** > > In https://github.com/openjdk/jdk/pull/24278 I added pretty extensive testing, even fuzzer style tests, see `TestAliasingFuzzer.java`. But I think all of those tests exercise `scale` that are in "nice" [int ranges](https://github.com/openjdk/jdk/pull/24278/files#diff-26de03e864a492fe8aa8178818968f2097b99cf36a763605e2fb11fbc04eedffR303-R322). Also the JavaFuzzer does not directly generate such long constants for array accesses (not possible without Unsafe I think), we were lucky that it generated the index with `%` that got optimized to some magic long constant. > > There is already an RFE filed for improvements to `TestAliasingFuzzer.java`: [JDK-8365985](https://bugs.openjdk.org/browse/JDK-836... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge branch 'master' into JDK-8369902-SW-VPointer-NaN-zero - Update test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentFilterSummands.java Co-authored-by: Manuel H?ssig - add comments to TestAliasingFuzzer.java - typo - add fuzzer test - test improvements and fix - second test - rename test - JDK-8369902 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27848/files - new: https://git.openjdk.org/jdk/pull/27848/files/e19a22f7..3aa0dc1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27848&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27848&range=01-02 Stats: 17082 lines in 502 files changed: 8392 ins; 6836 del; 1854 mod Patch: https://git.openjdk.org/jdk/pull/27848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27848/head:pull/27848 PR: https://git.openjdk.org/jdk/pull/27848 From epeter at openjdk.org Mon Oct 20 05:59:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 05:59:47 GMT Subject: RFR: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands [v3] In-Reply-To: <_WGjwVyk2zBmssCwv220X1CWZAQcAPB0uJJCqTaN9EU=.31ca4bd1-622e-4579-9bfc-7f273aac069d@github.com> References: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> <_WGjwVyk2zBmssCwv220X1CWZAQcAPB0uJJCqTaN9EU=.31ca4bd1-622e-4579-9bfc-7f273aac069d@github.com> Message-ID: On Fri, 17 Oct 2025 13:16:28 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8369902-SW-VPointer-NaN-zero >> - Update test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentFilterSummands.java >> >> Co-authored-by: Manuel H?ssig >> - add comments to TestAliasingFuzzer.java >> - typo >> - add fuzzer test >> - test improvements and fix >> - second test >> - rename test >> - JDK-8369902 > > Thank you for fixing my bug as well and doing the work to find a 64-bit reproducer, @eme64! Also, thanks for providing an explanation for the NaNs in the MemPointer parsing. > > The change looks good to me. I only have a suggestion to simplify your scenarios. @mhaessig Thanks for reviewing and the suggestion :) I addressed both your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27848#issuecomment-3420672630 From wenanjian at openjdk.org Mon Oct 20 06:23:11 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 20 Oct 2025 06:23:11 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v11] In-Reply-To: References: Message-ID: On Sat, 18 Oct 2025 12:05:50 GMT, Andrew Haley wrote: >> Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge branch 'openjdk:master' into aes_ctr >> - add assertion and change test >> - add zbb and zvbb check >> - Merge branch 'openjdk:master' into aes_ctr >> - Merge branch 'openjdk:master' into aes_ctr >> - fix the counter increase at limit and add test >> - change format >> - update reg use and instruction >> - change some name and format >> - delete useless Label, change L_judge_used to L_slow_loop >> - ... and 2 more: https://git.openjdk.org/jdk/compare/eff6439e...716825a4 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2626: > >> 2624: // >> 2625: address generate_counterMode_AESCrypt() { >> 2626: assert(UseAESCTRIntrinsics, "need AES instructions (Zvkned extension) support"); > > It's hard for anyone to understand the control flow. If you look at the same routine in the AArch64 port you'll see plenty of comments to help the reader. Thanks, I'll try to add some more comments and maybe a pseudocode about my control flow refer to aarch64 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2443904661 From epeter at openjdk.org Mon Oct 20 07:39:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 07:39:49 GMT Subject: RFR: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands [v4] In-Reply-To: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> References: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> Message-ID: > **TLDR** `is_NaN` -> `is_zero`, just like the code comment says. > > Thanks to @mhaessig for debugging the ARM32 bug below. He found the buggy line of code. > > ---------------------------------------- > > **Details** > > It seems there is a little "typo" (logic error) in `MemPointerParser::canonicalize_raw_summands` that slipped through the cracks in https://github.com/openjdk/jdk/pull/24278. The JavaFuzzer now found an example, and independently the issue was also reported on ARM32 [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578). > > Filtering out `NaN` instead of `zero` for the `scaleL` has two manifestations: > - If `scaleL` is zero, but does not get filtered out even though it should be: we hit the assert in `MemPointerSummand` constructor, `assert(!_scale.is_zero(), "non-zero scale");`. > - See [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578), though those tests seem to only fail on ARM32, and nowhere else. > - I was able to construct a `MemorySegment` regression test, see `TestMemorySegmentFilterSummands.test1`. I suspect that the ARM32 failures happened on an array, as it failed in places like `BigInteger::implMultiplyToLen`. But now I was able to reproduce it with native memory, to get a pointer expression that has the same cancellation issue. > - If `scaleL` is `NaN`, and gets filtered even though it should not be: We get a non-trivial MemPointer that is missing a summand. So we will succeed in optimizing, but with wrong assumptions. We generate a runtime aliasing check that is incorrect, leading to wrong results. > - This was reported by the fuzzer, see attached `TestDoNotFilterNaNSummands`. > - I was also able to create a simpler example with `MemorySegments`, see attached `TestMemorySegmentFilterSummands.test2`. > > **Why did this slip through the cracks?** > > In https://github.com/openjdk/jdk/pull/24278 I added pretty extensive testing, even fuzzer style tests, see `TestAliasingFuzzer.java`. But I think all of those tests exercise `scale` that are in "nice" [int ranges](https://github.com/openjdk/jdk/pull/24278/files#diff-26de03e864a492fe8aa8178818968f2097b99cf36a763605e2fb11fbc04eedffR303-R322). Also the JavaFuzzer does not directly generate such long constants for array accesses (not possible without Unsafe I think), we were lucky that it generated the index with `%` that got optimized to some magic long constant. > > There is already an RFE filed for improvements to `TestAliasingFuzzer.java`: [JDK-8365985](https://bugs.openjdk.org/browse/JDK-836... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix up manuel's suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27848/files - new: https://git.openjdk.org/jdk/pull/27848/files/3aa0dc1a..9d550a3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27848&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27848&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27848/head:pull/27848 PR: https://git.openjdk.org/jdk/pull/27848 From mhaessig at openjdk.org Mon Oct 20 07:48:04 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 20 Oct 2025 07:48:04 GMT Subject: RFR: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands [v4] In-Reply-To: References: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> Message-ID: On Mon, 20 Oct 2025 07:39:49 GMT, Emanuel Peter wrote: >> **TLDR** `is_NaN` -> `is_zero`, just like the code comment says. >> >> Thanks to @mhaessig for debugging the ARM32 bug below. He found the buggy line of code. >> >> ---------------------------------------- >> >> **Details** >> >> It seems there is a little "typo" (logic error) in `MemPointerParser::canonicalize_raw_summands` that slipped through the cracks in https://github.com/openjdk/jdk/pull/24278. The JavaFuzzer now found an example, and independently the issue was also reported on ARM32 [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578). >> >> Filtering out `NaN` instead of `zero` for the `scaleL` has two manifestations: >> - If `scaleL` is zero, but does not get filtered out even though it should be: we hit the assert in `MemPointerSummand` constructor, `assert(!_scale.is_zero(), "non-zero scale");`. >> - See [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578), though those tests seem to only fail on ARM32, and nowhere else. >> - I was able to construct a `MemorySegment` regression test, see `TestMemorySegmentFilterSummands.test1`. I suspect that the ARM32 failures happened on an array, as it failed in places like `BigInteger::implMultiplyToLen`. But now I was able to reproduce it with native memory, to get a pointer expression that has the same cancellation issue. >> - If `scaleL` is `NaN`, and gets filtered even though it should not be: We get a non-trivial MemPointer that is missing a summand. So we will succeed in optimizing, but with wrong assumptions. We generate a runtime aliasing check that is incorrect, leading to wrong results. >> - This was reported by the fuzzer, see attached `TestDoNotFilterNaNSummands`. >> - I was also able to create a simpler example with `MemorySegments`, see attached `TestMemorySegmentFilterSummands.test2`. >> >> **Why did this slip through the cracks?** >> >> In https://github.com/openjdk/jdk/pull/24278 I added pretty extensive testing, even fuzzer style tests, see `TestAliasingFuzzer.java`. But I think all of those tests exercise `scale` that are in "nice" [int ranges](https://github.com/openjdk/jdk/pull/24278/files#diff-26de03e864a492fe8aa8178818968f2097b99cf36a763605e2fb11fbc04eedffR303-R322). Also the JavaFuzzer does not directly generate such long constants for array accesses (not possible without Unsafe I think), we were lucky that it generated the index with `%` that got optimized to some magic long constant. >> >> There is already an RFE filed for improvements to `TestAliasingFuzzer.java`: [JDK-83... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix up manuel's suggestion Thank you incorporating my suggestion. Looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/27848#pullrequestreview-3355325979 From dlunden at openjdk.org Mon Oct 20 07:48:16 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 20 Oct 2025 07:48:16 GMT Subject: RFR: 8370031: Make RegMask copy constructor explicit and replace RegMask operator= with named function Message-ID: The `RegMask` copy constructor is currently non-explicit. We should make it explicit so that we do not unintentionally copy register masks. Additionally, we currently overload `operator=` in `RegMask` to do a deep copy. It is preferable to use an explicit named function instead, according to the HotSpot coding style. ### Changeset - Make the `RegMask` copy constructor explicit. - Fix compilation errors as a result of the now explicit constructor. Specifically, the methods `Matcher::divI_proj_mask`, `Matcher::modI_proj_mask`, `Matcher::divL_proj_mask`, and `Matcher::modL_proj_mask` all use implicit copy construction (likely unintended). Change the methods to return `const RegMask&` instead of `RegMask` and correspondingly change the return value from `RegMask()` to `RegMask::Empty` on some platforms. - Rename the old method `RegMask::copy` to `RegMask::assignFrom` to better describe its functionality, and make it public instead of private. - Delete `RegMask` copy assignment (`operator=`) and change all uses to the named function `assignFrom` instead. - Fix various syntax issues at lines touched by the changeset. ### Testing - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18589208499) - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. ------------- Commit messages: - Fix issue Changes: https://git.openjdk.org/jdk/pull/27891/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27891&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370031 Stats: 257 lines in 14 files changed: 34 ins; 35 del; 188 mod Patch: https://git.openjdk.org/jdk/pull/27891.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27891/head:pull/27891 PR: https://git.openjdk.org/jdk/pull/27891 From dlunden at openjdk.org Mon Oct 20 07:52:15 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 20 Oct 2025 07:52:15 GMT Subject: Integrated: 8369569: Rename methods in regmask.hpp to conform with HotSpot coding style In-Reply-To: References: Message-ID: <4HdU0eHdahImR-MduNI1PiA9ngjNKjjmaYGXpH3GlLM=.9ef1cee9-70c7-4709-8e58-3e4f1e8dd1ba@github.com> On Wed, 15 Oct 2025 08:14:26 GMT, Daniel Lund?n wrote: > A number of methods in regmask.hpp do not conform with the HotSpot coding style. We should make sure they do. > > ### Changeset > - Rename methods in `regmask.hpp` to conform with HotSpot coding style. > - Similarly rename directly related methods in `chaitin.hpp`. > - Rename the constant register masks `All` and `Empty` to `ALL` and `EMPTY`. > - Fix a few additional code style issues at lines touched by the changeset. > > Note: this is a syntax-only changeset (no functional changes). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18500704336) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. This pull request has now been integrated. Changeset: 39211e7f Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/39211e7fac74a30c343987e2ef17ab5d855a73dc Stats: 629 lines in 31 files changed: 12 ins; 0 del; 617 mod 8369569: Rename methods in regmask.hpp to conform with HotSpot coding style Reviewed-by: aseoane, rcastanedalo, epeter ------------- PR: https://git.openjdk.org/jdk/pull/27817 From fgao at openjdk.org Mon Oct 20 08:30:06 2025 From: fgao at openjdk.org (Fei Gao) Date: Mon, 20 Oct 2025 08:30:06 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Wed, 3 Sep 2025 17:08:08 GMT, Fei Gao wrote: >> I'm a little sick and don't feel very focused, so I'll have to look at the PR next week. >> >> BTW: I just integrated https://github.com/openjdk/jdk/pull/24278 which may have silent merge conflicts, so it would be good if you merged and tested again. Once you do that I could also run some internal testing, if you like :) > >> BTW: I just integrated https://github.com/openjdk/jdk/pull/24278 which may have silent merge conflicts, so it would be good if you merged and tested again. > > Hi @eme64 , I?ve rebased the patch onto the latest JDK, and all tier1 to tier3 tests have passed on my local AArch64 and x86 machines. > >> It would be good if you re-ran the benchmarks. It seems the last ones you did in December of 2024. > We should see that we have various benchmarks, both for array and MemorySegment. > You could look at the array benchmarks from here: https://github.com/openjdk/jdk/pull/22070 > > I also re-verified the benchmark from [PR #22070](https://github.com/openjdk/jdk/pull/22070) on 128-bit, 256-bit, and 512-bit vector machines. The results show no significant regressions and performance changes are consistent with the previous round described in [perf results]( https://bugs.openjdk.org/browse/JDK-8307084?focusedId=14729524&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14729524). > >> Once you do that I could also run some internal testing, if you like :) > > I?d really appreciate it if you could run some internal testing at a time you think is suitable. > Thanks :) > @fg1417 Are you still working on this? @eme64 Apologies for the delay. I?m still working on this and will push the updated changes soon. The update addresses your comments, resolves some test failures after rebasing to the latest JDK, and is currently being tested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3421046488 From aph at openjdk.org Mon Oct 20 08:47:04 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 20 Oct 2025 08:47:04 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v11] In-Reply-To: References: Message-ID: On Fri, 12 Sep 2025 08:08:53 GMT, Anjian Wen wrote: >> Thanks for the review. I'm still developing it. >> Regarding the growth of the counter array, it should use 8 bytes to store the count. I use 4 Byte here according to OpenSSL aes-ctr code, I will try to fix it later >> https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkb-zvkned.pl#L242 > >> Are you sure this is correct? See `com.sun.crypto.provider.CounterMode::increment`. > > Hi @theRealAph , according to your advice and code from `com.sun.crypto.provider.CounterMode::increment`, I have modified my patch about counter increase by increasing 2 8Byte. Most of case increasing the first 8 Byte(from 8bit to 15 bit) is enough, it only needs to increase the next 8Byte when the first 8Byte overflows. And I have added a test for limit case, could you please help review again? Your encryption operations should run in constant time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2444214586 From wenanjian at openjdk.org Mon Oct 20 09:32:05 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 20 Oct 2025 09:32:05 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v11] In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 08:43:58 GMT, Andrew Haley wrote: >>> Are you sure this is correct? See `com.sun.crypto.provider.CounterMode::increment`. >> >> Hi @theRealAph , according to your advice and code from `com.sun.crypto.provider.CounterMode::increment`, I have modified my patch about counter increase by increasing 2 8Byte. Most of case increasing the first 8 Byte(from 8bit to 15 bit) is enough, it only needs to increase the next 8Byte when the first 8Byte overflows. And I have added a test for limit case, could you please help review again? > > Your encryption operations should run in constant time. @theRealAph Sorry, I don't quite understand what the "constant time" here means. if you mean counter increase, here I try to optimize counter increase with vectorAdd. If N is the number of counter we should increase, it can theoretically optimize the time use from `O(N * 16) `to `O(N * 2 / (4 * (vector_register_len / 64)))`, as for vector_register_len equals 128, it will optimize about 64 times if the N is large enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2444374228 From aph at openjdk.org Mon Oct 20 09:43:06 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 20 Oct 2025 09:43:06 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v11] In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 09:29:11 GMT, Anjian Wen wrote: >> Your encryption operations should run in constant time. > > @theRealAph Sorry, I don't quite understand what the "constant time" here means. > > if you mean counter increase, here I try to optimize counter increase with vectorAdd. If N is the number of counter we should increase, it can theoretically optimize the time use from `O(N * 16) `to `O(N * 2 / (4 * (vector_register_len / 64)))`, as for vector_register_len equals 128, it will optimize about 64 times if the N is large enough. And I don't understand any of that. You should follow the example of the other implementations of CTR mode in HotSpot, so that no matter what the value of the counter is, incrementing it takes the same time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2444422766 From mli at openjdk.org Mon Oct 20 09:43:32 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 20 Oct 2025 09:43:32 GMT Subject: RFR: 8370225: RISC-V: move verify_frame_setup into ASSERT Message-ID: Hi, Can you help to review this patch? `verify_frame_setup` should be only declared/implemented/invoked in debug version. This is a leftover by https://bugs.openjdk.org/browse/JDK-8369947. Thanks! ------------- Commit messages: - initial commit - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 5 more: https://git.openjdk.org/jdk/compare/f158451c...47ee73bc Changes: https://git.openjdk.org/jdk/pull/27894/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27894&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370225 Stats: 9 lines in 2 files changed: 7 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27894.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27894/head:pull/27894 PR: https://git.openjdk.org/jdk/pull/27894 From wenanjian at openjdk.org Mon Oct 20 10:08:12 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 20 Oct 2025 10:08:12 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v11] In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 09:40:29 GMT, Andrew Haley wrote: >> @theRealAph Sorry, I don't quite understand what the "constant time" here means. >> >> if you mean counter increase, here I try to optimize counter increase with vectorAdd. If N is the number of counter we should increase, it can theoretically optimize the time use from `O(N * 16) `to `O(N * 2 / (4 * (vector_register_len / 64)))`, as for vector_register_len equals 128, it will optimize about 64 times if the N is large enough. > > And I don't understand any of that. > > You should follow the example of the other implementations of CTR mode in HotSpot, so that no matter what the value of the counter is, incrementing it takes the same time. @theRealAph 1. Here I mean increase more than one counter the same time, because we can set LMUL more than 1 in RISCV. And for the vector register len (VLEN)in riscv is not a constant value we can not assume the number of counter we can deal once a time. 2. I have not found a suitable overflow check in riscv RVV, so I use pre check to avoid overflow, here we may discuss is there a more suitable way. 3. Why we should make the counter increment same time? Why is this necessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2444522378 From bkilambi at openjdk.org Mon Oct 20 10:35:03 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 20 Oct 2025 10:35:03 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 10:45:47 GMT, erifan wrote: > According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 environment using qemu. > > This PR fixes these test failures. LGTM! Thanks for doing this Eric. Overall, the patch looks reasonable. The test passes on the SVE/SVE2 hosts I have access to but did not test it on a >16B SVE2 simulator myself. If your QEMU runs validate that configuration, I?m happy to rely on those results. ------------- Marked as reviewed by bkilambi (Author). PR Review: https://git.openjdk.org/jdk/pull/27723#pullrequestreview-3355991975 From mhaessig at openjdk.org Mon Oct 20 11:49:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 20 Oct 2025 11:49:17 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v6] In-Reply-To: <1v9rJr_jz6k9Zqa0dcfhLN1feWAvnnQdiH5n1gc4VX4=.9ec0da19-feaa-4cb1-9f5e-e819a5a4a480@github.com> References: <1v9rJr_jz6k9Zqa0dcfhLN1feWAvnnQdiH5n1gc4VX4=.9ec0da19-feaa-4cb1-9f5e-e819a5a4a480@github.com> Message-ID: On Tue, 14 Oct 2025 16:38:33 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName > - improve tutorial for Manuel Thank you for the improved descriptions. It helped a lot with my understanding. I have a few more nits, but otherwise, this looks good. test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 90: > 88: * frame. If a frame is non-transparent, this frame defines an inner > 89: * {@link NameSet}, for the names that are generated inside this frame. Once > 90: * this frame is exited, the name from inside this frame are not available. Suggestion: * Creates a normal frame, which has a {@link #parent}. It can either be * transparent for names, meaning that names are added and accessed to and * from an outer frame. Names that are added in a transparent frame are * still available in the outer frames, as far out as the next non-transparent * frame. If a frame is non-transparent, this frame defines an inner * {@link NameSet}, for the names that are generated inside this frame. Once * this frame is exited, the names from inside this frame are not available. Typos test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 396: > 394: addHashtagReplacement(lt.key(), lt.value()); > 395: }); > 396: Suggestion: Nit: remove superfluous line test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 593: > 591: )), > 592: Hooks.METHOD_HOOK.insert(scope( > 593: let("value", 11), Suggestion: This is unused now. test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 1248: > 1246: sample: x. > 1247: sample: z. > 1248: """; Suggestion: // Outer scope DataName: addDataName("outerInt", myInt, MUTABLE), dataNames(MUTABLE).exactOf(myInt).sample((DataName dn) -> scope( let("name1", dn.name()), "sample: #name1.\n", // We can also see the outer DataName: dataNames(MUTABLE).exactOf(myInt).sampleAndLetAs("name2"), "sample: #name2.\n", // Local DataName: addDataName("innerLong", myLong, MUTABLE), dataNames(MUTABLE).exactOf(myLong).sampleAndLetAs("name3"), "sample: #name3.\n" )), // We can still see the outer scope DataName: dataNames(MUTABLE).exactOf(myInt).sampleAndLetAs("name4"), "sample: #name4.\n", // But we cannot see the DataNames that are local to the inner scope. // So here, we will always see "outerLong", and never "innerLong". addDataName("outerLong", myLong, MUTABLE), dataNames(MUTABLE).exactOf(myLong).sampleAndLetAs("name5"), "sample: #name5.\n" )); String code = template.render(); String expected = """ sample: outerInt. sample: outerInt. sample: innerLong. sample: outerInt. sample: outerLong. """; These names make the test a little easier to understand. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3350990372 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2444200025 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2444319448 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2440494414 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2444014306 From dlunden at openjdk.org Mon Oct 20 11:59:58 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 20 Oct 2025 11:59:58 GMT Subject: RFR: 8370031: Make RegMask copy constructor explicit and replace RegMask operator= with named function [v2] In-Reply-To: References: Message-ID: <4q9CrlwVAibAEIySihIaFIV24xTs1_uT2QpnJag7BX0=.fd0693e7-f11b-4224-9dbb-93a116a64a6f@github.com> > The `RegMask` copy constructor is currently non-explicit. We should make it explicit so that we do not unintentionally copy register masks. > > Additionally, we currently overload `operator=` in `RegMask` to do a deep copy. It is preferable to use an explicit named function instead, according to the HotSpot coding style. > > ### Changeset > > - Make the `RegMask` copy constructor explicit. > - Fix compilation errors as a result of the now explicit constructor. Specifically, the methods `Matcher::divI_proj_mask`, `Matcher::modI_proj_mask`, `Matcher::divL_proj_mask`, and `Matcher::modL_proj_mask` all use implicit copy construction (likely unintended). Change the methods to return `const RegMask&` instead of `RegMask` and correspondingly change the return value from `RegMask()` to `RegMask::Empty` on some platforms. > - Rename the old method `RegMask::copy` to `RegMask::assignFrom` to better describe its functionality, and make it public instead of private. > - Delete `RegMask` copy assignment (`operator=`) and change all uses to the named function `assignFrom` instead. > - Fix various syntax issues at lines touched by the changeset. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18589208499) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge remote-tracking branch 'upstream/master' into regmask-explicit-8370031 - Fix issue ------------- Changes: https://git.openjdk.org/jdk/pull/27891/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27891&range=01 Stats: 257 lines in 14 files changed: 34 ins; 35 del; 188 mod Patch: https://git.openjdk.org/jdk/pull/27891.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27891/head:pull/27891 PR: https://git.openjdk.org/jdk/pull/27891 From mablakatov at openjdk.org Mon Oct 20 12:29:10 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Mon, 20 Oct 2025 12:29:10 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v12] In-Reply-To: References: Message-ID: <6uY2RVNYO93jgx3rvBc8kvFk3KipikpMXM3ua_qD0Ls=.a08f3461-d6fa-4f05-8d5f-6885351e60c0@github.com> > Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. > > Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. > > The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms > IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms > LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms > > > Fujitsu A64FX (SVE 512-bit): > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms > IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms > LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - cleanup: remove redundand local variables Change-Id: I6fb6a9a7a236537612caa5d53c5516ed2f260bad - cleanup: remove a trivial switch-case statement Change-Id: Ib914ce02ae9d88057cb0b88d4880df6ca64f8184 - Assert the exact supported VL of 32B in SVE-specific methods Change-Id: I8768c653ff563cd8a7a75cd06a6523a9526d15ec - cleanup: fix long line formatting Change-Id: I173e70a2fa9a45f56fe50d4a6b81699665e3433d - fixup: remove VL asserts in match rules to fix failures on >= 512b SVE platforms Change-Id: I721f5a97076d645905ee1716f7d57ec8c90ef6e9 - Merge branch 'master' into 8343689 Change-Id: Iebe758e4c7b3ab0de5f580199f8909e96b8c6274 - cleanup: start the SVE Integer Misc - Unpredicated section - Merge branch 'master' - Address review comments and simplify the implementation - remove the loops from gt128b methods making them 256b only - fixup: missed fnoregs in instruct reduce_mulL_256b - use an extra vtmp3 reg for the 256b integer method - remove a no longer needed change in reduce_mul_integral_le128b - cleanup: unify comments - Merge commit '8193856af8546332bfa180cb45154a4093b4fd2c' - ... and 13 more: https://git.openjdk.org/jdk/compare/cc563c87...8c9e0845 ------------- Changes: https://git.openjdk.org/jdk/pull/23181/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=11 Stats: 274 lines in 6 files changed: 207 ins; 2 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/23181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23181/head:pull/23181 PR: https://git.openjdk.org/jdk/pull/23181 From mhaessig at openjdk.org Mon Oct 20 12:43:04 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 20 Oct 2025 12:43:04 GMT Subject: RFR: 8370031: Make RegMask copy constructor explicit and replace RegMask operator= with named function [v2] In-Reply-To: <4q9CrlwVAibAEIySihIaFIV24xTs1_uT2QpnJag7BX0=.fd0693e7-f11b-4224-9dbb-93a116a64a6f@github.com> References: <4q9CrlwVAibAEIySihIaFIV24xTs1_uT2QpnJag7BX0=.fd0693e7-f11b-4224-9dbb-93a116a64a6f@github.com> Message-ID: <9iuxVrckqdjA7GrRqkSFQPVvhJDojDiNpqmKYga51GY=.f58fca84-b84b-4708-93a1-1c29c06cc38e@github.com> On Mon, 20 Oct 2025 11:59:58 GMT, Daniel Lund?n wrote: >> The `RegMask` copy constructor is currently non-explicit. We should make it explicit so that we do not unintentionally copy register masks. >> >> Additionally, we currently overload `operator=` in `RegMask` to do a deep copy. It is preferable to use an explicit named function instead, according to the HotSpot coding style. >> >> ### Changeset >> >> - Make the `RegMask` copy constructor explicit. >> - Fix compilation errors as a result of the now explicit constructor. Specifically, the methods `Matcher::divI_proj_mask`, `Matcher::modI_proj_mask`, `Matcher::divL_proj_mask`, and `Matcher::modL_proj_mask` all use implicit copy construction (likely unintended). Change the methods to return `const RegMask&` instead of `RegMask` and correspondingly change the return value from `RegMask()` to `RegMask::Empty` on some platforms. >> - Rename the old method `RegMask::copy` to `RegMask::assignFrom` to better describe its functionality, and make it public instead of private. >> - Delete `RegMask` copy assignment (`operator=`) and change all uses to the named function `assignFrom` instead. >> - Fix various syntax issues at lines touched by the changeset. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18589208499) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge remote-tracking branch 'upstream/master' into regmask-explicit-8370031 > - Fix issue Thank you for cleaning this up, @dlunde. Your changes look good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/27891#pullrequestreview-3356317308 From epeter at openjdk.org Mon Oct 20 12:57:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 12:57:05 GMT Subject: RFR: 8369898: C2 SuperWord: assert(has_ctrl(i)) failed: should be control, not loop Message-ID: In `PhaseIdealLoop::create_new_if_for_multiversion`, we replace `multiversion_slow_proj` with the new `region` that merges the `new_multiversion_slow_proj` and `new_if_false`. Using `igvn.replace_node` moves all the control inputs of the outputs of the old `multiversion_slow_proj` to the new `region`. This is sufficient during IGVN, but not during loop-opts: the "controllees" of `multiversion_slow_proj` (the nodes that used to answer to `get_ctrl` with `multiversion_slow_proj`) should now be "controllees" of `region` (answer `region` for `get_ctrl`). This is what `lazy_replace` is for: - It puts a "forwarding" in the `_loop_and_ctrl` table: instead of mapping `multiversion_slow_proj` to its loop, it now points to the new ctrl node `region`. - When we call `get_ctrl` on a "controllee" of the old `multiversion_slow_proj`, we then skip over `multiversion_slow_proj` via the "forwarding" to the new `region`. I'm proposing a PR to improve the documentation and some renamings around `get_ctrl` and `lazy_replace`: https://github.com/openjdk/jdk/pull/27892 A previous PR that used `lazy_replace`, in case you want to understand more: https://github.com/openjdk/jdk/pull/15720 ------------- Commit messages: - JDK-8369898 Changes: https://git.openjdk.org/jdk/pull/27889/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27889&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369898 Stats: 99 lines in 2 files changed: 98 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27889/head:pull/27889 PR: https://git.openjdk.org/jdk/pull/27889 From epeter at openjdk.org Mon Oct 20 12:58:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 12:58:55 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes Message-ID: When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. Here, I'm doing the following: - Add more documentation, and improve it in other cases. - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` - Made some methods private, and added some additional asserts. I'd be more than happy for even better names, and suggestions how to improve the documentation further :) Related issues: https://github.com/openjdk/jdk/pull/27889 https://github.com/openjdk/jdk/pull/15720 TODO: improve `VerifyLoopOptimizations` to check that we can call `get_ctrl` on all live nodes after loop-opts. ------------- Commit messages: - code style - missing part - rename lazy methods - make helper method private - wip documentation and renaming - more documentation - JDK-8370220 Changes: https://git.openjdk.org/jdk/pull/27892/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370220 Stats: 103 lines in 6 files changed: 53 ins; 4 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/27892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27892/head:pull/27892 PR: https://git.openjdk.org/jdk/pull/27892 From chagedorn at openjdk.org Mon Oct 20 13:05:38 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 20 Oct 2025 13:05:38 GMT Subject: RFR: 8369898: C2 SuperWord: assert(has_ctrl(i)) failed: should be control, not loop In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 07:11:34 GMT, Emanuel Peter wrote: > In `PhaseIdealLoop::create_new_if_for_multiversion`, we replace `multiversion_slow_proj` with the new `region` that merges the `new_multiversion_slow_proj` and `new_if_false`. Using `igvn.replace_node` moves all the control inputs of the outputs of the old `multiversion_slow_proj` to the new `region`. This is sufficient during IGVN, but not during loop-opts: the "controllees" of `multiversion_slow_proj` (the nodes that used to answer to `get_ctrl` with `multiversion_slow_proj`) should now be "controllees" of `region` (answer `region` for `get_ctrl`). > > This is what `lazy_replace` is for: > - It puts a "forwarding" in the `_loop_and_ctrl` table: instead of mapping `multiversion_slow_proj` to its loop, it now points to the new ctrl node `region`. > - When we call `get_ctrl` on a "controllee" of the old `multiversion_slow_proj`, we then skip over `multiversion_slow_proj` via the "forwarding" to the new `region`. > > I'm proposing a PR to improve the documentation and some renamings around `get_ctrl` and `lazy_replace`: > https://github.com/openjdk/jdk/pull/27892 > > A previous PR that used `lazy_replace`, in case you want to understand more: > https://github.com/openjdk/jdk/pull/15720 The fix looks reasonable to me and I agree that we should improve the documentation for `lazy_replace()` and also think about it's naming - by just looking at the name, one could first guess that the actual node replacement is delayed which is not the case! Maybe @rwestrel also wants to have a look who worked with `lazy_replace()` in the mentioned PR https://github.com/openjdk/jdk/pull/15720. test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionSlowProjReplacementAndGetCtrl.java line 26: > 24: /* > 25: * @test > 26: * @bug 8369902 Suggestion: * @bug 8369898 test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionSlowProjReplacementAndGetCtrl.java line 32: > 30: * same loop-opts-phase. > 31: * @run main/othervm > 32: * -XX:+IgnoreUnrecognizedVMOptions Not needed: Suggestion: test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionSlowProjReplacementAndGetCtrl.java line 92: > 90: > 91: public static void main(String[] strArr) { > 92: for (int i = 0; i < 10_00; i++) { Suggestion: for (int i = 0; i < 1_000; i++) { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27889#pullrequestreview-3356373059 PR Review Comment: https://git.openjdk.org/jdk/pull/27889#discussion_r2444947618 PR Review Comment: https://git.openjdk.org/jdk/pull/27889#discussion_r2444944107 PR Review Comment: https://git.openjdk.org/jdk/pull/27889#discussion_r2444946780 From epeter at openjdk.org Mon Oct 20 13:30:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 13:30:15 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v7] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Manuel's suggestions Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/f7d64326..68f719d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=05-06 Stats: 13 lines in 4 files changed: 0 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Mon Oct 20 13:38:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 13:38:49 GMT Subject: RFR: 8369898: C2 SuperWord: assert(has_ctrl(i)) failed: should be control, not loop [v2] In-Reply-To: References: Message-ID: > In `PhaseIdealLoop::create_new_if_for_multiversion`, we replace `multiversion_slow_proj` with the new `region` that merges the `new_multiversion_slow_proj` and `new_if_false`. Using `igvn.replace_node` moves all the control inputs of the outputs of the old `multiversion_slow_proj` to the new `region`. This is sufficient during IGVN, but not during loop-opts: the "controllees" of `multiversion_slow_proj` (the nodes that used to answer to `get_ctrl` with `multiversion_slow_proj`) should now be "controllees" of `region` (answer `region` for `get_ctrl`). > > This is what `lazy_replace` is for: > - It puts a "forwarding" in the `_loop_and_ctrl` table: instead of mapping `multiversion_slow_proj` to its loop, it now points to the new ctrl node `region`. > - When we call `get_ctrl` on a "controllee" of the old `multiversion_slow_proj`, we then skip over `multiversion_slow_proj` via the "forwarding" to the new `region`. > > I'm proposing a PR to improve the documentation and some renamings around `get_ctrl` and `lazy_replace`: > https://github.com/openjdk/jdk/pull/27892 > > A previous PR that used `lazy_replace`, in case you want to understand more: > https://github.com/openjdk/jdk/pull/15720 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27889/files - new: https://git.openjdk.org/jdk/pull/27889/files/74eb443a..6e23865a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27889&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27889&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27889/head:pull/27889 PR: https://git.openjdk.org/jdk/pull/27889 From epeter at openjdk.org Mon Oct 20 13:38:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 13:38:50 GMT Subject: RFR: 8369898: C2 SuperWord: assert(has_ctrl(i)) failed: should be control, not loop [v2] In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 13:03:18 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > The fix looks reasonable to me and I agree that we should improve the documentation for `lazy_replace()` and also think about it's naming - by just looking at the name, one could first guess that the actual node replacement is delayed which is not the case! > > Maybe @rwestrel also wants to have a look who worked with `lazy_replace()` in the mentioned PR https://github.com/openjdk/jdk/pull/15720. @chhagedorn Thanks for the suggestions, they are all applied :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27889#issuecomment-3422096168 From epeter at openjdk.org Mon Oct 20 13:52:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 20 Oct 2025 13:52:31 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v6] In-Reply-To: References: <1v9rJr_jz6k9Zqa0dcfhLN1feWAvnnQdiH5n1gc4VX4=.9ec0da19-feaa-4cb1-9f5e-e819a5a4a480@github.com> Message-ID: On Mon, 20 Oct 2025 11:46:51 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName >> - improve tutorial for Manuel > > Thank you for the improved descriptions. It helped a lot with my understanding. > > I have a few more nits, but otherwise, this looks good. @mhaessig Thanks for the suggestions! I applied them all ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3422148577 From chagedorn at openjdk.org Mon Oct 20 13:59:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 20 Oct 2025 13:59:36 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes In-Reply-To: References: Message-ID: <7nfOylxfH7KPKOj291d3N3f67w5rKrGwhc17OKv77NY=.afb319e9-f3be-4330-abe9-b1c19b61dc75@github.com> On Mon, 20 Oct 2025 08:57:33 GMT, Emanuel Peter wrote: > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 > > TODO: improve `VerifyLoopOptimizations` to check that we can call `get_ctrl` on all live nodes after loop-opts. Thanks a lot for following up with a documentation and renaming! Some small suggestions, otherwise, looks good! Note: There are some build failures in GHA. src/hotspot/share/opto/loopnode.hpp line 1080: > 1078: } > 1079: > 1080: // Retreives the ctrl for a data node i. Suggestion: // Retrieves the ctrl for a data node i. src/hotspot/share/opto/loopnode.hpp line 1125: > 1123: // forwarding installed, using "install_lazy_ctrl_and_idom_forwarding". > 1124: // We now have to jump from the old (dead) ctrl node to the new (live) > 1125: // ctrl node, in possibly multiple ctrl/idom forwarding steps. I guess in this context of ctrl, you can omit the idom? Suggestion: // ctrl node, in possibly multiple ctrl forwarding steps. src/hotspot/share/opto/loopnode.hpp line 1154: > 1152: // forwarding in the future. > 1153: // - When querying "idom": from some node get its old idom, which > 1154: // may be dead but has a ctrl forwarding to the new and live Maybe add this: Suggestion: // may be dead but has an idom forwarding (piggy-backing on '_loop_or_ctrl') to the new and live src/hotspot/share/opto/loopnode.hpp line 1160: > 1158: // the entry for the old dead node now, and we do not have to update all > 1159: // the nodes that had the old_node as their "get_ctrl" or "idom". We > 1160: // clean up the forwarding links when we query "get_ctrl" or "idom". Suggestion: // clean up the forwarding links when we query "get_ctrl" or "idom" for these nodes the next time. src/hotspot/share/opto/loopnode.hpp line 1161: > 1159: // the nodes that had the old_node as their "get_ctrl" or "idom". We > 1160: // clean up the forwarding links when we query "get_ctrl" or "idom". > 1161: void install_lazy_ctrl_and_idom_forwarding(Node* old_node, Node* new_node) { Maybe we don't need lazy sice "install forwarding" is already expressive enough: Suggestion: void install_ctrl_and_idom_forwarding(Node* old_node, Node* new_node) { src/hotspot/share/opto/loopnode.hpp line 1176: > 1174: // - Update the node inputs of all uses. > 1175: // - Lazily update the ctrl and idom info of all uses, via a ctrl/idom forwarding. > 1176: void replace_ctrl_node_and_forward_ctrl_and_idom(Node *old_node, Node *new_node) { Maybe add here and/or in `install_lazy_ctrl_and_idom_forwarding()` an assert that we have a CFG nodes (i.e. `is_CFG()`) additionally to the `!has_ctrl()` asserts. src/hotspot/share/opto/loopnode.hpp line 1259: > 1257: while (n->in(0) == nullptr) { // Skip dead CFG nodes > 1258: // We encountered a dead CFG node. > 1259: // If everything went right, this dead CFG node should have had a idom/ctrl In the context of idom, you might be able to remove "ctrl": Suggestion: // If everything went right, this dead CFG node should have had an idom src/hotspot/share/opto/loopnode.hpp line 1261: > 1259: // If everything went right, this dead CFG node should have had a idom/ctrl > 1260: // forwarding installed, using "install_lazy_ctrl_and_idom_forwarding". > 1261: // We now have to jump from the old (dead) ctrl node to the new (live) Suggestion: // We now have to jump from the old (dead) idom node to the new (live) src/hotspot/share/opto/loopnode.hpp line 1262: > 1260: // forwarding installed, using "install_lazy_ctrl_and_idom_forwarding". > 1261: // We now have to jump from the old (dead) ctrl node to the new (live) > 1262: // ctrl/idom node, in possibly multiple ctrl/idom forwarding steps. Maybe for clarification since it's somehow surprising at first that we reuse `_loop_or_ctrl`: Suggestion: // idom node, in possibly multiple idom forwarding steps. // Note that we piggy back on `_loop_or_ctrl` do the the forwarding. src/hotspot/share/opto/loopnode.hpp line 1272: > 1270: Node* idom(Node* d) const { > 1271: return idom(d->_idx); > 1272: } While touching it, we might also want to name it `n` for node? `d` could suggest dominator but it's actually the "dominatee". Suggestion: Node* idom(Node* n) const { return idom(n->_idx); } src/hotspot/share/opto/loopnode.hpp line 1274: > 1272: } > 1273: > 1274: Node* idom(uint didx) const { While at it: Maybe also name this `node_index`? src/hotspot/share/opto/loopnode.hpp line 1278: > 1276: // We store the found idom in the side-table again. In most cases, > 1277: // this is a no-op, since we just read from _idom. But in cases where > 1278: // there was a ctrl forwarding via dead ctrl nodes, this shortens the path. Suggestion: // there was an idom forwarding via dead idom nodes, this shortens the path. ------------- PR Review: https://git.openjdk.org/jdk/pull/27892#pullrequestreview-3356437939 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445002972 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445012015 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445018079 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445021875 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445023706 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2444994098 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445049053 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445055275 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445083835 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445090585 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445094889 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445093928 From chagedorn at openjdk.org Mon Oct 20 13:59:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 20 Oct 2025 13:59:37 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes In-Reply-To: <7nfOylxfH7KPKOj291d3N3f67w5rKrGwhc17OKv77NY=.afb319e9-f3be-4330-abe9-b1c19b61dc75@github.com> References: <7nfOylxfH7KPKOj291d3N3f67w5rKrGwhc17OKv77NY=.afb319e9-f3be-4330-abe9-b1c19b61dc75@github.com> Message-ID: On Mon, 20 Oct 2025 13:17:29 GMT, Christian Hagedorn wrote: >> When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. >> >> Here, I'm doing the following: >> - Add more documentation, and improve it in other cases. >> - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. >> - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` >> - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` >> - Made some methods private, and added some additional asserts. >> >> I'd be more than happy for even better names, and suggestions how to improve the documentation further :) >> >> Related issues: >> https://github.com/openjdk/jdk/pull/27889 >> https://github.com/openjdk/jdk/pull/15720 >> >> TODO: improve `VerifyLoopOptimizations` to check that we can call `get_ctrl` on all live nodes after loop-opts. > > src/hotspot/share/opto/loopnode.hpp line 1176: > >> 1174: // - Update the node inputs of all uses. >> 1175: // - Lazily update the ctrl and idom info of all uses, via a ctrl/idom forwarding. >> 1176: void replace_ctrl_node_and_forward_ctrl_and_idom(Node *old_node, Node *new_node) { > > Maybe add here and/or in `install_lazy_ctrl_and_idom_forwarding()` an assert that we have a CFG nodes (i.e. `is_CFG()`) additionally to the `!has_ctrl()` asserts. Just noticed this: We often (intuitively?) seem to use "control" when talking about just some control nodes and "ctrl" when talking about the nodes found/fetched from `_loop_or_ctrl`. Under this light, we might want to name the method "replace_control_node_and_forward_ctrl_and_idom" to better distinguish them. But I don't have a strong opinion about it - your call ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2445046836 From bmaillard at openjdk.org Mon Oct 20 14:23:40 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 20 Oct 2025 14:23:40 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive Message-ID: This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. ```c++ ... // Global Value Numbering i = hash_find_insert(k); // Check for pre-existing node if (i && (i != k)) { // Return the pre-existing node if it isn't dead NOT_PRODUCT(set_progress();) add_users_to_worklist(k); subsume_node(k, i); // Everybody using k now uses i return i; } ... The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. ### Proposed Fix We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. ### Testing - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) - [x] tier1-3, plus some internal testing - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed Thank you for reviewing! ------------- Commit messages: - 8369646: Fix excissively restrictive pattern in add_users_of_use_to_worklist - 8369646: Add second test run with -XX:+StressIGVN and stress seed Changes: https://git.openjdk.org/jdk/pull/27900/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369646 Stats: 15 lines in 2 files changed: 9 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/27900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27900/head:pull/27900 PR: https://git.openjdk.org/jdk/pull/27900 From chagedorn at openjdk.org Mon Oct 20 14:25:24 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 20 Oct 2025 14:25:24 GMT Subject: RFR: 8369898: C2 SuperWord: assert(has_ctrl(i)) failed: should be control, not loop [v2] In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 13:38:49 GMT, Emanuel Peter wrote: >> In `PhaseIdealLoop::create_new_if_for_multiversion`, we replace `multiversion_slow_proj` with the new `region` that merges the `new_multiversion_slow_proj` and `new_if_false`. Using `igvn.replace_node` moves all the control inputs of the outputs of the old `multiversion_slow_proj` to the new `region`. This is sufficient during IGVN, but not during loop-opts: the "controllees" of `multiversion_slow_proj` (the nodes that used to answer to `get_ctrl` with `multiversion_slow_proj`) should now be "controllees" of `region` (answer `region` for `get_ctrl`). >> >> This is what `lazy_replace` is for: >> - It puts a "forwarding" in the `_loop_and_ctrl` table: instead of mapping `multiversion_slow_proj` to its loop, it now points to the new ctrl node `region`. >> - When we call `get_ctrl` on a "controllee" of the old `multiversion_slow_proj`, we then skip over `multiversion_slow_proj` via the "forwarding" to the new `region`. >> >> I'm proposing a PR to improve the documentation and some renamings around `get_ctrl` and `lazy_replace`: >> https://github.com/openjdk/jdk/pull/27892 >> >> A previous PR that used `lazy_replace`, in case you want to understand more: >> https://github.com/openjdk/jdk/pull/15720 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Update looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27889#pullrequestreview-3356682142 From kxu at openjdk.org Mon Oct 20 14:27:52 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 20 Oct 2025 14:27:52 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v16] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: fix iv increment basic type and truncated increment check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/80c2a62a..9005b864 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=14-15 Stats: 9 lines in 2 files changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From kxu at openjdk.org Mon Oct 20 14:27:54 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 20 Oct 2025 14:27:54 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v13] In-Reply-To: References: <6zw0uSB1sUHZTyDUXDjiXcB0Chmu0XH1cEngzhG-UNk=.b239a687-cfb7-49a3-993a-34327a83c4de@github.com> Message-ID: On Fri, 10 Oct 2025 08:53:01 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8354383: C2: enable sinking of Type nodes out of loop >> >> Reviewed-by: chagedorn, thartmann >> (cherry picked from commit a2f99fd88bd03337e1ba73b413ffe4e39f3584cf) > > src/hotspot/share/opto/loopnode.cpp line 1814: > >> 1812: _iv_incr = PhaseIdealLoop::LoopIVIncr(incr, _head, _loop); >> 1813: _iv_incr.build(); >> 1814: if (_iv_incr.incr() == nullptr) { > > Why don't you also check with a `is_valid()` method here? added `is_valid()` method to check regardless of iv basic type (which could be different from the loop itself). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2445182264 From bmaillard at openjdk.org Mon Oct 20 14:42:53 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 20 Oct 2025 14:42:53 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v2] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. > > This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). > > However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. > > This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. > As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. > > ```c++ > ... > // Global Value Numbering > i = hash_find_insert(k); // Check for pre-existing node > if (i && (i != k)) { > // Return the pre-existing node if it isn't dead > NOT_PRODUCT(set_progress();) > add_users_to_worklist(k); > subsume_node(k, i); // Everybody using k now uses i > return i; > } > ... > > > The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. > > ### Proposed Fix > > We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) > - [x] tier1-3, plus some internal testing > - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Add new issue name to jtreg headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27900/files - new: https://git.openjdk.org/jdk/pull/27900/files/402def2b..86ed4661 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27900/head:pull/27900 PR: https://git.openjdk.org/jdk/pull/27900 From kvn at openjdk.org Mon Oct 20 16:06:03 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 Oct 2025 16:06:03 GMT Subject: RFR: 8369898: C2 SuperWord: assert(has_ctrl(i)) failed: should be control, not loop [v2] In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 13:38:49 GMT, Emanuel Peter wrote: >> In `PhaseIdealLoop::create_new_if_for_multiversion`, we replace `multiversion_slow_proj` with the new `region` that merges the `new_multiversion_slow_proj` and `new_if_false`. Using `igvn.replace_node` moves all the control inputs of the outputs of the old `multiversion_slow_proj` to the new `region`. This is sufficient during IGVN, but not during loop-opts: the "controllees" of `multiversion_slow_proj` (the nodes that used to answer to `get_ctrl` with `multiversion_slow_proj`) should now be "controllees" of `region` (answer `region` for `get_ctrl`). >> >> This is what `lazy_replace` is for: >> - It puts a "forwarding" in the `_loop_and_ctrl` table: instead of mapping `multiversion_slow_proj` to its loop, it now points to the new ctrl node `region`. >> - When we call `get_ctrl` on a "controllee" of the old `multiversion_slow_proj`, we then skip over `multiversion_slow_proj` via the "forwarding" to the new `region`. >> >> I'm proposing a PR to improve the documentation and some renamings around `get_ctrl` and `lazy_replace`: >> https://github.com/openjdk/jdk/pull/27892 >> >> A previous PR that used `lazy_replace`, in case you want to understand more: >> https://github.com/openjdk/jdk/pull/15720 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27889#pullrequestreview-3357063259 From kvn at openjdk.org Mon Oct 20 16:19:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 Oct 2025 16:19:04 GMT Subject: RFR: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands [v4] In-Reply-To: References: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> Message-ID: On Mon, 20 Oct 2025 07:39:49 GMT, Emanuel Peter wrote: >> **TLDR** `is_NaN` -> `is_zero`, just like the code comment says. >> >> Thanks to @mhaessig for debugging the ARM32 bug below. He found the buggy line of code. >> >> ---------------------------------------- >> >> **Details** >> >> It seems there is a little "typo" (logic error) in `MemPointerParser::canonicalize_raw_summands` that slipped through the cracks in https://github.com/openjdk/jdk/pull/24278. The JavaFuzzer now found an example, and independently the issue was also reported on ARM32 [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578). >> >> Filtering out `NaN` instead of `zero` for the `scaleL` has two manifestations: >> - If `scaleL` is zero, but does not get filtered out even though it should be: we hit the assert in `MemPointerSummand` constructor, `assert(!_scale.is_zero(), "non-zero scale");`. >> - See [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578), though those tests seem to only fail on ARM32, and nowhere else. >> - I was able to construct a `MemorySegment` regression test, see `TestMemorySegmentFilterSummands.test1`. I suspect that the ARM32 failures happened on an array, as it failed in places like `BigInteger::implMultiplyToLen`. But now I was able to reproduce it with native memory, to get a pointer expression that has the same cancellation issue. >> - If `scaleL` is `NaN`, and gets filtered even though it should not be: We get a non-trivial MemPointer that is missing a summand. So we will succeed in optimizing, but with wrong assumptions. We generate a runtime aliasing check that is incorrect, leading to wrong results. >> - This was reported by the fuzzer, see attached `TestDoNotFilterNaNSummands`. >> - I was also able to create a simpler example with `MemorySegments`, see attached `TestMemorySegmentFilterSummands.test2`. >> >> **Why did this slip through the cracks?** >> >> In https://github.com/openjdk/jdk/pull/24278 I added pretty extensive testing, even fuzzer style tests, see `TestAliasingFuzzer.java`. But I think all of those tests exercise `scale` that are in "nice" [int ranges](https://github.com/openjdk/jdk/pull/24278/files#diff-26de03e864a492fe8aa8178818968f2097b99cf36a763605e2fb11fbc04eedffR303-R322). Also the JavaFuzzer does not directly generate such long constants for array accesses (not possible without Unsafe I think), we were lucky that it generated the index with `%` that got optimized to some magic long constant. >> >> There is already an RFE filed for improvements to `TestAliasingFuzzer.java`: [JDK-83... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix up manuel's suggestion Looks fine. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27848#pullrequestreview-3357105039 From bmaillard at openjdk.org Mon Oct 20 16:19:37 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 20 Oct 2025 16:19:37 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v3] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. > > This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). > > However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. > > This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. > As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. > > ```c++ > ... > // Global Value Numbering > i = hash_find_insert(k); // Check for pre-existing node > if (i && (i != k)) { > // Return the pre-existing node if it isn't dead > NOT_PRODUCT(set_progress();) > add_users_to_worklist(k); > subsume_node(k, i); // Everybody using k now uses i > return i; > } > ... > > > The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. > > ### Proposed Fix > > We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) > - [x] tier1-3, plus some internal testing > - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Missing -XX:+UnlockDiagnosticVMOptions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27900/files - new: https://git.openjdk.org/jdk/pull/27900/files/86ed4661..12706636 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27900/head:pull/27900 PR: https://git.openjdk.org/jdk/pull/27900 From vlivanov at openjdk.org Mon Oct 20 21:08:17 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 20 Oct 2025 21:08:17 GMT Subject: RFR: 8370251: C2: Inlining checks for method handle intrinsics are too strict Message-ID: C2 performs access checks during inlining attempts through method handle intrinsic calls. But there are no such checks happening at runtime when executing the calls. (Access checks are performed when corresponding method handle is resolved.) So, inlining may fail due to access checks failure while the call always succeeds at runtime. The fix is to skip access checks when inlining through method handle intrinsics. Testing: hs-tier1 - hs-tier4 ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/27908/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27908&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370251 Stats: 137 lines in 2 files changed: 94 ins; 11 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/27908.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27908/head:pull/27908 PR: https://git.openjdk.org/jdk/pull/27908 From kvn at openjdk.org Mon Oct 20 21:35:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 Oct 2025 21:35:01 GMT Subject: RFR: 8370251: C2: Inlining checks for method handle intrinsics are too strict In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 21:00:40 GMT, Vladimir Ivanov wrote: > C2 performs access checks during inlining attempts through method handle > intrinsic calls. But there are no such checks happening at runtime when > executing the calls. (Access checks are performed when corresponding method > handle is resolved.) So, inlining may fail due to access checks failure while > the call always succeeds at runtime. > > The fix is to skip access checks when inlining through method handle intrinsics. > > Testing: hs-tier1 - hs-tier4 src/hotspot/share/opto/doCall.cpp line 245: > 243: receiver_method = callee->resolve_invoke(jvms->method()->holder(), > 244: speculative_receiver_type, > 245: check_access); Can you explain why only here you pass `check_access` and expect it is `true` in all other places? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27908#discussion_r2446174649 From liach at openjdk.org Mon Oct 20 22:00:12 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 20 Oct 2025 22:00:12 GMT Subject: RFR: 8370251: C2: Inlining checks for method handle intrinsics are too strict In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 21:32:03 GMT, Vladimir Kozlov wrote: >> C2 performs access checks during inlining attempts through method handle >> intrinsic calls. But there are no such checks happening at runtime when >> executing the calls. (Access checks are performed when corresponding method >> handle is resolved.) So, inlining may fail due to access checks failure while >> the call always succeeds at runtime. >> >> The fix is to skip access checks when inlining through method handle intrinsics. >> >> Testing: hs-tier1 - hs-tier4 > > src/hotspot/share/opto/doCall.cpp line 245: > >> 243: receiver_method = callee->resolve_invoke(jvms->method()->holder(), >> 244: speculative_receiver_type, >> 245: check_access); > > Can you explain why only here you pass `check_access` and expect it is `true` in all other places? Similar question, should we add an assert for check_access before the resolve_invoke in Compile::optimize_inlining? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27908#discussion_r2446225242 From valeriep at openjdk.org Mon Oct 20 23:17:03 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Mon, 20 Oct 2025 23:17:03 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v12] In-Reply-To: <3nnB5pkNnYqYi6OAH3u83PNmiO607_xwHXCmCeIE7gA=.371aa791-952e-4143-9012-1728dbf31ae9@github.com> References: <3nnB5pkNnYqYi6OAH3u83PNmiO607_xwHXCmCeIE7gA=.371aa791-952e-4143-9012-1728dbf31ae9@github.com> Message-ID: <2uIJVPehvByBLLKXHTHvkysLZIyHXsUvviNj1mbelxE=.379ce84d-7532-4a61-a63a-f4f54791b4e7@github.com> On Sun, 19 Oct 2025 02:18:43 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1034: > 1032: int ti0, ti1, ti2, ti3; > 1033: int a0, a1, a2, a3; > 1034: int w = K.length - 4; nit: 4 could be WB? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2446328176 From valeriep at openjdk.org Mon Oct 20 23:51:06 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Mon, 20 Oct 2025 23:51:06 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v12] In-Reply-To: <3nnB5pkNnYqYi6OAH3u83PNmiO607_xwHXCmCeIE7gA=.371aa791-952e-4143-9012-1728dbf31ae9@github.com> References: <3nnB5pkNnYqYi6OAH3u83PNmiO607_xwHXCmCeIE7gA=.371aa791-952e-4143-9012-1728dbf31ae9@github.com> Message-ID: On Sun, 19 Oct 2025 02:18:43 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng Just have some minor questions and nit. src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 589: > 587: > 588: // Lookup table for inverse substitution transform of last round as > 589: // described in the international journal article referenced. Is there a link that I can look it up also? src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1180: > 1178: ^ T3[(ti0 >> 16) & 0xFF] & 0xFF0000 > 1179: ^ T0[(ti1 >> 8) & 0xFF] & 0xFF00 > 1180: ^ T1[ti2 & 0xFF] & 0xFF ^ K[w + 3]; Is this last round processing also based on spec or some journal? ------------- PR Review: https://git.openjdk.org/jdk/pull/27807#pullrequestreview-3358325489 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2446391365 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2446389723 From duke at openjdk.org Tue Oct 21 00:05:31 2025 From: duke at openjdk.org (Shawn M Emery) Date: Tue, 21 Oct 2025 00:05:31 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v13] In-Reply-To: References: Message-ID: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Updates for code review comments from @valeriepeng ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27807/files - new: https://git.openjdk.org/jdk/pull/27807/files/5ea6933b..fdfd3892 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27807&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27807.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27807/head:pull/27807 PR: https://git.openjdk.org/jdk/pull/27807 From duke at openjdk.org Tue Oct 21 00:05:34 2025 From: duke at openjdk.org (Shawn M Emery) Date: Tue, 21 Oct 2025 00:05:34 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v12] In-Reply-To: References: <3nnB5pkNnYqYi6OAH3u83PNmiO607_xwHXCmCeIE7gA=.371aa791-952e-4143-9012-1728dbf31ae9@github.com> Message-ID: On Mon, 20 Oct 2025 23:47:51 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates for code review comments from @valeriepeng > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 589: > >> 587: >> 588: // Lookup table for inverse substitution transform of last round as >> 589: // described in the international journal article referenced. > > Is there a link that I can look it up also? Yes, it's the 3rd document cited for this class: https://www.internationaljournalcorner.com/index.php/ijird_ojs/article/view/134688 > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1034: > >> 1032: int ti0, ti1, ti2, ti3; >> 1033: int a0, a1, a2, a3; >> 1034: int w = K.length - 4; > > nit: 4 could be WB? Yes, I think that logic is acceptable. Fixed. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 1180: > >> 1178: ^ T3[(ti0 >> 16) & 0xFF] & 0xFF0000 >> 1179: ^ T0[(ti1 >> 8) & 0xFF] & 0xFF00 >> 1180: ^ T1[ti2 & 0xFF] & 0xFF ^ K[w + 3]; > > Is this last round processing also based on spec or some journal? Yes, it's an optimization based on the 3rd document cited for this class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2446411532 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2446411617 PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2446411563 From duke at openjdk.org Tue Oct 21 00:21:05 2025 From: duke at openjdk.org (Shawn M Emery) Date: Tue, 21 Oct 2025 00:21:05 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: <3IhmbTDiDNPdMTe_K1OZx6sC67UGjObzOXwX8Ekp7pA=.0e742e44-4dba-4680-8f24-7321f8516071@github.com> References: <2_eqasQo7DtbnrxwxuFYvl_yhVh7P6wzlxwPEl_DB-Q=.ff81a933-2c29-4d69-826b-d1ccf04d2e1c@github.com> <5SGZrrpgKtrf7IKtfEDPFb4LnggKNoeaZpFVGuHu-p4=.1ec8dd26-e125-4c8a-ba22-9006c7da4024@github.com> <3IhmbTDiDNPdMTe_K1OZx6sC67UGjObzOXwX8Ekp7pA=.0e742e44-4dba-4680-8f24-7321f8516071@github.com> Message-ID: <56yK_DbZSYd0QHrNIo3kAy5wxesDss6VTTB0Ii6q_JU=.12a00665-fd8d-4e44-a169-5fec560fc0b2@github.com> On Fri, 17 Oct 2025 07:04:47 GMT, Valerie Peng wrote: >> I've removed this method and inlined this logic in the invGenRoundKeys method. > > Sure, this works as well. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2446428127 From duke at openjdk.org Tue Oct 21 00:21:07 2025 From: duke at openjdk.org (Shawn M Emery) Date: Tue, 21 Oct 2025 00:21:07 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v8] In-Reply-To: <5SGZrrpgKtrf7IKtfEDPFb4LnggKNoeaZpFVGuHu-p4=.1ec8dd26-e125-4c8a-ba22-9006c7da4024@github.com> References: <2_eqasQo7DtbnrxwxuFYvl_yhVh7P6wzlxwPEl_DB-Q=.ff81a933-2c29-4d69-826b-d1ccf04d2e1c@github.com> <5SGZrrpgKtrf7IKtfEDPFb4LnggKNoeaZpFVGuHu-p4=.1ec8dd26-e125-4c8a-ba22-9006c7da4024@github.com> Message-ID: On Fri, 17 Oct 2025 06:52:16 GMT, Shawn M Emery wrote: >> src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 976: >> >>> 974: * @param state [in, out] the round key for inverse mix column processing. >>> 975: */ >>> 976: private static void invMixRKey(int[] state) { >> >> nit: name the method "invMixColumns(int[])". This name matches the spec psuedo code and goes better with the "state" argument name. Or use "invMixRoundKey(int[] roundKey)"? > > I've removed this method and inlined this logic in the invGenRoundKeys method. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2446428552 From fyang at openjdk.org Tue Oct 21 01:16:03 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 Oct 2025 01:16:03 GMT Subject: RFR: 8370225: RISC-V: move verify_frame_setup into ASSERT In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 09:36:49 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > `verify_frame_setup` should be only declared/implemented/invoked in debug version. > This is a leftover by https://bugs.openjdk.org/browse/JDK-8369947. > > Thanks! Does this make a difference? It has `NOT_DEBUG_RETURN` which will make it a NOP for release build. 376 #ifdef ASSERT 377 #define DEBUG_ONLY(code) code 378 #define NOT_DEBUG(code) 379 #define NOT_DEBUG_RETURN /*next token must be ;*/ 380 #else // ASSERT 381 #define DEBUG_ONLY(code) 382 #define NOT_DEBUG(code) code 383 #define NOT_DEBUG_RETURN {} 384 #endif // ASSERT ------------- PR Review: https://git.openjdk.org/jdk/pull/27894#pullrequestreview-3358447440 From duke at openjdk.org Tue Oct 21 01:23:20 2025 From: duke at openjdk.org (erifan) Date: Tue, 21 Oct 2025 01:23:20 GMT Subject: Integrated: 8366333: AArch64: Enhance SVE subword type implementation of vector compress In-Reply-To: References: Message-ID: On Wed, 10 Sep 2025 08:41:51 GMT, erifan wrote: > The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions. > > This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects: > 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction. > 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid. > > This pull request introduces the following changes: > 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput. > 2. Eliminates unnecessary compress operations for partial subword type cases. > 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure. > > Benchmark results demonstrate that these changes significantly improve performance. > > Benchmarks on Nvidia Grace machine with 128-bit SVE: > > Benchmark Unit Before Error After Error Uplift > Byte128Vector.compress ops/ms 4846.97 26.23 6638.56 31.60 1.36 > Byte64Vector.compress ops/ms 2447.69 12.95 7167.68 34.49 2.92 > Short128Vector.compress ops/ms 7174.88 40.94 8398.45 9.48 1.17 > Short64Vector.compress ops/ms 3618.72 3.04 8618.22 10.91 2.38 > > > This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed. This pull request has now been integrated. Changeset: 2de8d585 Author: erifan Committer: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/2de8d58552936e5b02b851003ec000373c32a918 Stats: 434 lines in 10 files changed: 317 ins; 24 del; 93 mod 8366333: AArch64: Enhance SVE subword type implementation of vector compress Co-authored-by: Jatin Bhateja Reviewed-by: jbhateja, xgong, galder, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/27188 From duke at openjdk.org Tue Oct 21 01:52:02 2025 From: duke at openjdk.org (erifan) Date: Tue, 21 Oct 2025 01:52:02 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 10:32:04 GMT, Bhavana Kilambi wrote: > LGTM! Thanks for doing this Eric. Overall, the patch looks reasonable. The test passes on the SVE/SVE2 hosts I have access to but did not test it on a >16B SVE2 simulator myself. If your QEMU runs validate that configuration, I?m happy to rely on those results. Yes I have tested the fix in my QEMU environment, the test result show the PR can fix this failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27723#issuecomment-3424351523 From epeter at openjdk.org Tue Oct 21 05:46:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 05:46:14 GMT Subject: RFR: 8369898: C2 SuperWord: assert(has_ctrl(i)) failed: should be control, not loop [v2] In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 14:23:04 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > Update looks good, thanks! @chhagedorn @vnkozlov Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27889#issuecomment-3424774478 From epeter at openjdk.org Tue Oct 21 05:46:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 05:46:15 GMT Subject: RFR: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands [v4] In-Reply-To: References: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> Message-ID: On Mon, 20 Oct 2025 07:45:30 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix up manuel's suggestion > > Thank you incorporating my suggestion. Looks good to me. @mhaessig @vnkozlov Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27848#issuecomment-3424772180 From epeter at openjdk.org Tue Oct 21 05:46:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 05:46:16 GMT Subject: Integrated: 8369898: C2 SuperWord: assert(has_ctrl(i)) failed: should be control, not loop In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 07:11:34 GMT, Emanuel Peter wrote: > In `PhaseIdealLoop::create_new_if_for_multiversion`, we replace `multiversion_slow_proj` with the new `region` that merges the `new_multiversion_slow_proj` and `new_if_false`. Using `igvn.replace_node` moves all the control inputs of the outputs of the old `multiversion_slow_proj` to the new `region`. This is sufficient during IGVN, but not during loop-opts: the "controllees" of `multiversion_slow_proj` (the nodes that used to answer to `get_ctrl` with `multiversion_slow_proj`) should now be "controllees" of `region` (answer `region` for `get_ctrl`). > > This is what `lazy_replace` is for: > - It puts a "forwarding" in the `_loop_and_ctrl` table: instead of mapping `multiversion_slow_proj` to its loop, it now points to the new ctrl node `region`. > - When we call `get_ctrl` on a "controllee" of the old `multiversion_slow_proj`, we then skip over `multiversion_slow_proj` via the "forwarding" to the new `region`. > > I'm proposing a PR to improve the documentation and some renamings around `get_ctrl` and `lazy_replace`: > https://github.com/openjdk/jdk/pull/27892 > > A previous PR that used `lazy_replace`, in case you want to understand more: > https://github.com/openjdk/jdk/pull/15720 This pull request has now been integrated. Changeset: 634746a0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/634746a0f167da50c2aef010756f607a436696e9 Stats: 98 lines in 2 files changed: 97 ins; 0 del; 1 mod 8369898: C2 SuperWord: assert(has_ctrl(i)) failed: should be control, not loop Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/27889 From epeter at openjdk.org Tue Oct 21 05:46:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 05:46:17 GMT Subject: Integrated: 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands In-Reply-To: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> References: <0FaIYVu0DC9tiKMLsw-TFxBlTSDzp74_0pSIAhRd72Q=.0c602e17-24f8-4aca-b8d8-7b8df0a1dcc9@github.com> Message-ID: On Thu, 16 Oct 2025 14:20:29 GMT, Emanuel Peter wrote: > **TLDR** `is_NaN` -> `is_zero`, just like the code comment says. > > Thanks to @mhaessig for debugging the ARM32 bug below. He found the buggy line of code. > > ---------------------------------------- > > **Details** > > It seems there is a little "typo" (logic error) in `MemPointerParser::canonicalize_raw_summands` that slipped through the cracks in https://github.com/openjdk/jdk/pull/24278. The JavaFuzzer now found an example, and independently the issue was also reported on ARM32 [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578). > > Filtering out `NaN` instead of `zero` for the `scaleL` has two manifestations: > - If `scaleL` is zero, but does not get filtered out even though it should be: we hit the assert in `MemPointerSummand` constructor, `assert(!_scale.is_zero(), "non-zero scale");`. > - See [JDK-8368578](https://bugs.openjdk.org/browse/JDK-8368578), though those tests seem to only fail on ARM32, and nowhere else. > - I was able to construct a `MemorySegment` regression test, see `TestMemorySegmentFilterSummands.test1`. I suspect that the ARM32 failures happened on an array, as it failed in places like `BigInteger::implMultiplyToLen`. But now I was able to reproduce it with native memory, to get a pointer expression that has the same cancellation issue. > - If `scaleL` is `NaN`, and gets filtered even though it should not be: We get a non-trivial MemPointer that is missing a summand. So we will succeed in optimizing, but with wrong assumptions. We generate a runtime aliasing check that is incorrect, leading to wrong results. > - This was reported by the fuzzer, see attached `TestDoNotFilterNaNSummands`. > - I was also able to create a simpler example with `MemorySegments`, see attached `TestMemorySegmentFilterSummands.test2`. > > **Why did this slip through the cracks?** > > In https://github.com/openjdk/jdk/pull/24278 I added pretty extensive testing, even fuzzer style tests, see `TestAliasingFuzzer.java`. But I think all of those tests exercise `scale` that are in "nice" [int ranges](https://github.com/openjdk/jdk/pull/24278/files#diff-26de03e864a492fe8aa8178818968f2097b99cf36a763605e2fb11fbc04eedffR303-R322). Also the JavaFuzzer does not directly generate such long constants for array accesses (not possible without Unsafe I think), we were lucky that it generated the index with `%` that got optimized to some magic long constant. > > There is already an RFE filed for improvements to `TestAliasingFuzzer.java`: [JDK-8365985](https://bugs.openjdk.org/browse/JDK-836... This pull request has now been integrated. Changeset: 207fe55d Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/207fe55d90fd4fa1a53c876865b1c227518c170e Stats: 252 lines in 4 files changed: 250 ins; 0 del; 2 mod 8369902: C2 SuperWord: wrong result because filterin NaN instead of zero in MemPointerParser::canonicalize_raw_summands Co-authored-by: Manuel H?ssig Reviewed-by: mhaessig, kvn ------------- PR: https://git.openjdk.org/jdk/pull/27848 From epeter at openjdk.org Tue Oct 21 05:50:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 05:50:02 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v9] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 17:19:09 GMT, Shawn M Emery wrote: >> test/micro/org/openjdk/bench/javax/crypto/AESDecrypt.java line 44: >> >>> 42: >>> 43: @Param("10000000") >>> 44: private int count; >> >> Drive-by comment / question: >> Did you do all benchmarking with this single (quite large) size? How are the results for much smaller sizes? It may be worth it to just get a nice plot that goes over a range of sizes, to see if it behaves as expected. > > The benchmarks listed in the PR description execute tests for data sizes ranging from 16 to 10_000_000 bytes for decryption and encryption. The difference in performance between the old and new code were within SE. Ok, sure. Why not list all the relevant sizes in the benchmark itself then? But totally up to you. This was just a hint/drive-by comment, feel free to mark it as resolved :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27807#discussion_r2446783134 From epeter at openjdk.org Tue Oct 21 06:57:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 06:57:28 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v2] In-Reply-To: References: Message-ID: > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 > > TODO: improve `VerifyLoopOptimizations` to check that we can call `get_ctrl` on all live nodes after loop-opts. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into JDK-8370220-get-ctrl-documentation - code style - missing part - rename lazy methods - make helper method private - wip documentation and renaming - more documentation - JDK-8370220 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27892/files - new: https://git.openjdk.org/jdk/pull/27892/files/21911dc4..c3ab0698 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=00-01 Stats: 20763 lines in 619 files changed: 10651 ins; 7255 del; 2857 mod Patch: https://git.openjdk.org/jdk/pull/27892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27892/head:pull/27892 PR: https://git.openjdk.org/jdk/pull/27892 From epeter at openjdk.org Tue Oct 21 07:18:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 07:18:05 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v3] In-Reply-To: References: Message-ID: <84MqtoPPB-QssENCh58yZai2AXiBUmiJMchtZxclf50=.b43920a8-6e84-4904-81e9-5c2166a697d9@github.com> > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 > > TODO: improve `VerifyLoopOptimizations` to check that we can call `get_ctrl` on all live nodes after loop-opts. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27892/files - new: https://git.openjdk.org/jdk/pull/27892/files/c3ab0698..e0b6ad68 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=01-02 Stats: 8 lines in 2 files changed: 1 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/27892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27892/head:pull/27892 PR: https://git.openjdk.org/jdk/pull/27892 From epeter at openjdk.org Tue Oct 21 07:21:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 07:21:07 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v3] In-Reply-To: References: <7nfOylxfH7KPKOj291d3N3f67w5rKrGwhc17OKv77NY=.afb319e9-f3be-4330-abe9-b1c19b61dc75@github.com> Message-ID: On Mon, 20 Oct 2025 13:36:13 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/loopnode.hpp line 1176: >> >>> 1174: // - Update the node inputs of all uses. >>> 1175: // - Lazily update the ctrl and idom info of all uses, via a ctrl/idom forwarding. >>> 1176: void replace_ctrl_node_and_forward_ctrl_and_idom(Node *old_node, Node *new_node) { >> >> Maybe add here and/or in `install_lazy_ctrl_and_idom_forwarding()` an assert that we have a CFG nodes (i.e. `is_CFG()`) additionally to the `!has_ctrl()` asserts. > > Just noticed this: We often (intuitively?) seem to use "control" when talking about just some control nodes and "ctrl" when talking about the nodes found/fetched from `_loop_or_ctrl`. Under this light, we might want to name the method "replace_control_node_and_forward_ctrl_and_idom" to better distinguish them. But I don't have a strong opinion about it - your call ? I'm not sure we do that consistently. I think I'll just keep it as is, in a shorter form. The name is already quite long ? Adding the `is_CFG`, good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2446988758 From epeter at openjdk.org Tue Oct 21 07:29:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 07:29:41 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v4] In-Reply-To: References: Message-ID: <3TcuvopXlzCCI30K0S5aC07hKiGgQ1xkIOHbc_0ZmNo=.45f10950-7238-4d3f-a51e-53a281e3fa75@github.com> > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 > > TODO: improve `VerifyLoopOptimizations` to check that we can call `get_ctrl` on all live nodes after loop-opts. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: for Christian part 1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27892/files - new: https://git.openjdk.org/jdk/pull/27892/files/e0b6ad68..d9f7926e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27892/head:pull/27892 PR: https://git.openjdk.org/jdk/pull/27892 From epeter at openjdk.org Tue Oct 21 07:35:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 07:35:20 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v5] In-Reply-To: References: Message-ID: <9RB0Ls1iPyS_xeJWj5DLw5b_un7ueUWZ4bX5ix9F_kw=.3495c8e4-a6ed-430e-b7d8-a6e4acf2b99a@github.com> > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 > > TODO: improve `VerifyLoopOptimizations` to check that we can call `get_ctrl` on all live nodes after loop-opts. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27892/files - new: https://git.openjdk.org/jdk/pull/27892/files/d9f7926e..73bb42f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=03-04 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/27892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27892/head:pull/27892 PR: https://git.openjdk.org/jdk/pull/27892 From epeter at openjdk.org Tue Oct 21 07:42:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 07:42:13 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v5] In-Reply-To: <7nfOylxfH7KPKOj291d3N3f67w5rKrGwhc17OKv77NY=.afb319e9-f3be-4330-abe9-b1c19b61dc75@github.com> References: <7nfOylxfH7KPKOj291d3N3f67w5rKrGwhc17OKv77NY=.afb319e9-f3be-4330-abe9-b1c19b61dc75@github.com> Message-ID: On Mon, 20 Oct 2025 13:25:58 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > src/hotspot/share/opto/loopnode.hpp line 1154: > >> 1152: // forwarding in the future. >> 1153: // - When querying "idom": from some node get its old idom, which >> 1154: // may be dead but has a ctrl forwarding to the new and live > > Maybe add this: > Suggestion: > > // may be dead but has an idom forwarding (piggy-backing on '_loop_or_ctrl') to the new and live Thanks for the suggestion. I applied something that is inspired by your suggestion instead :) > src/hotspot/share/opto/loopnode.hpp line 1160: > >> 1158: // the entry for the old dead node now, and we do not have to update all >> 1159: // the nodes that had the old_node as their "get_ctrl" or "idom". We >> 1160: // clean up the forwarding links when we query "get_ctrl" or "idom". > > Suggestion: > > // clean up the forwarding links when we query "get_ctrl" or "idom" for these nodes the next time. Applied it but with a line break ;) > src/hotspot/share/opto/loopnode.hpp line 1161: > >> 1159: // the nodes that had the old_node as their "get_ctrl" or "idom". We >> 1160: // clean up the forwarding links when we query "get_ctrl" or "idom". >> 1161: void install_lazy_ctrl_and_idom_forwarding(Node* old_node, Node* new_node) { > > Maybe we don't need lazy sice "install forwarding" is already expressive enough: > > Suggestion: > > void install_ctrl_and_idom_forwarding(Node* old_node, Node* new_node) { Applied systematically :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447051099 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447056113 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447060094 From rrich at openjdk.org Tue Oct 21 07:43:06 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Oct 2025 07:43:06 GMT Subject: RFR: 8369946: Bytecode rewriting causes Java heap corruption on PPC In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 12:13:52 GMT, Martin Doerr wrote: > Like the aarch64 fix (https://github.com/openjdk/jdk/pull/27748). > PPC64 has additional requirements: > - It implements `fast_invokevfinal` which uses `ResolvedMethodEntry`. > - Speculative loads need to get prevented by memory barrier instructions (even on control dependent paths). > > I've refactored `load_field_entry` and `load_method_entry` into a common function and added support for rewritten "fast" Bytecodes. I'm using `isync` instructions because we already have a control dependency (via Bytecode dispatch). > > The `isync` instruction is relatively cheap in comparison to other memory barriers, but still introduces some performance loss. SPEC jvm98 with -Xint shows about 5% regression in `compress` sub-benchmark. The other sub-benchmarks are not significantly impacted. However, switching off `RewriteBytecodes` would cause a much higher performance loss. > > Note: I had also ported the `verify_field_offset` check and used it in the fastdebug and product build for testing, but couldn't catch any issue. Not included in this PR. I'm not planning to contribute it. Hi Martin, nasty bug this is. Thanks for doing the fixing on ppc. And great work by Justin finding it! I see that we've missed porting previous fixes to ppc that prevent reordering of the bytecode load with loads from ResolvedFieldEntry and -MethodEntry (JDK-8248219 and JDK-8327647). Now this is done in `load_field_or_method_entry()`. The change is good. The comment could be improved a little bit. Thanks, Richard. src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 492: > 490: > 491: if (for_fast_bytecode) { > 492: // Prevent loading inconsistent resolved info which may have been written by another thread. Suggestion: // Prevent speculative loading from ResolvedFieldEntry/ResolvedMethodEntry as it can miss the info written by another thread. ------------- Changes requested by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27867#pullrequestreview-3357882991 PR Review Comment: https://git.openjdk.org/jdk/pull/27867#discussion_r2446088467 From mchevalier at openjdk.org Tue Oct 21 07:46:31 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 21 Oct 2025 07:46:31 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v5] In-Reply-To: References: Message-ID: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Check before ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27586/files - new: https://git.openjdk.org/jdk/pull/27586/files/a47f6f70..72136811 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=03-04 Stats: 13 lines in 1 file changed: 13 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27586/head:pull/27586 PR: https://git.openjdk.org/jdk/pull/27586 From epeter at openjdk.org Tue Oct 21 07:47:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 07:47:07 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v5] In-Reply-To: <7nfOylxfH7KPKOj291d3N3f67w5rKrGwhc17OKv77NY=.afb319e9-f3be-4330-abe9-b1c19b61dc75@github.com> References: <7nfOylxfH7KPKOj291d3N3f67w5rKrGwhc17OKv77NY=.afb319e9-f3be-4330-abe9-b1c19b61dc75@github.com> Message-ID: <6fV6WCQAoTBHoBnxKZLvMdINAZgsMRtrw6ol63bqOGQ=.8df8a005-2b10-41ef-b533-74d184bf976a@github.com> On Mon, 20 Oct 2025 13:50:08 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > src/hotspot/share/opto/loopnode.hpp line 1262: > >> 1260: // forwarding installed, using "install_lazy_ctrl_and_idom_forwarding". >> 1261: // We now have to jump from the old (dead) ctrl node to the new (live) >> 1262: // ctrl/idom node, in possibly multiple ctrl/idom forwarding steps. > > Maybe for clarification since it's somehow surprising at first that we reuse `_loop_or_ctrl`: > Suggestion: > > // idom node, in possibly multiple idom forwarding steps. > // Note that we piggy back on `_loop_or_ctrl` do the the forwarding. Applied with even more information. > src/hotspot/share/opto/loopnode.hpp line 1274: > >> 1272: } >> 1273: >> 1274: Node* idom(uint didx) const { > > While at it: Maybe also name this `node_index`? I chose to replace it with `node_idx`, since it comes from `n->_idx`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447068474 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447075460 From mchevalier at openjdk.org Tue Oct 21 07:53:04 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 21 Oct 2025 07:53:04 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v4] In-Reply-To: <3jSctyKb4Zi-tG17Yn9xKACgwnJBVU079t5m7VcvoGA=.ef922908-c05a-4046-ad30-365b228ee089@github.com> References: <3jSctyKb4Zi-tG17Yn9xKACgwnJBVU079t5m7VcvoGA=.ef922908-c05a-4046-ad30-365b228ee089@github.com> Message-ID: <-NLFspKWDUZlgX_guoex9FBcKJ_BZhOKne4OFBLZHls=.f44b14ee-ebf6-4c57-b6a1-1771b884e6b0@github.com> On Fri, 17 Oct 2025 14:54:35 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> driver -> main > > Could `PhaseIdealLoop::eliminate_useless_zero_trip_guard()` maybe also be an option? We are looping through all loops and check the `OpaqueZeroTripGuardNodes` anyways there. I've added a check at the place suggested by @chhagedorn. Testing seems happy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3425247642 From epeter at openjdk.org Tue Oct 21 07:57:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 07:57:39 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v6] In-Reply-To: References: Message-ID: <59V3v8eASx0Rblb7t7fSAlMC4YOGOIHP4OGlUmOjbH8=.9894dc12-027d-4e17-b250-1c3c5dbfe395@github.com> > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: - Apply suggestions from code review Co-authored-by: Tobias Hartmann - more for Christian part 3 - more for Christian part 2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27892/files - new: https://git.openjdk.org/jdk/pull/27892/files/73bb42f5..a91396be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=04-05 Stats: 33 lines in 3 files changed: 10 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/27892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27892/head:pull/27892 PR: https://git.openjdk.org/jdk/pull/27892 From thartmann at openjdk.org Tue Oct 21 07:57:43 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 Oct 2025 07:57:43 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v5] In-Reply-To: <9RB0Ls1iPyS_xeJWj5DLw5b_un7ueUWZ4bX5ix9F_kw=.3495c8e4-a6ed-430e-b7d8-a6e4acf2b99a@github.com> References: <9RB0Ls1iPyS_xeJWj5DLw5b_un7ueUWZ4bX5ix9F_kw=.3495c8e4-a6ed-430e-b7d8-a6e4acf2b99a@github.com> Message-ID: On Tue, 21 Oct 2025 07:35:20 GMT, Emanuel Peter wrote: >> When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. >> >> Here, I'm doing the following: >> - Add more documentation, and improve it in other cases. >> - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. >> - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` >> - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` >> - Made some methods private, and added some additional asserts. >> >> I'd be more than happy for even better names, and suggestions how to improve the documentation further :) >> >> Related issues: >> https://github.com/openjdk/jdk/pull/27889 >> https://github.com/openjdk/jdk/pull/15720 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn src/hotspot/share/opto/loopUnswitch.cpp line 523: > 521: register_control(region, lp, new_multiversion_slow_proj); > 522: > 523: // Hook region into slow_path, in stead of the multiversion_slow_proj. Suggestion: // Hook region into slow_path, instead of the multiversion_slow_proj. src/hotspot/share/opto/loopnode.hpp line 1176: > 1174: // - Update the node inputs of all uses. > 1175: // - Lazily update the ctrl and idom info of all uses, via a ctrl/idom forwarding. > 1176: void replace_ctrl_node_and_forward_ctrl_and_idom(Node *old_node, Node *new_node) { Drive-by comment (sorry): I think this is way too heavy. Descriptive names are good but for complex methods, not all the semantics can be put into the name but should rather be in a comment. Couldn't we just name these `replace_and_update_ctrl` and `update_ctrl`? I think this entails updating idom which is dependent on ctrl. And the comments explain the details well enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447056340 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447097742 From epeter at openjdk.org Tue Oct 21 08:11:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 08:11:10 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v5] In-Reply-To: References: <9RB0Ls1iPyS_xeJWj5DLw5b_un7ueUWZ4bX5ix9F_kw=.3495c8e4-a6ed-430e-b7d8-a6e4acf2b99a@github.com> Message-ID: On Tue, 21 Oct 2025 07:51:04 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > src/hotspot/share/opto/loopnode.hpp line 1176: > >> 1174: // - Update the node inputs of all uses. >> 1175: // - Lazily update the ctrl and idom info of all uses, via a ctrl/idom forwarding. >> 1176: void replace_ctrl_node_and_forward_ctrl_and_idom(Node *old_node, Node *new_node) { > > Drive-by comment (sorry): I think this is way too heavy. Descriptive names are good but for complex methods, not all the semantics can be put into the name but should rather be in a comment. > > Couldn't we just name these `replace_and_update_ctrl` and `update_ctrl`? I think this entails updating idom which is dependent on ctrl. And the comments explain the details well enough. @TobiHartmann `update_ctrl`: Sounds too much like `set_ctrl`, but it is not at all similar conceptually. What about: - `install_ctrl_and_idom_forwarding` -> `forward_ctrl` - `replace_ctrl_node_and_forward_ctrl_and_idom` -> `replace_node_and_forward_ctrl` - Because it is really a specialization of `_igvn.replace_node`, so it should have some name similarity. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447148523 From thartmann at openjdk.org Tue Oct 21 08:11:11 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 Oct 2025 08:11:11 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v5] In-Reply-To: References: <9RB0Ls1iPyS_xeJWj5DLw5b_un7ueUWZ4bX5ix9F_kw=.3495c8e4-a6ed-430e-b7d8-a6e4acf2b99a@github.com> Message-ID: On Tue, 21 Oct 2025 08:06:27 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopnode.hpp line 1176: >> >>> 1174: // - Update the node inputs of all uses. >>> 1175: // - Lazily update the ctrl and idom info of all uses, via a ctrl/idom forwarding. >>> 1176: void replace_ctrl_node_and_forward_ctrl_and_idom(Node *old_node, Node *new_node) { >> >> Drive-by comment (sorry): I think this is way too heavy. Descriptive names are good but for complex methods, not all the semantics can be put into the name but should rather be in a comment. >> >> Couldn't we just name these `replace_and_update_ctrl` and `update_ctrl`? I think this entails updating idom which is dependent on ctrl. And the comments explain the details well enough. > > @TobiHartmann > > `update_ctrl`: Sounds too much like `set_ctrl`, but it is not at all similar conceptually. > > What about: > - `install_ctrl_and_idom_forwarding` -> `forward_ctrl` > - `replace_ctrl_node_and_forward_ctrl_and_idom` -> `replace_node_and_forward_ctrl` > - Because it is really a specialization of `_igvn.replace_node`, so it should have some name similarity. > > What do you think? Yes, that's much better, good with me! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447155999 From mchevalier at openjdk.org Tue Oct 21 08:14:37 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 21 Oct 2025 08:14:37 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean Message-ID: Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. There is one detail, we used to have void restore_major_progress(int progress) { _major_progress += progress; } It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. It has a weird semantics: Progress before | Progress after verification | Progress after restore ----------------|-----------------------------|----------------------- 0 | 0 | 0 1 | 0 | 1 0 | 1 | 1 1 | 1 | 2 It is rather a or than a restore, and a proper boolean version of that would be void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural void set_major_progress(bool progress) { _major_progress = progress; } that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. Thanks, Marc ------------- Commit messages: - major_progress as bool Changes: https://git.openjdk.org/jdk/pull/27912/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27912&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370077 Stats: 12 lines in 2 files changed: 1 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/27912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27912/head:pull/27912 PR: https://git.openjdk.org/jdk/pull/27912 From epeter at openjdk.org Tue Oct 21 08:20:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 08:20:26 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v7] In-Reply-To: References: Message-ID: <5NCbOn36uvy0cTen6j1ys_E4uDD-qWjW5N1B-doqCGU=.4f75ce02-f27c-4d9a-ace8-5073f211d78f@github.com> > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: renaming for Tobias ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27892/files - new: https://git.openjdk.org/jdk/pull/27892/files/a91396be..aaa91d18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=05-06 Stats: 49 lines in 8 files changed: 0 ins; 1 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/27892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27892/head:pull/27892 PR: https://git.openjdk.org/jdk/pull/27892 From epeter at openjdk.org Tue Oct 21 08:23:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 08:23:14 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v7] In-Reply-To: <7nfOylxfH7KPKOj291d3N3f67w5rKrGwhc17OKv77NY=.afb319e9-f3be-4330-abe9-b1c19b61dc75@github.com> References: <7nfOylxfH7KPKOj291d3N3f67w5rKrGwhc17OKv77NY=.afb319e9-f3be-4330-abe9-b1c19b61dc75@github.com> Message-ID: On Mon, 20 Oct 2025 13:55:49 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> renaming for Tobias > > Thanks a lot for following up with a documentation and renaming! Some small suggestions, otherwise, looks good! > > Note: There are some build failures in GHA. @chhagedorn Thanks for all the suggestions / comments, I think I applied them all now :) @TobiHartmann thanks for the drive-by comment / suggestion. @rwestrel You are somewhat familiar with this code, would you mind reviewing? @vnkozlov You may also want to have quick look, just so you are aware of the renamings. But feel free to also review ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27892#issuecomment-3425369640 From mhaessig at openjdk.org Tue Oct 21 08:24:24 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 21 Oct 2025 08:24:24 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v7] In-Reply-To: References: <3HJu_S1E43xIC7b_KglB7EUTIqry7lLFmgvC75OlOwc=.aef64d7e-2a60-4fa7-934b-00adf7dd5e9c@github.com> Message-ID: <8ZvN9aTsbmdcUm8x1mI38ReDtheZm1tl_zC2fZ9u9Q8=.7bd7a026-3868-4238-a1bc-6db57438bef6@github.com> On Sun, 19 Oct 2025 19:17:26 GMT, Tobias Hotz wrote: >> src/hotspot/share/opto/divnode.cpp line 651: >> >>> 649: if( (t1 == bot) || (t2 == bot) || >>> 650: (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM) ) >>> 651: return bot; >> >> I think this can be removed - and in cases where one side is the local bottom (i.e., `TypeInt::INT`) and the other is more restricted, the result should even more precise after removing. Could you also add tests for such cases? For example dividing `TypeInt::INT` by some interval with a lower bound of 2, the resulting range can be narrowed. Similarly, dividing some small interval `[lo, hi]` by `TypeInt::INT` should result in a similar interval with bounds adjusted to deal with sign changes. If I didn't miss something, your code should already be able to deal with this, it's just this early return here preventing it. > > I think you are correct. The only part where I am not sure is if every instance where i1/i2 can be Type::BOTTOM, i1/i2 can be cast to TypeInt. > Can someone please confirm the removal is safe? `Type::BOTTOM` should not happen, since we are in a method `DivI/L` and thus know that the inputs should be some particular type. I think removing this check is fine, since you are doing `t1->is_int()` below, which would assert for `t1 == Type::BOTTOM`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2447203932 From mli at openjdk.org Tue Oct 21 08:32:11 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 Oct 2025 08:32:11 GMT Subject: RFR: 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this trivial patch? > `verify_xxx` verify_xxx in interp_masm_riscv.hpp should be consistent. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: add NOT_DEBUG_RETURN ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27894/files - new: https://git.openjdk.org/jdk/pull/27894/files/47ee73bc..6dbe8748 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27894&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27894&range=00-01 Stats: 14 lines in 3 files changed: 0 ins; 12 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27894.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27894/head:pull/27894 PR: https://git.openjdk.org/jdk/pull/27894 From mli at openjdk.org Tue Oct 21 08:32:12 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 Oct 2025 08:32:12 GMT Subject: RFR: 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 09:36:49 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > `verify_xxx` verify_xxx in interp_masm_riscv.hpp should be consistent. > > Thanks! Ah, you're right. Thanks! I'll make the `verify_` methods in interp_masm_riscv.hpp consistent with each other. Can you have another look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27894#issuecomment-3425378756 From rcastanedalo at openjdk.org Tue Oct 21 08:36:38 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 21 Oct 2025 08:36:38 GMT Subject: RFR: 8370031: Make RegMask copy constructor explicit and replace RegMask operator= with named function [v2] In-Reply-To: <4q9CrlwVAibAEIySihIaFIV24xTs1_uT2QpnJag7BX0=.fd0693e7-f11b-4224-9dbb-93a116a64a6f@github.com> References: <4q9CrlwVAibAEIySihIaFIV24xTs1_uT2QpnJag7BX0=.fd0693e7-f11b-4224-9dbb-93a116a64a6f@github.com> Message-ID: On Mon, 20 Oct 2025 11:59:58 GMT, Daniel Lund?n wrote: >> The `RegMask` copy constructor is currently non-explicit. We should make it explicit so that we do not unintentionally copy register masks. >> >> Additionally, we currently overload `operator=` in `RegMask` to do a deep copy. It is preferable to use an explicit named function instead, according to the HotSpot coding style. >> >> ### Changeset >> >> - Make the `RegMask` copy constructor explicit. >> - Fix compilation errors as a result of the now explicit constructor. Specifically, the methods `Matcher::divI_proj_mask`, `Matcher::modI_proj_mask`, `Matcher::divL_proj_mask`, and `Matcher::modL_proj_mask` all use implicit copy construction (likely unintended). Change the methods to return `const RegMask&` instead of `RegMask` and correspondingly change the return value from `RegMask()` to `RegMask::Empty` on some platforms. >> - Rename the old method `RegMask::copy` to `RegMask::assignFrom` to better describe its functionality, and make it public instead of private. >> - Delete `RegMask` copy assignment (`operator=`) and change all uses to the named function `assignFrom` instead. >> - Fix various syntax issues at lines touched by the changeset. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18589208499) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge remote-tracking branch 'upstream/master' into regmask-explicit-8370031 > - Fix issue Looks good, thanks. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27891#pullrequestreview-3359414515 From thartmann at openjdk.org Tue Oct 21 08:38:23 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 Oct 2025 08:38:23 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v7] In-Reply-To: <5NCbOn36uvy0cTen6j1ys_E4uDD-qWjW5N1B-doqCGU=.4f75ce02-f27c-4d9a-ace8-5073f211d78f@github.com> References: <5NCbOn36uvy0cTen6j1ys_E4uDD-qWjW5N1B-doqCGU=.4f75ce02-f27c-4d9a-ace8-5073f211d78f@github.com> Message-ID: On Tue, 21 Oct 2025 08:20:26 GMT, Emanuel Peter wrote: >> When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. >> >> Here, I'm doing the following: >> - Add more documentation, and improve it in other cases. >> - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. >> - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` >> - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` >> - Made some methods private, and added some additional asserts. >> >> I'd be more than happy for even better names, and suggestions how to improve the documentation further :) >> >> Related issues: >> https://github.com/openjdk/jdk/pull/27889 >> https://github.com/openjdk/jdk/pull/15720 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > renaming for Tobias Nice improvement, looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27892#pullrequestreview-3359424704 From mhaessig at openjdk.org Tue Oct 21 08:50:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 21 Oct 2025 08:50:17 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v8] In-Reply-To: References: Message-ID: On Sun, 19 Oct 2025 19:20:58 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: > > - Add additional nodes to fail conditions to detect idealized/transformed DivI Nodes that did not constant fold > - Remove checks for bottom and reorganize DivI/DivL Value functions Thank you for addressing all the comments and no worries about the delay. This looks good to me now. I just kicked off testing and will report back with the results. ------------- PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3359482818 From mdoerr at openjdk.org Tue Oct 21 08:57:41 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 21 Oct 2025 08:57:41 GMT Subject: RFR: 8369946: Bytecode rewriting causes Java heap corruption on PPC [v2] In-Reply-To: References: Message-ID: > Like the aarch64 fix (https://github.com/openjdk/jdk/pull/27748). > PPC64 has additional requirements: > - It implements `fast_invokevfinal` which uses `ResolvedMethodEntry`. > - Speculative loads need to get prevented by memory barrier instructions (even on control dependent paths). > > I've refactored `load_field_entry` and `load_method_entry` into a common function and added support for rewritten "fast" Bytecodes. I'm using `isync` instructions because we already have a control dependency (via Bytecode dispatch). > > The `isync` instruction is relatively cheap in comparison to other memory barriers, but still introduces some performance loss. SPEC jvm98 with -Xint shows about 5% regression in `compress` sub-benchmark. The other sub-benchmarks are not significantly impacted. However, switching off `RewriteBytecodes` would cause a much higher performance loss. > > Note: I had also ported the `verify_field_offset` check and used it in the fastdebug and product build for testing, but couldn't catch any issue. Not included in this PR. I'm not planning to contribute it. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Improve comment. Co-authored-by: Richard Reingruber ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27867/files - new: https://git.openjdk.org/jdk/pull/27867/files/138df669..773b82f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27867&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27867&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27867.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27867/head:pull/27867 PR: https://git.openjdk.org/jdk/pull/27867 From bmaillard at openjdk.org Tue Oct 21 08:59:36 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 21 Oct 2025 08:59:36 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node Message-ID: This PR prevents hitting an assert caused by encountering `top` while following the memory slice associated with a field when eliminating allocations in macro node elimination. This situation is the result of another elimination (boxing node elimination) that happened at the same macro expansion iteration. ### Analysis The issue appears in the macro expansion phase. We have a nested `synchronized` block, with the outer block synchronizing on `new A()` and the inner one on `TestTopInMacroElimination.class`. In the inner `synchronized` block we have an `Integer.valueOf` call in a loop. In `PhaseMacroExpand::eliminate_boxing_node` we are getting rid of the `Integer.valueOf` call, as it is a non-escaping boxing node. After having eliminated the call, `PhaseMacroExpand::process_users_of_allocation` takes care of the users of the removed node. There, we replace usages of the fallthrough memory projection with `top`. In the same macro expansion iteration, we later attempt to get rid of the `new A()` allocation in `PhaseMacroExpand::create_scalarized_object_description`. There, we have to make sure that all safepoints can still see the object fields as if the allocation was never deleted. For this, we attempt to find the last value on the slice of each specific field (`a` in this case). Because field `a` is never written to, and it is not explicitely initialized, there is no `Store` associated to it and not even a dedicated memory slice (we end up taking the `Bot` input on `MergeMem` nodes). By going up the memory chain, we eventually encounter the `MemBarReleaseLock` whose input was set to `top`. This is where the assert is hit. ### Proposed Fix In the end I opted for an analog fix as the similar [JDK-8325030](https://git.openjdk.org/jdk/pull/23104). If we get `top` from `scan_mem_chain` in `PhaseMacroExpand::value_from_mem`, then we can safely return `top` as well. This means that the safepoint will have `top` as data input, but this will eventually cleaned up by the next round of IGVN. Another (tempting) option would have been to simply return `nullptr` from `PhaseMacroExpand::value_from_mem` when encoutering `top`. However this would result in bailing out from eliminating this allocation temporarily and effectively delaying it to a subsqequent macro expansion round. ### Testing - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8362832) - [x] tier1-4, plus some internal testing Thank you for reviewing! ------------- Commit messages: - Add comment - Remove file commited by mistake - Add issue number to jtreg headers - 8362832: Return top in value_from_mem as last value on the slice if the path is dead - 8362832: Remove test from problemlist Changes: https://git.openjdk.org/jdk/pull/27903/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27903&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8362832 Stats: 8 lines in 3 files changed: 4 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27903/head:pull/27903 PR: https://git.openjdk.org/jdk/pull/27903 From aseoane at openjdk.org Tue Oct 21 09:03:24 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 21 Oct 2025 09:03:24 GMT Subject: RFR: 8367690: C2: Unneeded branch in reduce_phi Message-ID: This PR carries out a minor cleanup found in the Phi reduction code. The combination of the branch and assert is redundant as the assert will always trigger. We can either remove the `else if` branch or change the assert for a `ShouldNotReachHere`. I have gone for the former option here. **Testing:** passes tiers 1-3 ------------- Commit messages: - Merge branch 'openjdk:master' into JDK-8367690 - Merge branch 'openjdk:master' into JDK-8367690 - Cleanup Changes: https://git.openjdk.org/jdk/pull/27849/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27849&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8367690 Stats: 3 lines in 1 file changed: 1 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27849.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27849/head:pull/27849 PR: https://git.openjdk.org/jdk/pull/27849 From rcastanedalo at openjdk.org Tue Oct 21 09:15:07 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 21 Oct 2025 09:15:07 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v3] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 15:28:01 GMT, Roland Westrelin wrote: >> In the `test1()` method of the test case: >> >> `inlined2()` calls `clone()` for an object loaded from field `field` >> that has inexact type `A` at parse time. The intrinsic for `clone()` >> inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the >> load of `field` is optimized out because it reads back a newly >> allocated `B` written to `field` in the same method. `ArrayCopy` can >> now be optimized because the type of its `src` input is known. The >> type of its `dest` input is the `CheckCastPP` from the allocation of >> the cloned object created at parse time. That one has type `A`. A >> series of `Load`s/`Store`s are created to copy the fields of class `B` >> from `src` (of type `B`) to `dest` of (type `A`). >> >> Writting to `dest` with offsets for fields that don't exist in `A`, >> causes this code in `Compile::flatten_alias_type()`: >> >> >> } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { >> // Static fields are in the space above the normal instance >> // fields in the java.lang.Class instance. >> if (ik != ciEnv::current()->Class_klass()) { >> to = nullptr; >> tj = TypeOopPtr::BOTTOM; >> offset = tj->offset(); >> } >> >> >> to assign it some slice that doesn't match the one that's used at the >> same offset in `B`. >> >> That causes an assert in `ArrayCopyNode::try_clone_instance()` to >> fire. With a release build, execution proceeds. `test1()` also has a >> non escaping allocation. That one causes EA to run and >> `ConnectionGraph::split_unique_types()` to move the store to the non >> escaping allocation to a new slice. In the process, when it iterates >> over `MergeMem` nodes, it notices the stores added by >> `ArrayCopyNode::try_clone_instance()`, finds that some are not on the >> right slice, tries to move them to the correct slice (expecting they >> are from a non escaping EA). That causes some of the `Store`s to be >> disconnected. When the resulting code runs, execution fails as some >> fields are not copied. >> >> The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` >> when `src` and `dest` classes don't match as this seems like a rare >> enough corner case. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8339526 > - review > - Merge branch 'master' into JDK-8339526 > - Update src/hotspot/share/opto/arraycopynode.cpp > > Co-authored-by: Christian Hagedorn > - test & fix Nice catch! The proposed solution looks good to me, I will just run some internal testing on the latest version and a set of benchmarks to increase the confidence that this is indeed a very corner case. test/hotspot/jtreg/compiler/arraycopy/TestCloneUnknownClassAtParseTime.java line 1: > 1: /* Please add a package declaration (`package compiler.arraycopy;`). It would also be valuable if you could incorporate the detailed failure analysis from the PR description into this test file. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27604#pullrequestreview-3359590599 PR Review Comment: https://git.openjdk.org/jdk/pull/27604#discussion_r2447395755 From haosun at openjdk.org Tue Oct 21 09:18:03 2025 From: haosun at openjdk.org (Hao Sun) Date: Tue, 21 Oct 2025 09:18:03 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Thu, 25 Sep 2025 03:08:47 GMT, Xiaohong Gong wrote: > The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. > > For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. > > These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. > > This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. > > It also modifies the Vector API jtreg tests for well testing. Here is the details: > > 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: > > VectorMaskToLong (VectorLongToMask l) => l > > Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. > > 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". > > Performance shows significant improvement on NVIDIA's Grace CPU. > > Here is the performance data with `-XX:UseSVE=2`: > > Benchmark bits inputs Mode Unit Before After Gain > MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 > MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 > MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 > MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 > MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/ms 101031... LGTM. Thanks for your work. ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3359613377 From rcastanedalo at openjdk.org Tue Oct 21 09:28:04 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 21 Oct 2025 09:28:04 GMT Subject: RFR: 8367690: C2: Unneeded branch in reduce_phi In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 14:49:02 GMT, Anton Seoane Ampudia wrote: > This PR carries out a minor cleanup found in the Phi reduction code. > > The combination of the branch and assert is redundant as the assert will always trigger. We can either remove the `else if` branch or change the assert for a `ShouldNotReachHere`. > > I have gone for the former option here. > > **Testing:** passes tiers 1-3 Looks good, thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27849#pullrequestreview-3359665908 From aseoane at openjdk.org Tue Oct 21 09:31:46 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 21 Oct 2025 09:31:46 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) Message-ID: This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. **Testing:** passes tiers 1-5 ------------- Commit messages: - Merge branch 'openjdk:master' into JDK-8347463 - Change to a more specific type - Runtime call had void type but actually returned an object Changes: https://git.openjdk.org/jdk/pull/27913/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347463 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From rrich at openjdk.org Tue Oct 21 09:33:15 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Oct 2025 09:33:15 GMT Subject: RFR: 8369946: Bytecode rewriting causes Java heap corruption on PPC [v2] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 08:57:41 GMT, Martin Doerr wrote: >> Like the aarch64 fix (https://github.com/openjdk/jdk/pull/27748). >> PPC64 has additional requirements: >> - It implements `fast_invokevfinal` which uses `ResolvedMethodEntry`. >> - Speculative loads need to get prevented by memory barrier instructions (even on control dependent paths). >> >> I've refactored `load_field_entry` and `load_method_entry` into a common function and added support for rewritten "fast" Bytecodes. I'm using `isync` instructions because we already have a control dependency (via Bytecode dispatch). >> >> The `isync` instruction is relatively cheap in comparison to other memory barriers, but still introduces some performance loss. SPEC jvm98 with -Xint shows about 5% regression in `compress` sub-benchmark. The other sub-benchmarks are not significantly impacted. However, switching off `RewriteBytecodes` would cause a much higher performance loss. >> >> Note: I had also ported the `verify_field_offset` check and used it in the fastdebug and product build for testing, but couldn't catch any issue. Not included in this PR. I'm not planning to contribute it. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Improve comment. > > Co-authored-by: Richard Reingruber Marked as reviewed by rrich (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27867#pullrequestreview-3359692023 From chagedorn at openjdk.org Tue Oct 21 09:38:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 21 Oct 2025 09:38:14 GMT Subject: RFR: 8367690: C2: Unneeded branch in reduce_phi In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 14:49:02 GMT, Anton Seoane Ampudia wrote: > This PR carries out a minor cleanup found in the Phi reduction code. > > The combination of the branch and assert is redundant as the assert will always trigger. We can either remove the `else if` branch or change the assert for a `ShouldNotReachHere`. > > I have gone for the former option here. > > **Testing:** passes tiers 1-3 Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27849#pullrequestreview-3359711380 From chagedorn at openjdk.org Tue Oct 21 09:45:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 21 Oct 2025 09:45:17 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v7] In-Reply-To: <5NCbOn36uvy0cTen6j1ys_E4uDD-qWjW5N1B-doqCGU=.4f75ce02-f27c-4d9a-ace8-5073f211d78f@github.com> References: <5NCbOn36uvy0cTen6j1ys_E4uDD-qWjW5N1B-doqCGU=.4f75ce02-f27c-4d9a-ace8-5073f211d78f@github.com> Message-ID: On Tue, 21 Oct 2025 08:20:26 GMT, Emanuel Peter wrote: >> When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. >> >> Here, I'm doing the following: >> - Add more documentation, and improve it in other cases. >> - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. >> - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` >> - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` >> - Made some methods private, and added some additional asserts. >> >> I'd be more than happy for even better names, and suggestions how to improve the documentation further :) >> >> Related issues: >> https://github.com/openjdk/jdk/pull/27889 >> https://github.com/openjdk/jdk/pull/15720 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > renaming for Tobias Good naming update! The update looks good to me apart from some last nits. src/hotspot/share/opto/loopnode.hpp line 1161: > 1159: // future. > 1160: // Note: while the "idom" information is stored in the "_idom" > 1161: // side-table, the idom forwarding piggy-packs on the ctrl Suggestion: // side-table, the idom forwarding piggybacks on the ctrl src/hotspot/share/opto/loopnode.hpp line 1182: > 1180: // - Update the node inputs of all uses. > 1181: // - Lazily update the ctrl and idom info of all uses, via a ctrl/idom forwarding. > 1182: void replace_node_and_forward_ctrl(Node *old_node, Node *new_node) { Suggestion: void replace_node_and_forward_ctrl(Node* old_node, Node* new_node) { src/hotspot/share/opto/loopnode.hpp line 1269: > 1267: // the old (dead) idom node to the new (live) idom node, in possibly > 1268: // multiple idom forwarding steps. > 1269: // Note that we piggy-back on "_loop_or_ctrl" to do the forwarding, Suggestion: // Note that we piggyback on "_loop_or_ctrl" to do the forwarding, ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27892#pullrequestreview-3359723589 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447507606 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447512959 PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2447509451 From duke at openjdk.org Tue Oct 21 09:46:17 2025 From: duke at openjdk.org (erifan) Date: Tue, 21 Oct 2025 09:46:17 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Thu, 25 Sep 2025 03:08:47 GMT, Xiaohong Gong wrote: > The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. > > For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. > > These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. > > This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. > > It also modifies the Vector API jtreg tests for well testing. Here is the details: > > 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: > > VectorMaskToLong (VectorLongToMask l) => l > > Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. > > 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". > > Performance shows significant improvement on NVIDIA's Grace CPU. > > Here is the performance data with `-XX:UseSVE=2`: > > Benchmark bits inputs Mode Unit Before After Gain > MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 > MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 > MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 > MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 > MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/ms 101031... LGTM, reviewed internally. ------------- Marked as reviewed by erifan at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3359737149 From fyang at openjdk.org Tue Oct 21 10:54:28 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 Oct 2025 10:54:28 GMT Subject: RFR: 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 08:32:11 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this trivial patch? >> `verify_xxx` verify_xxx in interp_masm_riscv.hpp should be consistent. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add NOT_DEBUG_RETURN src/hotspot/cpu/riscv/interp_masm_riscv.hpp line 306: > 304: void verify_access_flags(Register access_flags, uint32_t flag, > 305: const char* msg, bool stop_by_hit = true) NOT_DEBUG_RETURN; > 306: void verify_frame_setup() NOT_DEBUG_RETURN; OK. Then can you remove the surrounding ASSERT of the use sites for consistency? It becomes unnecessary after adding `NOT_DEBUG_RETURN`. One example: 1075 // start execution 1076 #ifdef ASSERT 1077 __ verify_frame_setup(); 1078 #endif ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27894#discussion_r2447668261 From mli at openjdk.org Tue Oct 21 10:54:29 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 Oct 2025 10:54:29 GMT Subject: RFR: 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 10:45:43 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add NOT_DEBUG_RETURN > > src/hotspot/cpu/riscv/interp_masm_riscv.hpp line 306: > >> 304: void verify_access_flags(Register access_flags, uint32_t flag, >> 305: const char* msg, bool stop_by_hit = true) NOT_DEBUG_RETURN; >> 306: void verify_frame_setup() NOT_DEBUG_RETURN; > > OK. Then can you remove the surrounding ASSERT of the use sites for consistency? It becomes unnecessary after adding `NOT_DEBUG_RETURN`. One example: > > 1075 // start execution > 1076 #ifdef ASSERT > 1077 __ verify_frame_setup(); > 1078 #endif I think the `ASSERT`s around invocation of `verify_frame_setup` are already removed in this patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27894#discussion_r2447677408 From fyang at openjdk.org Tue Oct 21 10:54:30 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 Oct 2025 10:54:30 GMT Subject: RFR: 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 10:50:01 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/interp_masm_riscv.hpp line 306: >> >>> 304: void verify_access_flags(Register access_flags, uint32_t flag, >>> 305: const char* msg, bool stop_by_hit = true) NOT_DEBUG_RETURN; >>> 306: void verify_frame_setup() NOT_DEBUG_RETURN; >> >> OK. Then can you remove the surrounding ASSERT of the use sites for consistency? It becomes unnecessary after adding `NOT_DEBUG_RETURN`. One example: >> >> 1075 // start execution >> 1076 #ifdef ASSERT >> 1077 __ verify_frame_setup(); >> 1078 #endif > > I think the `ASSERT`s around invocation of `verify_frame_setup` are already removed in this patch. Ah, yes for this one. Sorry, I missed that. What about `verify_access_flags`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27894#discussion_r2447680968 From mli at openjdk.org Tue Oct 21 11:06:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 Oct 2025 11:06:03 GMT Subject: RFR: 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 10:51:38 GMT, Fei Yang wrote: >> I think the `ASSERT`s around invocation of `verify_frame_setup` are already removed in this patch. > > Ah, yes for this one. Sorry, I missed that. What about `verify_access_flags`? I also noticed `verify_access_flags`, but as all its usages are accompanied by some non-`NOT_DEBUG_RETURN` like `load_unsigned_short`, I think it's best to keep it as it is. How do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27894#discussion_r2447707795 From fyang at openjdk.org Tue Oct 21 11:27:26 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 Oct 2025 11:27:26 GMT Subject: RFR: 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp [v2] In-Reply-To: References: Message-ID: <34ICSgSHhlPgGKV7y_Ri_254S9Y4W-xqRAq5KOdB3z0=.ecfb0532-a19c-4f9b-bd26-0d4df04a7f40@github.com> On Tue, 21 Oct 2025 08:32:11 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this trivial patch? >> `verify_xxx` verify_xxx in interp_masm_riscv.hpp should be consistent. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add NOT_DEBUG_RETURN Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27894#pullrequestreview-3360088429 From fyang at openjdk.org Tue Oct 21 11:27:28 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 Oct 2025 11:27:28 GMT Subject: RFR: 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 11:03:49 GMT, Hamlin Li wrote: >> Ah, yes for this one. Sorry, I missed that. What about `verify_access_flags`? > > I also noticed `verify_access_flags`, but as all its usages are accompanied by some non-`NOT_DEBUG_RETURN` like `load_unsigned_short`, I think it's best to keep it as it is. How do you think? Yes, I see that. I think that's OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27894#discussion_r2447781727 From mli at openjdk.org Tue Oct 21 11:30:26 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 21 Oct 2025 11:30:26 GMT Subject: RFR: 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp [v2] In-Reply-To: <34ICSgSHhlPgGKV7y_Ri_254S9Y4W-xqRAq5KOdB3z0=.ecfb0532-a19c-4f9b-bd26-0d4df04a7f40@github.com> References: <34ICSgSHhlPgGKV7y_Ri_254S9Y4W-xqRAq5KOdB3z0=.ecfb0532-a19c-4f9b-bd26-0d4df04a7f40@github.com> Message-ID: On Tue, 21 Oct 2025 11:24:26 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add NOT_DEBUG_RETURN > > Thanks! @RealFYang Thank you for the quick response! I think this is a trivial one, will integrate it later. Otherwise please let me know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27894#issuecomment-3426109389 From rcastanedalo at openjdk.org Tue Oct 21 11:44:08 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 21 Oct 2025 11:44:08 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v3] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 15:28:01 GMT, Roland Westrelin wrote: >> In the `test1()` method of the test case: >> >> `inlined2()` calls `clone()` for an object loaded from field `field` >> that has inexact type `A` at parse time. The intrinsic for `clone()` >> inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the >> load of `field` is optimized out because it reads back a newly >> allocated `B` written to `field` in the same method. `ArrayCopy` can >> now be optimized because the type of its `src` input is known. The >> type of its `dest` input is the `CheckCastPP` from the allocation of >> the cloned object created at parse time. That one has type `A`. A >> series of `Load`s/`Store`s are created to copy the fields of class `B` >> from `src` (of type `B`) to `dest` of (type `A`). >> >> Writting to `dest` with offsets for fields that don't exist in `A`, >> causes this code in `Compile::flatten_alias_type()`: >> >> >> } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { >> // Static fields are in the space above the normal instance >> // fields in the java.lang.Class instance. >> if (ik != ciEnv::current()->Class_klass()) { >> to = nullptr; >> tj = TypeOopPtr::BOTTOM; >> offset = tj->offset(); >> } >> >> >> to assign it some slice that doesn't match the one that's used at the >> same offset in `B`. >> >> That causes an assert in `ArrayCopyNode::try_clone_instance()` to >> fire. With a release build, execution proceeds. `test1()` also has a >> non escaping allocation. That one causes EA to run and >> `ConnectionGraph::split_unique_types()` to move the store to the non >> escaping allocation to a new slice. In the process, when it iterates >> over `MergeMem` nodes, it notices the stores added by >> `ArrayCopyNode::try_clone_instance()`, finds that some are not on the >> right slice, tries to move them to the correct slice (expecting they >> are from a non escaping EA). That causes some of the `Store`s to be >> disconnected. When the resulting code runs, execution fails as some >> fields are not copied. >> >> The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` >> when `src` and `dest` classes don't match as this seems like a rare >> enough corner case. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8339526 > - review > - Merge branch 'master' into JDK-8339526 > - Update src/hotspot/share/opto/arraycopynode.cpp > > Co-authored-by: Christian Hagedorn > - test & fix > I will just run (...) a set of benchmarks to increase the confidence that this is indeed a very corner case. I ran DaCapo 23 and did not hit the problematic case once. The regular case (exactly same type) is exercised by more than half of the DaCapo 23 benchmarks. Will come back with test results in a day or two. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27604#issuecomment-3426170845 From fandreuzzi at openjdk.org Tue Oct 21 11:54:08 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Tue, 21 Oct 2025 11:54:08 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 08:07:15 GMT, Marc Chevalier wrote: > Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. > > There is one detail, we used to have > > void restore_major_progress(int progress) { _major_progress += progress; } > > > It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. > > It has a weird semantics: > > Progress before | Progress after verification | Progress after restore > ----------------|-----------------------------|----------------------- > 0 | 0 | 0 > 1 | 0 | 1 > 0 | 1 | 1 > 1 | 1 | 2 > > It is rather a or than a restore, and a proper boolean version of that would be > > void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } > > but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural > > void set_major_progress(bool progress) { _major_progress = progress; } > > that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. > > Thanks, > Marc src/hotspot/share/opto/compile.hpp line 325: > 323: bool _allow_macro_nodes; // True if we allow creation of macro nodes. > 324: > 325: bool _major_progress; // Count of something big happening Perhaps the comment should be updated too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2447867922 From jbhateja at openjdk.org Tue Oct 21 11:56:31 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 21 Oct 2025 11:56:31 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v5] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Limiting register biasing to NDD specific demotable instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/5ae56d8d..1ab0ac92 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=03-04 Stats: 107 lines in 9 files changed: 89 ins; 1 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From mchevalier at openjdk.org Tue Oct 21 12:01:43 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 21 Oct 2025 12:01:43 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 11:51:29 GMT, Francesco Andreuzzi wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore >> ----------------|-----------------------------|----------------------- >> 0 | 0 | 0 >> 1 | 0 | 1 >> 0 | 1 | 1 >> 1 | 1 | 2 >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > src/hotspot/share/opto/compile.hpp line 325: > >> 323: bool _allow_macro_nodes; // True if we allow creation of macro nodes. >> 324: >> 325: bool _major_progress; // Count of something big happening > > Perhaps the comment should be updated too? That makes sense. A suggestion? Maybe "Whether something big happened"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2447889101 From jbhateja at openjdk.org Tue Oct 21 12:17:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 21 Oct 2025 12:17:02 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v6] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 - Limiting register biasing to NDD specific demotable instructions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 - Fix jtreg, one less spill - Updating as per reivew suggestions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 - Some refactoring - 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions ------------- Changes: https://git.openjdk.org/jdk/pull/26283/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=05 Stats: 177 lines in 9 files changed: 158 ins; 8 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From epeter at openjdk.org Tue Oct 21 12:17:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 12:17:23 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v8] In-Reply-To: References: Message-ID: > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27892/files - new: https://git.openjdk.org/jdk/pull/27892/files/aaa91d18..ac057395 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=06-07 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27892/head:pull/27892 PR: https://git.openjdk.org/jdk/pull/27892 From jbhateja at openjdk.org Tue Oct 21 12:24:06 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 21 Oct 2025 12:24:06 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v6] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 12:17:02 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 > - Limiting register biasing to NDD specific demotable instructions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 > - Fix jtreg, one less spill > - Updating as per reivew suggestions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 > - Some refactoring > - 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions Current scheme of validation is manual:- 1) Revert https://github.com/openjdk/jdk/pull/27320, since SDE 9.58 does not support APX_NCI_NDD_NF flag yet. 2) Static register allocation ordering change in x86_64.ad to always perference to EGPR R16-R31 during allocation. 3) Register allocation biasing facilitates demotion, which happens in the assembler layer. 4) Added debug messages in demotable assembler routines. 5) Inspected the assembler encoding in Intel xed64 6) Ran following tests with -XX:-UseSuperWord to exercise various NDD demotable instructions with Intel SDE 9.58. - test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java - test/hotspot/jtreg/compiler/c2/cr6340864/TestLongVect.java **By limiting the scope of the fix to NDD specific instructions we have now mitigated any unwanted performance side effects on any other backend OR non-APX x86 backend.** We do have existing tests in place for functional correctness of NDD assembler instructions https://github.com/openjdk/jdk/blob/master/test/hotspot/gtest/x86/x86-asmtest.py ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3426307551 From epeter at openjdk.org Tue Oct 21 12:50:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 12:50:53 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Thu, 25 Sep 2025 03:08:47 GMT, Xiaohong Gong wrote: > The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. > > For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. > > These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. > > This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. > > It also modifies the Vector API jtreg tests for well testing. Here is the details: > > 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: > > VectorMaskToLong (VectorLongToMask l) => l > > Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. > > 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". > > Performance shows significant improvement on NVIDIA's Grace CPU. > > Here is the performance data with `-XX:UseSVE=2`: > > Benchmark bits inputs Mode Unit Before After Gain > MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 > MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 > MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 > MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 > MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/ms 101031... I gave it a quick glance, and had some comments. I'll run some testing, and review more fully after :) src/hotspot/cpu/aarch64/aarch64_vector.ad line 392: > 390: // Return true if vector mask operation with "opcode" requires the mask to be > 391: // saved in a predicate register. > 392: bool Matcher::vector_mask_requires_predicate(int opcode, const TypeVect* vt) { What would be the alternative, if it is not in a predicate register? src/hotspot/cpu/riscv/riscv_v.ad line 169: > 167: > 168: // Return true if vector mask operation with "opcode" requires the mask to be > 169: // saved with predicate type. This comment is different than on some other platforms. Can you put the comment not at every platform, but rather in one single place: the `.hpp` file? ------------- PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3360323815 PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2447970424 PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2447999972 From epeter at openjdk.org Tue Oct 21 12:54:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 12:54:21 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 03:26:52 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Hi, could anyone please help take a look at this PR? Thanks a lot in advance! @XiaohongGong Actually, I just tried to submit via my standard script. It failed because of merging issues. Would you mind merging with master, so we are on the newest state? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3426462839 From dlunden at openjdk.org Tue Oct 21 13:20:14 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 21 Oct 2025 13:20:14 GMT Subject: RFR: 8370031: Make RegMask copy constructor explicit and replace RegMask operator= with named function [v2] In-Reply-To: <9iuxVrckqdjA7GrRqkSFQPVvhJDojDiNpqmKYga51GY=.f58fca84-b84b-4708-93a1-1c29c06cc38e@github.com> References: <4q9CrlwVAibAEIySihIaFIV24xTs1_uT2QpnJag7BX0=.fd0693e7-f11b-4224-9dbb-93a116a64a6f@github.com> <9iuxVrckqdjA7GrRqkSFQPVvhJDojDiNpqmKYga51GY=.f58fca84-b84b-4708-93a1-1c29c06cc38e@github.com> Message-ID: On Mon, 20 Oct 2025 12:40:42 GMT, Manuel H?ssig wrote: >> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge remote-tracking branch 'upstream/master' into regmask-explicit-8370031 >> - Fix issue > > Thank you for cleaning this up, @dlunde. Your changes look good to me. Thanks for the reviews @mhaessig and @robcasloz! Sanity pre-integration tests look good, so integrating now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27891#issuecomment-3426583630 From dlunden at openjdk.org Tue Oct 21 13:20:16 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 21 Oct 2025 13:20:16 GMT Subject: Integrated: 8370031: Make RegMask copy constructor explicit and replace RegMask operator= with named function In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 07:42:01 GMT, Daniel Lund?n wrote: > The `RegMask` copy constructor is currently non-explicit. We should make it explicit so that we do not unintentionally copy register masks. > > Additionally, we currently overload `operator=` in `RegMask` to do a deep copy. It is preferable to use an explicit named function instead, according to the HotSpot coding style. > > ### Changeset > > - Make the `RegMask` copy constructor explicit. > - Fix compilation errors as a result of the now explicit constructor. Specifically, the methods `Matcher::divI_proj_mask`, `Matcher::modI_proj_mask`, `Matcher::divL_proj_mask`, and `Matcher::modL_proj_mask` all use implicit copy construction (likely unintended). Change the methods to return `const RegMask&` instead of `RegMask` and correspondingly change the return value from `RegMask()` to `RegMask::Empty` on some platforms. > - Rename the old method `RegMask::copy` to `RegMask::assignFrom` to better describe its functionality, and make it public instead of private. > - Delete `RegMask` copy assignment (`operator=`) and change all uses to the named function `assignFrom` instead. > - Fix various syntax issues at lines touched by the changeset. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18589208499) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. This pull request has now been integrated. Changeset: 2af4d20a Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/2af4d20abfda4113a2bfcf34dfad87187c0f584d Stats: 257 lines in 14 files changed: 34 ins; 35 del; 188 mod 8370031: Make RegMask copy constructor explicit and replace RegMask operator= with named function Reviewed-by: mhaessig, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/27891 From rcastanedalo at openjdk.org Tue Oct 21 13:51:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 21 Oct 2025 13:51:52 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <4UN1z9fhxeUqUGagnZVEIFOyDb_mP8WaWUBwWO2HjFA=.93b7c9ad-443c-4fff-810d-7fe805ccbfaa@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com> <4UN1z9fhxeUqUGagnZVEIFOyDb_mP8WaWUBwWO2HjFA=.93b7c9ad-443c-4fff-810d-7fe805ccbfaa@github.com> Message-ID: On Mon, 29 Sep 2025 08:40:01 GMT, Roland Westrelin wrote: >>> That sounds good to me, thank you for enforcing this Roland! I will re-run testing and have a new look at the changeset within the next days. >> >> Test results of b701d03ed335286587c4d2539dde715b091d30bd on top of jdk-26+14 look good. Will have a look at the code within the next days. > > @robcasloz Thanks for the patches. I added them. Hi @rwestrel, could you please have a look at the merge conflicts of this PR so that we can progress further with the review of this work? > Hi @rwestrel, could you please have a look at the merge conflicts of this PR so that we can progress further with the review of this work? The conflict is caused by the integration of [JDK-8360031](https://bugs.openjdk.org/browse/JDK-8360031), which relaxes the assertion in https://github.com/openjdk/jdk/blob/430041d366ddf450c2480c81608dde980dfa6d41/src/hotspot/share/opto/memnode.cpp#L4232 which is also touched by this changeset. Is the current assertion in mainline (after JDK-8360031) still valid in the context of this changeset? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3423349747 PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3425213782 From rcastanedalo at openjdk.org Tue Oct 21 13:51:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 21 Oct 2025 13:51:56 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com> <4UN1z9fhxeUqUGagnZVEIFOyDb_mP8WaWUBwWO2HjFA=.93b7c9ad-443c-4fff-810d-7fe805ccbfaa@github.com> Message-ID: On Tue, 21 Oct 2025 07:41:37 GMT, Roberto Casta?eda Lozano wrote: > Is the current assertion in mainline (after JDK-8360031) still valid in the context of this changeset? I did a bit of testing and updating the asserted invariant to `(outcnt() > 0 && outcnt() <= 2) || Opcode() == Op_Initialize` seems to work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3426782209 From dbriemann at openjdk.org Tue Oct 21 14:07:03 2025 From: dbriemann at openjdk.org (David Briemann) Date: Tue, 21 Oct 2025 14:07:03 GMT Subject: RFR: 8369946: Bytecode rewriting causes Java heap corruption on PPC [v2] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 08:57:41 GMT, Martin Doerr wrote: >> Like the aarch64 fix (https://github.com/openjdk/jdk/pull/27748). >> PPC64 has additional requirements: >> - It implements `fast_invokevfinal` which uses `ResolvedMethodEntry`. >> - Speculative loads need to get prevented by memory barrier instructions (even on control dependent paths). >> >> I've refactored `load_field_entry` and `load_method_entry` into a common function and added support for rewritten "fast" Bytecodes. I'm using `isync` instructions because we already have a control dependency (via Bytecode dispatch). >> >> The `isync` instruction is relatively cheap in comparison to other memory barriers, but still introduces some performance loss. SPEC jvm98 with -Xint shows about 5% regression in `compress` sub-benchmark. The other sub-benchmarks are not significantly impacted. However, switching off `RewriteBytecodes` would cause a much higher performance loss. >> >> Note: I had also ported the `verify_field_offset` check and used it in the fastdebug and product build for testing, but couldn't catch any issue. Not included in this PR. I'm not planning to contribute it. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Improve comment. > > Co-authored-by: Richard Reingruber Marked as reviewed by dbriemann (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27867#pullrequestreview-3360992058 From shade at openjdk.org Tue Oct 21 14:14:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 Oct 2025 14:14:18 GMT Subject: RFR: 8358749: Fix input checks in Vector API intrinsics In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 01:19:11 GMT, ExE Boss wrote: > Can thou create that bug report? (I don?t have an OpenJDK account to create it with) https://bugs.openjdk.org/browse/JDK-8370337 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25673#issuecomment-3426888367 From krk at openjdk.org Tue Oct 21 14:56:43 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 21 Oct 2025 14:56:43 GMT Subject: RFR: 8351194: Clean up Hotspot SA after 32-bit x86 removal [v3] In-Reply-To: References: Message-ID: > Remove 32-bit x86 specific code from the HotSpot Serviceability Agent following the removal of 32-bit x86 support. > > - Removed x86-specific implementations and ifdef blocks. > - Renamed files with X86 in the name when they are also used from AMD64, e.g. `X86Frame` ? `AMD64Frame`. > - Cleaned up platform detection logic in `PlatformInfo`. > - Updated documentation references. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - run RotateLeftNode*IdealizationTests on amd64 too - Merge branch 'master' into clean-x86-sa-JDK-8351194 - Merge branch 'master' into clean-x86-sa-JDK-8351194 - Clean up Hotspot SA after 32-bit x86 removal ------------- Changes: https://git.openjdk.org/jdk/pull/27844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27844&range=02 Stats: 2781 lines in 47 files changed: 623 ins; 2110 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/27844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27844/head:pull/27844 PR: https://git.openjdk.org/jdk/pull/27844 From krk at openjdk.org Tue Oct 21 14:56:45 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 21 Oct 2025 14:56:45 GMT Subject: RFR: 8351194: Clean up Hotspot SA after 32-bit x86 removal [v2] In-Reply-To: <6YqJmhccFDBabVIQpN3nNt_Cpa1zNQg3rRvtiTCEJmo=.c3653095-8ae3-4f0d-8c3b-50d4109b4b69@github.com> References: <6YqJmhccFDBabVIQpN3nNt_Cpa1zNQg3rRvtiTCEJmo=.c3653095-8ae3-4f0d-8c3b-50d4109b4b69@github.com> Message-ID: On Fri, 17 Oct 2025 01:11:12 GMT, David Holmes wrote: >> src/jdk.hotspot.agent/doc/clhsdb.html line 35: >> >>> 33: classes print all loaded Java classes with Klass* >>> 34: detach detach SA from current target >>> 35: dis address [ length ] disassemble (amd64) specified number of instructions from given address >> >> Two issues here. The first is I think this was previously incorrect in that SA supports any architecture for which it can find the hsdis library. You can probably just drop the amd64 reference or add "requires hsdis". >> >> The 2nd issue is with amd64 vs x86_64. It seems in SA the two basically have the same meaning, and you see a lot of C code that checks for both. However, the java code seems to always just reference AMD64 (but also works with x86_64). I'm just wondering if this is consistent with the rest of hotspot, or if we should consider a rename to x86_64 instead of amd64. >> >> BTW, at the platform level there are some amd64 vs x86_64 differences. The one I noted is that MacOSX is considered x86_64 and I think linux and windows are amd64. I'm not sure why, but I recently noted a test that had an @requires for `os.arch == "amd64"` and that kept is from running on macosx-x64 until the @requires was expanded to also allow for `os.arch == "x86_64"`. You should take extra care to make sure that these changes work with all the x86_64, including macosx. I see some places in the C code where we check for both amd64 and x86_64 and some where we only check for amd64. Perhaps x86_64 is not used by SA for macosx. > > AMD64 is historical, it should all be changed to x86_64. The only place AMD64 is relevant is in actual AMD processor specific code. Thanks for the reviews. I have updated the html to read "requires hsdis". Regarding checking for `amd64` vs. `x86_64`, I found two cases where one of `x86_64` and `amd64` is checked but not the other: test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeLongIdealizationTests.java 34: * @requires os.arch == "x86_64" | os.arch == "aarch64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*zbb.*") test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeIntIdealizationTests.java 34: * @requires os.arch == "x86_64" | os.arch == "aarch64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*zbb.*") I checked the C++ sources for `RotateLeftNode::Value` and `RotateLeftNode::Ideal`, I couldn't find any platform-specific logic that would justify excluding `amd64`. I have updated both tests to include `amd64` in their `@requires`. Is there a specific `x86_64` vs. `amd64` check in C you would like to point out? For the total annihilation of the `amd64` naming, I have cut an issue at [JDK-8370339](https://bugs.openjdk.org/browse/JDK-8370339). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27844#discussion_r2448649698 From krk at openjdk.org Tue Oct 21 15:19:35 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 21 Oct 2025 15:19:35 GMT Subject: RFR: 8351194: Clean up Hotspot SA after 32-bit x86 removal [v4] In-Reply-To: References: Message-ID: > Remove 32-bit x86 specific code from the HotSpot Serviceability Agent following the removal of 32-bit x86 support. > > - Removed x86-specific implementations and ifdef blocks. > - Renamed files with X86 in the name when they are also used from AMD64, e.g. `X86Frame` ? `AMD64Frame`. > - Cleaned up platform detection logic in `PlatformInfo`. > - Updated documentation references. Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: requires hsdis ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27844/files - new: https://git.openjdk.org/jdk/pull/27844/files/1011b304..f6387b57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27844&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27844&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27844/head:pull/27844 PR: https://git.openjdk.org/jdk/pull/27844 From epeter at openjdk.org Tue Oct 21 15:57:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 15:57:16 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v8] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 12:17:23 GMT, Emanuel Peter wrote: >> When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. >> >> Here, I'm doing the following: >> - Add more documentation, and improve it in other cases. >> - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. >> - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` >> - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` >> - Made some methods private, and added some additional asserts. >> >> I'd be more than happy for even better names, and suggestions how to improve the documentation further :) >> >> Related issues: >> https://github.com/openjdk/jdk/pull/27889 >> https://github.com/openjdk/jdk/pull/15720 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Seems GHA is giving me some failures I did not see in our internal testing: `compiler.lib.ir_framework.flag.FlagVM compiler.gcbarriers.TestShenandoahBarrierExpansion ` # Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/opto/loopnode.hpp:1168), pid=5884, tid=5904 # assert(!has_ctrl(old_node) && old_node->is_CFG() && old_node->in(0) == nullptr) failed: must be dead ctrl (CFG) node This is part of the assert I added. Will have to investigate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27892#issuecomment-3427379718 From epeter at openjdk.org Tue Oct 21 16:34:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 16:34:06 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 11:58:22 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/compile.hpp line 325: >> >>> 323: bool _allow_macro_nodes; // True if we allow creation of macro nodes. >>> 324: >>> 325: bool _major_progress; // Count of something big happening >> >> Perhaps the comment should be updated too? > > That makes sense. A suggestion? Maybe "Whether something big happened"? It is kind of scary that there is basically no documentation on this. But it is quite important actually. The current comment is really not very helpful. My understanding is that the flag is set if the loop-opts data-structures are invalid (or at least there is no guarantee that they are valid). So we need to re-build the loop tree. If we ever set the flag, we don't continue with more loop-opts, but spin back to IGVN, clean the graph, and maybe come back to a new loop-opts round. There may be others who have a better understanding / definition though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2448970079 From epeter at openjdk.org Tue Oct 21 16:39:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 16:39:17 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 08:07:15 GMT, Marc Chevalier wrote: > Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. > > There is one detail, we used to have > > void restore_major_progress(int progress) { _major_progress += progress; } > > > It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. > > It has a weird semantics: > > Progress before | Progress after verification | Progress after restore > ----------------|-----------------------------|----------------------- > 0 | 0 | 0 > 1 | 0 | 1 > 0 | 1 | 1 > 1 | 1 | 2 > > It is rather a or than a restore, and a proper boolean version of that would be > > void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } > > but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural > > void set_major_progress(bool progress) { _major_progress = progress; } > > that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. > > Thanks, > Marc Thanks for working on this. I agree that the old `restore_major_progress` semantics was a little strange, and hard to understand. So good you are trying to simplify. I know that all your testing passed... but if there was a bug we may not notice purely with testing. There could also be performance regressions, in cases where we then don't continue optimizing because we messed up and set `major_progress` where it should not be set. Do you think it could make sense to run some performance testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3427587076 From mhaessig at openjdk.org Tue Oct 21 16:39:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 21 Oct 2025 16:39:17 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Sun, 19 Oct 2025 15:46:06 GMT, Hannes Greule wrote: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. Thank you for working on this, @SirYwell. This seems like a tricky problem. To be honest, the fix seems a bit hacky. Have you explored any alternatives to this method of delaying the optimizations? I kicked off some testing in the meantime that I will report back upon completion. src/hotspot/share/opto/divnode.cpp line 545: > 543: > 544: // Keep this node as-is for now; we want Value() and > 545: // other optimizations checking for this node type to work Suggestion: // Keep this node as-is initially; we want Value() and // other optimizations checking for this node type to work. This is a small nit: I find it confusing to talk about "for now" in a method that is called at almost every stage of the compilation. Perhaps "initially" conveys the intention of first letting other optimizations do their magic first a bit better. ------------- PR Review: https://git.openjdk.org/jdk/pull/27886#pullrequestreview-3361503632 PR Review Comment: https://git.openjdk.org/jdk/pull/27886#discussion_r2448851036 From epeter at openjdk.org Tue Oct 21 16:42:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 16:42:26 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v6] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 12:17:02 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 > - Limiting register biasing to NDD specific demotable instructions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 > - Fix jtreg, one less spill > - Updating as per reivew suggestions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 > - Some refactoring > - 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions Drive-by question. src/hotspot/cpu/x86/x86_64.ad line 440: > 438: switch(mopc) { > 439: default: > 440: return false; What happens if we wrongly `return false` here? Is that a missed optimization or a correctness issue? ------------- PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3361696697 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2448987479 From mchevalier at openjdk.org Tue Oct 21 16:50:36 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 21 Oct 2025 16:50:36 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 16:31:30 GMT, Emanuel Peter wrote: >> That makes sense. A suggestion? Maybe "Whether something big happened"? > > It is kind of scary that there is basically no documentation on this. But it is quite important actually. The current comment is really not very helpful. > > My understanding is that the flag is set if the loop-opts data-structures are invalid (or at least there is no guarantee that they are valid). So we need to re-build the loop tree. > > If we ever set the flag, we don't continue with more loop-opts, but spin back to IGVN, clean the graph, and maybe come back to a new loop-opts round. > > There may be others who have a better understanding / definition though. I don't think it's the right place for this kind of comment. It's quite hidden, far from where it's actually useful to know we need to set or check that. I'd say it should rather be on `PhaseIdealLoop` for instance, or `PhaseIdealLoop::optimize`, something like that, as a part of a more global overview of how things work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2449020728 From jbhateja at openjdk.org Tue Oct 21 17:23:37 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 21 Oct 2025 17:23:37 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v6] In-Reply-To: References: Message-ID: <6t654uTcKhq4aH5kdoDsvn5icLPDGjlv8JZXuzaGBOM=.ecb7a27f-ecf9-4235-bb49-4ca1a5288498@github.com> On Tue, 21 Oct 2025 16:38:07 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 >> - Limiting register biasing to NDD specific demotable instructions >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 >> - Fix jtreg, one less spill >> - Updating as per reivew suggestions >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 >> - Some refactoring >> - 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions > > src/hotspot/cpu/x86/x86_64.ad line 440: > >> 438: switch(mopc) { >> 439: default: >> 440: return false; > > What happens if we wrongly `return false` here? Is that a missed optimization or a correctness issue? It will never result in a correctness issue; we will simply ditch register biasing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2449123448 From epeter at openjdk.org Tue Oct 21 17:32:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 Oct 2025 17:32:08 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 16:47:32 GMT, Marc Chevalier wrote: >> It is kind of scary that there is basically no documentation on this. But it is quite important actually. The current comment is really not very helpful. >> >> My understanding is that the flag is set if the loop-opts data-structures are invalid (or at least there is no guarantee that they are valid). So we need to re-build the loop tree. >> >> If we ever set the flag, we don't continue with more loop-opts, but spin back to IGVN, clean the graph, and maybe come back to a new loop-opts round. >> >> There may be others who have a better understanding / definition though. > > I don't think it's the right place for this kind of comment. It's quite hidden, far from where it's actually useful to know we need to set or check that. I'd say it should rather be on `PhaseIdealLoop` for instance, or `PhaseIdealLoop::optimize`, something like that, as a part of a more global overview of how things work. There should just be some documentation around the `major_progress` family of field/methods. Or at least link from there to where the documentation resides ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2449146152 From kvn at openjdk.org Tue Oct 21 17:35:23 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 Oct 2025 17:35:23 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 08:07:15 GMT, Marc Chevalier wrote: > Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. > > There is one detail, we used to have > > void restore_major_progress(int progress) { _major_progress += progress; } > > > It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. > > It has a weird semantics: > > Progress before | Progress after verification | Progress after restore > ----------------|-----------------------------|----------------------- > 0 | 0 | 0 > 1 | 0 | 1 > 0 | 1 | 1 > 1 | 1 | 2 > > It is rather a or than a restore, and a proper boolean version of that would be > > void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } > > but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural > > void set_major_progress(bool progress) { _major_progress = progress; } > > that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. > > Thanks, > Marc Or you can keep temporary (just for testing this PR and remove it before integration) original logic in debug VM to compare result of `major_progress()`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3428111241 From hgreule at openjdk.org Tue Oct 21 17:40:46 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 21 Oct 2025 17:40:46 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 16:36:36 GMT, Manuel H?ssig wrote: > Thank you for working on this, @SirYwell. This seems like a tricky problem. To be honest, the fix seems a bit hacky. Have you explored any alternatives to this method of delaying the optimizations? > > I kicked off some testing in the meantime that I will report back upon completion. Thanks for running tests. I tried delaying until post loop opts, but that prevents some vectorization and isn't really less hacky I guess. I didn't find any other good existing approach. Calculating Value before Ideal would work, but I assume that it is rarely useful, with Div/Mod being an exception. I a dream world, I guess we would have e-graphs or something similar, which would allow calculating a more precise type from different alternatives. If you can think of a better approach, please let me know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3428176218 From rcastanedalo at openjdk.org Tue Oct 21 17:45:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 21 Oct 2025 17:45:53 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v3] In-Reply-To: References: Message-ID: <-a1m4b9Idb5Zw3-FBgL1mFZ_zehuwGQVydTe4MtSffw=.8deb90df-ea57-4691-8ef0-61d79df050b0@github.com> On Mon, 13 Oct 2025 15:28:01 GMT, Roland Westrelin wrote: >> In the `test1()` method of the test case: >> >> `inlined2()` calls `clone()` for an object loaded from field `field` >> that has inexact type `A` at parse time. The intrinsic for `clone()` >> inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the >> load of `field` is optimized out because it reads back a newly >> allocated `B` written to `field` in the same method. `ArrayCopy` can >> now be optimized because the type of its `src` input is known. The >> type of its `dest` input is the `CheckCastPP` from the allocation of >> the cloned object created at parse time. That one has type `A`. A >> series of `Load`s/`Store`s are created to copy the fields of class `B` >> from `src` (of type `B`) to `dest` of (type `A`). >> >> Writting to `dest` with offsets for fields that don't exist in `A`, >> causes this code in `Compile::flatten_alias_type()`: >> >> >> } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { >> // Static fields are in the space above the normal instance >> // fields in the java.lang.Class instance. >> if (ik != ciEnv::current()->Class_klass()) { >> to = nullptr; >> tj = TypeOopPtr::BOTTOM; >> offset = tj->offset(); >> } >> >> >> to assign it some slice that doesn't match the one that's used at the >> same offset in `B`. >> >> That causes an assert in `ArrayCopyNode::try_clone_instance()` to >> fire. With a release build, execution proceeds. `test1()` also has a >> non escaping allocation. That one causes EA to run and >> `ConnectionGraph::split_unique_types()` to move the store to the non >> escaping allocation to a new slice. In the process, when it iterates >> over `MergeMem` nodes, it notices the stores added by >> `ArrayCopyNode::try_clone_instance()`, finds that some are not on the >> right slice, tries to move them to the correct slice (expecting they >> are from a non escaping EA). That causes some of the `Store`s to be >> disconnected. When the resulting code runs, execution fails as some >> fields are not copied. >> >> The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` >> when `src` and `dest` classes don't match as this seems like a rare >> enough corner case. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8339526 > - review > - Merge branch 'master' into JDK-8339526 > - Update src/hotspot/share/opto/arraycopynode.cpp > > Co-authored-by: Christian Hagedorn > - test & fix Test results look good. Approving, but please consider my suggestions about the test file. Thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27604#pullrequestreview-3361966729 From kvn at openjdk.org Tue Oct 21 17:46:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 Oct 2025 17:46:57 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 15:53:59 GMT, Beno?t Maillard wrote: > This PR prevents hitting an assert caused by encountering `top` while following the memory > slice associated with a field when eliminating allocations in macro node elimination. This situation > is the result of another elimination (boxing node elimination) that happened at the same > macro expansion iteration. > > ### Analysis > > The issue appears in the macro expansion phase. We have a nested `synchronized` block, > with the outer block synchronizing on `new A()` and the inner one on `TestTopInMacroElimination.class`. > In the inner `synchronized` block we have an `Integer.valueOf` call in a loop. > > In `PhaseMacroExpand::eliminate_boxing_node` we are getting rid of the `Integer.valueOf` > call, as it is a non-escaping boxing node. After having eliminated the call, > `PhaseMacroExpand::process_users_of_allocation` takes care of the users of the removed node. > There, we replace usages of the fallthrough memory projection with `top`. > > In the same macro expansion iteration, we later attempt to get rid of the `new A()` allocation > in `PhaseMacroExpand::create_scalarized_object_description`. There, we have to make > sure that all safepoints can still see the object fields as if the allocation was never deleted. > For this, we attempt to find the last value on the slice of each specific field (`a` > in this case). Because field `a` is never written to, and it is not explicitely initialized, > there is no `Store` associated to it and not even a dedicated memory slice (we end up > taking the `Bot` input on `MergeMem` nodes). By going up the memory chain, we eventually > encounter the `MemBarReleaseLock` whose input was set to `top`. This is where the assert > is hit. > > ### Proposed Fix > > In the end I opted for an analog fix as the similar [JDK-8325030](https://git.openjdk.org/jdk/pull/23104). > If we get `top` from `scan_mem_chain` in `PhaseMacroExpand::value_from_mem`, then we can safely > return `top` as well. This means that the safepoint will have `top` as data input, but this will > eventually cleaned up by the next round of IGVN. > > Another (tempting) option would have been to simply return `nullptr` from `PhaseMacroExpand::value_from_mem` when encoutering `top`. However this would result in bailing > out from eliminating this allocation temporarily and effectively delaying it to a subsqequent > macro expansion round. > > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8362832) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! I agree with fix. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27903#pullrequestreview-3361970213 From valeriep at openjdk.org Tue Oct 21 18:22:20 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Tue, 21 Oct 2025 18:22:20 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v13] In-Reply-To: References: Message-ID: <45hkD8RQ3TH-4dvjl_bN9dG0B4BSE3d1wXZAPMxeDSA=.84597563-35d7-43c4-bd7c-cad7da7e9277@github.com> On Tue, 21 Oct 2025 00:05:31 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng Changes look fine, thanks~ ------------- Marked as reviewed by valeriep (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27807#pullrequestreview-3362117409 From vlivanov at openjdk.org Tue Oct 21 21:23:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 21 Oct 2025 21:23:03 GMT Subject: RFR: 8370251: C2: Inlining checks for method handle intrinsics are too strict In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 21:57:10 GMT, Chen Liang wrote: >> src/hotspot/share/opto/doCall.cpp line 245: >> >>> 243: receiver_method = callee->resolve_invoke(jvms->method()->holder(), >>> 244: speculative_receiver_type, >>> 245: check_access); >> >> Can you explain why only here you pass `check_access` and expect it is `true` in all other places? > > Similar question, should we add an assert for check_access before the resolve_invoke in Compile::optimize_inlining? > Can you explain why only here you pass check_access and expect it is true in all other places? @vnkozlov That's the only case which was overlooked in JDK-8062280. All other cases aren't exercised for MH intrinsic methods and the asserts are there to verify that. If they start to fail, it'll signal that there may be a missing optimization opportunity. > should we add an assert for check_access before the resolve_invoke in Compile::optimize_inlining? @liach good question, it makes sense to separately take a closer look at this particular case. My first impression is `check_access` should be passed into ` resolve_invoke` rather than asserting `check_access == true` before `resolve_invoke`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27908#discussion_r2449730433 From dlong at openjdk.org Tue Oct 21 23:22:18 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 21 Oct 2025 23:22:18 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v4] In-Reply-To: References: Message-ID: <4Cj8ndIna8Do2EfcPsFR85vpAsHGPKtwAvrgp2dTkxU=.e996f1ed-c3a7-45be-a723-190db1bbea73@github.com> On Sat, 18 Oct 2025 10:13:42 GMT, Francesco Andreuzzi wrote: >> I am tempted to say yes, for consistency, but it probably won't make much of a difference either way. But now I am wondering, if these cold native wrappers continue to be immortal, then do they really need to give them nmethod entry barriers? Removing the barrier could remove some overhead. Whatever direction we decide to go, it would be good to add a comment here explaining the decision and/or trade-offs. > > Is it actually possible to remove entry barriers for _any_ garbage collectable nmethod? How can we know an nmethod is not used anymore, even when it is made not entrant? `is_cold()` bails out when an nmethod does not support entry barriers: > > // On platforms that don't support nmethod entry barriers, we can't > // trust the temporal aspect of the gc epochs. So we can't detect > // cold nmethods on such platforms. > > So, the decision of removing entry barriers for native nmethods would make the memory leak I'm trying to fix here effectively unfixable? Let me know if I'm missing something. If we mark them as not-entrant, then the is_not_entrant() check below will still catch them, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27742#discussion_r2449951151 From iveresov at openjdk.org Wed Oct 22 01:31:02 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 22 Oct 2025 01:31:02 GMT Subject: RFR: 8368321: Rethink compilation delay strategy for lukewarm methods Message-ID: In the current implementation we delay profiling of lukewarm methods (those that were never compiled by C2 during training) by increasing the 2->3 threshold by a factor. That may shift profiling of those too much into the future if a large factor is used, if we use a small factor, however, profiling may happen within the training run window so to speak. The solution I came up with it to delay profiling until we reach the number of invocations of a method equal to the number we had in the training run. After that we use the normal policy. Here is an example. I trained our JavacBenchApp for 5 iterations (which is artificially low and therefore many methods would be classified as lukewarm). Then I ran it for 200 iterations with AOT replay. old-vs-new While initially the performance is similar it quickly diverges. With the new approach we move to standard handling of lukewarm methods after 5 iterations and they get compiled with C2. With the old approach we don't. ------------- Commit messages: - Delay lukewarmp methods based on training data Changes: https://git.openjdk.org/jdk/pull/27926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27926&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8368321 Stats: 69 lines in 5 files changed: 33 ins; 15 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/27926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27926/head:pull/27926 PR: https://git.openjdk.org/jdk/pull/27926 From iveresov at openjdk.org Wed Oct 22 01:36:39 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 22 Oct 2025 01:36:39 GMT Subject: RFR: 8368321: Rethink compilation delay strategy for lukewarm methods [v2] In-Reply-To: References: Message-ID: > In the current implementation we delay profiling of lukewarm methods (those that were never compiled by C2 during training) by increasing the 2->3 threshold by a factor. That may shift profiling of those too much into the future if a large factor is used, if we use a small factor, however, profiling may happen within the training run window so to speak. The solution I came up with it to delay profiling until we reach the number of invocations of a method equal to the number we had in the training run. After that we use the normal policy. > > Here is an example. I trained our JavacBenchApp for 5 iterations (which is artificially low and therefore many methods would be classified as lukewarm). Then I ran it for 200 iterations with AOT replay. > > old-vs-new > > While initially the performance is similar it quickly diverges. With the new approach we move to standard handling of lukewarm methods after 5 iterations and they get compiled with C2. With the old approach we don't. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Fix zero build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27926/files - new: https://git.openjdk.org/jdk/pull/27926/files/e5d944af..0ecea581 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27926&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27926&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27926/head:pull/27926 PR: https://git.openjdk.org/jdk/pull/27926 From dholmes at openjdk.org Wed Oct 22 02:00:02 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Oct 2025 02:00:02 GMT Subject: RFR: 8351194: Clean up Hotspot SA after 32-bit x86 removal [v2] In-Reply-To: References: <6YqJmhccFDBabVIQpN3nNt_Cpa1zNQg3rRvtiTCEJmo=.c3653095-8ae3-4f0d-8c3b-50d4109b4b69@github.com> Message-ID: <0kOVT0jKO7wshPvx-AbDL0MGtSaCUSGJi6WIHgqrGdM=.c3be17fe-6614-44b4-b185-1f5a2715187b@github.com> On Tue, 21 Oct 2025 14:51:41 GMT, Kerem Kat wrote: > For the total annihilation of the amd64 naming, I have cut an issue at [JDK-8370339](https://bugs.openjdk.org/browse/JDK-8370339). I meant this for the SA code, not the JDK in its entirety. For historical reasons we still define os.arch as "amd64" on Linux and Windows. We need to fix tests that are using the wrong `@requires` values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27844#discussion_r2450181801 From xgong at openjdk.org Wed Oct 22 02:11:06 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 22 Oct 2025 02:11:06 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 12:19:25 GMT, Emanuel Peter wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 392: > >> 390: // Return true if vector mask operation with "opcode" requires the mask to be >> 391: // saved in a predicate register. >> 392: bool Matcher::vector_mask_requires_predicate(int opcode, const TypeVect* vt) { > > What would be the alternative, if it is not in a predicate register? It will be in a vector register like architectures that do not have predicate feature such as Arm NEON and X86 AVX1/2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2450199857 From dholmes at openjdk.org Wed Oct 22 02:22:04 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Oct 2025 02:22:04 GMT Subject: RFR: 8351194: Clean up Hotspot SA after 32-bit x86 removal [v2] In-Reply-To: <0kOVT0jKO7wshPvx-AbDL0MGtSaCUSGJi6WIHgqrGdM=.c3be17fe-6614-44b4-b185-1f5a2715187b@github.com> References: <6YqJmhccFDBabVIQpN3nNt_Cpa1zNQg3rRvtiTCEJmo=.c3653095-8ae3-4f0d-8c3b-50d4109b4b69@github.com> <0kOVT0jKO7wshPvx-AbDL0MGtSaCUSGJi6WIHgqrGdM=.c3be17fe-6614-44b4-b185-1f5a2715187b@github.com> Message-ID: On Wed, 22 Oct 2025 01:57:44 GMT, David Holmes wrote: >> Thanks for the reviews. >> >> I have updated the html to read "requires hsdis". >> >> Regarding checking for `amd64` vs. `x86_64`, I found two cases where one of `x86_64` and `amd64` is checked but not the other: >> >> >> test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeLongIdealizationTests.java >> 34: * @requires os.arch == "x86_64" | os.arch == "aarch64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*zbb.*") >> >> test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeIntIdealizationTests.java >> 34: * @requires os.arch == "x86_64" | os.arch == "aarch64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*zbb.*") >> >> >> I checked the C++ sources for `RotateLeftNode::Value` and `RotateLeftNode::Ideal`, I couldn't find any platform-specific logic that would justify excluding `amd64`. I have updated both tests to include `amd64` in their `@requires`. >> >> Is there a specific `x86_64` vs. `amd64` check in C you would like to point out? >> >> For the total annihilation of the `amd64` naming, I have cut an issue at [JDK-8370339](https://bugs.openjdk.org/browse/JDK-8370339). > >> For the total annihilation of the amd64 naming, I have cut an issue at [JDK-8370339](https://bugs.openjdk.org/browse/JDK-8370339). > > I meant this for the SA code, not the JDK in its entirety. For historical reasons we still define os.arch as "amd64" on Linux and Windows. We need to fix tests that are using the wrong `@requires` values. I filed https://bugs.openjdk.org/browse/JDK-8370378 for 3 compiler tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27844#discussion_r2450222785 From dholmes at openjdk.org Wed Oct 22 02:22:11 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Oct 2025 02:22:11 GMT Subject: RFR: 8351194: Clean up Hotspot SA after 32-bit x86 removal [v4] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 15:19:35 GMT, Kerem Kat wrote: >> Remove 32-bit x86 specific code from the HotSpot Serviceability Agent following the removal of 32-bit x86 support. >> >> - Removed x86-specific implementations and ifdef blocks. >> - Renamed files with X86 in the name when they are also used from AMD64, e.g. `X86Frame` ? `AMD64Frame`. >> - Cleaned up platform detection logic in `PlatformInfo`. >> - Updated documentation references. > > Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: > > requires hsdis test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeIntIdealizationTests.java line 34: > 32: * @library /test/lib / > 33: * @run driver compiler.c2.irTests.RotateLeftNodeIntIdealizationTests > 34: * @requires os.arch == "x86_64" | os.arch == "amd64" | os.arch == "aarch64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*zbb.*") Do not fix this as part of this PR as it is unrelated. test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeLongIdealizationTests.java line 34: > 32: * @library /test/lib / > 33: * @run driver compiler.c2.irTests.RotateLeftNodeLongIdealizationTests > 34: * @requires os.arch == "x86_64" | os.arch == "amd64" | os.arch == "aarch64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*zbb.*") Do not fix this as part of this PR as it is unrelated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27844#discussion_r2450225977 PR Review Comment: https://git.openjdk.org/jdk/pull/27844#discussion_r2450226326 From xgong at openjdk.org Wed Oct 22 04:15:26 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 22 Oct 2025 04:15:26 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: References: Message-ID: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> > The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. > > For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. > > These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. > > This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. > > It also modifies the Vector API jtreg tests for well testing. Here is the details: > > 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: > > VectorMaskToLong (VectorLongToMask l) => l > > Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. > > 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". > > Performance shows significant improvement on NVIDIA's Grace CPU. > > Here is the performance data with `-XX:UseSVE=2`: > > Benchmark bits inputs Mode Unit Before After Gain > MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 > MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 > MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 > MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 > MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/ms 101031... Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Move function comments to matcher.hpp - Merge 'jdk:master' into JDK-8367292 - 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27481/files - new: https://git.openjdk.org/jdk/pull/27481/files/25538369..d3e5b0fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=00-01 Stats: 62499 lines in 1624 files changed: 37115 ins; 16448 del; 8936 mod Patch: https://git.openjdk.org/jdk/pull/27481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27481/head:pull/27481 PR: https://git.openjdk.org/jdk/pull/27481 From xgong at openjdk.org Wed Oct 22 04:15:27 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 22 Oct 2025 04:15:27 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Fri, 10 Oct 2025 03:26:52 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Hi, could anyone please help take a look at this PR? Thanks a lot in advance! > @XiaohongGong Actually, I just tried to submit via my standard script. It failed because of merging issues. Would you mind merging with master, so we are on the newest state? Thanks for looking at this PR @eme64 ! I'v rebased the PR to master and addressed your comments. Please let me know if any other issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3430424797 From iveresov at openjdk.org Wed Oct 22 04:38:03 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 22 Oct 2025 04:38:03 GMT Subject: RFR: 8368321: Rethink compilation delay strategy for lukewarm methods [v2] In-Reply-To: References: Message-ID: <4ZFW6cjoH_qfZzBDcmPuGj624sVjOGI6mG9h_JnDMv0=.3dc70488-4712-4c86-9003-e89dccce0ea7@github.com> On Wed, 22 Oct 2025 01:36:39 GMT, Igor Veresov wrote: >> In the current implementation we delay profiling of lukewarm methods (those that were never compiled by C2 during training) by increasing the 2->3 threshold by a factor. That may shift profiling of those too much into the future if a large factor is used, if we use a small factor, however, profiling may happen within the training run window so to speak. The solution I came up with it to delay profiling until we reach the number of invocations of a method equal to the number we had in the training run. After that we use the normal policy. >> >> Here is an example. I trained our JavacBenchApp for 5 iterations (which is artificially low and therefore many methods would be classified as lukewarm). Then I ran it for 200 iterations with AOT replay. >> >> old-vs-new >> >> While initially the performance is similar it quickly diverges. With the new approach we move to standard handling of lukewarm methods after 5 iterations and they get compiled with C2. With the old approach we don't. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Fix zero build Testing looks ok ------------- PR Comment: https://git.openjdk.org/jdk/pull/27926#issuecomment-3430461715 From epeter at openjdk.org Wed Oct 22 06:30:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 22 Oct 2025 06:30:09 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 04:11:30 GMT, Xiaohong Gong wrote: >> Hi, could anyone please help take a look at this PR? Thanks a lot in advance! > >> @XiaohongGong Actually, I just tried to submit via my standard script. It failed because of merging issues. Would you mind merging with master, so we are on the newest state? > > Thanks for looking at this PR @eme64 ! I'v rebased the PR to master and addressed your comments. Please let me know if any other issues. @XiaohongGong Thanks for merging, running testing now :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3430677411 From epeter at openjdk.org Wed Oct 22 06:33:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 22 Oct 2025 06:33:54 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v9] In-Reply-To: References: Message-ID: > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8370220-get-ctrl-documentation' of https://github.com/eme64/jdk into JDK-8370220-get-ctrl-documentation - fix shenandoah replace for phis ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27892/files - new: https://git.openjdk.org/jdk/pull/27892/files/ac057395..44e808bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27892/head:pull/27892 PR: https://git.openjdk.org/jdk/pull/27892 From epeter at openjdk.org Wed Oct 22 06:49:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 22 Oct 2025 06:49:05 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v9] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 06:33:54 GMT, Emanuel Peter wrote: >> When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. >> >> Here, I'm doing the following: >> - Add more documentation, and improve it in other cases. >> - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. >> - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` >> - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` >> - Made some methods private, and added some additional asserts. >> >> I'd be more than happy for even better names, and suggestions how to improve the documentation further :) >> >> Related issues: >> https://github.com/openjdk/jdk/pull/27889 >> https://github.com/openjdk/jdk/pull/15720 > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8370220-get-ctrl-documentation' of https://github.com/eme64/jdk into JDK-8370220-get-ctrl-documentation > - fix shenandoah replace for phis src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 1780: > 1778: fix_memory_uses(u, n, n, c); > 1779: } else if (_phase->C->get_alias_index(u->adr_type()) == _alias) { > 1780: _phase->igvn().replace_node(u, n); As far as I can see, the `lazy_replace` only did `igvn.replace_node` for non-ctrl nodes anyway. Since we are dealing with `PhiNode`s here, we might as well only use `igvn.replace_node`. I discovered this, because it hit my `!old_node->is_CFG()` check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2450605644 From mchevalier at openjdk.org Wed Oct 22 07:39:20 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 22 Oct 2025 07:39:20 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 17:32:08 GMT, Vladimir Kozlov wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore >> ----------------|-----------------------------|----------------------- >> 0 | 0 | 0 >> 1 | 0 | 1 >> 0 | 1 | 1 >> 1 | 1 | 2 >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Or you can keep temporary (just for testing this PR and remove it before integration) original logic in debug VM to compare result of `major_progress()`. @vnkozlov I fear I don't understand what you're suggesting. I've tried to add in my `set_major_progress(bool)` an assert to check we are not in the 3rd case, the one where the assignment-semantics and the OR-semantics mismatch (that is with `progress` parameter (old progress) unset and current `_major_progress` set). And indeed the assert does not fire in tier1-6+some other internal testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3430850080 From mchevalier at openjdk.org Wed Oct 22 07:46:35 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 22 Oct 2025 07:46:35 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: Message-ID: > Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. > > There is one detail, we used to have > > void restore_major_progress(int progress) { _major_progress += progress; } > > > It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. > > It has a weird semantics: > > Progress before | Progress after verification | Progress after restore > ----------------|-----------------------------|----------------------- > 0 | 0 | 0 > 1 | 0 | 1 > 0 | 1 | 1 > 1 | 1 | 2 > > It is rather a or than a restore, and a proper boolean version of that would be > > void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } > > but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural > > void set_major_progress(bool progress) { _major_progress = progress; } > > that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Adapt the comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27912/files - new: https://git.openjdk.org/jdk/pull/27912/files/324e4312..96a1bc7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27912&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27912&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27912/head:pull/27912 PR: https://git.openjdk.org/jdk/pull/27912 From mchevalier at openjdk.org Wed Oct 22 07:46:36 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 22 Oct 2025 07:46:36 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: Message-ID: <4sEEV2mIkMt9Wslr88hgCJ5Xc1A3JJ2OoPgAq9Z2K7g=.fc11a041-7274-46c1-a14c-28196c51c20b@github.com> On Tue, 21 Oct 2025 17:29:38 GMT, Emanuel Peter wrote: >> I don't think it's the right place for this kind of comment. It's quite hidden, far from where it's actually useful to know we need to set or check that. I'd say it should rather be on `PhaseIdealLoop` for instance, or `PhaseIdealLoop::optimize`, something like that, as a part of a more global overview of how things work. > > There should just be some documentation around the `major_progress` family of field/methods. Or at least link from there to where the documentation resides ;) Probably, but that is out of the scope of this issue. Moreover, I don't know enough of loop optimization to write a helpful and correct comment. It feels like it should just be another RFE done by someone who can do it. The change I'm doing here doesn't actually need to understand how the whole thing works. Also > If we ever set the flag, we don't continue with more loop-opts, but spin back to IGVN, clean the graph, and maybe come back to a new loop-opts round. I don't think it's true. I have the same kind of intuition, but in details, I don't think it holds. We can unroll after peeling (which sets major progress) in the same loop optimization round. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2450730268 From epeter at openjdk.org Wed Oct 22 07:59:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 22 Oct 2025 07:59:33 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v9] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 06:46:21 GMT, Emanuel Peter wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8370220-get-ctrl-documentation' of https://github.com/eme64/jdk into JDK-8370220-get-ctrl-documentation >> - fix shenandoah replace for phis > > src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 1780: > >> 1778: fix_memory_uses(u, n, n, c); >> 1779: } else if (_phase->C->get_alias_index(u->adr_type()) == _alias) { >> 1780: _phase->igvn().replace_node(u, n); > > As far as I can see, the `lazy_replace` only did `igvn.replace_node` for non-ctrl nodes anyway. Since we are dealing with `PhiNode`s here, we might as well only use `igvn.replace_node`. > > I discovered this, because it hit my `!old_node->is_CFG()` check. @rwestrel Do you have an opinion on this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2450776790 From thartmann at openjdk.org Wed Oct 22 08:01:22 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 Oct 2025 08:01:22 GMT Subject: RFR: 8370378: Some compiler tests inadvertently exclude particular platforms Message-ID: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> As @dholmes-ora described in JBS: For historical reasons `os.arch` is `amd64` on x86_64 Linux and Windows, but `x86_64` on macOS. I fixed the tests accordingly. Thanks, Tobias ------------- Commit messages: - JDK-8370378 Changes: https://git.openjdk.org/jdk/pull/27931/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27931&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370378 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27931.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27931/head:pull/27931 PR: https://git.openjdk.org/jdk/pull/27931 From mchevalier at openjdk.org Wed Oct 22 08:12:08 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 22 Oct 2025 08:12:08 GMT Subject: RFR: 8370378: Some compiler tests inadvertently exclude particular platforms In-Reply-To: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> References: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> Message-ID: <-1PZVIgV2EBIKRTlBwTU3cYM-bjtmDfYBJE1cMzoDCg=.edd5c815-b3ce-4c60-aaf4-7bac7b8b1e17@github.com> On Wed, 22 Oct 2025 07:52:13 GMT, Tobias Hartmann wrote: > As @dholmes-ora described in JBS: For historical reasons `os.arch` is `amd64` on x86_64 Linux and Windows, but `x86_64` on macOS. I fixed the tests accordingly. > > Thanks, > Tobias Interesting. And I guess it's not possible to change how os.arch behaves, or have another property more consistent? I found the 3 same files to fix with `grep "requires.*x86_64" . -rIw | grep -v amd64` (and exchanging x86_64 and amd64) in `test/hotspot/jtreg/compiler/`. Looks good. ------------- Marked as reviewed by mchevalier (Committer). PR Review: https://git.openjdk.org/jdk/pull/27931#pullrequestreview-3364197485 From mli at openjdk.org Wed Oct 22 08:15:25 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 22 Oct 2025 08:15:25 GMT Subject: Integrated: 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 09:36:49 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > `verify_xxx` verify_xxx in interp_masm_riscv.hpp should be consistent. > > Thanks! This pull request has now been integrated. Changeset: 27c83c73 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/27c83c730d8b0f87bb51230c35e4fe261c9d2723 Stats: 9 lines in 2 files changed: 0 ins; 7 del; 2 mod 8370225: RISC-V: cleanup verify_xxx in interp_masm_riscv.hpp Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/27894 From chagedorn at openjdk.org Wed Oct 22 08:23:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 22 Oct 2025 08:23:13 GMT Subject: RFR: 8370378: Some compiler tests inadvertently exclude particular platforms In-Reply-To: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> References: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> Message-ID: On Wed, 22 Oct 2025 07:52:13 GMT, Tobias Hartmann wrote: > As @dholmes-ora described in JBS: For historical reasons `os.arch` is `amd64` on x86_64 Linux and Windows, but `x86_64` on macOS. I fixed the tests accordingly. > > Thanks, > Tobias Looks good and trivial. test/hotspot/jtreg/compiler/c2/TestBit.java line 36: > 34: * > 35: * @requires vm.flagless > 36: * @requires os.arch=="aarch64" | os.arch=="amd64" | os.arch=="x86_64" | os.arch == "ppc64" | os.arch == "ppc64le" | os.arch == "riscv64" Suggestion: * @requires os.arch == "aarch64" | os.arch == "amd64" | os.arch=="x86_64" | os.arch == "ppc64" | os.arch == "ppc64le" | os.arch == "riscv64" test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeIntIdealizationTests.java line 34: > 32: * @library /test/lib / > 33: * @run driver compiler.c2.irTests.RotateLeftNodeIntIdealizationTests > 34: * @requires os.arch=="amd64" | os.arch=="x86_64" | os.arch == "aarch64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*zbb.*") For consistency: Suggestion: * @requires os.arch == "amd64" | os.arch == "x86_64" | os.arch == "aarch64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*zbb.*") test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeLongIdealizationTests.java line 34: > 32: * @library /test/lib / > 33: * @run driver compiler.c2.irTests.RotateLeftNodeLongIdealizationTests > 34: * @requires os.arch=="amd64" | os.arch=="x86_64" | os.arch == "aarch64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*zbb.*") Suggestion: * @requires os.arch == "amd64" | os.arch == "x86_64" | os.arch == "aarch64" | (os.arch == "riscv64" & vm.cpu.features ~= ".*zbb.*") ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27931#pullrequestreview-3364238531 PR Review Comment: https://git.openjdk.org/jdk/pull/27931#discussion_r2450862970 PR Review Comment: https://git.openjdk.org/jdk/pull/27931#discussion_r2450861390 PR Review Comment: https://git.openjdk.org/jdk/pull/27931#discussion_r2450861938 From thartmann at openjdk.org Wed Oct 22 08:31:05 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 Oct 2025 08:31:05 GMT Subject: RFR: 8370378: Some compiler tests inadvertently exclude particular platforms [v2] In-Reply-To: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> References: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> Message-ID: > As @dholmes-ora described in JBS: For historical reasons `os.arch` is `amd64` on x86_64 Linux and Windows, but `x86_64` on macOS. I fixed the tests accordingly. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with four additional commits since the last revision: - Update TestBit.java - Update test/hotspot/jtreg/compiler/c2/TestBit.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeLongIdealizationTests.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeIntIdealizationTests.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27931/files - new: https://git.openjdk.org/jdk/pull/27931/files/fc92ee6e..39fcc6bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27931&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27931&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27931.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27931/head:pull/27931 PR: https://git.openjdk.org/jdk/pull/27931 From thartmann at openjdk.org Wed Oct 22 08:31:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 Oct 2025 08:31:06 GMT Subject: RFR: 8370378: Some compiler tests inadvertently exclude particular platforms In-Reply-To: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> References: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> Message-ID: On Wed, 22 Oct 2025 07:52:13 GMT, Tobias Hartmann wrote: > As @dholmes-ora described in JBS: For historical reasons `os.arch` is `amd64` on x86_64 Linux and Windows, but `x86_64` on macOS. I fixed the tests accordingly. > > Thanks, > Tobias Thanks for the reviews! I fixed the consistency issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27931#issuecomment-3431065831 From chagedorn at openjdk.org Wed Oct 22 08:32:43 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 22 Oct 2025 08:32:43 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: <4sEEV2mIkMt9Wslr88hgCJ5Xc1A3JJ2OoPgAq9Z2K7g=.fc11a041-7274-46c1-a14c-28196c51c20b@github.com> References: <4sEEV2mIkMt9Wslr88hgCJ5Xc1A3JJ2OoPgAq9Z2K7g=.fc11a041-7274-46c1-a14c-28196c51c20b@github.com> Message-ID: On Wed, 22 Oct 2025 07:42:51 GMT, Marc Chevalier wrote: >> There should just be some documentation around the `major_progress` family of field/methods. Or at least link from there to where the documentation resides ;) > > Probably, but that is out of the scope of this issue. Moreover, I don't know enough of loop optimization to write a helpful and correct comment. It feels like it should just be another RFE done by someone who can do it. The change I'm doing here doesn't actually need to understand how the whole thing works. > > Also >> If we ever set the flag, we don't continue with more loop-opts, but spin back to IGVN, clean the graph, and maybe come back to a new loop-opts round. > > I don't think it's true. I have the same kind of intuition, but in details, I don't think it holds. We can unroll after peeling (which sets major progress) in the same loop optimization round. > My understanding is that the flag is set if the loop-opts data-structures are invalid (or at least there is no guarantee that they are valid). So we need to re-build the loop tree. Additionally, there are also cases where we think that we could apply more loop-opts even though the structures are still valid: https://github.com/openjdk/jdk/blob/f475eb8ee7c9a3e360b2f1210ed71b629243cd2a/src/hotspot/share/opto/loopnode.cpp#L5113-L5119 One more reason that we should add some documentation about how we use `major_progress`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2450905429 From chagedorn at openjdk.org Wed Oct 22 08:33:40 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 22 Oct 2025 08:33:40 GMT Subject: RFR: 8370378: Some compiler tests inadvertently exclude particular platforms [v2] In-Reply-To: References: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> Message-ID: On Wed, 22 Oct 2025 08:31:05 GMT, Tobias Hartmann wrote: >> As @dholmes-ora described in JBS: For historical reasons `os.arch` is `amd64` on x86_64 Linux and Windows, but `x86_64` on macOS. I fixed the tests accordingly. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with four additional commits since the last revision: > > - Update TestBit.java > - Update test/hotspot/jtreg/compiler/c2/TestBit.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeLongIdealizationTests.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeIntIdealizationTests.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27931#pullrequestreview-3364300907 From thartmann at openjdk.org Wed Oct 22 08:37:46 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 Oct 2025 08:37:46 GMT Subject: RFR: 8370378: Some compiler tests inadvertently exclude particular platforms [v2] In-Reply-To: References: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> Message-ID: On Wed, 22 Oct 2025 08:31:05 GMT, Tobias Hartmann wrote: >> As @dholmes-ora described in JBS: For historical reasons `os.arch` is `amd64` on x86_64 Linux and Windows, but `x86_64` on macOS. I fixed the tests accordingly. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with four additional commits since the last revision: > > - Update TestBit.java > - Update test/hotspot/jtreg/compiler/c2/TestBit.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeLongIdealizationTests.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeIntIdealizationTests.java > > Co-authored-by: Christian Hagedorn Thanks again! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27931#issuecomment-3431097410 From mdoerr at openjdk.org Wed Oct 22 08:37:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 22 Oct 2025 08:37:57 GMT Subject: RFR: 8369946: Bytecode rewriting causes Java heap corruption on PPC [v2] In-Reply-To: References: Message-ID: <4hv2bTxk1-MZS_LAZJTZfqAkrU7muwk7gf5WTdrBBCs=.95c22b84-afb5-4a6f-80be-9ccd1e8f1858@github.com> On Tue, 21 Oct 2025 08:57:41 GMT, Martin Doerr wrote: >> Like the aarch64 fix (https://github.com/openjdk/jdk/pull/27748). >> PPC64 has additional requirements: >> - It implements `fast_invokevfinal` which uses `ResolvedMethodEntry`. >> - Speculative loads need to get prevented by memory barrier instructions (even on control dependent paths). >> >> I've refactored `load_field_entry` and `load_method_entry` into a common function and added support for rewritten "fast" Bytecodes. I'm using `isync` instructions because we already have a control dependency (via Bytecode dispatch). >> >> The `isync` instruction is relatively cheap in comparison to other memory barriers, but still introduces some performance loss. SPEC jvm98 with -Xint shows about 5% regression in `compress` sub-benchmark. The other sub-benchmarks are not significantly impacted. However, switching off `RewriteBytecodes` would cause a much higher performance loss. >> >> Note: I had also ported the `verify_field_offset` check and used it in the fastdebug and product build for testing, but couldn't catch any issue. Not included in this PR. I'm not planning to contribute it. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Improve comment. > > Co-authored-by: Richard Reingruber Thanks for the reviews! I'll start working on backports. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27867#issuecomment-3431092379 From mdoerr at openjdk.org Wed Oct 22 08:37:58 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 22 Oct 2025 08:37:58 GMT Subject: Integrated: 8369946: Bytecode rewriting causes Java heap corruption on PPC In-Reply-To: References: Message-ID: <2gpEG1grUkkdJXoqFY14T6GHYLqAFKFkokfO-6F-EOg=.a6e5984f-c5b9-49f7-8c88-63c2cee9991b@github.com> On Fri, 17 Oct 2025 12:13:52 GMT, Martin Doerr wrote: > Like the aarch64 fix (https://github.com/openjdk/jdk/pull/27748). > PPC64 has additional requirements: > - It implements `fast_invokevfinal` which uses `ResolvedMethodEntry`. > - Speculative loads need to get prevented by memory barrier instructions (even on control dependent paths). > > I've refactored `load_field_entry` and `load_method_entry` into a common function and added support for rewritten "fast" Bytecodes. I'm using `isync` instructions because we already have a control dependency (via Bytecode dispatch). > > The `isync` instruction is relatively cheap in comparison to other memory barriers, but still introduces some performance loss. SPEC jvm98 with -Xint shows about 5% regression in `compress` sub-benchmark. The other sub-benchmarks are not significantly impacted. However, switching off `RewriteBytecodes` would cause a much higher performance loss. > > Note: I had also ported the `verify_field_offset` check and used it in the fastdebug and product build for testing, but couldn't catch any issue. Not included in this PR. I'm not planning to contribute it. This pull request has now been integrated. Changeset: 6bf3581b Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/6bf3581bbacc2ed8f6411d23a5ab332376c53c87 Stats: 36 lines in 3 files changed: 12 ins; 4 del; 20 mod 8369946: Bytecode rewriting causes Java heap corruption on PPC Reviewed-by: rrich, dbriemann ------------- PR: https://git.openjdk.org/jdk/pull/27867 From mchevalier at openjdk.org Wed Oct 22 08:50:35 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 22 Oct 2025 08:50:35 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: <4sEEV2mIkMt9Wslr88hgCJ5Xc1A3JJ2OoPgAq9Z2K7g=.fc11a041-7274-46c1-a14c-28196c51c20b@github.com> Message-ID: On Wed, 22 Oct 2025 08:29:38 GMT, Christian Hagedorn wrote: >> Probably, but that is out of the scope of this issue. Moreover, I don't know enough of loop optimization to write a helpful and correct comment. It feels like it should just be another RFE done by someone who can do it. The change I'm doing here doesn't actually need to understand how the whole thing works. >> >> Also >>> If we ever set the flag, we don't continue with more loop-opts, but spin back to IGVN, clean the graph, and maybe come back to a new loop-opts round. >> >> I don't think it's true. I have the same kind of intuition, but in details, I don't think it holds. We can unroll after peeling (which sets major progress) in the same loop optimization round. > >> My understanding is that the flag is set if the loop-opts data-structures are invalid (or at least there is no guarantee that they are valid). So we need to re-build the loop tree. > > Additionally, there are also cases where we think that we could apply more loop-opts even though the structures are still valid: > https://github.com/openjdk/jdk/blob/f475eb8ee7c9a3e360b2f1210ed71b629243cd2a/src/hotspot/share/opto/loopnode.cpp#L5113-L5119 > > One more reason that we should add some documentation about how we use `major_progress`. I don't think anyone is saying we should not have such comments. I just think it's out of scope here and mainly I just don't know enough to write anything useful and correct. But if somebody more knowledgeable in this gives me a patch that adds such documentation, I can sneak it in this PR. Otherwise, it will be another time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2450978799 From aseoane at openjdk.org Wed Oct 22 09:02:34 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 22 Oct 2025 09:02:34 GMT Subject: RFR: 8367690: C2: Unneeded branch in reduce_phi In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 14:49:02 GMT, Anton Seoane Ampudia wrote: > This PR carries out a minor cleanup found in the Phi reduction code. > > The combination of the branch and assert is redundant as the assert will always trigger. We can either remove the `else if` branch or change the assert for a `ShouldNotReachHere`. > > I have gone for the former option here. > > **Testing:** passes tiers 1-3 Thanks all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27849#issuecomment-3431208509 From duke at openjdk.org Wed Oct 22 09:02:35 2025 From: duke at openjdk.org (duke) Date: Wed, 22 Oct 2025 09:02:35 GMT Subject: RFR: 8367690: C2: Unneeded branch in reduce_phi In-Reply-To: References: Message-ID: <8tR6dDphi_waljWnGXS_faO9w9I2RCXdzgshck9kIXc=.bcae27e6-2b28-47f2-ad28-52f17b39054a@github.com> On Thu, 16 Oct 2025 14:49:02 GMT, Anton Seoane Ampudia wrote: > This PR carries out a minor cleanup found in the Phi reduction code. > > The combination of the branch and assert is redundant as the assert will always trigger. We can either remove the `else if` branch or change the assert for a `ShouldNotReachHere`. > > I have gone for the former option here. > > **Testing:** passes tiers 1-3 @anton-seoane Your change (at version c9624d03f79c7236226a4030bfc144c860dd60c4) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27849#issuecomment-3431212119 From aph at openjdk.org Wed Oct 22 09:11:45 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 22 Oct 2025 09:11:45 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v11] In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 10:05:21 GMT, Anjian Wen wrote: > * I have not found a suitable overflow check in riscv RVV, so I use pre check to avoid overflow, here we may discuss is there a more suitable way. Sure, OK. > * Why we should make the counter increment same time? Why is this necessary? Because we don't want to leak any information about the internal state of the cipher to an observer. You must assume that an observer can precisely measure execution time, power consumption, and so on. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2451088474 From aseoane at openjdk.org Wed Oct 22 09:11:59 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 22 Oct 2025 09:11:59 GMT Subject: Integrated: 8367690: C2: Unneeded branch in reduce_phi In-Reply-To: References: Message-ID: <9yp18lOU0ijJrhzsvKNXYRTMBzFY-hzuevUH3pYHtlw=.04472bf5-c160-48de-a857-a014b50f1e9d@github.com> On Thu, 16 Oct 2025 14:49:02 GMT, Anton Seoane Ampudia wrote: > This PR carries out a minor cleanup found in the Phi reduction code. > > The combination of the branch and assert is redundant as the assert will always trigger. We can either remove the `else if` branch or change the assert for a `ShouldNotReachHere`. > > I have gone for the former option here. > > **Testing:** passes tiers 1-3 This pull request has now been integrated. Changeset: bdfd5e84 Author: Anton Seoane Ampudia Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/bdfd5e843a7d3db50edf4375e50449b0ce528f8a Stats: 3 lines in 1 file changed: 1 ins; 2 del; 0 mod 8367690: C2: Unneeded branch in reduce_phi Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/27849 From chagedorn at openjdk.org Wed Oct 22 09:16:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 22 Oct 2025 09:16:05 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v5] In-Reply-To: References: Message-ID: <5EEpV0giAyitcB6fESy9zPvowHhjKuyYh3BKbvQ63-A=.cf4ec845-e0f0-46b8-bf04-d192fac13d8d@github.com> On Tue, 21 Oct 2025 07:46:31 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Check before Thanks for the update! Two more suggestions but then I think it looks good from my side. src/hotspot/share/opto/loopnode.cpp line 4771: > 4769: // See PhaseIdealLoop::do_unroll > 4770: // This property is desirable, but it maybe not hold after cloning a loop. > 4771: // In such a case, we bailout unrolling, and rely on IGVN to cleanup stuff. Suggestion: // In such a case, we bail out from unrolling, and rely on IGVN to cleanup stuff. src/hotspot/share/opto/loopnode.cpp line 4779: > 4777: if (!head->is_pre_loop() && !head->is_post_loop()) { > 4778: assert(opaque->outcnt() == 1 && opaque->in(1) == head->limit(), "IGVN should have cleaned that up!"); > 4779: } I think we can just check `is_main_loop()` here. `is_canonical_loop_entry()` will bail out if we see anything else than main or post loops: https://github.com/openjdk/jdk/blob/bdfd5e843a7d3db50edf4375e50449b0ce528f8a/src/hotspot/share/opto/loopnode.cpp#L6330-L6333 Suggestion: if (head->is_main_loop()) { assert(opaque->outcnt() == 1 && opaque->in(1) == head->limit(), "IGVN should have cleaned that up!"); } ------------- PR Review: https://git.openjdk.org/jdk/pull/27586#pullrequestreview-3364466165 PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2451038240 PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2451097797 From chagedorn at openjdk.org Wed Oct 22 09:16:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 22 Oct 2025 09:16:07 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v5] In-Reply-To: <5EEpV0giAyitcB6fESy9zPvowHhjKuyYh3BKbvQ63-A=.cf4ec845-e0f0-46b8-bf04-d192fac13d8d@github.com> References: <5EEpV0giAyitcB6fESy9zPvowHhjKuyYh3BKbvQ63-A=.cf4ec845-e0f0-46b8-bf04-d192fac13d8d@github.com> Message-ID: On Wed, 22 Oct 2025 09:10:40 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Check before > > src/hotspot/share/opto/loopnode.cpp line 4779: > >> 4777: if (!head->is_pre_loop() && !head->is_post_loop()) { >> 4778: assert(opaque->outcnt() == 1 && opaque->in(1) == head->limit(), "IGVN should have cleaned that up!"); >> 4779: } > > I think we can just check `is_main_loop()` here. `is_canonical_loop_entry()` will bail out if we see anything else than main or post loops: > > https://github.com/openjdk/jdk/blob/bdfd5e843a7d3db50edf4375e50449b0ce528f8a/src/hotspot/share/opto/loopnode.cpp#L6330-L6333 > > > Suggestion: > > if (head->is_main_loop()) { > assert(opaque->outcnt() == 1 && opaque->in(1) == head->limit(), "IGVN should have cleaned that up!"); > } Maybe you can even go a step further and also check that for the post loop, `outcnt()` is also 1 which should hold as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2451106340 From amitkumar at openjdk.org Wed Oct 22 09:57:32 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 22 Oct 2025 09:57:32 GMT Subject: RFR: 8370389: JavaFrameAnchor on s390 has unnecessary barriers Message-ID: No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler. ------------- Commit messages: - remove barriers Changes: https://git.openjdk.org/jdk/pull/27930/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27930&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370389 Stats: 19 lines in 1 file changed: 5 ins; 11 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27930.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27930/head:pull/27930 PR: https://git.openjdk.org/jdk/pull/27930 From lucy at openjdk.org Wed Oct 22 10:39:01 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 22 Oct 2025 10:39:01 GMT Subject: RFR: 8370389: JavaFrameAnchor on s390 has unnecessary barriers In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 07:21:45 GMT, Amit Kumar wrote: > No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler. LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27930#pullrequestreview-3365112084 From chagedorn at openjdk.org Wed Oct 22 10:39:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 22 Oct 2025 10:39:06 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v2] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: <5qtZxVebyVn6WML3Q4508dXPwxkw-CWhD_pE6UaNfF8=.76830409-b57d-410f-a30b-c7d01b62df7f@github.com> On Wed, 1 Oct 2025 12:28:38 GMT, Saranya Natarajan wrote: >> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). >> >> ### Fix >> Implemented the suggested refactoring. >> >> ### Testing >> Github Actions, Tier 1-3 > > Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: > > - fixing test failure > - addressing review comments Thanks for the update, that looks much better! Some more small follow-up comments. src/hotspot/share/opto/idealGraphPrinter.cpp line 1114: > 1112: } > 1113: > 1114: void PrintProperties::print_node_properties(Node* node, Compile* C){ Suggestion: void PrintProperties::print_node_properties(Node* node, Compile* C) { src/hotspot/share/opto/idealGraphPrinter.cpp line 1180: > 1178: if (flag) { > 1179: _printer->print_prop(name, val); > 1180: } We should not use implicit conversion of ints, same below: Suggestion: if (flag != 0) { _printer->print_prop(name, IdealGraphPrinter::TRUE_VALUE); } } void PrintProperties::print_property(int flag, const char* name, const char* val) { if (flag != 0) { _printer->print_prop(name, val); } } void PrintProperties::print_property(int flag, const char* name, int val) { if (flag != 0) { _printer->print_prop(name, val); } src/hotspot/share/opto/idealGraphPrinter.hpp line 34: > 32: #include "utilities/xmlstream.hpp" > 33: > 34: Suggestion: src/hotspot/share/opto/idealGraphPrinter.hpp line 172: > 170: }; > 171: > 172: class PrintProperties Do you really need it in the header file? You could also just move it the the source file directly where we use the class. src/hotspot/share/opto/idealGraphPrinter.hpp line 175: > 173: { > 174: private: > 175: IdealGraphPrinter *_printer; For new code, we should put the the `*` at the type. ------------- PR Review: https://git.openjdk.org/jdk/pull/26902#pullrequestreview-3364567310 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2451131733 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2451184881 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2451116971 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2451120931 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2451558310 From thartmann at openjdk.org Wed Oct 22 10:39:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 Oct 2025 10:39:10 GMT Subject: Integrated: 8370378: Some compiler tests inadvertently exclude particular platforms In-Reply-To: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> References: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> Message-ID: <5lJDxPehEV5vwT4koj0RA4qOxaijeqgGXmh4a9vPNDI=.01f02c3b-9441-4274-9702-398203b4f2b1@github.com> On Wed, 22 Oct 2025 07:52:13 GMT, Tobias Hartmann wrote: > As @dholmes-ora described in JBS: For historical reasons `os.arch` is `amd64` on x86_64 Linux and Windows, but `x86_64` on macOS. I fixed the tests accordingly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 60104575 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/60104575b221eb3d78a4d56839d55953d4036c21 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod 8370378: Some compiler tests inadvertently exclude particular platforms Reviewed-by: chagedorn, mchevalier ------------- PR: https://git.openjdk.org/jdk/pull/27931 From chagedorn at openjdk.org Wed Oct 22 12:07:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 22 Oct 2025 12:07:50 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: <4sEEV2mIkMt9Wslr88hgCJ5Xc1A3JJ2OoPgAq9Z2K7g=.fc11a041-7274-46c1-a14c-28196c51c20b@github.com> Message-ID: On Wed, 22 Oct 2025 08:48:14 GMT, Marc Chevalier wrote: >>> My understanding is that the flag is set if the loop-opts data-structures are invalid (or at least there is no guarantee that they are valid). So we need to re-build the loop tree. >> >> Additionally, there are also cases where we think that we could apply more loop-opts even though the structures are still valid: >> https://github.com/openjdk/jdk/blob/f475eb8ee7c9a3e360b2f1210ed71b629243cd2a/src/hotspot/share/opto/loopnode.cpp#L5113-L5119 >> >> One more reason that we should add some documentation about how we use `major_progress`. > > I don't think anyone is saying we should not have such comments. I just think it's out of scope here and mainly I just don't know enough to write anything useful and correct. But if somebody more knowledgeable in this gives me a patch that adds such documentation, I can sneak it in this PR. Otherwise, it will be another time. Absolutely, it should not hold up this PR. It was more meant to be an endorsement that we should definitely add some documentation some point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2451859576 From mdoerr at openjdk.org Wed Oct 22 14:12:22 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 22 Oct 2025 14:12:22 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v4] In-Reply-To: References: Message-ID: <9aDXEH6BtLRbcP5JPESx_swLFymWz_0ZyRk8k-RAXsQ=.5864f3e6-0788-4335-8a7f-5524196abd9d@github.com> On Tue, 7 Oct 2025 17:45:38 GMT, Martin Doerr wrote: >> We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. >> >> We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. >> >> I've tested this proposal by the following code on x86_64: >> >> diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> index a6b4efbe4f2..d715e69c850 100644 >> --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp >> +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { >> void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { >> prepare_to_jump_from_interpreted(); >> >> + if (UseNewCode) { >> + Label ok; >> + movptr(temp, Address(method, Method::from_interpreted_offset())); >> + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); >> + je(ok); >> + movptr(rax, Address(method, Method::from_compiled_offset())); >> + movptr(rbx, rax); >> + addptr(rbx, 128); >> + hlt(); >> + bind(ok); >> + } >> + >> if (JvmtiExport::can_post_interpreter_events()) { >> Label run_compiled_code; >> // JVMTI events, such as single-stepping, are implemented partly by avoiding running >> >> >> The output is (requires hsdis library, otherwise we only get the hex dump): >> >> RAX=0x00007fa3e072c100 is at entry_point+0 in (nmethod*)0x00007fa3e072c008 >> Compiled method (c1) 2521 1 3 java.lang.Byte::toUnsignedInt (6 bytes) >> total in heap [0x00007fa3e072c008,0x00007fa3e072c1f8] = 496 >> main code [0x00007fa3e072c100,0x00007fa3e072c1b8] = 184 >> stub code [0x00007fa3e072c1b8,0x00007fa3e072c1f8] = 64 >> mutable data [0x00007fa37c0160a0,0x00007fa37c0160d0] = 48 >> relocation [0x00007fa37c0160a0,0x00007fa37c0160c8] = 40 >> metadata [0x00007fa37c0160c8,0x00007fa37c0160d0] = 8 >> immutable data [0x00007fa37c015cc0,0x00007fa37c015d24] = 100 >> dependencies [0x00007fa37c015cc0,0x00007fa37c015... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use frame_complete_offset for better start address computation. Improve comments. Updated example output above. We could potentially dump more code if `ExtensiveErrorReports` is enabled, but I'd like that topic open for future enhancements and go ahead with this initial proposal. It has already proven helpful to analyze a bug since we are testing it together with other changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27530#issuecomment-3432547388 From fandreuzzi at openjdk.org Wed Oct 22 14:17:44 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Wed, 22 Oct 2025 14:17:44 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v4] In-Reply-To: <4Cj8ndIna8Do2EfcPsFR85vpAsHGPKtwAvrgp2dTkxU=.e996f1ed-c3a7-45be-a723-190db1bbea73@github.com> References: <4Cj8ndIna8Do2EfcPsFR85vpAsHGPKtwAvrgp2dTkxU=.e996f1ed-c3a7-45be-a723-190db1bbea73@github.com> Message-ID: On Tue, 21 Oct 2025 23:19:48 GMT, Dean Long wrote: >> Is it actually possible to remove entry barriers for _any_ garbage collectable nmethod? How can we know an nmethod is not used anymore, even when it is made not entrant? `is_cold()` bails out when an nmethod does not support entry barriers: >> >> // On platforms that don't support nmethod entry barriers, we can't >> // trust the temporal aspect of the gc epochs. So we can't detect >> // cold nmethods on such platforms. >> >> So, the decision of removing entry barriers for native nmethods would make the memory leak I'm trying to fix here effectively unfixable? Let me know if I'm missing something. > > If we mark them as not-entrant, then the is_not_entrant() check below will still catch them, right? I see, I assumed an nmethod couldn't be marked as on-stack without entry barriers, but that doesn't seem to be the case. But on second thought, do you agree with the fix I'm proposing in this PR? I think the following two work items could be implemented and reviewed in two separate chunks: - allow not-entrant nmethod to be collected during GC - review and possibly remove entry barriers for native wrappers Would you agree with this @dean-long ? I could create another ticket to handle the second part. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27742#discussion_r2452257035 From mbaesken at openjdk.org Wed Oct 22 14:19:36 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 22 Oct 2025 14:19:36 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v4] In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 17:45:38 GMT, Martin Doerr wrote: >> We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. >> >> We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. >> >> I've tested this proposal by the following code on x86_64: >> >> diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> index a6b4efbe4f2..d715e69c850 100644 >> --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp >> +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { >> void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { >> prepare_to_jump_from_interpreted(); >> >> + if (UseNewCode) { >> + Label ok; >> + movptr(temp, Address(method, Method::from_interpreted_offset())); >> + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); >> + je(ok); >> + movptr(rax, Address(method, Method::from_compiled_offset())); >> + movptr(rbx, rax); >> + addptr(rbx, 128); >> + hlt(); >> + bind(ok); >> + } >> + >> if (JvmtiExport::can_post_interpreter_events()) { >> Label run_compiled_code; >> // JVMTI events, such as single-stepping, are implemented partly by avoiding running >> >> >> The output is (requires hsdis library, otherwise we only get the hex dump): >> >> RAX=0x00007fa3e072c100 is at entry_point+0 in (nmethod*)0x00007fa3e072c008 >> Compiled method (c1) 2521 1 3 java.lang.Byte::toUnsignedInt (6 bytes) >> total in heap [0x00007fa3e072c008,0x00007fa3e072c1f8] = 496 >> main code [0x00007fa3e072c100,0x00007fa3e072c1b8] = 184 >> stub code [0x00007fa3e072c1b8,0x00007fa3e072c1f8] = 64 >> mutable data [0x00007fa37c0160a0,0x00007fa37c0160d0] = 48 >> relocation [0x00007fa37c0160a0,0x00007fa37c0160c8] = 40 >> metadata [0x00007fa37c0160c8,0x00007fa37c0160d0] = 8 >> immutable data [0x00007fa37c015cc0,0x00007fa37c015d24] = 100 >> dependencies [0x00007fa37c015cc0,0x00007fa37c015... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use frame_complete_offset for better start address computation. Improve comments. Could the new 'nmethod::print_code_snippet' coding crash under some bad circumstances ? If so, maybe we might need a way to disable it easily. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27530#issuecomment-3432590628 From mdoerr at openjdk.org Wed Oct 22 14:26:06 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 22 Oct 2025 14:26:06 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v4] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 14:16:19 GMT, Matthias Baesken wrote: > Could the new 'nmethod::print_code_snippet' coding crash under some bad circumstances ? If so, maybe we might need a way to disable it easily. Unlikely, and if it does, we still have the raw values and `REATTEMPT_STEP_IF` in `VMError::report`. So, it doesn't look more dangerous to me than other things we are doing, there. Do you have any specific concern? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27530#issuecomment-3432627188 From mchevalier at openjdk.org Wed Oct 22 14:53:33 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 22 Oct 2025 14:53:33 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v6] In-Reply-To: References: Message-ID: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27586/files - new: https://git.openjdk.org/jdk/pull/27586/files/72136811..8f684b16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=04-05 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27586/head:pull/27586 PR: https://git.openjdk.org/jdk/pull/27586 From mchevalier at openjdk.org Wed Oct 22 14:53:36 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 22 Oct 2025 14:53:36 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v5] In-Reply-To: References: <5EEpV0giAyitcB6fESy9zPvowHhjKuyYh3BKbvQ63-A=.cf4ec845-e0f0-46b8-bf04-d192fac13d8d@github.com> Message-ID: On Wed, 22 Oct 2025 09:12:35 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/loopnode.cpp line 4779: >> >>> 4777: if (!head->is_pre_loop() && !head->is_post_loop()) { >>> 4778: assert(opaque->outcnt() == 1 && opaque->in(1) == head->limit(), "IGVN should have cleaned that up!"); >>> 4779: } >> >> I think we can just check `is_main_loop()` here. `is_canonical_loop_entry()` will bail out if we see anything else than main or post loops: >> >> https://github.com/openjdk/jdk/blob/bdfd5e843a7d3db50edf4375e50449b0ce528f8a/src/hotspot/share/opto/loopnode.cpp#L6330-L6333 >> >> >> Suggestion: >> >> if (head->is_main_loop()) { >> assert(opaque->outcnt() == 1 && opaque->in(1) == head->limit(), "IGVN should have cleaned that up!"); >> } > > Maybe you can even go a step further and also check that for the post loop, `outcnt()` is also 1 which should hold as well. Changed and added. Is that what you had in mind? Testing seems happy, at least. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2452376350 From mhaessig at openjdk.org Wed Oct 22 15:21:12 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 22 Oct 2025 15:21:12 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Sun, 19 Oct 2025 15:46:06 GMT, Hannes Greule wrote: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. Thinking about it a bit more, I think your fix is too superficial. If the discovery of the constant is slightly delayed, nothing is folded again. Consider the followig program for an example: class Test { static boolean test(int x, boolean flag) { Integer a; if (flag) { a = 171384; } else { a = 2902; } return x % a >= a; } public static void main(String[] args) { for (int i = 0; i < 20000; i++) { if (test(i, false)) { throw new RuntimeException("wrong result"); } } } } In my opinion, the benefits do not outweigh the drawbacks for this PR. A better solution would probably be to delay the expansion of the Mod and Div nodes to post-loop optimizations and extend Superword to expand Div/Mod nodes to shifts. However, this is quite a bit of complexity, which raises if this complexity is worth it (@eme64 probably has opinions and/or guidance on this). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3432972151 From epeter at openjdk.org Wed Oct 22 15:24:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 22 Oct 2025 15:24:23 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: <4sEEV2mIkMt9Wslr88hgCJ5Xc1A3JJ2OoPgAq9Z2K7g=.fc11a041-7274-46c1-a14c-28196c51c20b@github.com> Message-ID: <8QQ8kda3XfiK160D_coGZKakItbYUuUzqCUgTMnYxOo=.fe946849-b071-4615-80ba-3ecc2d966fc0@github.com> On Wed, 22 Oct 2025 12:04:50 GMT, Christian Hagedorn wrote: >> I don't think anyone is saying we should not have such comments. I just think it's out of scope here and mainly I just don't know enough to write anything useful and correct. But if somebody more knowledgeable in this gives me a patch that adds such documentation, I can sneak it in this PR. Otherwise, it will be another time. > > Absolutely, it should not hold up this PR. It was more meant to be an endorsement that we should definitely add some documentation some point. Yes, I'd say just file an RFE, and link to the conversation here. Feel free to assign it to me if you don't want to own it ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2452470205 From epeter at openjdk.org Wed Oct 22 15:24:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 22 Oct 2025 15:24:24 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: <8QQ8kda3XfiK160D_coGZKakItbYUuUzqCUgTMnYxOo=.fe946849-b071-4615-80ba-3ecc2d966fc0@github.com> References: <4sEEV2mIkMt9Wslr88hgCJ5Xc1A3JJ2OoPgAq9Z2K7g=.fc11a041-7274-46c1-a14c-28196c51c20b@github.com> <8QQ8kda3XfiK160D_coGZKakItbYUuUzqCU gTMnYxOo=.fe946849-b071-4615-80ba-3ecc2d966fc0@github.com> Message-ID: <3AoE-jqtzheJc6Dwor2Eg5i4rysiiy25jLBnYc766Yk=.3c9ce052-ad01-4fd6-b343-844fa8b930f2@github.com> On Wed, 22 Oct 2025 15:19:48 GMT, Emanuel Peter wrote: >> Absolutely, it should not hold up this PR. It was more meant to be an endorsement that we should definitely add some documentation some point. > > Yes, I'd say just file an RFE, and link to the conversation here. Feel free to assign it to me if you don't want to own it ;) Let's ask @vnkozlov , @TobiHartmann and @rwestrel , do any of you have a good definition for the `major_progress` concept? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2452474828 From chagedorn at openjdk.org Wed Oct 22 15:25:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 22 Oct 2025 15:25:26 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v6] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 14:53:33 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/opto/loopnode.cpp line 4781: > 4779: } > 4780: if (head->is_post_loop()) { > 4781: assert(opaque->outcnt() == 1, "IGVN should have cleaned that up!"); I think it should not happen that they have more than one output, so the assert message is a bit misleading. Maybe change it to "opaque node should not be shared" or something like this. You could optionally also extend the assert message above. But otherwise, it looks good, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2452478826 From mchevalier at openjdk.org Wed Oct 22 15:35:08 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 22 Oct 2025 15:35:08 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v7] In-Reply-To: References: Message-ID: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27586/files - new: https://git.openjdk.org/jdk/pull/27586/files/8f684b16..f6112af2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=05-06 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27586/head:pull/27586 PR: https://git.openjdk.org/jdk/pull/27586 From mchevalier at openjdk.org Wed Oct 22 15:35:10 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 22 Oct 2025 15:35:10 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v6] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 15:22:26 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/loopnode.cpp line 4781: > >> 4779: } >> 4780: if (head->is_post_loop()) { >> 4781: assert(opaque->outcnt() == 1, "IGVN should have cleaned that up!"); > > I think it should not happen that they have more than one output, so the assert message is a bit misleading. Maybe change it to "opaque node should not be shared" or something like this. You could optionally also extend the assert message above. But otherwise, it looks good, thanks! I've split the assert to give more meaningful messages. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2452501386 From rriggs at openjdk.org Wed Oct 22 16:28:48 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 22 Oct 2025 16:28:48 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v8] In-Reply-To: References: Message-ID: On Fri, 19 Sep 2025 13:10:33 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Separate design doc > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - More review updates > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Move intrinsic to be a subsection; just one most common function of the annotation > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java > > Co-authored-by: Raffaello Giulietti > - Shorter first sentence > - Updates, thanks to John > - ... and 2 more: https://git.openjdk.org/jdk/compare/d1c63fdb...e4afa49d Changes requested: Stick to /**. .... **/ javadoc markup. And remove the characters from intrinsics.md file. ------------- Changes requested by rriggs (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24777#pullrequestreview-3366577067 From kvn at openjdk.org Wed Oct 22 16:36:21 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 Oct 2025 16:36:21 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 07:36:35 GMT, Marc Chevalier wrote: >> Or you can keep temporary (just for testing this PR and remove it before integration) original logic in debug VM to compare result of `major_progress()`. > > @vnkozlov I fear I don't understand what you're suggesting. > > I've tried to add in my `set_major_progress(bool)` an assert to check we are not in the 3rd case, the one where the assignment-semantics and the OR-semantics mismatch (that is with `progress` parameter (old progress) unset and current `_major_progress` set). And indeed the assert does not fire in tier1-6+some other internal testing. @marc-chevalier Here is what I am proposing to check if functionality is preserved and answer @eme64 concern. 1. make sure `_major_progress` accessed/updated through accessors methods. 2. add "new" field `_old_major_progress` 3. Restore old accessors methods but rename them with prefix `old_` and use them to update/access `_old_major_progress` 4. In new `major_progress()` add `assert((_old_major_progress > 0) == _major_progress, "should match")`. You can print values if they are not matching. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3433257601 From aph at openjdk.org Wed Oct 22 16:52:35 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 22 Oct 2025 16:52:35 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v4] In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 17:45:38 GMT, Martin Doerr wrote: >> We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. >> >> We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. >> >> I've tested this proposal by the following code on x86_64: >> >> diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> index a6b4efbe4f2..d715e69c850 100644 >> --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp >> +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { >> void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { >> prepare_to_jump_from_interpreted(); >> >> + if (UseNewCode) { >> + Label ok; >> + movptr(temp, Address(method, Method::from_interpreted_offset())); >> + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); >> + je(ok); >> + movptr(rax, Address(method, Method::from_compiled_offset())); >> + movptr(rbx, rax); >> + addptr(rbx, 128); >> + hlt(); >> + bind(ok); >> + } >> + >> if (JvmtiExport::can_post_interpreter_events()) { >> Label run_compiled_code; >> // JVMTI events, such as single-stepping, are implemented partly by avoiding running >> >> >> The output is (requires hsdis library, otherwise we only get the hex dump): >> >> RAX=0x00007fa3e072c100 is at entry_point+0 in (nmethod*)0x00007fa3e072c008 >> Compiled method (c1) 2521 1 3 java.lang.Byte::toUnsignedInt (6 bytes) >> total in heap [0x00007fa3e072c008,0x00007fa3e072c1f8] = 496 >> main code [0x00007fa3e072c100,0x00007fa3e072c1b8] = 184 >> stub code [0x00007fa3e072c1b8,0x00007fa3e072c1f8] = 64 >> mutable data [0x00007fa37c0160a0,0x00007fa37c0160d0] = 48 >> relocation [0x00007fa37c0160a0,0x00007fa37c0160c8] = 40 >> metadata [0x00007fa37c0160c8,0x00007fa37c0160d0] = 8 >> immutable data [0x00007fa37c015cc0,0x00007fa37c015d24] = 100 >> dependencies [0x00007fa37c015cc0,0x00007fa37c015... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use frame_complete_offset for better start address computation. Improve comments. OK. This looks useful. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27530#pullrequestreview-3366658702 From kvn at openjdk.org Wed Oct 22 17:05:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 Oct 2025 17:05:00 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 16:33:42 GMT, Vladimir Kozlov wrote: >> @vnkozlov I fear I don't understand what you're suggesting. >> >> I've tried to add in my `set_major_progress(bool)` an assert to check we are not in the 3rd case, the one where the assignment-semantics and the OR-semantics mismatch (that is with `progress` parameter (old progress) unset and current `_major_progress` set). And indeed the assert does not fire in tier1-6+some other internal testing. > > @marc-chevalier > > Here is what I am proposing to check if functionality is preserved and answer @eme64 concern. > > 1. make sure `_major_progress` accessed/updated through accessors methods. > 2. add "new" field `_old_major_progress` > 3. Restore old accessors methods but rename them with prefix `old_` and use them to update/access `_old_major_progress` > 4. In new `major_progress()` add `assert((_old_major_progress > 0) == _major_progress, "should match")`. You can print values if they are not matching. > Let's ask @vnkozlov , @TobiHartmann and @rwestrel , do any of you have a good definition for the major_progress concept? My understanding of major_progress is to mark major change to graph which may invalidate built loop tree and dominators information and requires exit current round of loop optimization (build_and_optimize()) and run IGVN to clean graph before starting next round of loop optimizations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3433348223 From kvn at openjdk.org Wed Oct 22 17:12:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 Oct 2025 17:12:51 GMT Subject: RFR: 8370378: Some compiler tests inadvertently exclude particular platforms [v2] In-Reply-To: References: <5Z5QuyoHOd7NpXC1RxeH67JDR6kXfCU7I95obEArrSE=.41cf60aa-08a0-422b-948d-9f5391802e5e@github.com> Message-ID: On Wed, 22 Oct 2025 08:31:05 GMT, Tobias Hartmann wrote: >> As @dholmes-ora described in JBS: For historical reasons `os.arch` is `amd64` on x86_64 Linux and Windows, but `x86_64` on macOS. I fixed the tests accordingly. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with four additional commits since the last revision: > > - Update TestBit.java > - Update test/hotspot/jtreg/compiler/c2/TestBit.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeLongIdealizationTests.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/RotateLeftNodeIntIdealizationTests.java > > Co-authored-by: Christian Hagedorn You could use `os.simpleArch == "x64"` in such cases: https://github.com/openjdk/jdk/blob/master/test/jtreg-ext/requires/VMProps.java#L173 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27931#issuecomment-3433372131 From hgreule at openjdk.org Wed Oct 22 18:15:32 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 22 Oct 2025 18:15:32 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: <-B2MGRxBjoFvhhILg9WNvLGuOWyU7aY69O3M8nh82Hs=.f5b375ac-f65f-480e-9a4b-04882d339de3@github.com> On Sun, 19 Oct 2025 15:46:06 GMT, Hannes Greule wrote: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. > Thinking about it a bit more, I think your fix is too superficial. If the discovery of the constant is slightly delayed, nothing is folded again. Consider the followig program for an example: > > ```java > class Test { > static boolean test(int x, boolean flag) { > Integer a; > if (flag) { > a = 171384; > } else { > a = 2902; > } > > return x % a >= a; > } > > public static void main(String[] args) { > for (int i = 0; i < 20000; i++) { > if (test(i, false)) { > throw new RuntimeException("wrong result"); > } > } > } > } > ``` > > In my opinion, the benefits do not outweigh the drawbacks for this PR. A better solution would probably be to delay the expansion of the Mod and Div nodes to post-loop optimizations and extend Superword to expand Div/Mod nodes to shifts. However, this is quite a bit of complexity, which raises if this complexity is worth it (@eme64 probably has opinions and/or guidance on this). I'm not sure about the drawbacks here, but I think optimizing this on the superword level doesn't make things less complicated. If cases where we end up idealizing before calling Value are a more general problem, I'd say it's worth to also address it on exactly that level: make sure that Value is called before Ideal. I'm just hesitant because I'm not aware of any other situations where this matters. One middle ground here would be some kind of `Node::InitialValue(...)` (or a better name :) ) that just calls `bottom_type()` by default and can be overridden for nodes like Mod and Div. Joining that value with the Value calculated later would solve this problem on a different level, but more effectively. But it would also be a more invasive change overall. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3433611940 From kxu at openjdk.org Wed Oct 22 19:07:32 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 22 Oct 2025 19:07:32 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v17] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: mark LoopExitTest::is_valid_with_bt() const ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/9005b864..0a3fff1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From kxu at openjdk.org Wed Oct 22 19:07:34 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 22 Oct 2025 19:07:34 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v3] In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 16:30:51 GMT, Christian Hagedorn wrote: >> Resolved conflict with [JDK-8357951](https://bugs.openjdk.org/browse/JDK-8357951). @chhagedorn I'd appreciate a re-review. Thank you so much! > > Thanks @tabjy for coming back with an update and pinging me again! Sorry, I completely missed it the first time. I will be on vacation starting tomorrow for two weeks but I'm happy to take another look when I'm back :-) @chhagedorn Sorry this took longer than expected. I left a few replies under some of your specific comments. All other issues were addressed. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3433811945 From kxu at openjdk.org Wed Oct 22 19:07:37 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 22 Oct 2025 19:07:37 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v13] In-Reply-To: References: <6zw0uSB1sUHZTyDUXDjiXcB0Chmu0XH1cEngzhG-UNk=.b239a687-cfb7-49a3-993a-34327a83c4de@github.com> Message-ID: <3ULoRhuYvjY4zpPlgdNW9WFGwpMyKjlcadE_bMoZEiU=.52f26d56-74e6-4e51-81df-ff270f9806fd@github.com> On Fri, 10 Oct 2025 08:57:19 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8354383: C2: enable sinking of Type nodes out of loop >> >> Reviewed-by: chagedorn, thartmann >> (cherry picked from commit a2f99fd88bd03337e1ba73b413ffe4e39f3584cf) > > src/hotspot/share/opto/loopnode.cpp line 1750: > >> 1748: >> 1749: _is_valid = true; >> 1750: return true; > > In the old code, we returned a `nullptr` from `loop_iv_incr()` and then bailed out in `is_counted_loop()`. But here we seem to set `_incr` regardless and also set `_is_valid` to true. This seems incorrect. You're right. This is fixed. > src/hotspot/share/opto/loopnode.hpp line 2073: > >> 2071: >> 2072: bool _insert_stride_overflow_limit_check = false; >> 2073: bool _insert_init_trip_limit_check = false; > > Can you move all fields to the top? This makes it easier to find them. Move everything except `LoopStructure _structure` to the top. `LoopStructure` must be declared before referencing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2453081111 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2453087070 From kxu at openjdk.org Wed Oct 22 19:07:38 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 22 Oct 2025 19:07:38 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v17] In-Reply-To: References: <6zw0uSB1sUHZTyDUXDjiXcB0Chmu0XH1cEngzhG-UNk=.b239a687-cfb7-49a3-993a-34327a83c4de@github.com> Message-ID: On Fri, 10 Oct 2025 08:49:58 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> mark LoopExitTest::is_valid_with_bt() const > > src/hotspot/share/opto/loopnode.cpp line 1884: > >> 1882: } >> 1883: >> 1884: // Trip-counter increment must be commutative & associative. > > This comment did not really make sense. I checked its history and it started to be misplaced here: > > https://github.com/openjdk/jdk/commit/baaa8f79ed93d4dc1444fed81599ab0f7c2dd395#diff-dc3fdd0572cfc2cb65bce10f08db4054dbaf1b3b94f8ad7883f6c120b4773cfeR332-R342 > > I suggest to move the comment again to the right place in your patch in `LoopIVIncr::build()`. Done. Thank you for suggesting this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2453082203 From mli at openjdk.org Wed Oct 22 20:55:12 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 22 Oct 2025 20:55:12 GMT Subject: RFR: 8370454: C2 SuperWord: unsigned comparison information is lost for VectorMaskCmp Message-ID: Hi, Can you help to review the patch? @eme64 Currently, in SLP if we support transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned comparison information is lost, it's in CmpU, but current code only check Bool for the information. For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. This loss of unsigned comparison information blocks the optimization proposed in https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. Thanks ------------- Commit messages: - refactor - revert blank line - initial commit - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 7 more: https://git.openjdk.org/jdk/compare/f158451c...6271a8e7 Changes: https://git.openjdk.org/jdk/pull/27942/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370454 Stats: 8 lines in 2 files changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27942/head:pull/27942 PR: https://git.openjdk.org/jdk/pull/27942 From fandreuzzi at openjdk.org Wed Oct 22 20:58:24 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Wed, 22 Oct 2025 20:58:24 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v5] In-Reply-To: References: Message-ID: <7XgDXusUBZA5Q-PRV2YL8eWTuTxXOFqO6ev7ScAQfUc=.7ce457ee-bddd-45a8-9199-470e1d507177@github.com> > I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. > > Passes tier1 and tier2 (fastdebug). Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: nn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27742/files - new: https://git.openjdk.org/jdk/pull/27742/files/8ea13b6e..98754ee8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27742&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27742&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27742.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27742/head:pull/27742 PR: https://git.openjdk.org/jdk/pull/27742 From dlong at openjdk.org Wed Oct 22 21:14:28 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 22 Oct 2025 21:14:28 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: <3AoE-jqtzheJc6Dwor2Eg5i4rysiiy25jLBnYc766Yk=.3c9ce052-ad01-4fd6-b343-844fa8b930f2@github.com> References: <4sEEV2mIkMt9Wslr88hgCJ5Xc1A3JJ2OoPgAq9Z2K7g=.fc11a041-7274-46c1-a14c-28196c51c20b@github.com> <8QQ8kda3XfiK160D_coGZKakItbYUuUzqCU gTMnYxOo=.fe946849-b071-4615-80ba-3ecc2d966fc0@github.com> <3AoE-jqtzheJc6Dwor2Eg5i4rysiiy25jLBnYc766Yk=.3c9ce052-ad01-4fd6-b343-844fa8b930f2@github.com> Message-ID: On Wed, 22 Oct 2025 15:21:18 GMT, Emanuel Peter wrote: >> Yes, I'd say just file an RFE, and link to the conversation here. Feel free to assign it to me if you don't want to own it ;) > > Let's ask @vnkozlov , @TobiHartmann and @rwestrel , do any of you have a good definition for the `major_progress` concept? Until jdk11, there were comments in the code like this: "If _major_progress, then more loop optimizations follow" but those uses of major_progress() have been changed to !post_loop_opts_phase(). So "major progress" seems to imply "expect more loop opts". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2453364617 From dlong at openjdk.org Wed Oct 22 21:59:11 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 22 Oct 2025 21:59:11 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: Message-ID: <0G5cUVDDDzfPkgkkvplaKYwk5gwRdmq9sHCHYmw5Ei0=.e2e2637c-612d-46ed-abda-f8e0ce7732da@github.com> On Wed, 22 Oct 2025 07:46:35 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore >> ----------------|-----------------------------|----------------------- >> 0 | 0 | 0 >> 1 | 0 | 1 >> 0 | 1 | 1 >> 1 | 1 | 2 >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Adapt the comment It doesn't look safe to me to change the restore_major_progress() behavior in build_and_optimize(). It certainly looks possible to call set_major_progress() in the do_max_unroll case (lines 5146 - 5163) at least. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3434324489 From dlong at openjdk.org Thu Oct 23 03:02:06 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 23 Oct 2025 03:02:06 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v4] In-Reply-To: References: <4Cj8ndIna8Do2EfcPsFR85vpAsHGPKtwAvrgp2dTkxU=.e996f1ed-c3a7-45be-a723-190db1bbea73@github.com> Message-ID: On Wed, 22 Oct 2025 14:15:12 GMT, Francesco Andreuzzi wrote: >> If we mark them as not-entrant, then the is_not_entrant() check below will still catch them, right? > > I see, I assumed an nmethod couldn't be marked as on-stack without entry barriers, but that doesn't seem to be the case. > > But on second thought, do you agree with the fix I'm proposing in this PR? I think the following two work items could be implemented and reviewed in separate changesets: > - Allow not-entrant nmethod to be collected during GC (I removed `is_static_method()` from L2599, so native nmethods are treated just like normal nmethods) > - Evaluate the implications of removing entry barriers for native nmethods, thus letting GC reclaim them whenever `!is_maybe_on_stack() && is_not_entrant()`, but without the overhead of entry barriers. > > I'm proposing this because I guess the latter will need more discussion and is technically not needed to fix the memory leak I address in this PR. Do you agree @dean-long ? I could create another ticket to handle the second item. Yes, I'm fine with it being a separate issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27742#discussion_r2453797323 From duke at openjdk.org Thu Oct 23 03:24:08 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 23 Oct 2025 03:24:08 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v13] In-Reply-To: <45hkD8RQ3TH-4dvjl_bN9dG0B4BSE3d1wXZAPMxeDSA=.84597563-35d7-43c4-bd7c-cad7da7e9277@github.com> References: <45hkD8RQ3TH-4dvjl_bN9dG0B4BSE3d1wXZAPMxeDSA=.84597563-35d7-43c4-bd7c-cad7da7e9277@github.com> Message-ID: On Tue, 21 Oct 2025 18:19:18 GMT, Valerie Peng wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates for code review comments from @valeriepeng > > Changes look fine, thanks~ Thank you @valeriepeng, @iwanowww, and @eme64 for your comments. After code review comment updates, all recommended regression tests have been executed and have passed (with all known failures). Benchmarks have been reran in each of the modes supported by intrinsics and these results have matched to those of pre-code review update. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27807#issuecomment-3434921841 From duke at openjdk.org Thu Oct 23 03:24:09 2025 From: duke at openjdk.org (duke) Date: Thu, 23 Oct 2025 03:24:09 GMT Subject: RFR: 8326609: New AES implementation with updates specified in FIPS 197 [v13] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 00:05:31 GMT, Shawn M Emery wrote: >> General: >> ----------- >> i) This work is to replace the existing AES cipher under the Cryptix license. >> >> ii) The lookup tables are employed for performance, but also for operating in constant time. >> >> iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. >> >> iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. >> >> Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. >> >> Correctness: >> ----------------- >> The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: >> >> i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass >> >> -intrinsics mode for: >> >> ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass >> >> iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures >> >> iv) jdk_security_infra: passed, with 48 known failures >> >> v) tier1 and tier2: all 110257 tests pass >> >> Security: >> ----------- >> In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. >> >> Performance: >> ------------------ >> All AES related benchmarks have been executed against the new and original Cryptix code: >> >> micro:org.openjdk.bench.javax.crypto.AES >> >> micro:org.openjdk.bench.javax.crypto.full.AESBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESExtraBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMBench >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream >> >> micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. >> >> micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) >> >> The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption re... > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Updates for code review comments from @valeriepeng @smemery Your change (at version fdfd38929d3c7b725cf44312eba5d2f0d7da7b0a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27807#issuecomment-3434923911 From epeter at openjdk.org Thu Oct 23 05:28:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 05:28:01 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Sun, 19 Oct 2025 15:46:06 GMT, Hannes Greule wrote: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. We had a bit of an offline discussion in the office yesterday. Here a summary of my thoughts. Ordering optimizations/phases in compilers is a difficult problem, it is not at all unique to this problem or even C2, all compilers have this problem. Doing what @SirYwell does here, with delaying to IGVN is a relatively simple fix, and it at least addresses all cases where the `divisor` and the `comparison` are already parse time constants. I would consider that a win already. But the solution is a bit hacky. The alternative that was suggested: delay it to post-loop-opts. But that is equally hacky really, it would have the same kind of delay logic where it is proposed now, just with a different "destination" (IGVN vs post-loop-opts). And it has the downside of preventing auto vectorization (SuperWord does not know how to deal with `Div/Mod`, no hardware I know of implements vectorized integer division, only floating division is supported). But delaying to post-loop-opts allows cases like @mhaessig showed, where control flow collapses during IGVN. We could also make a similar example where control flow collapses only during loop-opts, in some cases only after SuperWord even (though that would be very rare). It is really difficult to handle all cases, and I don't know if we really need to. But it is hard to know which cases we should focus on. Here a super intense solution that would be the most powerful I can think of right now: - Delay `transform_int_divide` to post-loop-opts, so we can wait for constants to appear during IGVN and loop-opts. - That would mean we have to accept regressions for the currently vectorizing cases, or we have to do some `transform_int_divide` inside SuperWord: add an `VTransform::optimize` pass somehow. This would take a "medium" amount of engineering, and it would be more C++ code to maintain and test. - Yet another possibility: during loop-opts, try to do `transform_int_divide` not just with constant divisor, but also loop-invariant divisor. We would have to find a way to do the logic of `transform_int_divide` that finds the magic constants in C2 IR instead of C++ code (there seem to be some "failure" cases in the computation, not sure if we can resolve those). If the loop has sufficient iterations, it can be profitable to do the magic constant calculation before the loop, and do only mul/shift/add inside the loop. But this seems like an optional add-on. But it would be really powerful. And it would make the `VTransform::optimize` (SuperWord) step unnecessary. So my current thinking is: We have to do some kind of delay anyway, either to IGVN or post-loop-opts, or elsewhere. For now, IGVN is a step in the right direction. The "delay mechanism" is a bit hacky, but we use it in multiple places already (grep for `record_for_igvn`). It is not @SirYwell 's fault that our delay mechanism is so hacky. So I would vote for going with delay to IGVN for now, to at least support the parse-time constants. Then file some RFE that tracks the other ideas, and see if someone wants to pick that up (figure out a loop-opts pass that works for loop-invariant divisors, and otherwise delay to post-loop-opts). src/hotspot/share/opto/divnode.cpp line 545: > 543: > 544: // Keep this node as-is for now; we want Value() and > 545: // other optimizations checking for this node type to work Do we only need `Value` done first on the `Div` node, or also on uses of it? It might be worth explaining it in a bit more detail here. If it was just about calling `Value` on the `Div` first, we could probably check what `Value` returns here. But I fear that is not enough, right? Because it is the `Value` here that returns some range, and then some use sees that this range has specific characteristics, and can constant fold a comparison, for example. Did I get this right? ------------- PR Review: https://git.openjdk.org/jdk/pull/27886#pullrequestreview-3368244192 PR Review Comment: https://git.openjdk.org/jdk/pull/27886#discussion_r2453923234 From epeter at openjdk.org Thu Oct 23 05:36:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 05:36:03 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: <0G5cUVDDDzfPkgkkvplaKYwk5gwRdmq9sHCHYmw5Ei0=.e2e2637c-612d-46ed-abda-f8e0ce7732da@github.com> References: <0G5cUVDDDzfPkgkkvplaKYwk5gwRdmq9sHCHYmw5Ei0=.e2e2637c-612d-46ed-abda-f8e0ce7732da@github.com> Message-ID: On Wed, 22 Oct 2025 21:56:26 GMT, Dean Long wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Adapt the comment > > It doesn't look safe to me to change the restore_major_progress() behavior in build_and_optimize(). It certainly looks possible to call set_major_progress() in the do_max_unroll case (lines 5146 - 5163) at least. Given @dean-long and @vnkozlov 's answers, I would suggest something like this: If major progress is set: Marks that the loop tree information (get_ctrl, idom, get_loop, etc) could be invalid, and we need to rebuild the loop tree. It also indicates that we have made progress, and so it is likely that we can make even more progress in a next round of loop optimizations. If major progress is not set: Loop tree information is valid. If major progress is not set at the end of a loop opts phase, then we can stop loop opts, because we do not expect any further progress if we did more loop ops phases. Suggestions for improvements? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3435159446 From epeter at openjdk.org Thu Oct 23 05:42:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 05:42:02 GMT Subject: RFR: 8370454: C2 SuperWord: unsigned comparison information is lost for VectorMaskCmp In-Reply-To: References: Message-ID: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> On Wed, 22 Oct 2025 20:48:17 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? @eme64 > > Currently, in SLP if we support transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned comparison information is lost, it's in CmpU, but current code only check Bool for the information. For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. > > This loss of unsigned comparison information blocks the optimization proposed in https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. > > Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. > > Thanks @Hamlin-Li That looks like a great improvement :) All I am missing are some test cases. And if possible: IR rules ? src/hotspot/share/opto/superword.cpp line 1704: > 1702: cmp0->Opcode() == Op_CmpUL || > 1703: cmp0->Opcode() == Op_CmpU3 || > 1704: cmp0->Opcode() == Op_CmpUL3; Maybe it is time to create a `switch` statement below ;) src/hotspot/share/opto/superword.cpp line 1751: > 1749: } > 1750: } else if (is_unsigned) { > 1751: mask = BoolTest::unsigned_mask(mask); This means more cases could now vectorize. Do we have good test cases for this? We should be able to get IR tests for this, right? Do x86 or aarch64 backends (or other platforms) not already have vector instructions for this? ------------- PR Review: https://git.openjdk.org/jdk/pull/27942#pullrequestreview-3368300725 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2453970205 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2453971569 From epeter at openjdk.org Thu Oct 23 06:02:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 06:02:12 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> Message-ID: <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> On Wed, 22 Oct 2025 04:15:26 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Move function comments to matcher.hpp > - Merge 'jdk:master' into JDK-8367292 > - 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE Tests passed :) Now I have some understanding questions ;) src/hotspot/cpu/aarch64/aarch64_vector.ad line 405: > 403: return true; > 404: } > 405: } The name suggests that if you return false here, then it is still ok to use a predicate instruction. The name suggests that if your return true, then you must use a predicate instruction. But then your comment for `Op_VectorLongToMask` and `Op_VectorMaskToLong` seems to suggest that we return false and do not want that a predicate instruction is used, but instead a packed vector. So now I'm a bit confused. I'm also wondering: Since there are two options (mask in packed vector vs predicate), does the availability of one always imply the availability of the other? Or could some platform have only one, and another platform only the other? And: can you please explain the `if (vt->isa_vectmask() == nullptr) {` check, also for the other platforms? src/hotspot/share/opto/vectorIntrinsics.cpp line 627: > 625: if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) { > 626: mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); > 627: } What does `VectorStoreMaskNode` do exactly? Could you maybe add some short comment above the class definition of `VectorStoreMaskNode`? I'm guessing it turns a predicate into a packed vector, right? If that is correct, then it would make more sense to check something like Suggestion: if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) { mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); } ------------- PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3368324099 PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2453989790 PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2453997194 From epeter at openjdk.org Thu Oct 23 06:06:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 06:06:06 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> Message-ID: On Thu, 23 Oct 2025 05:56:05 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Move function comments to matcher.hpp >> - Merge 'jdk:master' into JDK-8367292 >> - 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE > > src/hotspot/share/opto/vectorIntrinsics.cpp line 627: > >> 625: if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) { >> 626: mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); >> 627: } > > What does `VectorStoreMaskNode` do exactly? > Could you maybe add some short comment above the class definition of `VectorStoreMaskNode`? > > I'm guessing it turns a predicate into a packed vector, right? > If that is correct, then it would make more sense to check something like > Suggestion: > > if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) { > mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); > } I'm wondering if the name `VectorStoreMaskNode` is even very good. Is it about storing a mask, or a mask for storing? But is it really limited to storing things, or could it also be for loads? Or is it rather a conversion? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454008884 From mchevalier at openjdk.org Thu Oct 23 06:29:05 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 23 Oct 2025 06:29:05 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 07:46:35 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore >> ----------------|-----------------------------|----------------------- >> 0 | 0 | 0 >> 1 | 0 | 1 >> 0 | 1 | 1 >> 1 | 1 | 2 >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Adapt the comment I've filed this JDK-8370443 forgot to post. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3435365640 From mchevalier at openjdk.org Thu Oct 23 06:54:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 23 Oct 2025 06:54:03 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 07:46:35 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Adapt the comment There are 2 things: 1. `restore_major_progress` from addition (or OR) semantics to assignment (@dean-long's concern): I've added void set_major_progress(bool progress) { precond(!(!progress && _major_progress)); _major_progress = progress; } To see if we are ever in the case that the `old_progress` is false and `_major_progress` is true. That is the only case where the former OR-semantics is not the same as the new set semantics. It passes tier1-6 + other internal tests. I can replace the assignment with a `||` (the boolean `+`) if we still have doubts, but then, it seems tests are not exercising this path. 2. Is the type change correct overall. I've did something as @vnkozlov describes: have side by side the bool and the int version of the major progress, have the methods acts on both at the same time: on the int as it used to, on the bool as I propose here. Add the proposed assert in the getter. I've also made sure to assign both the int and the bool version for the 2 places in `compile.cpp` that assign `_major_progress` directly. It passes tier1-3 + other internal tests. This also makes sure there is no observable difference between the `+=` for the `int` version, and the assignment for the `bool` version. Here is what I've tested with: [testing.patch](https://github.com/user-attachments/files/23091637/testing.patch) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3435421146 From chagedorn at openjdk.org Thu Oct 23 06:59:03 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 23 Oct 2025 06:59:03 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: <0G5cUVDDDzfPkgkkvplaKYwk5gwRdmq9sHCHYmw5Ei0=.e2e2637c-612d-46ed-abda-f8e0ce7732da@github.com> Message-ID: On Thu, 23 Oct 2025 05:33:46 GMT, Emanuel Peter wrote: >> It doesn't look safe to me to change the restore_major_progress() behavior in build_and_optimize(). It certainly looks possible to call set_major_progress() in the do_max_unroll case (lines 5146 - 5163) at least. > > Given @dean-long and @vnkozlov 's answers, I would suggest something like this: > > If major progress is set: > Marks that the loop tree information (get_ctrl, idom, get_loop, etc) could be invalid, and we need to rebuild the loop tree. > It also indicates that we have made progress, and so it is likely that we can make even more progress in a next round of loop optimizations. > If major progress is not set: > Loop tree information is valid. > If major progress is not set at the end of a loop opts phase, then we can stop loop opts, because we do not expect any further progress if we did more loop ops phases. > > Suggestions for improvements? Thanks for the suggestion above @eme64! > Marks that the loop tree information (get_ctrl, idom, get_loop, etc) could be invalid, and we need to rebuild the loop tree. Is it really invalid or just not as accurate as it could be but still correct? At least during a major loop optimizations like Loop Peeling etc. we try to keep things right and even recompute broken things when we are done, as for example after Loop Unswitching (which seems unnecessary if we assume things can really be invalid when major progress is set): https://github.com/openjdk/jdk/blob/ffcb1585ed6c2a2bff28be6854d44a672aa31a0b/src/hotspot/share/opto/loopUnswitch.cpp#L314 > It also indicates that we have made progress, and so it is likely that we can make even more progress in a next round of loop optimizations. I think "made progress" is somewhat hard to quantify. We could do plenty of progress in IGVN but not decide to do another loop opts round. But we could remove one useless bool in IGVN and decide to set major progress: https://github.com/openjdk/jdk/blob/ffcb1585ed6c2a2bff28be6854d44a672aa31a0b/src/hotspot/share/opto/ifnode.cpp#L1449 Given that, I would rephrase it to something like: "It also indicates that the graph was changed in a way that is promising to be able to apply more loop optimizations." It seems that "major progress" is probably the wrong term since it not only indicates a major progress but also a "please apply more loop opts". Good example for that: https://github.com/openjdk/jdk/blob/ffcb1585ed6c2a2bff28be6854d44a672aa31a0b/src/hotspot/share/opto/loopnode.cpp#L5291-L5298 What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3435433072 From hgreule at openjdk.org Thu Oct 23 07:24:06 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 23 Oct 2025 07:24:06 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 04:58:33 GMT, Emanuel Peter wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > src/hotspot/share/opto/divnode.cpp line 545: > >> 543: >> 544: // Keep this node as-is for now; we want Value() and >> 545: // other optimizations checking for this node type to work > > Do we only need `Value` done first on the `Div` node, or also on uses of it? > It might be worth explaining it in a bit more detail here. > > If it was just about calling `Value` on the `Div` first, we could probably check what `Value` returns here. But I fear that is not enough, right? Because it is the `Value` here that returns some range, and then some use sees that this range has specific characteristics, and can constant fold a comparison, for example. Did I get this right? So, the *main* reason why I'm including Div here is mainly because of #26143; before that the DivI/LNode::Value() is actually less precise than Value on the nodes created by `transform_int_divide`. With #26143, some results are more precise even for constant divisors. In such case, uses can benefit from seeing the (then) more precise range. (@ichttt found a case where the replacement fails to constant-fold, but that's just due to missing constant folding in MulHiLNode) A secondary reason is other optimizations checking for Div inputs, though I didn't find any existing check that would actually benefit. There *might* be optimization opportunities that want to detect division, but that's just Generally from what I've found the benefit is bigger for Mod nodes, because there calling Value on the replacements is significantly worse. And there we also encounter typical usages in combination with range checks. Do you want me to expand both Div and Mod comments to cover more concrete benefits, depending on the operation? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27886#discussion_r2454163486 From xgong at openjdk.org Thu Oct 23 07:25:05 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 23 Oct 2025 07:25:05 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> Message-ID: On Thu, 23 Oct 2025 06:03:13 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 627: >> >>> 625: if (!Matcher::vector_mask_requires_predicate(mopc, mask_vec->bottom_type()->is_vect())) { >>> 626: mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); >>> 627: } >> >> What does `VectorStoreMaskNode` do exactly? >> Could you maybe add some short comment above the class definition of `VectorStoreMaskNode`? >> >> I'm guessing it turns a predicate into a packed vector, right? >> If that is correct, then it would make more sense to check something like >> Suggestion: >> >> if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) { >> mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); >> } > > I'm wondering if the name `VectorStoreMaskNode` is even very good. Is it about storing a mask, or a mask for storing? But is it really limited to storing things, or could it also be for loads? Or is it rather a conversion? `VectorStoreMask` is a opposite operation of `VectorLoadMask`. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a `VectorMask` is stored into a boolean array. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454163415 From xgong at openjdk.org Thu Oct 23 07:25:06 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 23 Oct 2025 07:25:06 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> Message-ID: <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> On Thu, 23 Oct 2025 07:21:02 GMT, Xiaohong Gong wrote: >> I'm wondering if the name `VectorStoreMaskNode` is even very good. Is it about storing a mask, or a mask for storing? But is it really limited to storing things, or could it also be for loads? Or is it rather a conversion? > > `VectorStoreMask` is a opposite operation of `VectorLoadMask`. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a `VectorMask` is stored into a boolean array. > if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) { > mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); > } Is the function name `vector_mask_must_be_packed` fine to you? This looks smarter to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454169830 From shade at openjdk.org Thu Oct 23 07:28:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 23 Oct 2025 07:28:31 GMT Subject: RFR: 8370318: AES-GCM vector intrinsic may read out of bounds (x86_64, AVX-512) Message-ID: See the bug for symptoms and discussion. In short, in newly added intrinsic in JDK 24, there is a potential read out of Java heap if key array is at the edge of it, which will crash JVM. And that read is redundant for the code path in question, we only use it in the subsequent blocks that we never actually enter in the problematic case. So we never see any failures in testing: the only observable effect is SEGV on uncommitted heap access. It is somewhat similar to [JDK-8330611](https://bugs.openjdk.org/browse/JDK-8330611) we have fixed in other place. But this one can be caught with the explicit range check in debug code. Additional testing: - [x] Linux x86_64 server fastdebug, `com/sun/crypto/provider/Cipher compiler/codegen/aes` (fails with range check only, passes with entire patch) - [x] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - No need to shorten branches after tighter check - Tighter check - Fix Changes: https://git.openjdk.org/jdk/pull/27951/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27951&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370318 Stats: 22 lines in 2 files changed: 21 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27951/head:pull/27951 PR: https://git.openjdk.org/jdk/pull/27951 From shade at openjdk.org Thu Oct 23 07:31:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 23 Oct 2025 07:31:25 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Wed, 24 Sep 2025 13:08:14 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Drop atomic counters > - Initial version Not now, bot, still looking for reviewers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3435525186 From xgong at openjdk.org Thu Oct 23 07:35:13 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 23 Oct 2025 07:35:13 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> Message-ID: On Wed, 22 Oct 2025 04:15:26 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Move function comments to matcher.hpp > - Merge 'jdk:master' into JDK-8367292 > - 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE Hi @fg1417 , @Bhavana-Kilambi could you please help take a look at this PR especially the backend changes? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3435540917 From xgong at openjdk.org Thu Oct 23 07:35:15 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 23 Oct 2025 07:35:15 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> Message-ID: <2f9k7vksJ4apEclf8z85MyLeg6bbRnhptLpsoydeTMI=.df8ad2fc-5b56-4554-982a-691eb7849a84@github.com> On Thu, 23 Oct 2025 05:51:21 GMT, Emanuel Peter wrote: > The name suggests that if you return false here, then it is still ok to use a predicate instruction. The name suggests that if your return true, then you must use a predicate instruction. > > But then your comment for `Op_VectorLongToMask` and `Op_VectorMaskToLong` seems to suggest that we return false and do not want that a predicate instruction is used, but instead a packed vector. > > So now I'm a bit confused. The type for a vector mask is different on architectures that supports the predicate feature or not (please see my details answer below). Hence, for some vector operations, the expected input mask register/layout is different. Please note that there are two kind of layout for a mask if it is stored in a **vector register**. It might be 1) a packed layout with 8-bit element width, or 2) a unpacked layout with 8/16/32/64-bit element width according to the vector type. For the data relative mask operations like `VectorBlend`, it is 2), while for some bit relative mask operations like `VectorMaskTrueCount, VectorMaskFirstTrue, toLong, fromLong, ...`, it is 1) , because the implementation will be more efficient. My intention is to use this function guide what the expected IR is generated for a vector mask operation. Before this patch, mid-end do the difference by just checking the type of a vector mask, as it assumes the predicate instruction will be generated for a predicate type, while the vector instructions are generated for a vector type. However, as I mentioned in this PR, some mask operations might not support native predicate instructions on predicate architectures. Instead, they are implemented with the same vector instructions like NEON. We have to do the mask layout conversion inside codegen, which is in-efficient. Generating the same IR pattern like NEON is more efficient. So, if this function returns false, it means the input/output mask for a specified opcode requires to be saved into a vector register with the packed layout, even the architecture supports predicate feature. This is decided by the IR's implementation. > > I'm also wondering: Since there are two options (mask in packed vector vs predicate), does the availability of one always imply the availability of the other? Or could some platform have only one, and another platform only the other? > There are three kind of options for a mask: 1) packed vector with 8-bit element size, 2) unpacked vector with 8/16/32/64-bit element size, and 3) predicate. 1) The packed vector with 8-bit layout is a temporary status of the mask which exists on all architectures. For example, it is the result of a `LoadVector` from a boolean array. 2) The unpacked vector with 8/16/32/64-bit layout is a real vector mask which is unpacked from 1). The `VectorLoadMask` IR is used to implement the unpack operation. It exists on platform that does not support the predicate feature (e.g. NEON, SSE/AVX1/AVX2). 3) The predicate layout is a real vector mask which is also converted from 1). The `VectorLoadMask` IR is used to implement the conversion. It just exists on platforms that do support the predicate feature (e.g. SVE, AVX-512, RVV). > And: can you please explain the `if (vt->isa_vectmask() == nullptr) {` check, also for the other platforms? `if (vt->isa_vectmask() == nullptr)` means current vector mask's type is normal `TypeVect` instead of `TypeVectMask`. The type of vector mask is defined based on whether current architecture supports the predicate feature, since the register for a mask is different with vector register on such architectures. If arch supports the predicate feature (such as Arm SVE, AVX512, and RVV), then the type will be defined as `TypeVectMask`, otherwise it is the normal `TypeVect` (e.g. Arm NEON, X86 SSE/AVX1/AVX2). Please see the definition here: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L2444-L2451. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454193088 From hgreule at openjdk.org Thu Oct 23 07:39:02 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 23 Oct 2025 07:39:02 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 05:25:32 GMT, Emanuel Peter wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > We had a bit of an offline discussion in the office yesterday. Here a summary of my thoughts. > > Ordering optimizations/phases in compilers is a difficult problem, it is not at all unique to this problem or even C2, all compilers have this problem. > > Doing what @SirYwell does here, with delaying to IGVN is a relatively simple fix, and it at least addresses all cases where the `divisor` and the `comparison` are already parse time constants. I would consider that a win already. But the solution is a bit hacky. > > The alternative that was suggested: delay it to post-loop-opts. But that is equally hacky really, it would have the same kind of delay logic where it is proposed now, just with a different "destination" (IGVN vs post-loop-opts). And it has the downside of preventing auto vectorization (SuperWord does not know how to deal with `Div/Mod`, no hardware I know of implements vectorized integer division, only floating division is supported). But delaying to post-loop-opts allows cases like @mhaessig showed, where control flow collapses during IGVN. We could also make a similar example where control flow collapses only during loop-opts, in some cases only after SuperWord even (though that would be very rare). > > It is really difficult to handle all cases, and I don't know if we really need to. But it is hard to know which cases we should focus on. > > Here a super intense solution that would be the most powerful I can think of right now: > - Delay `transform_int_divide` to post-loop-opts, so we can wait for constants to appear during IGVN and loop-opts. > - That would mean we have to accept regressions for the currently vectorizing cases, or we have to do some `transform_int_divide` inside SuperWord: add an `VTransform::optimize` pass somehow. This would take a "medium" amount of engineering, and it would be more C++ code to maintain and test. > - Yet another possibility: during loop-opts, try to do `transform_int_divide` not just with constant divisor, but also loop-invariant divisor. We would have to find a way to do the logic of `transform_int_divide` that finds the magic constants in C2 IR instead of C++ code (there seem to be some "failure" cases in the computation, not sure if we can resolve those). If the loop has sufficient iterations, it can be profitable to do the magic constant calculation before the loop, and do only mul/shift/add inside the loop. But this seems like an optional add-on. But it would be really powerful. And it would make the `VTransform::optimiz... Thanks for the summary @eme64. I totally agree that it's a bit hacky, but the current state is the least invasive. I'd also be interested in going further steps in the same direction, but I feel like the work increases significantly more than the benefits (at least as long as we don't generalize it to also optimize for loop invariant non-constants, but that's also a lot of work). @mhaessig do you have test results already? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3435552083 From mbaesken at openjdk.org Thu Oct 23 07:43:11 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 23 Oct 2025 07:43:11 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v4] In-Reply-To: References: Message-ID: <2y8Fj47o-JZvc1t0gRzBX2h2rSdN__43z0w9oWR7-HI=.10f0621e-f65f-4944-9278-90f1f3da248c@github.com> On Tue, 7 Oct 2025 17:45:38 GMT, Martin Doerr wrote: >> We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. >> >> We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. >> >> I've tested this proposal by the following code on x86_64: >> >> diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> index a6b4efbe4f2..d715e69c850 100644 >> --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp >> +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { >> void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { >> prepare_to_jump_from_interpreted(); >> >> + if (UseNewCode) { >> + Label ok; >> + movptr(temp, Address(method, Method::from_interpreted_offset())); >> + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); >> + je(ok); >> + movptr(rax, Address(method, Method::from_compiled_offset())); >> + movptr(rbx, rax); >> + addptr(rbx, 128); >> + hlt(); >> + bind(ok); >> + } >> + >> if (JvmtiExport::can_post_interpreter_events()) { >> Label run_compiled_code; >> // JVMTI events, such as single-stepping, are implemented partly by avoiding running >> >> >> The output is (requires hsdis library, otherwise we only get the hex dump): >> >> RAX=0x00007fa3e072c100 is at entry_point+0 in (nmethod*)0x00007fa3e072c008 >> Compiled method (c1) 2521 1 3 java.lang.Byte::toUnsignedInt (6 bytes) >> total in heap [0x00007fa3e072c008,0x00007fa3e072c1f8] = 496 >> main code [0x00007fa3e072c100,0x00007fa3e072c1b8] = 184 >> stub code [0x00007fa3e072c1b8,0x00007fa3e072c1f8] = 64 >> mutable data [0x00007fa37c0160a0,0x00007fa37c0160d0] = 48 >> relocation [0x00007fa37c0160a0,0x00007fa37c0160c8] = 40 >> metadata [0x00007fa37c0160c8,0x00007fa37c0160d0] = 8 >> immutable data [0x00007fa37c015cc0,0x00007fa37c015d24] = 100 >> dependencies [0x00007fa37c015cc0,0x00007fa37c015... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use frame_complete_offset for better start address computation. Improve comments. Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27530#pullrequestreview-3368631720 From shade at openjdk.org Thu Oct 23 08:06:08 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 23 Oct 2025 08:06:08 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v4] In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 17:45:38 GMT, Martin Doerr wrote: >> We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. >> >> We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. >> >> I've tested this proposal by the following code on x86_64: >> >> diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> index a6b4efbe4f2..d715e69c850 100644 >> --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp >> +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { >> void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { >> prepare_to_jump_from_interpreted(); >> >> + if (UseNewCode) { >> + Label ok; >> + movptr(temp, Address(method, Method::from_interpreted_offset())); >> + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); >> + je(ok); >> + movptr(rax, Address(method, Method::from_compiled_offset())); >> + movptr(rbx, rax); >> + addptr(rbx, 128); >> + hlt(); >> + bind(ok); >> + } >> + >> if (JvmtiExport::can_post_interpreter_events()) { >> Label run_compiled_code; >> // JVMTI events, such as single-stepping, are implemented partly by avoiding running >> >> >> The output is (requires hsdis library, otherwise we only get the hex dump): >> >> RAX=0x00007fa3e072c100 is at entry_point+0 in (nmethod*)0x00007fa3e072c008 >> Compiled method (c1) 2521 1 3 java.lang.Byte::toUnsignedInt (6 bytes) >> total in heap [0x00007fa3e072c008,0x00007fa3e072c1f8] = 496 >> main code [0x00007fa3e072c100,0x00007fa3e072c1b8] = 184 >> stub code [0x00007fa3e072c1b8,0x00007fa3e072c1f8] = 64 >> mutable data [0x00007fa37c0160a0,0x00007fa37c0160d0] = 48 >> relocation [0x00007fa37c0160a0,0x00007fa37c0160c8] = 40 >> metadata [0x00007fa37c0160c8,0x00007fa37c0160d0] = 8 >> immutable data [0x00007fa37c015cc0,0x00007fa37c015d24] = 100 >> dependencies [0x00007fa37c015cc0,0x00007fa37c015... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use frame_complete_offset for better start address computation. Improve comments. Looks fine. I see no clear point in referencing `ZBarrierRelocationFormatStoreGoodAfterMov` in the comments, but I have no strong opinion either. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27530#pullrequestreview-3368704596 From chagedorn at openjdk.org Thu Oct 23 08:06:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 23 Oct 2025 08:06:12 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v3] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 18:59:48 GMT, Kangcheng Xu wrote: >> Thanks @tabjy for coming back with an update and pinging me again! Sorry, I completely missed it the first time. I will be on vacation starting tomorrow for two weeks but I'm happy to take another look when I'm back :-) > > @chhagedorn Sorry this took longer than expected. I left a few replies under some of your specific comments. All other issues were addressed. Thank you! No worries, thanks @tabjy for addressing my suggestions and comments! I won't be able to continue this week but will have another look next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3435635503 From chagedorn at openjdk.org Thu Oct 23 08:07:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 23 Oct 2025 08:07:07 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v7] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 15:35:08 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > review That looks good to me, thanks! src/hotspot/share/opto/loopnode.cpp line 4779: > 4777: if (head->is_main_loop()) { > 4778: assert(opaque->outcnt() == 1, "opaque node should not be shared"); > 4779: assert(opaque->in(1) == head->limit(), "After IGVN cleanup, input of opaque node must be the limit."); Good solution to split it up with two separate messages ? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27586#pullrequestreview-3368708351 PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2454272993 From mchevalier at openjdk.org Thu Oct 23 08:17:06 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 23 Oct 2025 08:17:06 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 09:32:17 GMT, Roberto Casta?eda Lozano wrote: >> I mean that I went in the peeling policy to write conditions such as "first time, return 'no peeling' only the second time, run normally", to try to reproduce the right sequence on one particular input. So it's not very far of stressing: instead of deciding randomly, I hardcode in the C++ whether it should run normally or exit immediately for each call to the peeling policy. It's mostly useful for the case where the natural run would peel, but I don't want it to happen too early. >> >> Unlike stressing that would not return "should peel" when the normal heuristic would not peeling. But that's the only difference. I was trying to get closer from a natural (no stress) example, but at the end, it ended up being too similar to stressing to be really conclusive about real world cases. > >> I mean that I went in the peeling policy to write conditions such as "first time, return 'no peeling' only the second time, run normally", to try to reproduce the right sequence on one particular input. So it's not very far of stressing: instead of deciding randomly, I hardcode in the C++ whether it should run normally or exit immediately for each call to the peeling policy. It's mostly useful for the case where the natural run would peel, but I don't want it to happen too early. >> >> Unlike stressing that would not return "should peel" when the normal heuristic would not peeling. But that's the only difference. I was trying to get closer from a natural (no stress) example, but at the end, it ended up being too similar to stressing to be really conclusive about real world cases. > > I see, thanks for the clarification. Thanks! @robcasloz or @rwestrel do you want to give an updated opinion? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3435675670 From stefank at openjdk.org Thu Oct 23 08:18:08 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Oct 2025 08:18:08 GMT Subject: RFR: 8369658: Client emulation mode sets MaxRAM too late [v2] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 08:28:02 GMT, Joel Sikstr?m wrote: >> Hello, >> >> While working on the proposal for the potential deprecation of MaxRAM (see [JDK-8369347](https://bugs.openjdk.org/browse/JDK-8369347)) I saw that `CompilerConfig::ergo_initialize()` sets the value for `MaxRAM` after ergonomic heap sizing is already done, which is the only place in the VM that cares about `MaxRAM`. I suggest we move setting the value of `MaxRAM` to `Arguments::set_heap_size()` to fix this. >> >> Even though the `MaxRAM` flag might be deprecated, the code should still account for the fact that client emulation mode might lower the maximum amount of physical memory that can be used for the Java heap. If the flag is removed, we'd still want to lower the maximum memory, so it makes sense to have the code in `Arguments::set_heap_size()` in both cases. >> >> Testing: >> * Oracle's tier1-2 >> * Local test with `java -XX:+NeverActAsServerClassMachine -Xlog:gc+init` to see that the lower limit is reflected in ergonomic heap sizing. > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Looks good. But the next step needs to be deprecate the client emulation mode. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27765#pullrequestreview-3368743381 From fandreuzzi at openjdk.org Thu Oct 23 08:25:04 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Thu, 23 Oct 2025 08:25:04 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v4] In-Reply-To: References: <4Cj8ndIna8Do2EfcPsFR85vpAsHGPKtwAvrgp2dTkxU=.e996f1ed-c3a7-45be-a723-190db1bbea73@github.com> Message-ID: On Thu, 23 Oct 2025 02:59:05 GMT, Dean Long wrote: >> I see, I assumed an nmethod couldn't be marked as on-stack without entry barriers, but that doesn't seem to be the case. >> >> But on second thought, do you agree with the fix I'm proposing in this PR? I think the following two work items could be implemented and reviewed in separate changesets: >> - Allow not-entrant nmethod to be collected during GC (I removed `is_static_method()` from L2599, so native nmethods are treated just like normal nmethods) >> - Evaluate the implications of removing entry barriers for native nmethods, thus letting GC reclaim them whenever `!is_maybe_on_stack() && is_not_entrant()`, but without the overhead of entry barriers. >> >> I'm proposing this because I guess the latter will need more discussion and is technically not needed to fix the memory leak I address in this PR. Do you agree @dean-long ? I could create another ticket to handle the second item. > > Yes, I'm fine with it being a separate issue. https://bugs.openjdk.org/browse/JDK-8370472 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27742#discussion_r2454316998 From jsikstro at openjdk.org Thu Oct 23 08:25:17 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 23 Oct 2025 08:25:17 GMT Subject: RFR: 8369658: Client emulation mode sets MaxRAM too late [v2] In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 09:07:08 GMT, Axel Boldt-Christmas wrote: >> Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment > > Marked as reviewed by aboldtch (Reviewer). Thank you for the reviews! @xmas92 @stefank ------------- PR Comment: https://git.openjdk.org/jdk/pull/27765#issuecomment-3435696840 From jsikstro at openjdk.org Thu Oct 23 08:25:18 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 23 Oct 2025 08:25:18 GMT Subject: Integrated: 8369658: Client emulation mode sets MaxRAM too late In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 11:25:04 GMT, Joel Sikstr?m wrote: > Hello, > > While working on the proposal for the potential deprecation of MaxRAM (see [JDK-8369347](https://bugs.openjdk.org/browse/JDK-8369347)) I saw that `CompilerConfig::ergo_initialize()` sets the value for `MaxRAM` after ergonomic heap sizing is already done, which is the only place in the VM that cares about `MaxRAM`. I suggest we move setting the value of `MaxRAM` to `Arguments::set_heap_size()` to fix this. > > Even though the `MaxRAM` flag might be deprecated, the code should still account for the fact that client emulation mode might lower the maximum amount of physical memory that can be used for the Java heap. If the flag is removed, we'd still want to lower the maximum memory, so it makes sense to have the code in `Arguments::set_heap_size()` in both cases. > > Testing: > * Oracle's tier1-2 > * Local test with `java -XX:+NeverActAsServerClassMachine -Xlog:gc+init` to see that the lower limit is reflected in ergonomic heap sizing. This pull request has now been integrated. Changeset: dcf46a0a Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/dcf46a0a195d7386ed0bc872f60eb9c586425cc8 Stats: 32 lines in 3 files changed: 23 ins; 5 del; 4 mod 8369658: Client emulation mode sets MaxRAM too late Reviewed-by: aboldtch, stefank ------------- PR: https://git.openjdk.org/jdk/pull/27765 From epeter at openjdk.org Thu Oct 23 08:28:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 08:28:05 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: <0G5cUVDDDzfPkgkkvplaKYwk5gwRdmq9sHCHYmw5Ei0=.e2e2637c-612d-46ed-abda-f8e0ce7732da@github.com> Message-ID: On Thu, 23 Oct 2025 06:56:10 GMT, Christian Hagedorn wrote: > Is it really invalid or just not as accurate as it could be but still correct? That is a good question. I'm sure there are places where it is indeed invalid, and not just "not as accurate as it could be". What is an example of "not as accurate as it could be" where the information is not invalid? I suppose we would have to continue the work on `VerifyLoopOptimizations`, and see how far we can push that. Currently our verification is very basic, and most of `VerifyLoopOptimizations` is still commented out, because there are violations somewhere. Maybe we cannot get around doing `VerifyLoopOptimizations` and the documentation of `major_progress` together. Still: it could be worth to at least add some documentation, even if we do not have 100% confidence. We should at least write down what we do think we know, and put a caveat that we are not sure, and need to improve the situation in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3435714750 From fandreuzzi at openjdk.org Thu Oct 23 08:44:44 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Thu, 23 Oct 2025 08:44:44 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v6] In-Reply-To: References: Message-ID: > I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. > > Passes tier1 and tier2 (fastdebug). Francesco Andreuzzi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - cc - Merge branch 'master' into JDK-8369219 - nn - update foundOne - fix summary - nn - Merge branch 'master' into JDK-8369219 - trigger - nn - othervm - ... and 5 more: https://git.openjdk.org/jdk/compare/61d601ae...b6d94cf8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27742/files - new: https://git.openjdk.org/jdk/pull/27742/files/98754ee8..b6d94cf8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27742&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27742&range=04-05 Stats: 32964 lines in 799 files changed: 19857 ins; 8954 del; 4153 mod Patch: https://git.openjdk.org/jdk/pull/27742.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27742/head:pull/27742 PR: https://git.openjdk.org/jdk/pull/27742 From aph at openjdk.org Thu Oct 23 08:47:03 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 23 Oct 2025 08:47:03 GMT Subject: RFR: 8370389: JavaFrameAnchor on s390 has unnecessary barriers In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 07:21:45 GMT, Amit Kumar wrote: > No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler. Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27930#pullrequestreview-3368847875 From epeter at openjdk.org Thu Oct 23 08:47:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 08:47:04 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Sun, 19 Oct 2025 15:46:06 GMT, Hannes Greule wrote: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. Manuel and I discussed in the office a little more :) Can you show us a concrete example, where `Div` gets `Idealized` early, and then the generated nodes do not propagate the value range sufficiently precise for the comparison to constant fold? I suspect that it is the value range "truncation" on the lower bits that are lost in `MulHiLNode`, but it would be nice to see that example ;) Because if there is a solution that just improves the `Value` of the mul/shift/... nodes, that would probably be preferable. But if we in the end need to build a `Value` optimization that pattern matches again through the nodes that `transform_int_divide` generated, that would probably be less nice, given the complexity. And then we should do the delay. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3435782079 From stuefe at openjdk.org Thu Oct 23 09:01:14 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Oct 2025 09:01:14 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v4] In-Reply-To: References: Message-ID: On Tue, 7 Oct 2025 17:45:38 GMT, Martin Doerr wrote: >> We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. >> >> We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. >> >> I've tested this proposal by the following code on x86_64: >> >> diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> index a6b4efbe4f2..d715e69c850 100644 >> --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp >> +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { >> void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { >> prepare_to_jump_from_interpreted(); >> >> + if (UseNewCode) { >> + Label ok; >> + movptr(temp, Address(method, Method::from_interpreted_offset())); >> + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); >> + je(ok); >> + movptr(rax, Address(method, Method::from_compiled_offset())); >> + movptr(rbx, rax); >> + addptr(rbx, 128); >> + hlt(); >> + bind(ok); >> + } >> + >> if (JvmtiExport::can_post_interpreter_events()) { >> Label run_compiled_code; >> // JVMTI events, such as single-stepping, are implemented partly by avoiding running >> >> >> The output is (requires hsdis library, otherwise we only get the hex dump): >> >> RAX=0x00007fa3e072c100 is at entry_point+0 in (nmethod*)0x00007fa3e072c008 >> Compiled method (c1) 2521 1 3 java.lang.Byte::toUnsignedInt (6 bytes) >> total in heap [0x00007fa3e072c008,0x00007fa3e072c1f8] = 496 >> main code [0x00007fa3e072c100,0x00007fa3e072c1b8] = 184 >> stub code [0x00007fa3e072c1b8,0x00007fa3e072c1f8] = 64 >> mutable data [0x00007fa37c0160a0,0x00007fa37c0160d0] = 48 >> relocation [0x00007fa37c0160a0,0x00007fa37c0160c8] = 40 >> metadata [0x00007fa37c0160c8,0x00007fa37c0160d0] = 8 >> immutable data [0x00007fa37c015cc0,0x00007fa37c015d24] = 100 >> dependencies [0x00007fa37c015cc0,0x00007fa37c015... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use frame_complete_offset for better start address computation. Improve comments. src/hotspot/share/code/nmethod.cpp line 4026: > 4024: if (iter.has_current()) end = iter.addr(); > 4025: } > 4026: IIUC, the size of the printout is somewhat random. In the extreme cases, this may be either (close to) start-of-method to end-of-method, so almost the whole method. Or, it may be from an address very close to the address, so a very small snippet. Tying the end address to a relocation is not strictly necessary, no? We could just print to `MIN2(code end, addr + 64)? Disassembler should be fine if the printout stops in the middle of an instruction, as long as instruction addresses are correct? And could we start printing at the relocation preceding-or-at `addr - 64` instead, to ensure we have at least 64 bytes of printout before the crash address? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27530#discussion_r2454415847 From rcastanedalo at openjdk.org Thu Oct 23 09:18:46 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 23 Oct 2025 09:18:46 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v7] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 15:35:08 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > review I agree with the current solution, thanks Marc for considering alternative approaches and thanks Christian for putting in the work to find non-stress reproducers! Regarding the tests, please consider - merging all test cases into a single file (after all, we are testing different variations of the same scenario); - removing all outdated references to stress peeling (i.e. we now know better than "It seems to happen only with stress peeling"), including file names; and - removing redundant test summaries. If the tests are slow to run (are they?) and you want to avoid running all of them under all different configurations after merging them into a single file, you can simply pass as an argument to `main` which test you want to run and use a switch statement or similar to run only that test, see e.g. https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/codegen/TestGCMStorePlacement.java. Also, please make sure to run extended testing before integration (e.g. additional rounds of fuzzing), it is easy to miss corner cases when we introduce new assertions etc. src/hotspot/share/opto/loopTransform.cpp line 1929: > 1927: return; > 1928: } > 1929: // Zero-trip test uses an 'opaque' node which is not shared, otherwise bailout. Suggestion: // Zero-trip test uses an 'opaque' node which is not shared, otherwise bail out. src/hotspot/share/opto/loopnode.cpp line 4770: > 4768: #ifdef ASSERT > 4769: // See PhaseIdealLoop::do_unroll > 4770: // This property is desirable, but it maybe not hold after cloning a loop. Suggestion: // This property is desirable, but it may not hold after cloning a loop. src/hotspot/share/opto/loopnode.cpp line 4771: > 4769: // See PhaseIdealLoop::do_unroll > 4770: // This property is desirable, but it maybe not hold after cloning a loop. > 4771: // In such a case, we bail out from unrolling, and rely on IGVN to cleanup stuff. Suggestion: // In such a case, we bail out from unrolling, and rely on IGVN to clean up stuff. src/hotspot/share/opto/loopnode.cpp line 4776: > 4774: // On the other hand, if this assert passes, bailing out in do_unroll means that > 4775: // this property was broken in the current round of loop optimization, which is > 4776: // acceptable. This comment would be more useful if it was a bit more precise: could you clarify (in the comment) what do you mean by "desirable", "stuff", "mess", and "bad"? ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27586#pullrequestreview-3368880961 PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2454403689 PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2454404879 PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2454405504 PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2454421215 From bkilambi at openjdk.org Thu Oct 23 09:23:33 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 23 Oct 2025 09:23:33 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> Message-ID: On Thu, 23 Oct 2025 07:22:37 GMT, Xiaohong Gong wrote: >> `VectorStoreMask` is a opposite operation of `VectorLoadMask`. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a `VectorMask` is stored into a boolean array. > >> if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) { >> mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); >> } > > Is the function name `vector_mask_must_be_packed` fine to you? This looks smarter to me. Hi @XiaohongGong I am a bit confused with this condition here - `mask_vec->bottom_type()->isa_vectmask() == nullptr` So this means that `mask_vec` is not of type `TypeVectMask` right? Which means it is not a vector predicate/mask type? Then how can the `VectorStoreMaskNode` convert mask_vec predicate to a packed vector? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454474880 From xgong at openjdk.org Thu Oct 23 09:55:04 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 23 Oct 2025 09:55:04 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> Message-ID: <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> On Thu, 23 Oct 2025 09:20:50 GMT, Bhavana Kilambi wrote: >>> if (Matcher::vector_mask_must_be_packed_vector(mopc, mask_vec->bottom_type()->is_vect())) { >>> mask_vec = gvn().transform(VectorStoreMaskNode::make(gvn(), mask_vec, elem_bt, num_elem)); >>> } >> >> Is the function name `vector_mask_must_be_packed` fine to you? This looks smarter to me. > > Hi @XiaohongGong I am a bit confused with this condition here - > > `mask_vec->bottom_type()->isa_vectmask() == nullptr` > > So this means that `mask_vec` is not of type `TypeVectMask` right? Which means it is not a vector predicate/mask type? Then how can the `VectorStoreMaskNode` convert mask_vec predicate to a packed vector? Yes, this means the mask is a type of `TypeVect`. This just happens on architectures that do not support the predicate feature like NEON. On these architectures, `VectorStoreMaskNode` will convert the unpacked vector to a packed one. Some vector mask operations' implementation works on the packed mask layout on these architectures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454561463 From mli at openjdk.org Thu Oct 23 10:09:09 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 10:09:09 GMT Subject: RFR: 8370454: C2 SuperWord: unsigned comparison information is lost for VectorMaskCmp In-Reply-To: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> References: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> Message-ID: <_5bfkmM126iAa65jiyRNzWVpgDYRXNPLNZTcD3Wswtw=.a255a514-8f7d-49df-bda9-653e1d9348f8@github.com> On Thu, 23 Oct 2025 05:36:53 GMT, Emanuel Peter wrote: >> Hi, >> Can you help to review the patch? @eme64 >> >> Currently, in SLP if we support transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned comparison information is lost, it's in CmpU, but current code only check Bool for the information. For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. >> >> This loss of unsigned comparison information blocks the optimization proposed in https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. >> >> Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. >> >> Thanks > > src/hotspot/share/opto/superword.cpp line 1704: > >> 1702: cmp0->Opcode() == Op_CmpUL || >> 1703: cmp0->Opcode() == Op_CmpU3 || >> 1704: cmp0->Opcode() == Op_CmpUL3; > > Maybe it is time to create a `switch` statement below ;) Yes, will fix! :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2454598742 From mdoerr at openjdk.org Thu Oct 23 10:13:50 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 23 Oct 2025 10:13:50 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v5] In-Reply-To: References: Message-ID: > We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. > > We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. > > I've tested this proposal by the following code on x86_64: > > diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp > index a6b4efbe4f2..d715e69c850 100644 > --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp > +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp > @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { > void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { > prepare_to_jump_from_interpreted(); > > + if (UseNewCode) { > + Label ok; > + movptr(temp, Address(method, Method::from_interpreted_offset())); > + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); > + je(ok); > + movptr(rax, Address(method, Method::from_compiled_offset())); > + movptr(rbx, rax); > + addptr(rbx, 128); > + hlt(); > + bind(ok); > + } > + > if (JvmtiExport::can_post_interpreter_events()) { > Label run_compiled_code; > // JVMTI events, such as single-stepping, are implemented partly by avoiding running > > > The output is (requires hsdis library, otherwise we only get the hex dump): > > RAX=0x00007fa3e072c100 is at entry_point+0 in (nmethod*)0x00007fa3e072c008 > Compiled method (c1) 2521 1 3 java.lang.Byte::toUnsignedInt (6 bytes) > total in heap [0x00007fa3e072c008,0x00007fa3e072c1f8] = 496 > main code [0x00007fa3e072c100,0x00007fa3e072c1b8] = 184 > stub code [0x00007fa3e072c1b8,0x00007fa3e072c1f8] = 64 > mutable data [0x00007fa37c0160a0,0x00007fa37c0160d0] = 48 > relocation [0x00007fa37c0160a0,0x00007fa37c0160c8] = 40 > metadata [0x00007fa37c0160c8,0x00007fa37c0160d0] = 8 > immutable data [0x00007fa37c015cc0,0x00007fa37c015d24] = 100 > dependencies [0x00007fa37c015cc0,0x00007fa37c015cc8] = 8 > scopes pcs [0x00007fa37c015cc8,0x00007fa37c015d08] = 64 > scopes data ... Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge remote-tracking branch 'origin' into 8368787_hs_err_nmethod_code - Ensure to print at least 64 Bytes ahead in hex dump. - Use frame_complete_offset for better start address computation. Improve comments. - Move printing code to nmethod.cpp. - Always print hex dump. Plus disassembly when hsdis loaded. - 8368787: Error reporting: hs_err files should print instructions when referencing code in nemthods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27530/files - new: https://git.openjdk.org/jdk/pull/27530/files/81dd1c8e..99c248f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27530&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27530&range=03-04 Stats: 55381 lines in 1480 files changed: 33452 ins; 13573 del; 8356 mod Patch: https://git.openjdk.org/jdk/pull/27530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27530/head:pull/27530 PR: https://git.openjdk.org/jdk/pull/27530 From mli at openjdk.org Thu Oct 23 10:16:39 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 10:16:39 GMT Subject: RFR: 8370454: C2 SuperWord: unsigned comparison information is lost for VectorMaskCmp [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? @eme64 > > Currently, in SLP if we support transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned comparison information is lost, it's in CmpU, but current code only check Bool for the information. For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. > > This loss of unsigned comparison information blocks the optimization proposed in https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. > > Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. > > Thanks Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - tests - switch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27942/files - new: https://git.openjdk.org/jdk/pull/27942/files/6271a8e7..eee8acd8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=00-01 Stats: 202 lines in 2 files changed: 194 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27942/head:pull/27942 PR: https://git.openjdk.org/jdk/pull/27942 From mli at openjdk.org Thu Oct 23 10:16:41 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 10:16:41 GMT Subject: RFR: 8370454: C2 SuperWord: unsigned comparison information is lost for VectorMaskCmp [v2] In-Reply-To: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> References: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> Message-ID: On Thu, 23 Oct 2025 05:38:01 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - tests >> - switch > > src/hotspot/share/opto/superword.cpp line 1751: > >> 1749: } >> 1750: } else if (is_unsigned) { >> 1751: mask = BoolTest::unsigned_mask(mask); > > This means more cases could now vectorize. Do we have good test cases for this? We should be able to get IR tests for this, right? Do x86 or aarch64 backends (or other platforms) not already have vector instructions for this? Good question! At first, I think this pr is just a preparation for my other prs, but seems it's not just an improvement, it can also fix some existing issue, and this issue is not covered by existing compiler test. I just found out that without this patch, there could failure when comparing unsigned int/long in SLP on x86, and I think the same issue should also exist on arm. I'll add extra test to reflect this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2454610981 From mdoerr at openjdk.org Thu Oct 23 10:18:08 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 23 Oct 2025 10:18:08 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v4] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 08:58:22 GMT, Thomas Stuefe wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Use frame_complete_offset for better start address computation. Improve comments. > > src/hotspot/share/code/nmethod.cpp line 4026: > >> 4024: if (iter.has_current()) end = iter.addr(); >> 4025: } >> 4026: > > IIUC, the size of the printout is somewhat random. In the extreme cases, this may be either (close to) start-of-method to end-of-method, so almost the whole method. Or, it may be from an address very close to the address, so a very small snippet. > > Tying the end address to a relocation is not strictly necessary, no? We could just print to `MIN2(code end, addr + 64)? Disassembler should be fine if the printout stops in the middle of an instruction, as long as instruction addresses are correct? > > And could we start printing at the relocation preceding-or-at `addr - 64` instead, to ensure we have at least 64 bytes of printout before the crash address? Right, the size is somewhat random. Relocations seem to be the most fine-grained information we currently have. In addition, they typically point to some meaningful points in the code. This PR disassembles the smallest possible snippet around the given address using relocations as start and end. Right, having a relocation as end address is technically not strictly required. However, I've seen that the disassembler on x86 produced garbage as well when the end is not an instruction boundary. I agree with you that we usually want at least 64 Bytes ahead. On the other hand, some people don't want too much, either. See [JDK-8274986](https://bugs.openjdk.org/browse/JDK-8274986). So, I changed only the hex dump for which we can afford printing more without bloating the hs_err file too much. Please take a look at my new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27530#discussion_r2454621885 From mli at openjdk.org Thu Oct 23 10:20:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 10:20:15 GMT Subject: RFR: 8370454: C2 SuperWord: unsigned comparison information is lost for VectorMaskCmp [v2] In-Reply-To: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> References: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> Message-ID: On Thu, 23 Oct 2025 05:39:10 GMT, Emanuel Peter wrote: > @Hamlin-Li That looks like a great improvement :) Thank you for having a look! :) > All I am missing are some test cases. And if possible: IR rules ? Yes, I also added some tests and IR rules of course. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3436161113 From mhaessig at openjdk.org Thu Oct 23 10:22:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 23 Oct 2025 10:22:17 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v8] In-Reply-To: References: Message-ID: On Sun, 19 Oct 2025 19:20:58 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: > > - Add additional nodes to fail conditions to detect idealized/transformed DivI Nodes that did not constant fold > - Remove checks for bottom and reorganize DivI/DivL Value functions Unfortunately, testing revealed a problem in one of our internal tests, that I am unable to share. The Test is run with `-Xomp -XX:-TieredCompilation` and the error is STDOUT: 485 ConvI2L === _ 486 [[ 483 ]] #long:minint..maxint, 0u..maxulong, widen: 3 !orig=153 !jvms: b4735597::checkLong @ bci:4 (line 99) b4735597::run @ bci:19 (line 75) 152 ConL === 0 [[ 169 164 483 568 ]] #long:minlong 489 IfTrue === 488 [[ 483 490 ]] #1 !orig=161 !jvms: b4735597::checkLong @ bci:5 (line 99) b4735597::run @ bci:19 (line 75) 483 DivL === 489 152 485 [[ 482 504 ]] !orig=169 !jvms: b4735597::checkLong @ bci:5 (line 99) b4735597::run @ bci:19 (line 75) told = long:maxint..2305843009213693952, widen: 3 tnew = long:minlong..9007199254740992, 0u..maxulong, widen: 3 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/src/jdk-pr-26143/open/src/hotspot/share/opto/phaseX.cpp:2763), pid=92732, tid=92759 # fatal error: Not monotonic # # JRE version: Java(TM) SE Runtime Environment (26.0) (fastdebug build 26-internal-mhassig.open) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-internal-mhassig.open, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x185c479] PhaseCCP::verify_type(Node*, Type const*, Type const*)+0x169 Stack: [0x0000ffff79bf4000,0x0000ffff79df2000], sp=0x0000ffff79dec8c0, free space=2018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x15a7f0c] PhaseCCP::verify_type(Node*, Type const*, Type const*)+0x1a4 (phaseX.cpp:2763) V [libjvm.so+0x15b0350] PhaseCCP::analyze()+0x2cc (phaseX.cpp:2806) V [libjvm.so+0x9be4e8] Compile::Optimize()+0x780 (compile.cpp:2489) V [libjvm.so+0x9c0db0] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x16a4 (compile.cpp:860) V [libjvm.so+0x7ec560] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x2dc (c2compiler.cpp:147) V [libjvm.so+0x9cfacc] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb08 (compileBroker.cpp:2345) V [libjvm.so+0x9d09f8] CompileBroker::compiler_thread_loop()+0x638 (compileBroker.cpp:1989) V [libjvm.so+0xed1ea8] JavaThread::thread_main_inner()+0x108 (javaThread.cpp:771) V [libjvm.so+0x18455bc] Thread::call_run()+0xac (thread.cpp:243) V [libjvm.so+0x15248bc] thread_native_entry(Thread*)+0x12c (os_linux.cpp:883) C [libc.so.6+0x80b50] start_thread+0x300 test/hotspot/jtreg/compiler/c2/irTests/IntegerDivValueTests.java line 1: > 1: /* I only noticed now: please move this test out of the `irTests` directory. It is a historic mistake. Probably `compiler/igvn` would be a good location. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3368956083 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2454453693 From epeter at openjdk.org Thu Oct 23 10:38:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 10:38:07 GMT Subject: RFR: 8370454: C2 SuperWord: unsigned comparison information is lost for VectorMaskCmp [v2] In-Reply-To: References: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> Message-ID: On Thu, 23 Oct 2025 10:17:32 GMT, Hamlin Li wrote: >> @Hamlin-Li That looks like a great improvement :) >> >> All I am missing are some test cases. And if possible: IR rules ? > >> @Hamlin-Li That looks like a great improvement :) > > Thank you for having a look! :) > >> All I am missing are some test cases. And if possible: IR rules ? > > Yes, I also added some tests and IR rules of course. :) @Hamlin-Li I just looked at your Bug report [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481). Can you add a reproducer, and on what platforms you have experienced the failure? If you are fixing a bug here, we should focus on the bug only. After all, we may want to backport the bugfix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3436224804 From mhaessig at openjdk.org Thu Oct 23 10:39:16 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 23 Oct 2025 10:39:16 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v7] In-Reply-To: References: Message-ID: <-4wD6Ft3zjiPLu4J5JaX1GPOJWKU-NwzKSLkWrD9ObI=.779a7235-773e-47bc-b313-ceb148cb4b5e@github.com> On Mon, 20 Oct 2025 13:30:15 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Manuel's suggestions > > Co-authored-by: Manuel H?ssig Thank you for addressing my comments. Looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3369288657 From dlunden at openjdk.org Thu Oct 23 10:41:05 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 23 Oct 2025 10:41:05 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 15:53:59 GMT, Beno?t Maillard wrote: > This PR prevents hitting an assert caused by encountering `top` while following the memory > slice associated with a field when eliminating allocations in macro node elimination. This situation > is the result of another elimination (boxing node elimination) that happened at the same > macro expansion iteration. > > ### Analysis > > The issue appears in the macro expansion phase. We have a nested `synchronized` block, > with the outer block synchronizing on `new A()` and the inner one on `TestTopInMacroElimination.class`. > In the inner `synchronized` block we have an `Integer.valueOf` call in a loop. > > In `PhaseMacroExpand::eliminate_boxing_node` we are getting rid of the `Integer.valueOf` > call, as it is a non-escaping boxing node. After having eliminated the call, > `PhaseMacroExpand::process_users_of_allocation` takes care of the users of the removed node. > There, we replace usages of the fallthrough memory projection with `top`. > > In the same macro expansion iteration, we later attempt to get rid of the `new A()` allocation > in `PhaseMacroExpand::create_scalarized_object_description`. There, we have to make > sure that all safepoints can still see the object fields as if the allocation was never deleted. > For this, we attempt to find the last value on the slice of each specific field (`a` > in this case). Because field `a` is never written to, and it is not explicitely initialized, > there is no `Store` associated to it and not even a dedicated memory slice (we end up > taking the `Bot` input on `MergeMem` nodes). By going up the memory chain, we eventually > encounter the `MemBarReleaseLock` whose input was set to `top`. This is where the assert > is hit. > > ### Proposed Fix > > In the end I opted for an analog fix as the similar [JDK-8325030](https://git.openjdk.org/jdk/pull/23104). > If we get `top` from `scan_mem_chain` in `PhaseMacroExpand::value_from_mem`, then we can safely > return `top` as well. This means that the safepoint will have `top` as data input, but this will > eventually cleaned up by the next round of IGVN. > > Another (tempting) option would have been to simply return `nullptr` from `PhaseMacroExpand::value_from_mem` when encoutering `top`. However this would result in bailing > out from eliminating this allocation temporarily and effectively delaying it to a subsqequent > macro expansion round. > > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8362832) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! Thanks for the fix @benoitmaillard! > This means that the safepoint will have top as data input, but this will eventually cleaned up by the next round of IGVN. Is it valid for safepoints to even temporarily have top as data input? Even if this gets cleaned up eventually by IGVN, it seems potentially risky to have it in this state. src/hotspot/share/opto/macro.cpp line 506: > 504: } else if (mem->is_top()) { > 505: // slice is on a dead path, returning top prevents bailing out > 506: // from the elimination, and we let IGVN clean up later Suggestion: // The slice is on a dead path. Returning top prevents bailing out // from the elimination, and IGVN can later clean up. ------------- PR Review: https://git.openjdk.org/jdk/pull/27903#pullrequestreview-3369288457 PR Review Comment: https://git.openjdk.org/jdk/pull/27903#discussion_r2454676696 From mli at openjdk.org Thu Oct 23 10:48:09 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 10:48:09 GMT Subject: RFR: 8370454: C2 SuperWord: unsigned comparison information is lost for VectorMaskCmp [v2] In-Reply-To: References: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> Message-ID: On Thu, 23 Oct 2025 10:17:32 GMT, Hamlin Li wrote: >> @Hamlin-Li That looks like a great improvement :) >> >> All I am missing are some test cases. And if possible: IR rules ? > >> @Hamlin-Li That looks like a great improvement :) > > Thank you for having a look! :) > >> All I am missing are some test cases. And if possible: IR rules ? > > Yes, I also added some tests and IR rules of course. :) > @Hamlin-Li I just looked at your Bug report [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481). Can you add a reproducer, and on what platforms you have experienced the failure? Please check the added test cases, if it's run with JDK master, the test cases will fail on x86. > If you are fixing a bug here, we should focus on the bug only. After all, we may want to backport the bugfix. Yes. In fact at first I did not realise the existence of the bug. After read your previous question, I did more investigation and found out that I'm working a bug fix at the same time. I can change the title of this pr to "8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP" if you wish. Maybe I should close [JDK-8370454](https://bugs.openjdk.org/browse/JDK-8370454) at the same time? Please let me know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3436256229 From epeter at openjdk.org Thu Oct 23 10:53:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 10:53:05 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <2f9k7vksJ4apEclf8z85MyLeg6bbRnhptLpsoydeTMI=.df8ad2fc-5b56-4554-982a-691eb7849a84@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <2f9k7vksJ4apEclf8z85MyLeg6bbRnhptLpsoydeTMI=.df8ad2fc-5b56-4554-982a-691eb7849a84@github.com> Message-ID: On Thu, 23 Oct 2025 07:31:20 GMT, Xiaohong Gong wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 405: >> >>> 403: return true; >>> 404: } >>> 405: } >> >> The name suggests that if you return false here, then it is still ok to use a predicate instruction. >> The name suggests that if your return true, then you must use a predicate instruction. >> >> But then your comment for `Op_VectorLongToMask` and `Op_VectorMaskToLong` seems to suggest that we return false and do not want that a predicate instruction is used, but instead a packed vector. >> >> So now I'm a bit confused. >> >> I'm also wondering: >> Since there are two options (mask in packed vector vs predicate), does the availability of one always imply the availability of the other? Or could some platform have only one, and another platform only the other? >> >> And: can you please explain the `if (vt->isa_vectmask() == nullptr) {` check, also for the other platforms? > >> The name suggests that if you return false here, then it is still ok to use a predicate instruction. The name suggests that if your return true, then you must use a predicate instruction. >> >> But then your comment for `Op_VectorLongToMask` and `Op_VectorMaskToLong` seems to suggest that we return false and do not want that a predicate instruction is used, but instead a packed vector. >> >> So now I'm a bit confused. > > The type for a vector mask is different on architectures that supports the predicate feature or not (please see my details answer below). Hence, for some vector operations, the expected input mask register/layout is different. Please note that there are two kind of layout for a mask if it is stored in a **vector register**. It might be 1) a packed layout with 8-bit element width, or 2) a unpacked layout with 8/16/32/64-bit element width according to the vector type. For the data relative mask operations like `VectorBlend`, it is 2), while for some bit relative mask operations like `VectorMaskTrueCount, VectorMaskFirstTrue, toLong, fromLong, ...`, it is 1) , because the implementation will be more efficient. > > My intention is to use this function guide what the expected IR is generated for a vector mask operation. Before this patch, mid-end do the difference by just checking the type of a vector mask, as it assumes the predicate instruction will be generated for a predicate type, while the vector instructions are generated for a vector type. However, as I mentioned in this PR, some mask operations might not support native predicate instructions on predicate architectures. Instead, they are implemented with the same vector instructions like NEON. We have to do the mask layout conversion inside codegen, which is in-efficient. Generating the same IR pattern like NEON is more efficient. > > So, if this function returns false, it means the input/output mask for a specified opcode requires to be saved into a vector register with the packed layout, even the architecture supports predicate feature. This is decided by the IR's implementation. > >> >> I'm also wondering: Since there are two options (mask in packed vector vs predicate), does the availability of one always imply the availability of the other? Or could some platform have only one, and another platform only the other? >> > > There are three kind of options for a mask: 1) packed vector with 8-bit element size, 2) unpacked vector with 8/16/32/64-bit element size, and 3) predicate. > > 1) The packed vect... Thanks for all the explanations! Do you think some of that could be moved to code comments? I think that would be quite helpful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454705672 From epeter at openjdk.org Thu Oct 23 10:53:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 10:53:07 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> Message-ID: <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> On Thu, 23 Oct 2025 09:52:10 GMT, Xiaohong Gong wrote: >> Hi @XiaohongGong I am a bit confused with this condition here - >> >> `mask_vec->bottom_type()->isa_vectmask() == nullptr` >> >> So this means that `mask_vec` is not of type `TypeVectMask` right? Which means it is not a vector predicate/mask type? Then how can the `VectorStoreMaskNode` convert mask_vec predicate to a packed vector? > > Yes, this means the mask is a type of `TypeVect`. This just happens on architectures that do not support the predicate feature like NEON. On these architectures, `VectorStoreMaskNode` will convert the unpacked vector to a packed one. Some vector mask operations' implementation works on the packed mask layout on these architectures. @XiaohongGong thanks for all he explanations. From what you say, it seems that `vector_mask_must_be_packed` is good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454698901 From epeter at openjdk.org Thu Oct 23 10:53:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 10:53:07 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> Message-ID: <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> On Thu, 23 Oct 2025 10:47:15 GMT, Emanuel Peter wrote: >> Yes, this means the mask is a type of `TypeVect`. This just happens on architectures that do not support the predicate feature like NEON. On these architectures, `VectorStoreMaskNode` will convert the unpacked vector to a packed one. Some vector mask operations' implementation works on the packed mask layout on these architectures. > > @XiaohongGong thanks for all he explanations. From what you say, it seems that `vector_mask_must_be_packed` is good. > VectorStoreMask is a opposite operation of VectorLoadMask. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a VectorMask is stored into a boolean array. Thanks for the explanation! So it really only does the conversion, right? And no loading / storing? If that is true, we may want to rename them to `ConvPredicate2PackedVectorMaskNode`, or alike. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454703914 From stuefe at openjdk.org Thu Oct 23 11:09:09 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Oct 2025 11:09:09 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v5] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 10:13:50 GMT, Martin Doerr wrote: >> We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. >> >> We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. >> >> I've tested this proposal by the following code on x86_64: >> >> diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> index a6b4efbe4f2..d715e69c850 100644 >> --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp >> +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { >> void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { >> prepare_to_jump_from_interpreted(); >> >> + if (UseNewCode) { >> + Label ok; >> + movptr(temp, Address(method, Method::from_interpreted_offset())); >> + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); >> + je(ok); >> + movptr(rax, Address(method, Method::from_compiled_offset())); >> + movptr(rbx, rax); >> + addptr(rbx, 128); >> + hlt(); >> + bind(ok); >> + } >> + >> if (JvmtiExport::can_post_interpreter_events()) { >> Label run_compiled_code; >> // JVMTI events, such as single-stepping, are implemented partly by avoiding running >> >> >> The output is (requires hsdis library, otherwise we only get the hex dump): >> >> RAX=0x00007f1a75000100 is at entry_point+0 in (nmethod*)0x00007f1a75000008 >> Compiled method (c1) 2504 1 3 java.lang.Byte::toUnsignedInt (6 bytes) >> total in heap [0x00007f1a75000008,0x00007f1a750001f8] = 496 >> main code [0x00007f1a75000100,0x00007f1a750001b8] = 184 >> stub code [0x00007f1a750001b8,0x00007f1a750001f8] = 64 >> mutable data [0x00007f1a1001e0b0,0x00007f1a1001e0e0] = 48 >> relocation [0x00007f1a1001e0b0,0x00007f1a1001e0d8] = 40 >> metadata [0x00007f1a1001e0d8,0x00007f1a1001e0e0] = 8 >> immutable data [0x00007f1a1001dcd0,0x00007f1a1001dd30] = 96 >> dependencies [0x00007f1a1001dcd0,0x00007f1a1001dc... > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge remote-tracking branch 'origin' into 8368787_hs_err_nmethod_code > - Ensure to print at least 64 Bytes ahead in hex dump. > - Use frame_complete_offset for better start address computation. Improve comments. > - Move printing code to nmethod.cpp. > - Always print hex dump. Plus disassembly when hsdis loaded. > - 8368787: Error reporting: hs_err files should print instructions when referencing code in nemthods Looks good to me. Thanks for taking my comment into account. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27530#pullrequestreview-3369392877 From hgreule at openjdk.org Thu Oct 23 11:25:04 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 23 Oct 2025 11:25:04 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 08:44:19 GMT, Emanuel Peter wrote: > Manuel and I discussed in the office a little more :) > > Can you show us a concrete example, where `Div` gets `Idealized` early, and then the generated nodes do not propagate the value range sufficiently precise for the comparison to constant fold? > > I suspect that it is the value range "truncation" on the lower bits that are lost in `MulHiLNode`, but it would be nice to see that example ;) > > Because if there is a solution that just improves the `Value` of the mul/shift/... nodes, that would probably be preferable. > > But if we in the end need to build a `Value` optimization that pattern matches again through the nodes that `transform_int_divide` generated, that would probably be less nice, given the complexity. And then we should do the delay. One very straightforward example would be something like static boolean divFold(int a) { return a / 100_000 >= 21475; } which isn't folded to `false` with early idealization but works with the changes from this PR and #26143 both applied. >From my analysis, this comes from the the rounding adjustments: We need to round towards zero, so we need to add 1 (=subtract -1) for negative values. We achieve that by an right shift to produce either a 0 or a -1 and then do the subtraction with that value. image The subtraction isn't aware of the relation between the param being negative and the adjustment, and as you said, to recognize that relation, you'd more or less need to recognize that these operations form a division. Now, I *think* this is the only case, and it's only off by 1 (and if the sign of the dividend is known, it also isn't a problem), so I'm wondering if there are any common patterns where this would be relevant, otherwise it might really make sense to just delay Mod and accept this edge case for Div. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3436423466 From epeter at openjdk.org Thu Oct 23 11:25:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 11:25:05 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v2] In-Reply-To: References: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> Message-ID: On Thu, 23 Oct 2025 10:45:23 GMT, Hamlin Li wrote: >>> @Hamlin-Li That looks like a great improvement :) >> >> Thank you for having a look! :) >> >>> All I am missing are some test cases. And if possible: IR rules ? >> >> Yes, I also added some tests and IR rules of course. :) > >> @Hamlin-Li I just looked at your Bug report [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481). Can you add a reproducer, and on what platforms you have experienced the failure? > > Please check the added test cases, if it's run with JDK master, the test cases will fail on x86. > >> If you are fixing a bug here, we should focus on the bug only. After all, we may want to backport the bugfix. > > Yes. In fact at first I did not realise the existence of the bug. After read your previous question, I did more investigation and found out that I'm working a bug fix at the same time. I can change the title of this pr to "8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP" if you wish. > Maybe I should close [JDK-8370454](https://bugs.openjdk.org/browse/JDK-8370454) at the same time? > Please let me know. @Hamlin-Li Nice, thanks for filing [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481). I now extracted a reproducer from your PR here, and attached it to the JIRA issue. Yes, I think it would be better if you changed the title to: `8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP` I think you could already close [JDK-8370454](https://bugs.openjdk.org/browse/JDK-8370454) as a duplicate of the bug report :) Thanks very much for finding this, and even more for fixing it ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3436390144 From epeter at openjdk.org Thu Oct 23 11:25:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 11:25:09 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v2] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 10:16:39 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? @eme64 >> >> Currently, in SLP if we support transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned comparison information is lost, it's in CmpU, but current code only check Bool for the information. For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. >> >> This loss of unsigned comparison information blocks the optimization proposed in https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. >> >> Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - tests > - switch You should also change the PR description, especially you should describe what went wrong at what point. Well, you mostly already explain. I think the issue is that we don't really carry the "unsigned-ness" of the comparison, and then end up doing signed instead of unsigned comparison... src/hotspot/share/opto/superword.cpp line 1755: > 1753: mask = BoolTest::unsigned_mask(mask); > 1754: break; > 1755: } // switch Given that we just missed some cases in the old "if": you should definitively have a `default` case, that hits an assert. That way, we don't have missing cases that silently do the wrong thing. test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 743: > 741: r[i] = Integer.compareUnsigned(a[i], b[i]) > 0 ? cc : dd; > 742: } > 743: } Why does this not vectorize, but `testCMoveUIGTforF` does? Is that not strange? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3436400459 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2454782958 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2454786435 From mli at openjdk.org Thu Oct 23 11:25:09 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 11:25:09 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v2] In-Reply-To: References: <0kNAkZ3pumvLlXoKcwCXlTJN-JrNiqhiLpHmTKUDCVA=.44ea24f4-7613-4627-8caa-0881779be339@github.com> Message-ID: On Thu, 23 Oct 2025 10:45:23 GMT, Hamlin Li wrote: >>> @Hamlin-Li That looks like a great improvement :) >> >> Thank you for having a look! :) >> >>> All I am missing are some test cases. And if possible: IR rules ? >> >> Yes, I also added some tests and IR rules of course. :) > >> @Hamlin-Li I just looked at your Bug report [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481). Can you add a reproducer, and on what platforms you have experienced the failure? > > Please check the added test cases, if it's run with JDK master, the test cases will fail on x86. > >> If you are fixing a bug here, we should focus on the bug only. After all, we may want to backport the bugfix. > > Yes. In fact at first I did not realise the existence of the bug. After read your previous question, I did more investigation and found out that I'm working a bug fix at the same time. I can change the title of this pr to "8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP" if you wish. > Maybe I should close [JDK-8370454](https://bugs.openjdk.org/browse/JDK-8370454) at the same time? > Please let me know. > @Hamlin-Li Nice, thanks for filing [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481). I now extracted a reproducer from your PR here, and attached it to the JIRA issue. Thank you for doing it! > Yes, I think it would be better if you changed the title to: `8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP` > > I think you could already close [JDK-8370454](https://bugs.openjdk.org/browse/JDK-8370454) as a duplicate of the bug report :) It's done! > Thanks very much for finding this, and even more for fixing it ? Thank you! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3436418812 From epeter at openjdk.org Thu Oct 23 11:42:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 11:42:06 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Sun, 19 Oct 2025 15:46:06 GMT, Hannes Greule wrote: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. Thanks very much for the explanation and the nice graph ? That helps a lot. It also means that even for cases like @mhaessig showed above: https://github.com/openjdk/jdk/pull/27886#issuecomment-3432972151 We could still constant fold the comparison.... as long as the comparison is "relaxed enough". It might be worth having a handfull of examples like that: some that still constant fold, and some that don't because the comparison is too "sharp", and the "rounding error" too large. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3436481188 From epeter at openjdk.org Thu Oct 23 11:42:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 11:42:07 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 07:21:03 GMT, Hannes Greule wrote: >> src/hotspot/share/opto/divnode.cpp line 545: >> >>> 543: >>> 544: // Keep this node as-is for now; we want Value() and >>> 545: // other optimizations checking for this node type to work >> >> Do we only need `Value` done first on the `Div` node, or also on uses of it? >> It might be worth explaining it in a bit more detail here. >> >> If it was just about calling `Value` on the `Div` first, we could probably check what `Value` returns here. But I fear that is not enough, right? Because it is the `Value` here that returns some range, and then some use sees that this range has specific characteristics, and can constant fold a comparison, for example. Did I get this right? > > So, the *main* reason why I'm including Div here is mainly because of #26143; before that the DivI/LNode::Value() is actually less precise than Value on the nodes created by `transform_int_divide`. With #26143, some results are more precise even for constant divisors. In such case, uses can benefit from seeing the (then) more precise range. (@ichttt found a case where the replacement fails to constant-fold, but that's just due to missing constant folding in MulHiLNode) > > A secondary reason is other optimizations checking for Div inputs, though I didn't find any existing check that would actually benefit. There *might* be optimization opportunities that want to detect division, but that's just > > Generally from what I've found the benefit is bigger for Mod nodes, because there calling Value on the replacements is significantly worse. And there we also encounter typical usages in combination with range checks. > > Do you want me to expand both Div and Mod comments to cover more concrete benefits, depending on the operation? Yes, I think it would make sense to have an explanation at both ends. Your nice example with the "rounding error" of 0..1 for `Div` makes a lot of sense. Seeing a similar example for `Mod` (where it could be worse, you say) would also be nice ? You can copy the comments for the I/L cases, or only put it at one of them, and link from the other. There is an issue with a PR that refactors mod/div so that we only have one implementation each, and they can clean this up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27886#discussion_r2454825775 From mdoerr at openjdk.org Thu Oct 23 11:58:06 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 23 Oct 2025 11:58:06 GMT Subject: RFR: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods [v5] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 10:13:50 GMT, Martin Doerr wrote: >> We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. >> >> We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. >> >> I've tested this proposal by the following code on x86_64: >> >> diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> index a6b4efbe4f2..d715e69c850 100644 >> --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp >> +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp >> @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { >> void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { >> prepare_to_jump_from_interpreted(); >> >> + if (UseNewCode) { >> + Label ok; >> + movptr(temp, Address(method, Method::from_interpreted_offset())); >> + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); >> + je(ok); >> + movptr(rax, Address(method, Method::from_compiled_offset())); >> + movptr(rbx, rax); >> + addptr(rbx, 128); >> + hlt(); >> + bind(ok); >> + } >> + >> if (JvmtiExport::can_post_interpreter_events()) { >> Label run_compiled_code; >> // JVMTI events, such as single-stepping, are implemented partly by avoiding running >> >> >> The output is (requires hsdis library, otherwise we only get the hex dump): >> >> RAX=0x00007f1a75000100 is at entry_point+0 in (nmethod*)0x00007f1a75000008 >> Compiled method (c1) 2504 1 3 java.lang.Byte::toUnsignedInt (6 bytes) >> total in heap [0x00007f1a75000008,0x00007f1a750001f8] = 496 >> main code [0x00007f1a75000100,0x00007f1a750001b8] = 184 >> stub code [0x00007f1a750001b8,0x00007f1a750001f8] = 64 >> mutable data [0x00007f1a1001e0b0,0x00007f1a1001e0e0] = 48 >> relocation [0x00007f1a1001e0b0,0x00007f1a1001e0d8] = 40 >> metadata [0x00007f1a1001e0d8,0x00007f1a1001e0e0] = 8 >> immutable data [0x00007f1a1001dcd0,0x00007f1a1001dd30] = 96 >> dependencies [0x00007f1a1001dcd0,0x00007f1a1001dc... > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge remote-tracking branch 'origin' into 8368787_hs_err_nmethod_code > - Ensure to print at least 64 Bytes ahead in hex dump. > - Use frame_complete_offset for better start address computation. Improve comments. > - Move printing code to nmethod.cpp. > - Always print hex dump. Plus disassembly when hsdis loaded. > - 8368787: Error reporting: hs_err files should print instructions when referencing code in nemthods Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27530#issuecomment-3436535832 From mli at openjdk.org Thu Oct 23 12:41:11 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 12:41:11 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v2] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 11:20:31 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - tests >> - switch > > src/hotspot/share/opto/superword.cpp line 1755: > >> 1753: mask = BoolTest::unsigned_mask(mask); >> 1754: break; >> 1755: } // switch > > Given that we just missed some cases in the old "if": you should definitively have a `default` case, that hits an assert. That way, we don't have missing cases that silently do the wrong thing. Not sure I understand you correctly. Do you mean add some code like below? @@ -1752,6 +1752,8 @@ VTransformBoolTest PackSet::get_bool_test(const Node_List* bool_pack) const { case Op_CmpUL3: mask = BoolTest::unsigned_mask(mask); break; + default: + ShouldNotReachHere(); // or assert } // switch But besides of Op_CmpF/D, Op_CmpUxx, there could be other Cmp ops call into `PackSet::get_bool_test`. Or maybe I misunderstood you? > test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 743: > >> 741: r[i] = Integer.compareUnsigned(a[i], b[i]) > 0 ? cc : dd; >> 742: } >> 743: } > > Why does this not vectorize, but `testCMoveUIGTforF` does? Is that not strange? These test cases are converted from signed ones above, for example `testCMoveUIGTforI` is from `testCMoveIGTforI`. To answer your question, I think the reason is at the comment above `testCMoveIGTforI`. In https://github.com/openjdk/jdk/pull/25336, I'm going to enable this vectorization, but previously https://github.com/openjdk/jdk/pull/25336 was blocked by unsigned comparison issue (check the discussion at: https://github.com/openjdk/jdk/pull/25336#discussion_r2123518238). With this pr pushed in, I think I can restart the work of https://github.com/openjdk/jdk/pull/25336. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2454988006 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2454985371 From bkilambi at openjdk.org Thu Oct 23 12:57:45 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 23 Oct 2025 12:57:45 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> Message-ID: On Wed, 22 Oct 2025 04:15:26 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Move function comments to matcher.hpp > - Merge 'jdk:master' into JDK-8367292 > - 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1432: > 1430: } > 1431: > 1432: // The function is same as above "sve_vmask_tolong", but it uses SVE2's BDEP SVE's "BEXT" instruction? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2455038697 From epeter at openjdk.org Thu Oct 23 13:27:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 13:27:30 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v2] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 12:38:11 GMT, Hamlin Li wrote: >> src/hotspot/share/opto/superword.cpp line 1755: >> >>> 1753: mask = BoolTest::unsigned_mask(mask); >>> 1754: break; >>> 1755: } // switch >> >> Given that we just missed some cases in the old "if": you should definitively have a `default` case, that hits an assert. That way, we don't have missing cases that silently do the wrong thing. > > Not sure I understand you correctly. > Do you mean add some code like below? > > @@ -1752,6 +1752,8 @@ VTransformBoolTest PackSet::get_bool_test(const Node_List* bool_pack) const { > case Op_CmpUL3: > mask = BoolTest::unsigned_mask(mask); > break; > + default: > + ShouldNotReachHere(); // or assert > } // switch > > But besides of Op_CmpF/D, Op_CmpUxx, there could be other Cmp ops call into `PackSet::get_bool_test`. > > Or maybe I misunderstood you? There could be, but then we should explicitly name them, and make sure that they are all ok. Just see what cases you need to add, until all testing passes. I hope it would only be a handfull of cmp nodes. Doing it all explicitly means we know better that we thought about all cases. >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 743: >> >>> 741: r[i] = Integer.compareUnsigned(a[i], b[i]) > 0 ? cc : dd; >>> 742: } >>> 743: } >> >> Why does this not vectorize, but `testCMoveUIGTforF` does? Is that not strange? > > These test cases are converted from signed ones above, for example `testCMoveUIGTforI` is from `testCMoveIGTforI`. > > To answer your question, I think the reason is at the comment above `testCMoveIGTforI`. > In https://github.com/openjdk/jdk/pull/25336, I'm going to enable this vectorization, but previously https://github.com/openjdk/jdk/pull/25336 was blocked by unsigned comparison issue (check the discussion at: https://github.com/openjdk/jdk/pull/25336#discussion_r2123518238). > With this pr pushed in, I think I can restart the work of https://github.com/openjdk/jdk/pull/25336. Hmm ok, sounds good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455122473 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455127584 From mli at openjdk.org Thu Oct 23 13:44:41 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 13:44:41 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v3] In-Reply-To: References: Message-ID: <0eno5QZSEwsIACJpvJX2CoCVWiEYKEGOGwas6Z1oy48=.a897eee7-a5c7-4fd2-902b-e1a0a6e1d805@github.com> > Hi, > Can you help to review the patch? @eme64 > > ## Issue > > Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. > For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. > > ## ?Fix > Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. > > Thanks > > This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: add all possible Cmp cases; add comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27942/files - new: https://git.openjdk.org/jdk/pull/27942/files/eee8acd8..2bad7aea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=01-02 Stats: 12 lines in 1 file changed: 12 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27942/head:pull/27942 PR: https://git.openjdk.org/jdk/pull/27942 From mli at openjdk.org Thu Oct 23 13:44:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 13:44:42 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v2] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 13:22:36 GMT, Emanuel Peter wrote: >> Not sure I understand you correctly. >> Do you mean add some code like below? >> >> @@ -1752,6 +1752,8 @@ VTransformBoolTest PackSet::get_bool_test(const Node_List* bool_pack) const { >> case Op_CmpUL3: >> mask = BoolTest::unsigned_mask(mask); >> break; >> + default: >> + ShouldNotReachHere(); // or assert >> } // switch >> >> But besides of Op_CmpF/D, Op_CmpUxx, there could be other Cmp ops call into `PackSet::get_bool_test`. >> >> Or maybe I misunderstood you? > > There could be, but then we should explicitly name them, and make sure that they are all ok. > Just see what cases you need to add, until all testing passes. I hope it would only be a handfull of cmp nodes. > > Doing it all explicitly means we know better that we thought about all cases. Sure, added all the possible cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455181935 From epeter at openjdk.org Thu Oct 23 14:37:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 14:37:49 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v3] In-Reply-To: <0eno5QZSEwsIACJpvJX2CoCVWiEYKEGOGwas6Z1oy48=.a897eee7-a5c7-4fd2-902b-e1a0a6e1d805@github.com> References: <0eno5QZSEwsIACJpvJX2CoCVWiEYKEGOGwas6Z1oy48=.a897eee7-a5c7-4fd2-902b-e1a0a6e1d805@github.com> Message-ID: On Thu, 23 Oct 2025 13:44:41 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? @eme64 >> >> ## Issue >> >> Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. >> For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. >> >> ## ?Fix >> Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. >> >> Thanks >> >> This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add all possible Cmp cases; add comments src/hotspot/share/opto/superword.cpp line 1752: > 1750: case Op_CmpUL: > 1751: case Op_CmpU3: > 1752: case Op_CmpUL3: Do these really ever survive to here? I thought they should all be turned into other ops? Do you have an example? src/hotspot/share/opto/superword.cpp line 1758: > 1756: break; > 1757: case Op_CmpN: > 1758: case Op_CmpP: Did you ever encounter these? I'd be surprised if these work... Because they work with pointers, and we don't support pointers, only primitives. src/hotspot/share/opto/superword.cpp line 1760: > 1758: case Op_CmpP: > 1759: case Op_CmpD3: > 1760: case Op_CmpF3: These would probably have the same issue as the regular `CmpF/CmpD`, no? So are we sure we can "allow-list" them here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455362948 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455361259 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455365835 From epeter at openjdk.org Thu Oct 23 14:37:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 14:37:50 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v3] In-Reply-To: References: <0eno5QZSEwsIACJpvJX2CoCVWiEYKEGOGwas6Z1oy48=.a897eee7-a5c7-4fd2-902b-e1a0a6e1d805@github.com> Message-ID: <-NzrnoPmv_hPZ8bmJyJNljuolJtn4It7O7Tj_FMpiTs=.bc076285-d882-460b-ba23-2c3b2e39ea06@github.com> On Thu, 23 Oct 2025 14:33:04 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add all possible Cmp cases; add comments > > src/hotspot/share/opto/superword.cpp line 1752: > >> 1750: case Op_CmpUL: >> 1751: case Op_CmpU3: >> 1752: case Op_CmpUL3: > > Do these really ever survive to here? I thought they should all be turned into other ops? Do you have an example? If these never come through here, then just let them go into the default case. > src/hotspot/share/opto/superword.cpp line 1760: > >> 1758: case Op_CmpP: >> 1759: case Op_CmpD3: >> 1760: case Op_CmpF3: > > These would probably have the same issue as the regular `CmpF/CmpD`, no? So are we sure we can "allow-list" them here? If these never come through here, then just let them go into the default case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455369138 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455369339 From mli at openjdk.org Thu Oct 23 15:02:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 15:02:13 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v4] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? @eme64 > > ## Issue > > Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. > For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. > > ## ?Fix > Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. > > Thanks > > This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: catch unexpected Cmp ops ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27942/files - new: https://git.openjdk.org/jdk/pull/27942/files/2bad7aea..34fafd2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=02-03 Stats: 8 lines in 1 file changed: 1 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27942/head:pull/27942 PR: https://git.openjdk.org/jdk/pull/27942 From mli at openjdk.org Thu Oct 23 15:02:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 15:02:15 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v2] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 11:17:04 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - tests >> - switch > > You should also change the PR description, especially you should describe what went wrong at what point. > > > Well, you mostly already explain. I think the issue is that we don't really carry the "unsigned-ness" of the comparison, and then end up doing signed instead of unsigned comparison... @eme64 I just keep Op_CmpI/L. Test running, will update the result or code accordingly later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3437491323 From mli at openjdk.org Thu Oct 23 15:02:18 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 15:02:18 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v3] In-Reply-To: <-NzrnoPmv_hPZ8bmJyJNljuolJtn4It7O7Tj_FMpiTs=.bc076285-d882-460b-ba23-2c3b2e39ea06@github.com> References: <0eno5QZSEwsIACJpvJX2CoCVWiEYKEGOGwas6Z1oy48=.a897eee7-a5c7-4fd2-902b-e1a0a6e1d805@github.com> <-NzrnoPmv_hPZ8bmJyJNljuolJtn4It7O7Tj_FMpiTs=.bc076285-d882-460b-ba23-2c3b2e39ea06@github.com> Message-ID: <68GzmO5aZZv0sVilic-LTdJ1mSVo6zQDgkgs-YJkRK8=.dd91ff90-4c49-4a71-8e52-b75ef4343285@github.com> On Thu, 23 Oct 2025 14:35:03 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 1752: >> >>> 1750: case Op_CmpUL: >>> 1751: case Op_CmpU3: >>> 1752: case Op_CmpUL3: >> >> Do these really ever survive to here? I thought they should all be turned into other ops? Do you have an example? > > If these never come through here, then just let them go into the default case. will fix. >> src/hotspot/share/opto/superword.cpp line 1760: >> >>> 1758: case Op_CmpP: >>> 1759: case Op_CmpD3: >>> 1760: case Op_CmpF3: >> >> These would probably have the same issue as the regular `CmpF/CmpD`, no? So are we sure we can "allow-list" them here? > > If these never come through here, then just let them go into the default case. will fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455463230 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455464693 From hgreule at openjdk.org Thu Oct 23 15:09:00 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 23 Oct 2025 15:09:00 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 11:39:13 GMT, Emanuel Peter wrote: > We could still constant fold the comparison.... as long as the comparison is "relaxed enough". It might be worth having a handfull of examples like that: some that still constant fold, and some that don't because the comparison is too "sharp", and the "rounding error" too large. What do you think? Do you mean as part of the comment? That should be doable and provide useful context, yes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3437522539 From mli at openjdk.org Thu Oct 23 15:02:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 15:02:20 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v3] In-Reply-To: References: <0eno5QZSEwsIACJpvJX2CoCVWiEYKEGOGwas6Z1oy48=.a897eee7-a5c7-4fd2-902b-e1a0a6e1d805@github.com> Message-ID: <5N45ZthJFpUoe2KczPtsdVIWvuK226keIHwc3hjQefY=.e06899a3-fbe3-41b4-a84b-ff4ebfbf5bc9@github.com> On Thu, 23 Oct 2025 14:32:33 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add all possible Cmp cases; add comments > > src/hotspot/share/opto/superword.cpp line 1758: > >> 1756: break; >> 1757: case Op_CmpN: >> 1758: case Op_CmpP: > > Did you ever encounter these? I'd be surprised if these work... Because they work with pointers, and we don't support pointers, only primitives. will fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455461536 From epeter at openjdk.org Thu Oct 23 15:13:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 15:13:10 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v4] In-Reply-To: References: Message-ID: <9JTWuuX7fwGdU0ORTyrYavpMsI2DT_Y-rxDO6t0DkB0=.8847d8ce-6635-483a-91d4-4e0138414cde@github.com> On Thu, 23 Oct 2025 15:02:13 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? @eme64 >> >> ## Issue >> >> Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. >> For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. >> >> ## ?Fix >> Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. >> >> Thanks >> >> This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > catch unexpected Cmp ops src/hotspot/share/opto/superword.cpp line 1757: > 1755: case Op_CmpI: > 1756: case Op_CmpL: > 1757: break; We could add a comment why these are ok without any action. Suggestion: // The mask of signed int/long scalar comparisons has the same semantics // as the mask for vector elementwise int/long comparison with VectorMaskCmp. break; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455531987 From epeter at openjdk.org Thu Oct 23 15:23:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 15:23:18 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v4] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 15:02:13 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? @eme64 >> >> ## Issue >> >> Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. >> For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. >> >> ## ?Fix >> Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. >> >> Thanks >> >> This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > catch unexpected Cmp ops src/hotspot/share/opto/superword.cpp line 1752: > 1750: case Op_CmpUL: > 1751: // Carry unsigned-ness information from CmpUxx to VTransformBoolTest, > 1752: // which will be passed to e.g. VectorMaskCmp. Suggestion: // When we have CmpU->Bool, the mask of the Bool has no unsigned-ness information, // but the mask is implicitly unsigned only because of the CmpU. Since we will replace // the CmpU->Bool with a single VectorMaskCmp, we need to now make the unsigned-ness // explicit. This would give more of a reason why we need to do the "unsign...ing". What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455569832 From mli at openjdk.org Thu Oct 23 15:23:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 15:23:20 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v4] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 15:17:14 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> catch unexpected Cmp ops > > src/hotspot/share/opto/superword.cpp line 1752: > >> 1750: case Op_CmpUL: >> 1751: // Carry unsigned-ness information from CmpUxx to VTransformBoolTest, >> 1752: // which will be passed to e.g. VectorMaskCmp. > > Suggestion: > > // When we have CmpU->Bool, the mask of the Bool has no unsigned-ness information, > // but the mask is implicitly unsigned only because of the CmpU. Since we will replace > // the CmpU->Bool with a single VectorMaskCmp, we need to now make the unsigned-ness > // explicit. > > This would give more of a reason why we need to do the "unsign...ing". What do you think? Yes, it's better! Thanks! > src/hotspot/share/opto/superword.cpp line 1757: > >> 1755: case Op_CmpI: >> 1756: case Op_CmpL: >> 1757: break; > > We could add a comment why these are ok without any action. > Suggestion: > > // The mask of signed int/long scalar comparisons has the same semantics > // as the mask for vector elementwise int/long comparison with VectorMaskCmp. > break; Nice! I can commit your suggestion by simply click the button. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455580398 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455570421 From epeter at openjdk.org Thu Oct 23 15:23:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 15:23:21 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v4] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 15:17:25 GMT, Hamlin Li wrote: >> src/hotspot/share/opto/superword.cpp line 1757: >> >>> 1755: case Op_CmpI: >>> 1756: case Op_CmpL: >>> 1757: break; >> >> We could add a comment why these are ok without any action. >> Suggestion: >> >> // The mask of signed int/long scalar comparisons has the same semantics >> // as the mask for vector elementwise int/long comparison with VectorMaskCmp. >> break; > > Nice! I can commit your suggestion by simply click the button. Yes, the benefits of GitHub ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455580088 From epeter at openjdk.org Thu Oct 23 15:25:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 15:25:44 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v5] In-Reply-To: <4JU6tXdnIFqPXbDMV14FBOr7yvA4mQz6Cd4s1wOjmY0=.05b28d35-bc90-49b5-a321-d37ab3070799@github.com> References: <4JU6tXdnIFqPXbDMV14FBOr7yvA4mQz6Cd4s1wOjmY0=.05b28d35-bc90-49b5-a321-d37ab3070799@github.com> Message-ID: <33pm95WaCjJp6jwm4LaSx9F_ZGPy_uZGlwukkN3o0XM=.b3c414d5-4598-40bf-b0d9-ea9088e4f881@github.com> On Thu, 23 Oct 2025 15:23:16 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? @eme64 >> >> ## Issue >> >> Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. >> For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. >> >> ## ?Fix >> Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. >> >> Thanks >> >> This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/superword.cpp > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/superword.cpp > > Co-authored-by: Emanuel Peter test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 1131: > 1129: "testCMoveULGTforL", > 1130: "testCMoveULGTforF", > 1131: "testCMoveULGTforD", I just realized that we only have `GT` cases. But what about `GE,LE,LT`? Are those covered somewhere else? Otherwise we only test a fraction of `static mask unsigned_mask(mask btm) { return mask(btm | unsigned_compare); }` What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2455597841 From epeter at openjdk.org Thu Oct 23 15:21:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 Oct 2025 15:21:38 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 15:05:47 GMT, Hannes Greule wrote: > Do you mean as part of the comment? That should be doable and provide useful context, yes. Exactly, yes please ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3437587573 From mli at openjdk.org Thu Oct 23 15:23:16 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 15:23:16 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v5] In-Reply-To: References: Message-ID: <4JU6tXdnIFqPXbDMV14FBOr7yvA4mQz6Cd4s1wOjmY0=.05b28d35-bc90-49b5-a321-d37ab3070799@github.com> > Hi, > Can you help to review the patch? @eme64 > > ## Issue > > Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. > For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. > > ## ?Fix > Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. > > Thanks > > This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/superword.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/superword.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27942/files - new: https://git.openjdk.org/jdk/pull/27942/files/34fafd2c..650f9200 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=03-04 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27942/head:pull/27942 PR: https://git.openjdk.org/jdk/pull/27942 From kvn at openjdk.org Thu Oct 23 16:41:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 Oct 2025 16:41:53 GMT Subject: RFR: 8370318: AES-GCM vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 07:20:48 GMT, Aleksey Shipilev wrote: > See the bug for symptoms and discussion. > > In short, in newly added intrinsic in JDK 24, there is a potential read out of Java heap if key array is at the edge of it, which will crash JVM. And that read is redundant for the code path in question, we only use it in the subsequent blocks that we never actually enter in the problematic case. So we never see any failures in testing: the only observable effect is SEGV on uncommitted heap access. It is somewhat similar to [JDK-8330611](https://bugs.openjdk.org/browse/JDK-8330611) we have fixed in other place. But this one can be caught with the explicit range check in debug code. > > I opted to keep this patch very simple, because I would backport it to 25u shortly after we integrate to mainline. It just moves the read down to the block where it is actually needed. Note that `aes_192` and `aes_256` labels are red herring in this code, they are unbound; you can even remove them without any bulid errors. The actual thing that drives path selection is `NROUNDS` -- that one is derived from the key array length -- and we are just doing the read too early. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `com/sun/crypto/provider/Cipher compiler/codegen/aes` (fails with range check only, passes with entire patch) > - [x] Linux x86_64 server fastdebug, `all` on AVX-512 machine Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27951#pullrequestreview-3371090729 From kvn at openjdk.org Thu Oct 23 16:58:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 Oct 2025 16:58:48 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v2] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 06:51:09 GMT, Marc Chevalier wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Adapt the comment > > There are 2 things: > 1. `restore_major_progress` from addition (or OR) semantics to assignment (@dean-long's concern): I've added > > void set_major_progress(bool progress) { precond(!(!progress && _major_progress)); _major_progress = progress; } > > To see if we are ever in the case that the `old_progress` is false and `_major_progress` is true. That is the only case where the former OR-semantics is not the same as the new set semantics. It passes tier1-6 + other internal tests. I can replace the assignment with a `||` (the boolean `+`) if we still have doubts, but then, it seems tests are not exercising this path. > 2. Is the type change correct overall. I've did something as @vnkozlov describes: have side by side the bool and the int version of the major progress, have the methods acts on both at the same time: on the int as it used to, on the bool as I propose here. Add the proposed assert in the getter. I've also made sure to assign both the int and the bool version for the 2 places in `compile.cpp` that assign `_major_progress` directly. It passes tier1-3 + other internal tests. This also makes sure there is no observable difference between the `+=` for the `int` version, and the assignment for the `bool` version. > > Here is what I've tested with: > [testing.patch](https://github.com/user-attachments/files/23091637/testing.patch) Thank you, @marc-chevalier, for confirming that new logic works as old one by running different experiments. @eme64, I agree with you suggested comment. @chhagedorn, yes, in most cases setting major_progress indicates that calculated loop information (get_ctrl, idom, get_loop, etc) is invalid anymore and needs to be recalculated. I agree that in some cases we indeed too conservative by setting major_progress. But the only downside is compilation time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3438098985 From kvn at openjdk.org Thu Oct 23 17:13:05 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 Oct 2025 17:13:05 GMT Subject: RFR: 8368321: Rethink compilation delay strategy for lukewarm methods [v2] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 01:36:39 GMT, Igor Veresov wrote: >> In the current implementation we delay profiling of lukewarm methods (those that were never compiled by C2 during training) by increasing the 2->3 threshold by a factor. That may shift profiling of those too much into the future if a large factor is used, if we use a small factor, however, profiling may happen within the training run window so to speak. The solution I came up with it to delay profiling until we reach the number of invocations of a method equal to the number we had in the training run. After that we use the normal policy. >> >> Here is an example. I trained our JavacBenchApp for 5 iterations (which is artificially low and therefore many methods would be classified as lukewarm). Then I ran it for 200 iterations with AOT replay. >> >> old-vs-new >> >> While initially the performance is similar it quickly diverges. With the new approach we move to standard handling of lukewarm methods after 5 iterations and they get compiled with C2. With the old approach we don't. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Fix zero build Seems fine. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27926#pullrequestreview-3371313857 From kvn at openjdk.org Thu Oct 23 17:17:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 Oct 2025 17:17:01 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 In-Reply-To: <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> Message-ID: On Wed, 15 Oct 2025 19:17:37 GMT, Chad Rakoczy wrote: >>> I'm not sure if you saw this because of the bot comments but I'm not able to reproduce the COH failure >> >> @chadrako, I added test output to [8369150](https://bugs.openjdk.org/browse/JDK-8369150) bug report. >> >> Do you unload old method after coping and let GC do it normal way? > >> Do you unload old method after coping and let GC do it normal way? > > When an nmethod is relocated the old is marked not entrant. Then yes it is unloaded normally by the GC. The issue is most likely the GC deciding not to unload it for whatever reason. I'll see if there is a more deterministic way to test this @chadrako what is status of this work. If you are struggling to reproduce [8369150](https://bugs.openjdk.org/browse/JDK-8369150) you can fix it separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3438176293 From duke at openjdk.org Thu Oct 23 17:33:23 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 23 Oct 2025 17:33:23 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v2] In-Reply-To: <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> Message-ID: On Wed, 15 Oct 2025 19:17:37 GMT, Chad Rakoczy wrote: >>> I'm not sure if you saw this because of the bot comments but I'm not able to reproduce the COH failure >> >> @chadrako, I added test output to [8369150](https://bugs.openjdk.org/browse/JDK-8369150) bug report. >> >> Do you unload old method after coping and let GC do it normal way? > >> Do you unload old method after coping and let GC do it normal way? > > When an nmethod is relocated the old is marked not entrant. Then yes it is unloaded normally by the GC. The issue is most likely the GC deciding not to unload it for whatever reason. I'll see if there is a more deterministic way to test this > @chadrako what is status of this work. If you are struggling to reproduce [8369150](https://bugs.openjdk.org/browse/JDK-8369150) you can fix it separately. I haven't been able to reproduce that failure. I'll reopen [8369150](https://bugs.openjdk.org/browse/JDK-8369150) so it can be completed separately ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3438252118 From duke at openjdk.org Thu Oct 23 17:33:21 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 23 Oct 2025 17:33:21 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v2] In-Reply-To: References: Message-ID: <6dX4hnm-wd-kZ-FVuZMe2RY0zAstL8x-eKjqCUnAhRI=.fdcbf683-03d7-4610-bd7e-650c94cae155@github.com> > [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) > > Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) > > `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` > > `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions > > `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC Chad Rakoczy has updated the pull request incrementally with four additional commits since the last revision: - Fix requires - Reproblem list serviceability/jvmti/NMethodRelocation/NMethodRelocationTest.java - Compile for 30 seconds instead of 1024 methods - Fix requires ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27659/files - new: https://git.openjdk.org/jdk/pull/27659/files/6cbda05f..769800cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27659&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27659&range=00-01 Stats: 52 lines in 6 files changed: 16 ins; 2 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/27659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27659/head:pull/27659 PR: https://git.openjdk.org/jdk/pull/27659 From kvn at openjdk.org Thu Oct 23 17:45:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 Oct 2025 17:45:20 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v2] In-Reply-To: References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> Message-ID: On Thu, 23 Oct 2025 17:30:27 GMT, Chad Rakoczy wrote: >>> Do you unload old method after coping and let GC do it normal way? >> >> When an nmethod is relocated the old is marked not entrant. Then yes it is unloaded normally by the GC. The issue is most likely the GC deciding not to unload it for whatever reason. I'll see if there is a more deterministic way to test this > >> @chadrako what is status of this work. If you are struggling to reproduce [8369150](https://bugs.openjdk.org/browse/JDK-8369150) you can fix it separately. > > I haven't been able to reproduce that failure. I'll reopen [8369150](https://bugs.openjdk.org/browse/JDK-8369150) so it can be completed separately @chadrako, is PR ready for testing now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3438306723 From duke at openjdk.org Thu Oct 23 18:47:24 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 23 Oct 2025 18:47:24 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v2] In-Reply-To: References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> Message-ID: On Thu, 23 Oct 2025 17:30:27 GMT, Chad Rakoczy wrote: >>> Do you unload old method after coping and let GC do it normal way? >> >> When an nmethod is relocated the old is marked not entrant. Then yes it is unloaded normally by the GC. The issue is most likely the GC deciding not to unload it for whatever reason. I'll see if there is a more deterministic way to test this > >> @chadrako what is status of this work. If you are struggling to reproduce [8369150](https://bugs.openjdk.org/browse/JDK-8369150) you can fix it separately. > > I haven't been able to reproduce that failure. I'll reopen [8369150](https://bugs.openjdk.org/browse/JDK-8369150) so it can be completed separately > @chadrako, is PR ready for testing now? Yes ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3438589415 From kvn at openjdk.org Thu Oct 23 18:59:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 Oct 2025 18:59:48 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v2] In-Reply-To: <6dX4hnm-wd-kZ-FVuZMe2RY0zAstL8x-eKjqCUnAhRI=.fdcbf683-03d7-4610-bd7e-650c94cae155@github.com> References: <6dX4hnm-wd-kZ-FVuZMe2RY0zAstL8x-eKjqCUnAhRI=.fdcbf683-03d7-4610-bd7e-650c94cae155@github.com> Message-ID: On Thu, 23 Oct 2025 17:33:21 GMT, Chad Rakoczy wrote: >> [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) >> >> Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) >> >> `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` >> >> `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions >> >> `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC > > Chad Rakoczy has updated the pull request incrementally with four additional commits since the last revision: > > - Fix requires > - Reproblem list serviceability/jvmti/NMethodRelocation/NMethodRelocationTest.java > - Compile for 30 seconds instead of 1024 methods > - Fix requires I submitted testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3438643384 From mli at openjdk.org Thu Oct 23 19:18:37 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 19:18:37 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v2] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 11:17:04 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - tests >> - switch > > You should also change the PR description, especially you should describe what went wrong at what point. > > > Well, you mostly already explain. I think the issue is that we don't really carry the "unsigned-ness" of the comparison, and then end up doing signed instead of unsigned comparison... > @eme64 I just keep Op_CmpI/L. Test running, will update the result or code accordingly later. I ran all the tests under test/hotspot/jtreg/compiler on x86, the `default` case (i.e. `ShouldNotReachHere`) is not triggerred. Plus github CI, I think we are good at these `switch cases`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3438716433 From mli at openjdk.org Thu Oct 23 19:18:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 19:18:42 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v5] In-Reply-To: <33pm95WaCjJp6jwm4LaSx9F_ZGPy_uZGlwukkN3o0XM=.b3c414d5-4598-40bf-b0d9-ea9088e4f881@github.com> References: <4JU6tXdnIFqPXbDMV14FBOr7yvA4mQz6Cd4s1wOjmY0=.05b28d35-bc90-49b5-a321-d37ab3070799@github.com> <33pm95WaCjJp6jwm4LaSx9F_ZGPy_uZGlwukkN3o0XM=.b3c414d5-4598-40bf-b0d9-ea9088e4f881@github.com> Message-ID: On Thu, 23 Oct 2025 15:23:23 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/superword.cpp >> >> Co-authored-by: Emanuel Peter >> - Update src/hotspot/share/opto/superword.cpp >> >> Co-authored-by: Emanuel Peter > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 1131: > >> 1129: "testCMoveULGTforL", >> 1130: "testCMoveULGTforF", >> 1131: "testCMoveULGTforD", > > I just realized that we only have `GT` cases. But what about `GE,LE,LT`? Are those covered somewhere else? > > Otherwise we only test a fraction of > `static mask unsigned_mask(mask btm) { return mask(btm | unsigned_compare); }` > > What do you think? Make sense, I'll add more tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2456869594 From vlivanov at openjdk.org Thu Oct 23 19:29:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 23 Oct 2025 19:29:58 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Sun, 19 Oct 2025 15:46:06 GMT, Hannes Greule wrote: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. It seems the root cause is a divergence in functionality in GVN between different representations of Div/Mod nodes. How hard would it be to align the implementation and improve GVN handling for other representations, so the same type is constructed irrespective of the IR shape? Alternatively, as part of the expansion, new representation can be wrapped in CastII/CastLL with the narrower type of original Div/Mod node. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3438763340 From mli at openjdk.org Thu Oct 23 19:38:43 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 19:38:43 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v6] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? @eme64 > > ## Issue > > Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. > For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. > > ## ?Fix > Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. > > Thanks > > This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27942/files - new: https://git.openjdk.org/jdk/pull/27942/files/650f9200..309880d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=04-05 Stats: 159 lines in 1 file changed: 159 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27942/head:pull/27942 PR: https://git.openjdk.org/jdk/pull/27942 From duke at openjdk.org Thu Oct 23 19:40:38 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 23 Oct 2025 19:40:38 GMT Subject: Integrated: 8326609: New AES implementation with updates specified in FIPS 197 In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 19:33:41 GMT, Shawn M Emery wrote: > General: > ----------- > i) This work is to replace the existing AES cipher under the Cryptix license. > > ii) The lookup tables are employed for performance, but also for operating in constant time. > > iii) Several loops have been unrolled for optimization purposes, but are harder to read and don't meet coding style guidelines. > > iv) None of the AES related intrinsics has been modified in this PR, but the new code has been updated to use the intrinsics related hooks for the AES block and key table arguments. > > Note: I've purposefully not seen the original Cryptix code, so when making a code review comment please don't quote the code in the AESCrypt.java file. > > Correctness: > ----------------- > The following AES-specific regression tests have passed in intrinsics (default) and non-intrinsic (-Xint) modes: > > i) test/jdk/com/sun/crypto/provider/Cipher/AES: all 27 tests pass > > -intrinsics mode for: > > ii) test/hotspot/jtreg/compiler/codegen/aes: all 4 tests pass > > iii) jck:api/java_security, jck:api/javax_crypto, jck:api/javax_net, jck:api/javax_security, jck:api/org_ietf, and jck:api/javax_xml/crypto: passed, with 10 known failures > > iv) jdk_security_infra: passed, with 48 known failures > > v) tier1 and tier2: all 110257 tests pass > > Security: > ----------- > In order to prevent side-channel (timing and differential power analysis) attacks the code has been constructed to operate in constant time and does not use conditionals based on the key or key expansion table. This is accomplished by using lookup tables in both the cipher and inverse cipher of AES. > > Performance: > ------------------ > All AES related benchmarks have been executed against the new and original Cryptix code: > > micro:org.openjdk.bench.javax.crypto.AES > > micro:org.openjdk.bench.javax.crypto.full.AESBench > > micro:org.openjdk.bench.javax.crypto.full.AESExtraBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMBench > > micro:org.openjdk.bench.javax.crypto.full.AESGCMByteBuffer > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherInputStream > > micro:org.openjdk.bench.javax.crypto.full.AESGCMCipherOutputStream > > micro:org.openjdk.bench.javax.crypto.full.AESKeyWrapBench. > > micro:org.openjdk.bench.java.security.CipherSuiteBench (AES) > > The benchmarks were executed in different compiler modes (default (no compiler options), -Xint, and -Xcomp) and on two different architectures (x86 and arm64) with the following encryption results: > > i) Default (no JVM options, non-intrinsics) mode: > > a) Encryption: the new code performed better for b... This pull request has now been integrated. Changeset: 62f11cd4 Author: Shawn M Emery Committer: Valerie Peng URL: https://git.openjdk.org/jdk/commit/62f11cd4070f21ad82eebbb5319bdbbf4e13f9cf Stats: 2991 lines in 18 files changed: 1476 ins; 1473 del; 42 mod 8326609: New AES implementation with updates specified in FIPS 197 Reviewed-by: valeriep ------------- PR: https://git.openjdk.org/jdk/pull/27807 From mli at openjdk.org Thu Oct 23 19:42:24 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Oct 2025 19:42:24 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v5] In-Reply-To: References: <4JU6tXdnIFqPXbDMV14FBOr7yvA4mQz6Cd4s1wOjmY0=.05b28d35-bc90-49b5-a321-d37ab3070799@github.com> <33pm95WaCjJp6jwm4LaSx9F_ZGPy_uZGlwukkN3o0XM=.b3c414d5-4598-40bf-b0d9-ea9088e4f881@github.com> Message-ID: On Thu, 23 Oct 2025 19:15:34 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 1131: >> >>> 1129: "testCMoveULGTforL", >>> 1130: "testCMoveULGTforF", >>> 1131: "testCMoveULGTforD", >> >> I just realized that we only have `GT` cases. But what about `GE,LE,LT`? Are those covered somewhere else? >> >> Otherwise we only test a fraction of >> `static mask unsigned_mask(mask btm) { return mask(btm | unsigned_compare); }` >> >> What do you think? > > Make sense, I'll add more tests. Added tests to cover UI{GE|LT|LE}forF and UL{GE|LT|LE}forD. Other tests for example UI{GE|LT|LE}forD UL{GE|LT|LE}forF could be added when I work on https://github.com/openjdk/jdk/pull/25336 or https://github.com/openjdk/jdk/pull/25341, as currently they are not vectorized. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2456977884 From sviswanathan at openjdk.org Thu Oct 23 23:30:04 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 23 Oct 2025 23:30:04 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v5] In-Reply-To: References: Message-ID: <7OtrTykH5ClTZP4wv_wCCME-qfObci9VWNdGqaOyV4c=.83990bd4-66cf-431e-af58-247d47ea52bc@github.com> On Tue, 21 Oct 2025 11:56:31 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Limiting register biasing to NDD specific demotable instructions src/hotspot/cpu/x86/x86_64.ad line 445: > 443: case addI_rReg_rReg_mem_ndd_rule: > 444: case addL_rReg_ndd_rule: > 445: case addL_rReg_rReg_imm_ndd_rule: The following rules are missing: addL_rReg_rReg_mem_ndd_rule minI_rReg_ndd maxI_rReg_ndd ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2457813554 From kvn at openjdk.org Fri Oct 24 00:26:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 Oct 2025 00:26:02 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v2] In-Reply-To: <6dX4hnm-wd-kZ-FVuZMe2RY0zAstL8x-eKjqCUnAhRI=.fdcbf683-03d7-4610-bd7e-650c94cae155@github.com> References: <6dX4hnm-wd-kZ-FVuZMe2RY0zAstL8x-eKjqCUnAhRI=.fdcbf683-03d7-4610-bd7e-650c94cae155@github.com> Message-ID: On Thu, 23 Oct 2025 17:33:21 GMT, Chad Rakoczy wrote: >> [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) >> >> Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) >> >> `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` >> >> `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions >> >> `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC > > Chad Rakoczy has updated the pull request incrementally with four additional commits since the last revision: > > - Fix requires > - Reproblem list serviceability/jvmti/NMethodRelocation/NMethodRelocationTest.java > - Compile for 30 seconds instead of 1024 methods > - Fix requires Unfortunately all sub-tests with `-XX:+UseShenandoahGC` failed because Oracle JDK does not include this GC: Error occurred during initialization of VM Option -XX:+UseShenandoahGC not supported ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3440046663 From kvn at openjdk.org Fri Oct 24 00:33:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 Oct 2025 00:33:02 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v2] In-Reply-To: References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> Message-ID: <04axkDv5tJbnZz4o8TYCU0g9l0NqtDYGMkxAAzdiCvs=.43dfa056-97b5-4e30-95f6-1835db991bd9@github.com> On Thu, 23 Oct 2025 18:45:08 GMT, Chad Rakoczy wrote: >>> @chadrako what is status of this work. If you are struggling to reproduce [8369150](https://bugs.openjdk.org/browse/JDK-8369150) you can fix it separately. >> >> I haven't been able to reproduce that failure. I'll reopen [8369150](https://bugs.openjdk.org/browse/JDK-8369150) so it can be completed separately > >> @chadrako, is PR ready for testing now? > > Yes @chadrako I think my suggestion was not correct. We should revert back to your first changes for `@requires`. Original code was correct and only `serviceability/jvmti/NMethodRelocation/NMethodRelocationTest.java` missed it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3440078040 From xgong at openjdk.org Fri Oct 24 01:49:03 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 24 Oct 2025 01:49:03 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <2f9k7vksJ4apEclf8z85MyLeg6bbRnhptLpsoydeTMI=.df8ad2fc-5b56-4554-982a-691eb7849a84@github.com> Message-ID: On Thu, 23 Oct 2025 10:50:29 GMT, Emanuel Peter wrote: >>> The name suggests that if you return false here, then it is still ok to use a predicate instruction. The name suggests that if your return true, then you must use a predicate instruction. >>> >>> But then your comment for `Op_VectorLongToMask` and `Op_VectorMaskToLong` seems to suggest that we return false and do not want that a predicate instruction is used, but instead a packed vector. >>> >>> So now I'm a bit confused. >> >> The type for a vector mask is different on architectures that supports the predicate feature or not (please see my details answer below). Hence, for some vector operations, the expected input mask register/layout is different. Please note that there are two kind of layout for a mask if it is stored in a **vector register**. It might be 1) a packed layout with 8-bit element width, or 2) a unpacked layout with 8/16/32/64-bit element width according to the vector type. For the data relative mask operations like `VectorBlend`, it is 2), while for some bit relative mask operations like `VectorMaskTrueCount, VectorMaskFirstTrue, toLong, fromLong, ...`, it is 1) , because the implementation will be more efficient. >> >> My intention is to use this function guide what the expected IR is generated for a vector mask operation. Before this patch, mid-end do the difference by just checking the type of a vector mask, as it assumes the predicate instruction will be generated for a predicate type, while the vector instructions are generated for a vector type. However, as I mentioned in this PR, some mask operations might not support native predicate instructions on predicate architectures. Instead, they are implemented with the same vector instructions like NEON. We have to do the mask layout conversion inside codegen, which is in-efficient. Generating the same IR pattern like NEON is more efficient. >> >> So, if this function returns false, it means the input/output mask for a specified opcode requires to be saved into a vector register with the packed layout, even the architecture supports predicate feature. This is decided by the IR's implementation. >> >>> >>> I'm also wondering: Since there are two options (mask in packed vector vs predicate), does the availability of one always imply the availability of the other? Or could some platform have only one, and another platform only the other? >>> >> >> There are three kind of options for a mask: 1) packed vector with 8-bit element size, 2) unpacked vector with 8/16/32/64-bit element size, and 3) pred... > > Thanks for all the explanations! Do you think some of that could be moved to code comments? I think that would be quite helpful. Maybe I can add brief comments before this method like before? BTW, there are some comments added in the code generation of these two IRs. Would you mind checking the changes in `C2_Macroassembler_aarch64.cpp|hpp` and seeing whether the comment is helpful? Thanks so much! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2458327157 From xgong at openjdk.org Fri Oct 24 02:04:05 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 24 Oct 2025 02:04:05 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> Message-ID: On Thu, 23 Oct 2025 10:49:37 GMT, Emanuel Peter wrote: > > VectorStoreMask is a opposite operation of VectorLoadMask. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a VectorMask is stored into a boolean array. > > Thanks for the explanation! So it really only does the conversion, right? And no loading / storing? If that is true, we may want to rename them to `ConvPredicate2PackedVectorMaskNode`, or alike. What do you think? Yes, it just finished the layout conversion for a mask, without no loading/storing. These two IRs are frequently used in Vector mask relative operations in C2 mid-end and each backend, such as load/store/VectorBox/VectorUnbox and object re-materialize during deoptimization, and so on. They were added at the beginning support of API I think. Renaming is not an easy work. The main function of these two IRs are the conversion between different mask layout from compiler and java API: 1) To load a vector mask from memory (a boolean array), it needs to load the values into a boolean array with `LoadVectorNode`, and then convert the boolean vector to a data vector with `VectorLoadMask`. 2) To store a vector mask from compiler to a memory, it needs to convert the data vector to a boolean vector with `VectorStoreMask`, and then store the boolean vector to memory with `StoreVectorNode`. Hence, these two APIs are a part of `VectorMask.fromArray()/intoArray()` relatively. Although they really do not any memory access operation. If we have a better name for these two IRs, I think that would be another topic that we can revisit with a separate thread. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2458378440 From shade at openjdk.org Fri Oct 24 05:45:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 24 Oct 2025 05:45:02 GMT Subject: RFR: 8370318: AES-GCM vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 07:20:48 GMT, Aleksey Shipilev wrote: > See the bug for symptoms and discussion. > > In short, in newly added intrinsic in JDK 24, there is a potential read out of Java heap if key array is at the edge of it, which will crash JVM. And that read is redundant for the code path in question, we only use it in the subsequent blocks that we never actually enter in the problematic case. So we never see any failures in testing: the only observable effect is SEGV on uncommitted heap access. It is somewhat similar to [JDK-8330611](https://bugs.openjdk.org/browse/JDK-8330611) we have fixed in other place. But this one can be caught with the explicit range check in debug code. > > I opted to keep this patch very simple, because I would backport it to 25u shortly after we integrate to mainline. It just moves the read down to the block where it is actually needed. Note that `aes_192` and `aes_256` labels are red herring in this code, they are unbound; you can even remove them without any bulid errors. The actual thing that drives path selection is `NROUNDS` -- that one is derived from the key array length -- and we are just doing the read too early. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `com/sun/crypto/provider/Cipher compiler/codegen/aes` (fails with range check only, passes with entire patch) > - [x] Linux x86_64 server fastdebug, `all` on AVX-512 machine Thanks! I think I need another Review before I can integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27951#issuecomment-3441141803 From amitkumar at openjdk.org Fri Oct 24 05:46:14 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 24 Oct 2025 05:46:14 GMT Subject: RFR: 8370389: JavaFrameAnchor on s390 has unnecessary barriers In-Reply-To: References: Message-ID: <76NgnPuw3_U9COVLaq4LN1ibKXpdcjjtmfI991paFos=.4b3e332f-ba2a-419f-b0c8-ac07573329de@github.com> On Wed, 22 Oct 2025 07:21:45 GMT, Amit Kumar wrote: > No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler. Thank for the approvals :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27930#issuecomment-3441143460 From amitkumar at openjdk.org Fri Oct 24 05:46:15 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 24 Oct 2025 05:46:15 GMT Subject: Integrated: 8370389: JavaFrameAnchor on s390 has unnecessary barriers In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 07:21:45 GMT, Amit Kumar wrote: > No hardware barriers are necessary. All members are volatile and the profiler is run from a signal handler. This pull request has now been integrated. Changeset: 87645afa Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/87645afa052a87ab2af9602c8fafc2a707c77c19 Stats: 19 lines in 1 file changed: 5 ins; 11 del; 3 mod 8370389: JavaFrameAnchor on s390 has unnecessary barriers Reviewed-by: lucy, aph ------------- PR: https://git.openjdk.org/jdk/pull/27930 From xgong at openjdk.org Fri Oct 24 05:54:23 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 24 Oct 2025 05:54:23 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: Message-ID: > The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. > > For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. > > These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. > > This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. > > It also modifies the Vector API jtreg tests for well testing. Here is the details: > > 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: > > VectorMaskToLong (VectorLongToMask l) => l > > Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. > > 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". > > Performance shows significant improvement on NVIDIA's Grace CPU. > > Here is the performance data with `-XX:UseSVE=2`: > > Benchmark bits inputs Mode Unit Before After Gain > MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 > MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 > MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 > MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 > MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/ms 101031... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Rename the matcher function and fix comment issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27481/files - new: https://git.openjdk.org/jdk/pull/27481/files/d3e5b0fa..612c612f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=01-02 Stats: 58 lines in 11 files changed: 14 ins; 6 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/27481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27481/head:pull/27481 PR: https://git.openjdk.org/jdk/pull/27481 From xgong at openjdk.org Fri Oct 24 06:04:02 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 24 Oct 2025 06:04:02 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> Message-ID: On Thu, 23 Oct 2025 10:49:37 GMT, Emanuel Peter wrote: >> @XiaohongGong thanks for all he explanations. From what you say, it seems that `vector_mask_must_be_packed` is good. > >> VectorStoreMask is a opposite operation of VectorLoadMask. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a VectorMask is stored into a boolean array. > > Thanks for the explanation! So it really only does the conversion, right? And no loading / storing? If that is true, we may want to rename them to `ConvPredicate2PackedVectorMaskNode`, or alike. What do you think? Hi @eme64 , I updated a commit with renaming the matcher function to `mask_op_uses_packed_vector`. Is this fine to you? The main concern here is that only the specified vector mask ops [(VectorMaskOpNode)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.hpp#L1343) need the packed vector mask. Name `vector_mask_must_be_packed` might extend the scope to all vector/mask operations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2458988592 From epeter at openjdk.org Fri Oct 24 07:13:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 07:13:15 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 19:27:05 GMT, Vladimir Ivanov wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > It seems the root cause is a divergence in functionality in GVN between different representations of Div/Mod nodes. > > How hard would it be to align the implementation and improve GVN handling for other representations, so the same type is constructed irrespective of the IR shape? > > Alternatively, as part of the expansion, new representation can be wrapped in CastII/CastLL with the narrower type of original Div/Mod node. @iwanowww I'm not quite following your suggestions / questions. > It seems the root cause is a divergence in functionality in GVN between different representations of Div/Mod nodes. > How hard would it be to align the implementation and improve GVN handling for other representations, so the same type is constructed irrespective of the IR shape? Do you consider the "expanded" versions of Div/Mod as a "different representation of Div/Mod"? If yes: we could pattern match for such "expanded" versions of Div/Mod, but it would be quite complex: you would have to parse through patterns like displayed [up here](https://github.com/openjdk/jdk/pull/27886#issuecomment-3436423466). Do you think that is a good idea? I already mentioned the idea [up here](https://github.com/openjdk/jdk/pull/27886#issuecomment-3435782079), but did not think it was desirable due to the complexity. > Alternatively, as part of the expansion, new representation can be wrapped in CastII/CastLL with the narrower type of original Div/Mod node. How does this "wrapping" help? After parsing, the CastII at the bottom of the "expanded" Div would just have the whole int range. How would the type of the CastII ever be improved, without pattern matching the "expanded" Div? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3441433782 From epeter at openjdk.org Fri Oct 24 07:15:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 07:15:36 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v5] In-Reply-To: References: <4JU6tXdnIFqPXbDMV14FBOr7yvA4mQz6Cd4s1wOjmY0=.05b28d35-bc90-49b5-a321-d37ab3070799@github.com> <33pm95WaCjJp6jwm4LaSx9F_ZGPy_uZGlwukkN3o0XM=.b3c414d5-4598-40bf-b0d9-ea9088e4f881@github.com> Message-ID: On Thu, 23 Oct 2025 19:39:16 GMT, Hamlin Li wrote: >> Make sense, I'll add more tests. > > Added tests to cover UI{GE|LT|LE}forF and UL{GE|LT|LE}forD. > > Other tests for example UI{GE|LT|LE}forD UL{GE|LT|LE}forF could be added when I work on https://github.com/openjdk/jdk/pull/25336 or https://github.com/openjdk/jdk/pull/25341, as currently they are not vectorized. If you already have the tests in code, it may be good to just put all tests in now. Of course with adjusted IR rules. That would allow us to verify correctness on all combinations, and backport the tests as well. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2459112304 From jbhateja at openjdk.org Fri Oct 24 07:17:41 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 24 Oct 2025 07:17:41 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v7] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/e7de7ae6..31079431 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=05-06 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Fri Oct 24 07:17:43 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 24 Oct 2025 07:17:43 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v5] In-Reply-To: <7OtrTykH5ClTZP4wv_wCCME-qfObci9VWNdGqaOyV4c=.83990bd4-66cf-431e-af58-247d47ea52bc@github.com> References: <7OtrTykH5ClTZP4wv_wCCME-qfObci9VWNdGqaOyV4c=.83990bd4-66cf-431e-af58-247d47ea52bc@github.com> Message-ID: On Thu, 23 Oct 2025 23:07:15 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Limiting register biasing to NDD specific demotable instructions > > src/hotspot/cpu/x86/x86_64.ad line 445: > >> 443: case addI_rReg_rReg_mem_ndd_rule: >> 444: case addL_rReg_ndd_rule: >> 445: case addL_rReg_rReg_imm_ndd_rule: > > The following rules are missing: > addL_rReg_rReg_mem_ndd_rule > minI_rReg_ndd > maxI_rReg_ndd Thanks @sviswa7 , DONE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2459112090 From epeter at openjdk.org Fri Oct 24 07:19:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 07:19:18 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> Message-ID: <1qB16WqDleABsguKwI8xSgWBf1NFQ7uOZByQHIIXdOU=.e30842bf-67f4-4fb3-b877-b91b288912bc@github.com> On Fri, 24 Oct 2025 06:01:14 GMT, Xiaohong Gong wrote: >>> VectorStoreMask is a opposite operation of VectorLoadMask. We can treat it as a layout conversion for a vector mask. It is used to convert a vector mask (either a unpacked vector or a predicate) to a packed vector status (i.e. 8-bit element size). Because, in Java API, elements of a VectorMask is stored into a boolean array. >> >> Thanks for the explanation! So it really only does the conversion, right? And no loading / storing? If that is true, we may want to rename them to `ConvPredicate2PackedVectorMaskNode`, or alike. What do you think? > > Hi @eme64 , I updated a commit with renaming the matcher function to `mask_op_uses_packed_vector`. Is this fine to you? The main concern here is that only the specified vector mask ops [(VectorMaskOpNode)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.hpp#L1343) need the packed vector mask. Name `vector_mask_must_be_packed` might extend the scope to all vector/mask operations. > Hence, these two APIs are a part of VectorMask.fromArray()/intoArray() relatively. Although they really do not any memory access operation. If we have a better name for these two IRs, I think that would be another topic that we can revisit with a separate thread. WDYT? Sure, we can do it in a future RFE. It is just that bad naming makes it harder for me to review your PR, and so I'm a bit annoying for you probably. I'm sorry for that. Thanks for your patience and explaining things :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459118785 From epeter at openjdk.org Fri Oct 24 07:19:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 07:19:18 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <1qB16WqDleABsguKwI8xSgWBf1NFQ7uOZByQHIIXdOU=.e30842bf-67f4-4fb3-b877-b91b288912bc@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> <1qB16WqDleABsguKwI8xSgWBf1NFQ7uOZByQHIIXdOU=.e30842bf-67f4-4fb3-b877-b91b288912bc@github.com> Message-ID: <3Vc1sIRj7GOSPv3E1tz6xOOmjTuN40yWsfbTvm5LdS0=.005267aa-3f4e-4f78-b55d-4900d8d7065e@github.com> On Fri, 24 Oct 2025 07:15:44 GMT, Emanuel Peter wrote: >> Hi @eme64 , I updated a commit with renaming the matcher function to `mask_op_uses_packed_vector`. Is this fine to you? The main concern here is that only the specified vector mask ops [(VectorMaskOpNode)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.hpp#L1343) need the packed vector mask. Name `vector_mask_must_be_packed` might extend the scope to all vector/mask operations. > >> Hence, these two APIs are a part of VectorMask.fromArray()/intoArray() relatively. Although they really do not any memory access operation. If we have a better name for these two IRs, I think that would be another topic that we can revisit with a separate thread. WDYT? > > Sure, we can do it in a future RFE. It is just that bad naming makes it harder for me to review your PR, and so I'm a bit annoying for you probably. I'm sorry for that. Thanks for your patience and explaining things :) > Hi @eme64 , I updated a commit with renaming the matcher function to mask_op_uses_packed_vector. Is this fine to you? The main concern here is that only the specified vector mask ops [(VectorMaskOpNode)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.hpp#L1343) need the packed vector mask. Name vector_mask_must_be_packed might extend the scope to all vector/mask operations. Sounds, good. I'll have a look at the code. More precise names are always preferable. And some code comments can help refine the definition further: what are the guarantees if you return true or false? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459121523 From hgreule at openjdk.org Fri Oct 24 07:26:04 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 24 Oct 2025 07:26:04 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 07:09:25 GMT, Emanuel Peter wrote: > It seems the root cause is a divergence in functionality in GVN between different representations of Div/Mod nodes. > > How hard would it be to align the implementation and improve GVN handling for other representations, so the same type is constructed irrespective of the IR shape? For Div, we can have either the magic multiply/shift variant, or shifting for powers of 2. Each of these can have slightly different shapes depending on e.g., the sign of the divisor, the sign of the dividend, and the sign of the magic constant. For DivL, we also use MulHi if supported and a different sequence of instructions otherwise. For Mod, we either do what we do for Div, multiply again and subtract to get the remainder; or we just directly use And, or we do the Mersenne number optimization (related: https://bugs.openjdk.org/browse/JDK-8370135) which unrolls the same few operations multiple times. Generally, there also isn't one guaranteed result node (like e.g., Sub) where we could place code that recognizes these patterns and provides better results, so I don't think this is feasible (for Div it *might* be doable, at least I only found that off-by-one overapproximation that could be dealt with in Sub). > Alternatively, as part of the expansion, new representation can be wrapped in CastII/CastLL with the narrower type of original Div/Mod node. I thought about that a bit as well and I think it has the same downside as the current approach: As soon as we don't use the Div/Mod node anymore, making the inputs more precise doesn't help anymore. We still have the Cast node, but that node doesn't know how to recalculate/improve its own type. (Additionally, but less of a problem, a Cast node would require optimizations checking their inputs for Div/Mod nodes to uncast). Basically, this comes back to what e-graphs do better: remember multiple alternative constructs for the same semantic operation. Without considering whether that's realistic, if a Cast node would keep the original operation alive somehow (but that operation isn't further optimized itself, I guess), then the Cast node could recalculate its type depending on multiple variants and choose the more specific result even at later stages of optimization. That said, I'm not in the position if using Cast nodes is more idiomatic, and I'm open to rework the PR to use Cast nodes if you want. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3441493233 From epeter at openjdk.org Fri Oct 24 07:34:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 07:34:08 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 05:54:23 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Rename the matcher function and fix comment issue src/hotspot/cpu/aarch64/aarch64_vector.ad line 395: > 393: // By default, all the mask query operations without predicate support > 394: // requires the mask to be saved in a boolean vector. > 395: bool Matcher::mask_op_uses_packed_vector(int opcode, const TypeVect* vt) { I find `uses` to be ambiguous. does `mask_op` require packed vector (nothing else accepted), or just allow packed vector (and other options are also accepted)? Your `Return true if` comment above suggests it is a `requires` case, right? Could you please also add a `Return false if` comment? src/hotspot/cpu/aarch64/aarch64_vector.ad line 402: > 400: // These ops are implemented with predicate instructions if input > 401: // mask is a predciate. > 402: return vt->isa_vectmask() == nullptr; If we had an assert above, what else that `vt` could be other than `vectmask`, it would help in understanding this logic here ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459136062 PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459144173 From epeter at openjdk.org Fri Oct 24 07:34:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 07:34:09 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: Message-ID: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> On Fri, 24 Oct 2025 07:22:16 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename the matcher function and fix comment issue > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 395: > >> 393: // By default, all the mask query operations without predicate support >> 394: // requires the mask to be saved in a boolean vector. >> 395: bool Matcher::mask_op_uses_packed_vector(int opcode, const TypeVect* vt) { > > I find `uses` to be ambiguous. does `mask_op` require packed vector (nothing else accepted), or just allow packed vector (and other options are also accepted)? > > Your `Return true if` comment above suggests it is a `requires` case, right? > > Could you please also add a `Return false if` comment? Could we have some sort of assert on `vt` here? What input types are allowed? `isa_vect`, `isa_vectmask`, and what else? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459141135 From epeter at openjdk.org Fri Oct 24 07:34:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 07:34:10 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Fri, 24 Oct 2025 07:24:28 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 395: >> >>> 393: // By default, all the mask query operations without predicate support >>> 394: // requires the mask to be saved in a boolean vector. >>> 395: bool Matcher::mask_op_uses_packed_vector(int opcode, const TypeVect* vt) { >> >> I find `uses` to be ambiguous. does `mask_op` require packed vector (nothing else accepted), or just allow packed vector (and other options are also accepted)? >> >> Your `Return true if` comment above suggests it is a `requires` case, right? >> >> Could you please also add a `Return false if` comment? > > Could we have some sort of assert on `vt` here? What input types are allowed? `isa_vect`, `isa_vectmask`, and what else? There could be additional confusion: is the `packed vector` for the mask, or for all its inputs? is the `vt` for the mask type, or the output type of the `mask_op`? Suggestion: `mask_op_uses_packed_vector` -> `mask_op_uses_packed_vector_mask` `vt` -> `mask_vt` What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459149531 From epeter at openjdk.org Fri Oct 24 07:45:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 07:45:07 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: <3Vc1sIRj7GOSPv3E1tz6xOOmjTuN40yWsfbTvm5LdS0=.005267aa-3f4e-4f78-b55d-4900d8d7065e@github.com> References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> <1qB16WqDleABsguKwI8xSgWBf1NFQ7uOZByQHIIXdOU=.e30842bf-67f4-4fb3-b877-b91b288912bc@github.com> <3Vc1sIRj7GOSPv3E1tz6xOOmjTuN40yWsfbTvm5LdS0=.005267aa-3f4e-4f78-b55d-4900d8d7065e@github.com> Message-ID: On Fri, 24 Oct 2025 07:17:08 GMT, Emanuel Peter wrote: >>> Hence, these two APIs are a part of VectorMask.fromArray()/intoArray() relatively. Although they really do not any memory access operation. If we have a better name for these two IRs, I think that would be another topic that we can revisit with a separate thread. WDYT? >> >> Sure, we can do it in a future RFE. It is just that bad naming makes it harder for me to review your PR, and so I'm a bit annoying for you probably. I'm sorry for that. Thanks for your patience and explaining things :) > >> Hi @eme64 , I updated a commit with renaming the matcher function to mask_op_uses_packed_vector. Is this fine to you? The main concern here is that only the specified vector mask ops [(VectorMaskOpNode)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.hpp#L1343) need the packed vector mask. Name vector_mask_must_be_packed might extend the scope to all vector/mask operations. > > Sounds, good. I'll have a look at the code. More precise names are always preferable. And some code comments can help refine the definition further: what are the guarantees if you return true or false? Maybe @PaulSandoz has a good idea for a better naming of `VectorLoadMask` and `VectorStoreMask`? @XiaohongGong Is there any good place where we already document the different kinds of masks, and how they can be converted, and how they are used? If not: it would be really great if we could add that to `vectornode.hpp`. I also see that `TypeVectMask` has no class comment. We really should improve things there. It would make reviewing Vector API code so much easier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459181885 From epeter at openjdk.org Fri Oct 24 07:47:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 07:47:10 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 07:23:02 GMT, Hannes Greule wrote: >> @iwanowww I'm not quite following your suggestions / questions. >> >>> It seems the root cause is a divergence in functionality in GVN between different representations of Div/Mod nodes. >> >>> How hard would it be to align the implementation and improve GVN handling for other representations, so the same type is constructed irrespective of the IR shape? >> >> Do you consider the "expanded" versions of Div/Mod as a "different representation of Div/Mod"? >> If yes: we could pattern match for such "expanded" versions of Div/Mod, but it would be quite complex: you would have to parse through patterns like displayed [up here](https://github.com/openjdk/jdk/pull/27886#issuecomment-3436423466). Do you think that is a good idea? I already mentioned the idea [up here](https://github.com/openjdk/jdk/pull/27886#issuecomment-3435782079), but did not think it was desirable due to the complexity. >> >>> Alternatively, as part of the expansion, new representation can be wrapped in CastII/CastLL with the narrower type of original Div/Mod node. >> >> How does this "wrapping" help? After parsing, the CastII at the bottom of the "expanded" Div would just have the whole int range. How would the type of the CastII ever be improved, without pattern matching the "expanded" Div? > >> It seems the root cause is a divergence in functionality in GVN between different representations of Div/Mod nodes. >> >> How hard would it be to align the implementation and improve GVN handling for other representations, so the same type is constructed irrespective of the IR shape? > > For Div, we can have either the magic multiply/shift variant, or shifting for powers of 2. Each of these can have slightly different shapes depending on e.g., the sign of the divisor, the sign of the dividend, and the sign of the magic constant. > For DivL, we also use MulHi if supported and a different sequence of instructions otherwise. > > For Mod, we either do what we do for Div, multiply again and subtract to get the remainder; or we just directly use And, or we do the Mersenne number optimization (related: https://bugs.openjdk.org/browse/JDK-8370135) which unrolls the same few operations multiple times. > > Generally, there also isn't one guaranteed result node (like e.g., Sub) where we could place code that recognizes these patterns and provides better results, so I don't think this is feasible (for Div it *might* be doable, at least I only found that off-by-one overapproximation that could be dealt with in Sub). > >> Alternatively, as part of the expansion, new representation can be wrapped in CastII/CastLL with the narrower type of original Div/Mod node. > > I thought about that a bit as well and I think it has the same downside as the current approach: As soon as we don't use the Div/Mod node anymore, making the inputs more precise doesn't help anymore. We still have the Cast node, but that node doesn't know how to recalculate/improve its own type. > (Additionally, but less of a problem, a Cast node would require optimizations checking their inputs for Div/Mod nodes to uncast). > > Basically, this comes back to what e-graphs do better: remember multiple alternative constructs for the same semantic operation. Without considering whether that's realistic, if a Cast node would keep the original operation alive somehow (but that operation isn't further optimized itself, I guess), then the Cast node could recalculate its type depending on multiple variants and choose the more specific result even at later stages of optimization. > > That said, I'm not in the position if using Cast nodes is more idiomatic, and I'm open to rework the PR to use Cast nodes if you want. @SirYwell We were probably typing at the same time ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3441588487 From duke at openjdk.org Fri Oct 24 08:04:03 2025 From: duke at openjdk.org (erifan) Date: Fri, 24 Oct 2025 08:04:03 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Fri, 24 Oct 2025 07:28:24 GMT, Emanuel Peter wrote: >> Could we have some sort of assert on `vt` here? What input types are allowed? `isa_vect`, `isa_vectmask`, and what else? > > There could be additional confusion: is the `packed vector` for the mask, or for all its inputs? is the `vt` for the mask type, or the output type of the `mask_op`? > > Suggestion: > `mask_op_uses_packed_vector` -> `mask_op_uses_packed_vector_mask` > `vt` -> `mask_vt` > > What do you think? > I find uses to be ambiguous. does mask_op require packed vector (nothing else accepted), or just allow packed vector (and other options are also accepted)? Currently, there are three layouts for vector masks in Aarch64 registers: 1. Predicate register (requires SVE support), **m-bit** mask in **p** register for **m*8-byte** vector element. 2. Unpacked vector register, **m-byte** mask in **v** register for **m-byte** vector element. 3. Packed vector register, **1-byte** mask in **v** register for **m-byte** vector element. @XiaohongGong explained this earlier. So, if SVE is supported, an IR with a mask input, such as `VectorMaskToLong`, can use any layout. Different layouts will simply result in different code generation. For performance reasons, we use different layouts for different IRs and in different cases. So, there's no mandatory layout; it's just about better performance. So, personally I tend to **use**. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459236637 From mchevalier at openjdk.org Fri Oct 24 08:06:19 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 24 Oct 2025 08:06:19 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v3] In-Reply-To: References: Message-ID: > Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. > > There is one detail, we used to have > > void restore_major_progress(int progress) { _major_progress += progress; } > > > It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. > > It has a weird semantics: > > Progress before | Progress after verification | Progress after restore | What would be the assignment semantics > ----------------|-----------------------------|-----------------------|- > 0 | 0 | 0 | 0 > 1 | 0 | 1 | 1 > 0 | 1 | 1 | 0 (mismatch!) > 1 | 1 | 2 | 1 (same truthiness) > > It is rather a or than a restore, and a proper boolean version of that would be > > void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } > > but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural > > void set_major_progress(bool progress) { _major_progress = progress; } > > that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: More comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27912/files - new: https://git.openjdk.org/jdk/pull/27912/files/96a1bc7c..9e5612a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27912&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27912&range=01-02 Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27912/head:pull/27912 PR: https://git.openjdk.org/jdk/pull/27912 From xgong at openjdk.org Fri Oct 24 08:12:01 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 24 Oct 2025 08:12:01 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 10:45:47 GMT, erifan wrote: > According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 environment using qemu. > > This PR fixes these test failures. LGTM! Thanks for your fix! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/27723#pullrequestreview-3375022126 From mdoerr at openjdk.org Fri Oct 24 08:29:18 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 24 Oct 2025 08:29:18 GMT Subject: Integrated: 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods In-Reply-To: References: Message-ID: <2eAlOtc_KBr0Tn-43Uxgg3wk5HVsFLo23HFfOmhJOGU=.48533aef-e0e9-4e9f-99fa-e4b8495e5912@github.com> On Fri, 26 Sep 2025 16:12:14 GMT, Martin Doerr wrote: > We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it. > > We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses. > > I've tested this proposal by the following code on x86_64: > > diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp > index a6b4efbe4f2..d715e69c850 100644 > --- a/src/hotspot/cpu/x86/interp_masm_x86.cpp > +++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp > @@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() { > void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) { > prepare_to_jump_from_interpreted(); > > + if (UseNewCode) { > + Label ok; > + movptr(temp, Address(method, Method::from_interpreted_offset())); > + cmpptr(temp, Address(method, Method::interpreter_entry_offset())); > + je(ok); > + movptr(rax, Address(method, Method::from_compiled_offset())); > + movptr(rbx, rax); > + addptr(rbx, 128); > + hlt(); > + bind(ok); > + } > + > if (JvmtiExport::can_post_interpreter_events()) { > Label run_compiled_code; > // JVMTI events, such as single-stepping, are implemented partly by avoiding running > > > The output is (requires hsdis library, otherwise we only get the hex dump): > > RAX=0x00007f1a75000100 is at entry_point+0 in (nmethod*)0x00007f1a75000008 > Compiled method (c1) 2504 1 3 java.lang.Byte::toUnsignedInt (6 bytes) > total in heap [0x00007f1a75000008,0x00007f1a750001f8] = 496 > main code [0x00007f1a75000100,0x00007f1a750001b8] = 184 > stub code [0x00007f1a750001b8,0x00007f1a750001f8] = 64 > mutable data [0x00007f1a1001e0b0,0x00007f1a1001e0e0] = 48 > relocation [0x00007f1a1001e0b0,0x00007f1a1001e0d8] = 40 > metadata [0x00007f1a1001e0d8,0x00007f1a1001e0e0] = 8 > immutable data [0x00007f1a1001dcd0,0x00007f1a1001dd30] = 96 > dependencies [0x00007f1a1001dcd0,0x00007f1a1001dcd8] = 8 > scopes pcs [0x00007f1a1001dcd8,0x00007f1a1001dd18] = 64 > scopes data [... This pull request has now been integrated. Changeset: b31bbfcf Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/b31bbfcf2f13fa5b16762f5384d95c2b5d9c5705 Stats: 42 lines in 3 files changed: 42 ins; 0 del; 0 mod 8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods Reviewed-by: stuefe, aph, mbaesken, shade ------------- PR: https://git.openjdk.org/jdk/pull/27530 From epeter at openjdk.org Fri Oct 24 08:30:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 08:30:04 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Fri, 24 Oct 2025 08:01:38 GMT, erifan wrote: >> There could be additional confusion: is the `packed vector` for the mask, or for all its inputs? is the `vt` for the mask type, or the output type of the `mask_op`? >> >> Suggestion: >> `mask_op_uses_packed_vector` -> `mask_op_uses_packed_vector_mask` >> `vt` -> `mask_vt` >> >> What do you think? > >> I find uses to be ambiguous. does mask_op require packed vector (nothing else accepted), or just allow packed vector (and other options are also accepted)? > > Currently, there are three layouts for vector masks in Aarch64 registers: > 1. Predicate register (requires SVE support), **m-bit** mask in **p** register for **m*8-byte** vector element. > 2. Unpacked vector register, **m-byte** mask in **v** register for **m-byte** vector element. > 3. Packed vector register, **1-byte** mask in **v** register for **m-byte** vector element. > > @XiaohongGong explained this earlier. > > So, if SVE is supported, an IR with a mask input, such as `VectorMaskToLong`, can use any layout. Different layouts will simply result in different code generation. For performance reasons, we use different layouts for different IRs and in different cases. So, there's no mandatory layout; it's just about better performance. So, personally I tend to **use**. Ah, I see, thanks for the explanations. Is this documented somewhere in code comments? That would really save you having to explain it repeatedly: you could just point to the code comments ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459303010 From epeter at openjdk.org Fri Oct 24 08:55:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 08:55:03 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Fri, 24 Oct 2025 08:27:49 GMT, Emanuel Peter wrote: >>> I find uses to be ambiguous. does mask_op require packed vector (nothing else accepted), or just allow packed vector (and other options are also accepted)? >> >> Currently, there are three layouts for vector masks in Aarch64 registers: >> 1. Predicate register (requires SVE support), **m-bit** mask in **p** register for **m*8-byte** vector element. >> 2. Unpacked vector register, **m-byte** mask in **v** register for **m-byte** vector element. >> 3. Packed vector register, **1-byte** mask in **v** register for **m-byte** vector element. >> >> @XiaohongGong explained this earlier. >> >> So, if SVE is supported, an IR with a mask input, such as `VectorMaskToLong`, can use any layout. Different layouts will simply result in different code generation. For performance reasons, we use different layouts for different IRs and in different cases. So, there's no mandatory layout; it's just about better performance. So, personally I tend to **use**. > > Ah, I see, thanks for the explanations. Is this documented somewhere in code comments? That would really save you having to explain it repeatedly: you could just point to the code comments ;) What about `prefers` instead of `uses`? Or `should_use`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459396208 From duke at openjdk.org Fri Oct 24 08:59:13 2025 From: duke at openjdk.org (erifan) Date: Fri, 24 Oct 2025 08:59:13 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Fri, 24 Oct 2025 08:52:13 GMT, Emanuel Peter wrote: >> Ah, I see, thanks for the explanations. Is this documented somewhere in code comments? That would really save you having to explain it repeatedly: you could just point to the code comments ;) > > What about `prefers` instead of `uses`? Or `should_use`? I don't know where there is such a comment, if no, maybe this function is a good place to comment this. Also I wonder if it's better to return an enum constant? For example: PREDICATE_MASK PACKED_MASK UNPACKED_MASK ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459407774 From mli at openjdk.org Fri Oct 24 09:02:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 Oct 2025 09:02:20 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v5] In-Reply-To: References: <4JU6tXdnIFqPXbDMV14FBOr7yvA4mQz6Cd4s1wOjmY0=.05b28d35-bc90-49b5-a321-d37ab3070799@github.com> <33pm95WaCjJp6jwm4LaSx9F_ZGPy_uZGlwukkN3o0XM=.b3c414d5-4598-40bf-b0d9-ea9088e4f881@github.com> Message-ID: <3HBIMP0fqwmDkMmpDMRPhlo7bAPgKccAJZfeu3e2WV0=.318dc21e-1afe-461a-affd-08febba6094f@github.com> On Fri, 24 Oct 2025 07:12:29 GMT, Emanuel Peter wrote: >> Added tests to cover UI{GE|LT|LE}forF and UL{GE|LT|LE}forD. >> >> Other tests for example UI{GE|LT|LE}forD UL{GE|LT|LE}forF could be added when I work on https://github.com/openjdk/jdk/pull/25336 or https://github.com/openjdk/jdk/pull/25341, as currently they are not vectorized. > > If you already have the tests in code, it may be good to just put all tests in now. Of course with adjusted IR rules. That would allow us to verify correctness on all combinations, and backport the tests as well. What do you think? Sure, added all the combination for unsigned comparison. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2459407325 From mli at openjdk.org Fri Oct 24 09:02:19 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 Oct 2025 09:02:19 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v7] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? @eme64 > > ## Issue > > Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. > For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. > > ## ?Fix > Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. > > Thanks > > This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27942/files - new: https://git.openjdk.org/jdk/pull/27942/files/309880d8..ecb38321 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=05-06 Stats: 414 lines in 1 file changed: 413 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27942/head:pull/27942 PR: https://git.openjdk.org/jdk/pull/27942 From duke at openjdk.org Fri Oct 24 09:04:16 2025 From: duke at openjdk.org (erifan) Date: Fri, 24 Oct 2025 09:04:16 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Fri, 24 Oct 2025 08:56:35 GMT, erifan wrote: >> What about `prefers` instead of `uses`? Or `should_use`? > > I don't know where there is such a comment, if no, maybe this function is a good place to comment this. > > Also I wonder if it's better to return an enum constant? For example: > > PREDICATE_MASK > PACKED_MASK > UNPACKED_MASK By the way, I'm fine to the current implementation. Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459421864 From duke at openjdk.org Fri Oct 24 09:10:20 2025 From: duke at openjdk.org (erifan) Date: Fri, 24 Oct 2025 09:10:20 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Fri, 24 Oct 2025 09:01:35 GMT, erifan wrote: >> I don't know where there is such a comment, if no, maybe this function is a good place to comment this. >> >> Also I wonder if it's better to return an enum constant? For example: >> >> PREDICATE_MASK >> PACKED_MASK >> UNPACKED_MASK > > By the way, I'm fine to the current implementation. Thanks Both `prefer` and `use` look good to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2459438526 From hgreule at openjdk.org Fri Oct 24 09:10:36 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 24 Oct 2025 09:10:36 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v2] In-Reply-To: References: Message-ID: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: expand comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27886/files - new: https://git.openjdk.org/jdk/pull/27886/files/6a392224..6a8d842f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=00-01 Stats: 25 lines in 1 file changed: 17 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/27886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27886/head:pull/27886 PR: https://git.openjdk.org/jdk/pull/27886 From hgreule at openjdk.org Fri Oct 24 09:10:38 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 24 Oct 2025 09:10:38 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 07:23:02 GMT, Hannes Greule wrote: >> @iwanowww I'm not quite following your suggestions / questions. >> >>> It seems the root cause is a divergence in functionality in GVN between different representations of Div/Mod nodes. >> >>> How hard would it be to align the implementation and improve GVN handling for other representations, so the same type is constructed irrespective of the IR shape? >> >> Do you consider the "expanded" versions of Div/Mod as a "different representation of Div/Mod"? >> If yes: we could pattern match for such "expanded" versions of Div/Mod, but it would be quite complex: you would have to parse through patterns like displayed [up here](https://github.com/openjdk/jdk/pull/27886#issuecomment-3436423466). Do you think that is a good idea? I already mentioned the idea [up here](https://github.com/openjdk/jdk/pull/27886#issuecomment-3435782079), but did not think it was desirable due to the complexity. >> >>> Alternatively, as part of the expansion, new representation can be wrapped in CastII/CastLL with the narrower type of original Div/Mod node. >> >> How does this "wrapping" help? After parsing, the CastII at the bottom of the "expanded" Div would just have the whole int range. How would the type of the CastII ever be improved, without pattern matching the "expanded" Div? > >> It seems the root cause is a divergence in functionality in GVN between different representations of Div/Mod nodes. >> >> How hard would it be to align the implementation and improve GVN handling for other representations, so the same type is constructed irrespective of the IR shape? > > For Div, we can have either the magic multiply/shift variant, or shifting for powers of 2. Each of these can have slightly different shapes depending on e.g., the sign of the divisor, the sign of the dividend, and the sign of the magic constant. > For DivL, we also use MulHi if supported and a different sequence of instructions otherwise. > > For Mod, we either do what we do for Div, multiply again and subtract to get the remainder; or we just directly use And, or we do the Mersenne number optimization (related: https://bugs.openjdk.org/browse/JDK-8370135) which unrolls the same few operations multiple times. > > Generally, there also isn't one guaranteed result node (like e.g., Sub) where we could place code that recognizes these patterns and provides better results, so I don't think this is feasible (for Div it *might* be doable, at least I only found that off-by-one overapproximation that could be dealt with in Sub). > >> Alternatively, as part of the expansion, new representation can be wrapped in CastII/CastLL with the narrower type of original Div/Mod node. > > I thought about that a bit as well and I think it has the same downside as the current approach: As soon as we don't use the Div/Mod node anymore, making the inputs more precise doesn't help anymore. We still have the Cast node, but that node doesn't know how to recalculate/improve its own type. > (Additionally, but less of a problem, a Cast node would require optimizations checking their inputs for Div/Mod nodes to uncast). > > Basically, this comes back to what e-graphs do better: remember multiple alternative constructs for the same semantic operation. Without considering whether that's realistic, if a Cast node would keep the original operation alive somehow (but that operation isn't further optimized itself, I guess), then the Cast node could recalculate its type depending on multiple variants and choose the more specific result even at later stages of optimization. > > That said, I'm not in the position if using Cast nodes is more idiomatic, and I'm open to rework the PR to use Cast nodes if you want. > @SirYwell We were probably typing at the same time ;) Indeed :) I still updated the comments now as they are useful independently of the ongoing discussion. Feel free to give your opinion on that :) I also noticed that `x % 2 == 0` isn't optimized to `(x & 1) == 0`, which is a bit surprising. I found but that seems to be different, so maybe there is a regression somewhere *if* that worked before? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3442004572 From epeter at openjdk.org Fri Oct 24 09:56:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 09:56:25 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v15] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Fri, 24 Oct 2025 09:40:45 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/memnode.hpp line 1424: >> >>> 1422: template ProjNode* apply_to_narrow_mem_projs(DUIterator& i, Callback callback) const { >>> 1423: return apply_to_narrow_mem_projs_any_iterator(UsesIterator(i, this), callback); >>> 1424: } >> >> Do we really need to have a `public` API where we can pass in an iterator? What if someone uses it with an iterator for the wrong node? >> >> The only use for it seems to be [ConnectionGraph::split_unique_types](https://github.com/openjdk/jdk/pull/24570/files#diff-03f7ae3cf79ff61be6e4f0590b7809a87825b073341fdbfcf36143b99c304474R4467) with `DUIterator`. >> >> Is there a reason you want to pass the iterator explicitly? > > Also: why does it return anything? What kind of callback is exected here? I think the public API deserves a bit of documentation. The use in [ConnectionGraph::split_unique_types](https://github.com/openjdk/jdk/pull/24570/files#diff-03f7ae3cf79ff61be6e4f0590b7809a87825b073341fdbfcf36143b99c304474R4467) really does not need a return value, and also could work fine with a `callback` that return `void`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459562042 From epeter at openjdk.org Fri Oct 24 09:56:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 09:56:23 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v15] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Mon, 29 Sep 2025 08:44:51 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - review > - Roberto's patches @rwestrel Thanks for your continued work on this, and you patience with the slow reviews ? I stumbled a bit over the many overloads of `apply_to_narrow_mem_projs`, and left some comments for that below. There are really only one use of `already_has_narrow_mem_proj_with_adr_type` and one of `apply_to_narrow_mem_projs`, and you have a whole lot of methods that implement a lot of abstractions that confuse me. Is there a plan behind this, or just an artefact of many refactorings? Could we not just implement those 2 methods directly using `apply_to_projs_any_iterator`? I'll continue reviewing other parts now. src/hotspot/share/opto/escape.cpp line 4457: > 4455: const TypePtr* new_adr_type = tinst->add_offset(adr_type->offset()); > 4456: bool already_has_narrow_mem_proj_with_adr_type = init->already_has_narrow_mem_proj_with_adr_type(new_adr_type); > 4457: if (adr_type != new_adr_type && !already_has_narrow_mem_proj_with_adr_type) { Nit: we could avoid the iteration inside `already_has_narrow_mem_proj_with_adr_type` if we did the `adr_type != new_adr_type` first. Feel free to ignore if you think this is a micro-optimization ;) src/hotspot/share/opto/escape.cpp line 4467: > 4465: return MultiNode::CONTINUE; > 4466: }; > 4467: init->apply_to_narrow_mem_projs(i, process_narrow_proj); Is there a reason why this is not a `init->for_each_narrow_mem_proj(callback)`, that has an internal iterator? Because with this API, I'm wondering: What would happen if I feed `apply_to_narrow_mem_projs` an iterator that does not belong to the `init`? src/hotspot/share/opto/memnode.cpp line 5491: > 5489: }; > 5490: return apply_to_narrow_mem_projs(find_proj, adr_type) != nullptr; > 5491: } Am I seeing this right: `already_has_narrow_mem_proj_with_adr_type` calls `apply_to_narrow_mem_projs` with `callback = find_proj`. `find_proj` returns `BREAK_AND_RETURN_CURRENT_PROJ`, which is an element from `ApplyToProjs`. That would mean that when we call the `callback`, we get an enum element, and not a boolean, right? If that is the case, you should probably not do an implicit comparison on this line: ` if (proj->adr_type() == adr_type && callback(proj->as_NarrowMemProj())) {` Hotspot style guide does not like implicit conversion. You should use an explicit comparison. I think it would also be more clear what is happening. Currently, I'm a bit confused. All the overloadings of `apply_to_narrow_mem_projs` make it a bit hard to see what goes where :/ I wonder if we really need all that complexity. src/hotspot/share/opto/memnode.hpp line 1408: > 1406: template ProjNode* apply_to_narrow_mem_projs_any_iterator(Iterator i, Callback callback) const { > 1407: auto filter = [&](ProjNode* proj) { > 1408: if (proj->is_NarrowMemProj() && callback(proj->as_NarrowMemProj())) { What does the `callback` return here? Are we sure this is not an implicit zero/null check, that the hotspot style guide would not be happy with? src/hotspot/share/opto/memnode.hpp line 1424: > 1422: template ProjNode* apply_to_narrow_mem_projs(DUIterator& i, Callback callback) const { > 1423: return apply_to_narrow_mem_projs_any_iterator(UsesIterator(i, this), callback); > 1424: } Do we really need to have a `public` API where we can pass in an iterator? What if someone uses it with an iterator for the wrong node? The only use for it seems to be [ConnectionGraph::split_unique_types](https://github.com/openjdk/jdk/pull/24570/files#diff-03f7ae3cf79ff61be6e4f0590b7809a87825b073341fdbfcf36143b99c304474R4467) with `DUIterator`. Is there a reason you want to pass the iterator explicitly? src/hotspot/share/opto/memnode.hpp line 1428: > 1426: template ProjNode* apply_to_narrow_mem_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback) const { > 1427: return apply_to_narrow_mem_projs_any_iterator(UsesIteratorFast(imax, i, this), callback); > 1428: } Is there any use for this method? I could not find any. ------------- PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-3375302849 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459494172 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459451788 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459537600 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459569501 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459482037 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459456185 From epeter at openjdk.org Fri Oct 24 09:56:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 09:56:24 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v15] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Fri, 24 Oct 2025 09:12:43 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - Roberto's patches > > src/hotspot/share/opto/escape.cpp line 4467: > >> 4465: return MultiNode::CONTINUE; >> 4466: }; >> 4467: init->apply_to_narrow_mem_projs(i, process_narrow_proj); > > Is there a reason why this is not a `init->for_each_narrow_mem_proj(callback)`, that has an internal iterator? > > Because with this API, I'm wondering: What would happen if I feed `apply_to_narrow_mem_projs` an iterator that does not belong to the `init`? If it was a plain `for_each_narrow_mem_proj`, your `callback` would also not have to return anything, you could remove this line: `return MultiNode::CONTINUE;`. > src/hotspot/share/opto/memnode.hpp line 1424: > >> 1422: template ProjNode* apply_to_narrow_mem_projs(DUIterator& i, Callback callback) const { >> 1423: return apply_to_narrow_mem_projs_any_iterator(UsesIterator(i, this), callback); >> 1424: } > > Do we really need to have a `public` API where we can pass in an iterator? What if someone uses it with an iterator for the wrong node? > > The only use for it seems to be [ConnectionGraph::split_unique_types](https://github.com/openjdk/jdk/pull/24570/files#diff-03f7ae3cf79ff61be6e4f0590b7809a87825b073341fdbfcf36143b99c304474R4467) with `DUIterator`. > > Is there a reason you want to pass the iterator explicitly? Also: why does it return anything? What kind of callback is exected here? I think the public API deserves a bit of documentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459559137 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459554017 From mchevalier at openjdk.org Fri Oct 24 09:58:08 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 24 Oct 2025 09:58:08 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v3] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 08:06:19 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > More comment I've added a comment, also using Christian's input. > 2. Is the type change correct overall. I've did something as @vnkozlov describes: have side by side the bool and the int version of the major progress, have the methods acts on both at the same time: on the int as it used to, on the bool as I propose here. Add the proposed assert in the getter. I've also made sure to assign both the int and the bool version for the 2 places in compile.cpp that assign _major_progress directly. It passes tier1-3 + other internal tests. This also makes sure there is no observable difference between the += for the int version, and the assignment for the bool version. I've pushed that to tier 4-6, with success. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3442256107 From mchevalier at openjdk.org Fri Oct 24 10:04:32 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 24 Oct 2025 10:04:32 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v8] In-Reply-To: References: Message-ID: <_9_fUHVqFJ9mXpEtj-H6qt4tJc0Z1owPEYcvPjo39-8=.fdcb4e71-d99a-4e1e-8fd6-400dee008362@github.com> > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Comments + merge tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27586/files - new: https://git.openjdk.org/jdk/pull/27586/files/f6112af2..0035c8fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=06-07 Stats: 377 lines in 5 files changed: 141 ins; 219 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/27586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27586/head:pull/27586 PR: https://git.openjdk.org/jdk/pull/27586 From mchevalier at openjdk.org Fri Oct 24 10:04:34 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 24 Oct 2025 10:04:34 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v7] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 15:35:08 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > review I've merged the tests, fixed the comments, and tested tier1-6 + some internal tests, without (related) problems. I've also made sure that each `@run` (but the last, the one without flags) still reproduces on master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3442286237 From epeter at openjdk.org Fri Oct 24 10:41:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 10:41:16 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v15] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: <_qiLczbSCNh63QdZCymWBBv0nh82yEO5euGDw9RyMH8=.8142c826-00bd-4aab-9574-cc3004b04dd3@github.com> On Mon, 29 Sep 2025 08:44:51 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - review > - Roberto's patches Second batch of comments. Will continue later at `class NarrowMemProjNode`. src/hotspot/share/opto/library_call.cpp line 5612: > 5610: return MultiNode::CONTINUE; > 5611: }; > 5612: init->apply_to_projs(move_proj, TypeFunc::Memory); A "for each" using callback with `void` return would create a little less noise. src/hotspot/share/opto/macro.cpp line 1609: > 1607: // null, this allocation does have an InitializeNode but this logic can't locate it (see comment in > 1608: // PhaseMacroExpand::initialize_object()). > 1609: MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxRaw); Would it not be nicer to have the explanation here than below, and refer from below to here? Would help the reading flow ;) src/hotspot/share/opto/macro.cpp line 1638: > 1636: Node* ctrl = new ProjNode(init, TypeFunc::Control); > 1637: transform_later(ctrl); > 1638: Node* existing_raw_mem_proj = nullptr; Tiny suggestion: `existing_raw_mem_proj` -> `old_raw_mem_proj`, to emphasize that it is old, and will be replaced. src/hotspot/share/opto/macro.cpp line 1646: > 1644: return MultiNode::CONTINUE; > 1645: }; > 1646: init->apply_to_projs(find_raw_mem, TypeFunc::Memory); A "for each" with `void` return callback would reduce noise. src/hotspot/share/opto/memnode.cpp line 5468: > 5466: }; > 5467: apply_to_projs(imax, i, replace_proj, TypeFunc::Memory); > 5468: } Ouff, it's a little sad that we modify the iterator both outside and inside the method call. But I don't have a better solution now either. It may just be what we have to do. src/hotspot/share/opto/multnode.cpp line 63: > 61: }; > 62: return apply_to_projs(find_proj, which_proj, is_io_use); > 63: } A `find_first` API with a boolean predicate could reduce some noise here. src/hotspot/share/opto/multnode.cpp line 67: > 65: template ProjNode* MultiNode::apply_to_projs(Callback callback, uint which_proj, bool is_io_use) const { > 66: auto filter = [&](ProjNode* proj) { > 67: if (proj->_is_io_use == is_io_use && callback(proj)) { implicit zero check? src/hotspot/share/opto/multnode.cpp line 81: > 79: return CONTINUE; > 80: }; > 81: apply_to_projs(count_projs, which_proj); for each pattern would have been nice here. ------------- PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-3375556080 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459657397 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459675950 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459698388 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459680660 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459717105 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459729036 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459730207 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459733750 From epeter at openjdk.org Fri Oct 24 10:41:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 10:41:17 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v15] In-Reply-To: <_qiLczbSCNh63QdZCymWBBv0nh82yEO5euGDw9RyMH8=.8142c826-00bd-4aab-9574-cc3004b04dd3@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <_qiLczbSCNh63QdZCymWBBv0nh82yEO5euGDw9RyMH8=.8142c826-00bd-4aab-9574-cc3004b04dd3@github.com> Message-ID: On Fri, 24 Oct 2025 10:18:04 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - Roberto's patches > > src/hotspot/share/opto/macro.cpp line 1646: > >> 1644: return MultiNode::CONTINUE; >> 1645: }; >> 1646: init->apply_to_projs(find_raw_mem, TypeFunc::Memory); > > A "for each" with `void` return callback would reduce noise. Alternatives: - `init->unique_raw_mem_proj()` - `init->unique_out(raw_mem_proj_predicate)` with boolean predicate. Both of those would assert if we find multiple or none. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2459689999 From rcastanedalo at openjdk.org Fri Oct 24 12:33:37 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 24 Oct 2025 12:33:37 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing Message-ID: This changeset makes it easier to trace C2's individual bytecode parsing steps to the input class files and the output of type flow analysis, by: 1. dumping the `map` node (holding the JVM state), basic block (reverse post-order index), and method name as properties of the graph that C2 dumps after each parsed bytecode at IGV dump level 6; 2. appending this information to the graph name (hence generalizing [JDK-8356779](https://bugs.openjdk.org/browse/JDK-8356779)); and 3. making the graph name suffix configurable via `Options -> Graph Name Suffix`. By default, it appends `(map: [map], block #[block] at [method])` to the names of graphs containing these properties (bytecode parsing dumps), and nothing otherwise. Here are the `Outline` and `Properties` windows for $ java -Xbatch -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphLevel=6 before (left) and after (right) the changeset: before-after Please let me know if there are other bytecode parsing graph properties that could be useful to dump, and whether you think the default graph name suffix contains the right amount of information. #### Testing - tier1. - Tested automatically that dumping thousands of graphs does not trigger any assertion failure on HotSpot or IGV. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/27975/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27975&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370569 Stats: 155 lines in 9 files changed: 106 ins; 13 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/27975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27975/head:pull/27975 PR: https://git.openjdk.org/jdk/pull/27975 From rcastanedalo at openjdk.org Fri Oct 24 12:55:05 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 24 Oct 2025 12:55:05 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v8] In-Reply-To: <_9_fUHVqFJ9mXpEtj-H6qt4tJc0Z1owPEYcvPjo39-8=.fdcb4e71-d99a-4e1e-8fd6-400dee008362@github.com> References: <_9_fUHVqFJ9mXpEtj-H6qt4tJc0Z1owPEYcvPjo39-8=.fdcb4e71-d99a-4e1e-8fd6-400dee008362@github.com> Message-ID: <1dCuq-198PNU347VmUisH0qBi3sOjU9LPD0jRlJl9dw=.81de5eb5-69ba-4d53-ad31-87128605ac7b@github.com> On Fri, 24 Oct 2025 10:04:32 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Comments + merge tests Thank you Marc for addressing my comments, I only have one suggestion left: renaming `TooStrictAssertForUnrollAfterStressPeeling` with simply `TooStrictAssertForUnrollAfterPeeling` (since most scenarios can be exercised without the stress mode). ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27586#pullrequestreview-3376486951 From epeter at openjdk.org Fri Oct 24 13:42:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 13:42:27 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v15] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Mon, 29 Sep 2025 08:44:51 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - review > - Roberto's patches Last batch of comments for this round. I'm quite happy with the solution @rwestrel , thanks for coming up with this ? I think really all my suggestions / requests are about the complexity / implementation around the `apply_...` methods. It seems to me that it makes sense to internally implement it with the `apply_...` functionality that take callbacks which return a `ApplyToProjs`. But I think many of the `public` API methods could have a simpler semantic, for example many could just be `for_each_...` calls that just take a callback with void return / no return. src/hotspot/share/opto/memnode.cpp line 5476: > 5474: > 5475: > 5476: template ProjNode* InitializeNode::apply_to_narrow_mem_projs(Callback callback, const TypePtr* adr_type) const { Another nit: we will only ever return a `NarrowMemProj`, so you might as well make the return value more precise ;) src/hotspot/share/opto/multnode.cpp line 273: > 271: ProjNode::dump_compact_spec(st); > 272: MemNode::dump_adr_type(_adr_type, st); > 273: } Can you show us an example out put of `dump`? I'm just wondering if there maybe needs to be a space between the two, and if it is immediately readable :) src/hotspot/share/opto/multnode.hpp line 215: > 213: } > 214: public: > 215: NarrowMemProjNode(Node* src, const TypePtr* adr_type) Can you feed it any other `src` than a `InitializeNode*`? Suggestion: NarrowMemProjNode(InitializeNode* src, const TypePtr* adr_type) src/hotspot/share/opto/multnode.hpp line 232: > 230: }; > 231: > 232: template ProjNode* MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj) const { Does this not belong right after the `MultiNode`? Or even in `multnode.cpp`? src/hotspot/share/opto/multnode.hpp line 234: > 232: template ProjNode* MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj) const { > 233: auto filter = [&](ProjNode* proj) { > 234: if (proj->_con == which_proj && callback(proj)) { Implicit zero check? src/hotspot/share/opto/phaseX.cpp line 2598: > 2596: add_users_to_worklist0(proj, worklist); > 2597: return MultiNode::CONTINUE; > 2598: }; `for_each` call below would mean we would not need to return anything. ------------- PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-3376663539 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2460499158 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2460487363 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2460422959 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2460434657 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2460443335 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2460448870 From shade at openjdk.org Fri Oct 24 13:49:11 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 24 Oct 2025 13:49:11 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v6] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 08:44:44 GMT, Francesco Andreuzzi wrote: >> I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. >> >> Passes tier1 and tier2 (fastdebug). > > Francesco Andreuzzi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - cc > - Merge branch 'master' into JDK-8369219 > - nn > - update foundOne > - fix summary > - nn > - Merge branch 'master' into JDK-8369219 > - trigger > - nn > - othervm > - ... and 5 more: https://git.openjdk.org/jdk/compare/8965993a...b6d94cf8 Looks reasonable, but the test needs a bit more work. test/hotspot/jtreg/gc/NativeWrapperCollection/NativeWrapperCollection.java line 62: > 60: WB.enqueueMethodForCompilation(method, 1 /* compLevel */); > 61: while (WB.isMethodQueuedForCompilation(method)) { > 62: Thread.onSpinWait(); We are just waiting for compilation here. It is counter-productive to wait with a busy-loop. Insert a sleep for ~10...100ms instead. Same thing for the loop below. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27742#pullrequestreview-3376819734 PR Review Comment: https://git.openjdk.org/jdk/pull/27742#discussion_r2460528080 From bkilambi at openjdk.org Fri Oct 24 13:50:11 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 24 Oct 2025 13:50:11 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: Message-ID: <_XquQ5TR_3ZGRTZ1c3iVCRGkNBDYyXvXuhimlGQFKq4=.bae46e77-9028-4d92-8ae4-29a5eef4b27a@github.com> On Fri, 24 Oct 2025 05:54:23 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Rename the matcher function and fix comment issue src/hotspot/share/opto/vectorIntrinsics.cpp line 2548: > 2546: } > 2547: // VectorMaskToLongNode requires the input is either a mask or a vector with BOOLEAN type. > 2548: if (Matcher::mask_op_uses_packed_vector(Op_VectorMaskToLong, opd->bottom_type()->is_vect())) { So without this patch, it'd generate `VectorMaskToLong -> URShiftLNode -> AndLNode` (as the earlier `if` condition would have been false) and in the backend, the implementation for `VectorMaskToLong` contains code to convert the mask in a predicate to a packed vector (followed by the actual `VectorMaskToLong` related code). With this patch, it now generates `VectorStoreMaskNode -> VectorMaskToLong -> URShiftLNode ... `(the backend implementation is now separated at the IR level). Does the major performance uplift come from this Ideal optimization - `VectorMaskToLongNode::Ideal_MaskAll()` where the `VectorStoreMaskNode` gets optimized away? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2460569950 From fandreuzzi at openjdk.org Fri Oct 24 14:05:04 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Fri, 24 Oct 2025 14:05:04 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v7] In-Reply-To: References: Message-ID: > I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. > > Passes tier1 and tier2 (fastdebug). Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: sleep ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27742/files - new: https://git.openjdk.org/jdk/pull/27742/files/b6d94cf8..5d0c7056 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27742&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27742&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27742.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27742/head:pull/27742 PR: https://git.openjdk.org/jdk/pull/27742 From fandreuzzi at openjdk.org Fri Oct 24 14:05:09 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Fri, 24 Oct 2025 14:05:09 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v6] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 13:39:19 GMT, Aleksey Shipilev wrote: >> Francesco Andreuzzi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: >> >> - cc >> - Merge branch 'master' into JDK-8369219 >> - nn >> - update foundOne >> - fix summary >> - nn >> - Merge branch 'master' into JDK-8369219 >> - trigger >> - nn >> - othervm >> - ... and 5 more: https://git.openjdk.org/jdk/compare/4e995ea0...b6d94cf8 > > test/hotspot/jtreg/gc/NativeWrapperCollection/NativeWrapperCollection.java line 62: > >> 60: WB.enqueueMethodForCompilation(method, 1 /* compLevel */); >> 61: while (WB.isMethodQueuedForCompilation(method)) { >> 62: Thread.onSpinWait(); > > We are just waiting for compilation here. It is counter-productive to wait with a busy-loop. Insert a sleep for ~10...100ms instead. Same thing for the loop below. Thanks, that sounds reasonable. I got this pattern from another test, but it looks counterproductive here indeed. 5d0c70562614c5a3cd3391f5191bf60c5b51e82c ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27742#discussion_r2460634429 From shade at openjdk.org Fri Oct 24 14:31:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 24 Oct 2025 14:31:50 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v7] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:05:04 GMT, Francesco Andreuzzi wrote: >> I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. >> >> Passes tier1 and tier2 (fastdebug). > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > sleep Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27742#pullrequestreview-3377171194 From jbhateja at openjdk.org Fri Oct 24 14:45:32 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 24 Oct 2025 14:45:32 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop Message-ID: Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral (comparison) operation[s]. However, the safest way to compare two Float16 values is to use Float16.compare/compareTo method, given that floating point comparisons can also be unordered. e.g., both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, but are not numerically equivalent with integral comparison. jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) $3 ==> 0 In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short must be sign-extended before operation. Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases. Kindly review the patch and share your feedback. Best Regards, Jatin ------------- Commit messages: - Adding random test inputs - 8370409: Incorrect computation in Float16 reduction loop Changes: https://git.openjdk.org/jdk/pull/27977/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27977&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370409 Stats: 167 lines in 2 files changed: 167 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27977.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27977/head:pull/27977 PR: https://git.openjdk.org/jdk/pull/27977 From duke at openjdk.org Fri Oct 24 14:45:33 2025 From: duke at openjdk.org ( (__Main__)) Date: Fri, 24 Oct 2025 14:45:33 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:36:21 GMT, Jatin Bhateja wrote: > Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral (comparison) operation[s]. However, the safest way to compare two Float16 values is to use Float16.compare/compareTo method, given that floating point comparisons can also be unordered. > > e.g., both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, but are not numerically equivalent with integral comparison. > jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) > $3 ==> 0 > > In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short must be sign-extended before operation. > > Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Marked as reviewed by 123MAIN-pk at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/27977#pullrequestreview-3377229309 From epeter at openjdk.org Fri Oct 24 15:04:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 15:04:20 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:46:36 GMT, Qizheng Xing wrote: >> test/hotspot/jtreg/compiler/loopopts/TestRedundantSafepointElimination.java line 66: >> >>> (failed to retrieve contents of file, check the PR for context) >> So these do not end up being CountedLoop? > > The first one (`loopConst`) is a counted loop. The second one (`loopVar`) is not, because it calls `empty` (not inlined) which may modify `loopCount`. > > Both the two loops should have no safepoints, since the `empty` call always polls the safepoint. @MaxXSoft Can you please add some code comments in the test? Just if the rule ever fails, it would be nice to know why the rule was added ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2460914707 From epeter at openjdk.org Fri Oct 24 15:17:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 24 Oct 2025 15:17:27 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v7] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 09:32:48 GMT, Qizheng Xing wrote: >> In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. >> >> loop-safepoint >> >> This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). >> >> >> Benchmark Mode Cnt Score Error Units >> LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline >> LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch >> >> >> Testing: tier1-2 on x86_64 and aarch64. > > Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: > > - Update microbench > - Add IR tests for nested loops This now looks really good to me, thanks for all the additions! I think it would be best if @rwestrel also gave this a look. One question I have, maybe @rwestrel can weight in here too: how does all of this play with `LongCountedLoops`? I suppose they decay to int loops at some point... We don't have to worry about this in this PR, I'm just asking the question because it came to mind :) I'll run some internal testing now, before approving from my side. ------------- PR Review: https://git.openjdk.org/jdk/pull/23057#pullrequestreview-3377437061 From mhaessig at openjdk.org Fri Oct 24 15:26:19 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 24 Oct 2025 15:26:19 GMT Subject: RFR: 8370579: PPC: fix inswri immediate argument order Message-ID: This cleanup PR swaps and renames the immediate arguments of the two `insrwi` instruction macros in `ppc.ad` such that they correspond to the order and names in the manual. This involved swapping the arguments in all six usages. I hope this saves the next person trying to reason about this some confused hours. Testing: - [ ] Github Actions - [ ] Running some relevant tests (`compiler/c2/TestCharShortByteSwap.java`, `jdk/java/lang/Short/BitTwiddle.java`, `jdk/java/lang/Integer/BitTwiddle.java`, `compiler/codegen/Test6431242.java`) in qemu ------------- Commit messages: - ppc.ad: swap arguments of insrwi Changes: https://git.openjdk.org/jdk/pull/27978/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27978&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370579 Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/27978.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27978/head:pull/27978 PR: https://git.openjdk.org/jdk/pull/27978 From kvn at openjdk.org Fri Oct 24 16:07:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 Oct 2025 16:07:06 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v3] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 08:06:19 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > More comment Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27912#pullrequestreview-3377672615 From qamai at openjdk.org Fri Oct 24 17:08:39 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 24 Oct 2025 17:08:39 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v7] In-Reply-To: References: Message-ID: <0uLLE8EsQwRujEmG0lCXXtkOGzR7_Zc4AvUcS-L-3mM=.b831ecd8-86cc-497a-9f95-c3cec47888dc@github.com> On Fri, 24 Oct 2025 07:17:41 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions src/hotspot/share/opto/chaitin.cpp line 1693: > 1691: // then bias its color towards its input's def. > 1692: if (lrin1 != 0 && lrg->_copy_bias == 0 && _ifg->test_edge_sq(lidx, lrin1) == 0) { > 1693: lrg->_copy_bias = lrin1; I believe biasing is a hint, so it will not attempt to assign the same colour if they interfere. In that case is it better if we don't care about interference here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2461314061 From vlivanov at openjdk.org Fri Oct 24 18:08:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 24 Oct 2025 18:08:11 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 07:09:25 GMT, Emanuel Peter wrote: > Do you consider the "expanded" versions of Div/Mod as a "different representation of Div/Mod"? Yes, exactly. > we could pattern match for such "expanded" versions of Div/Mod, but it would be quite complex: you would have to parse through patterns like displayed https://github.com/openjdk/jdk/pull/27886#issuecomment-3436423466. Do you think that is a good idea? Sure, It may be way above the complexity budget we are willing to spend on it. The expansion code I see for Div/Mod nodes doesn't look too complicated, but matching the pattern may require more effort. The positive thing is it'll optimize the pattern irrespective of the origin (either expanded Div/Mod or explicitly optimized in the code by the user). So, the question is how much complexity it requires vs scenarios it covers. > How does this "wrapping" help? After parsing, the CastII at the bottom of the "expanded" Div would just have the whole int range. How would the type of the CastII ever be improved, without pattern matching the "expanded" Div? It's not fully clear to me what is the scope of problematic scenarios. If it's only about Ideal() expanding the node before Value() has a chance to run, then wrapping the result of expansion in CastII/CastLL node and attach Value() as it's type should be enough (when produced type is narrower than Type::INT). If we want to to keep expanded shape while being able to compute its type as if it were the original node, then a new flavor of Cast node may help. The one which keeps the node type and its inputs and can run Value() as if it were the original node. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3444298123 From kvn at openjdk.org Fri Oct 24 19:46:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 Oct 2025 19:46:02 GMT Subject: RFR: 8370251: C2: Inlining checks for method handle intrinsics are too strict In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 21:00:40 GMT, Vladimir Ivanov wrote: > C2 performs access checks during inlining attempts through method handle > intrinsic calls. But there are no such checks happening at runtime when > executing the calls. (Access checks are performed when corresponding method > handle is resolved.) So, inlining may fail due to access checks failure while > the call always succeeds at runtime. > > The fix is to skip access checks when inlining through method handle intrinsics. > > Testing: hs-tier1 - hs-tier4 @iwanowww thank you for answering my question. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27908#pullrequestreview-3378587335 From dlong at openjdk.org Fri Oct 24 21:13:06 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 24 Oct 2025 21:13:06 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v3] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 08:06:19 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > More comment I did some experiements, and it looks like we don't hit case 3 in build_and_optimize() because when not verifying, major progress is already set. But I'm concerned about ShenandoahBarrierC2Support::expand(), which clears major progress before calling build_and_optimize(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3444921504 From vlivanov at openjdk.org Fri Oct 24 21:33:01 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 24 Oct 2025 21:33:01 GMT Subject: RFR: 8368321: Rethink compilation delay strategy for lukewarm methods [v2] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 01:36:39 GMT, Igor Veresov wrote: >> In the current implementation we delay profiling of lukewarm methods (those that were never compiled by C2 during training) by increasing the 2->3 threshold by a factor. That may shift profiling of those too much into the future if a large factor is used, if we use a small factor, however, profiling may happen within the training run window so to speak. The solution I came up with it to delay profiling until we reach the number of invocations of a method equal to the number we had in the training run. After that we use the normal policy. >> >> Here is an example. I trained our JavacBenchApp for 5 iterations (which is artificially low and therefore many methods would be classified as lukewarm). Then I ran it for 200 iterations with AOT replay. >> >> old-vs-new >> >> While initially the performance is similar it quickly diverges. With the new approach we move to standard handling of lukewarm methods after 5 iterations and they get compiled with C2. With the old approach we don't. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Fix zero build Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27926#pullrequestreview-3378908589 From vlivanov at openjdk.org Fri Oct 24 22:02:45 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 24 Oct 2025 22:02:45 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v18] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge branch 'master' into 8290892.rf - cleanup - update - Merge remote-tracking branch 'origin/master' into 8290892.rf - Merge branch 'master' into 8290892.rf - scalarization support - Remove comment - Add PreserveReachabilityFencesOnConstants test - Minor fix - minor fixes - ... and 16 more: https://git.openjdk.org/jdk/compare/97e5ac6e...a1101cda ------------- Changes: https://git.openjdk.org/jdk/pull/25315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=17 Stats: 1504 lines in 38 files changed: 1442 ins; 20 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Fri Oct 24 22:02:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 24 Oct 2025 22:02:46 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v17] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 20:01:50 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup Any reviews, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3445049145 From duke at openjdk.org Sat Oct 25 13:43:06 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sat, 25 Oct 2025 13:43:06 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v8] In-Reply-To: References: Message-ID: On Sun, 19 Oct 2025 19:20:58 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: > > - Add additional nodes to fail conditions to detect idealized/transformed DivI Nodes that did not constant fold > - Remove checks for bottom and reorganize DivI/DivL Value functions Yeah this sucks, as it stands I do not have enough info to debug this... I noticed that the new types looks wrong if I read the input nodes correctly. It should be min_long / -2, but it is min_long / -1024. I can however not reproduce this locally: grafik I have the same input nodes in this example, but the correct type for the div node here. I've starred at the code now for a bit and I am still not sure what could have caused this. And I am afraid I can't debug without the test case... ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3446716303 From qamai at openjdk.org Sat Oct 25 16:43:08 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 25 Oct 2025 16:43:08 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v6] In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 16:05:52 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix test options Can I have a second review, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25284#issuecomment-3446888226 From mchevalier at openjdk.org Sun Oct 26 18:57:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Sun, 26 Oct 2025 18:57:52 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v9] In-Reply-To: References: Message-ID: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Rename test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27586/files - new: https://git.openjdk.org/jdk/pull/27586/files/0035c8fc..7db83901 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=07-08 Stats: 16 lines in 1 file changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/27586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27586/head:pull/27586 PR: https://git.openjdk.org/jdk/pull/27586 From mchevalier at openjdk.org Sun Oct 26 18:57:54 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Sun, 26 Oct 2025 18:57:54 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v8] In-Reply-To: <_9_fUHVqFJ9mXpEtj-H6qt4tJc0Z1owPEYcvPjo39-8=.fdcb4e71-d99a-4e1e-8fd6-400dee008362@github.com> References: <_9_fUHVqFJ9mXpEtj-H6qt4tJc0Z1owPEYcvPjo39-8=.fdcb4e71-d99a-4e1e-8fd6-400dee008362@github.com> Message-ID: On Fri, 24 Oct 2025 10:04:32 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Comments + merge tests Nicely spotted, I've missed it. Thanks! Renamed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3448792533 From mchevalier at openjdk.org Sun Oct 26 21:20:38 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Sun, 26 Oct 2025 21:20:38 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v4] In-Reply-To: References: Message-ID: > Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. > > There is one detail, we used to have > > void restore_major_progress(int progress) { _major_progress += progress; } > > > It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. > > It has a weird semantics: > > Progress before | Progress after verification | Progress after restore | What would be the assignment semantics > ----------------|-----------------------------|-----------------------|- > 0 | 0 | 0 | 0 > 1 | 0 | 1 | 1 > 0 | 1 | 1 | 0 (mismatch!) > 1 | 1 | 2 | 1 (same truthiness) > > It is rather a or than a restore, and a proper boolean version of that would be > > void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } > > but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural > > void set_major_progress(bool progress) { _major_progress = progress; } > > that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: put back the OR in restORe ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27912/files - new: https://git.openjdk.org/jdk/pull/27912/files/9e5612a1..c0b0bdec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27912&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27912&range=02-03 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27912/head:pull/27912 PR: https://git.openjdk.org/jdk/pull/27912 From mchevalier at openjdk.org Sun Oct 26 21:20:39 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Sun, 26 Oct 2025 21:20:39 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v3] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 08:06:19 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > More comment I see. Then, let's use back the OR version. I've tested it as described before, also with success, it should be fine, and is indeed closer from the old behavior anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3448931312 From qxing at openjdk.org Mon Oct 27 02:01:11 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Mon, 27 Oct 2025 02:01:11 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:57:57 GMT, Emanuel Peter wrote: >> The first one (`loopConst`) is a counted loop. The second one (`loopVar`) is not, because it calls `empty` (not inlined) which may modify `loopCount`. >> >> Both the two loops should have no safepoints, since the `empty` call always polls the safepoint. > > @MaxXSoft Can you please add some code comments in the test? Just if the rule ever fails, it would be nice to know why the rule was added ;) @eme64 Added IR test comments in commit b42ffb46a590e2c1d8c2e8ad6cd765138a36f3cd to explain why such safepoints exist. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2464261417 From xgong at openjdk.org Mon Oct 27 02:16:01 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 27 Oct 2025 02:16:01 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> <1qB16WqDleABsguKwI8xSgWBf1NFQ7uOZByQHIIXdOU=.e30842bf-67f4-4fb3-b877-b91b288912bc@github.com> <3Vc1sIRj7GOSPv3E1tz6xOOmjTuN40yWsfbTvm5LdS0=.005267aa-3f4e-4f78-b55d-4900d8d7065e@github.com> Message-ID: On Fri, 24 Oct 2025 07:42:25 GMT, Emanuel Peter wrote: >>> Hi @eme64 , I updated a commit with renaming the matcher function to mask_op_uses_packed_vector. Is this fine to you? The main concern here is that only the specified vector mask ops [(VectorMaskOpNode)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.hpp#L1343) need the packed vector mask. Name vector_mask_must_be_packed might extend the scope to all vector/mask operations. >> >> Sounds, good. I'll have a look at the code. More precise names are always preferable. And some code comments can help refine the definition further: what are the guarantees if you return true or false? > > Maybe @PaulSandoz has a good idea for a better naming of `VectorLoadMask` and `VectorStoreMask`? > > @XiaohongGong Is there any good place where we already document the different kinds of masks, and how they can be converted, and how they are used? If not: it would be really great if we could add that to `vectornode.hpp`. I also see that `TypeVectMask` has no class comment. We really should improve things there. It would make reviewing Vector API code so much easier. Hi @eme64 , I'm afraid that there is not a place that we document these things now. And I agree that clearly comments might be necessary. I'v created a separate JBS to file https://bugs.openjdk.org/browse/JDK-8370666. Thanks for your suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2464272917 From xgong at openjdk.org Mon Oct 27 02:27:04 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 27 Oct 2025 02:27:04 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Fri, 24 Oct 2025 09:07:49 GMT, erifan wrote: >> By the way, I'm fine to the current implementation. Thanks > > Both `prefer` and `use` look good to me. > Could we have some sort of assert on `vt` here? What input types are allowed? `isa_vect`, `isa_vectmask`, and what else? The argument type `TypeVect` which is a super class of all vector types, and it can makes sure the input `vt` is a kind of vector type. Please note that `TypeVectMask` is a kind of `TypeVect` as well. > I find `uses` to be ambiguous. does `mask_op` require packed vector (nothing else accepted), or just allow packed vector (and other options are also accepted)? > > Your `Return true if` comment above suggests it is a `requires` case, right? > > Could you please also add a `Return false if` comment? Yes, it is `require` here. I will add a comment for `return false` path in next commit. > There could be additional confusion: is the `packed vector` for the mask, or for all its inputs? is the `vt` for the mask type, or the output type of the `mask_op`? > > Suggestion: `mask_op_uses_packed_vector` -> `mask_op_uses_packed_vector_mask` `vt` -> `mask_vt` > > What do you think? `vt` is the type for a input/output `mask`. As `mask_op` here means the operation is vector mask specific. There is no other vector inputs besides the mask. Hence I used `vt` here, which I think is smarter and aligned with other functions. > I don't know where there is such a comment, if no, maybe this function is a good place to comment this. > > Also I wonder if it's better to return an enum constant? For example: > > ``` > PREDICATE_MASK > PACKED_MASK > UNPACKED_MASK > ``` This will make the code harder to understand to me. As for these mask op, it either accepts a predicate mask or a packed vector mask. So I prefer current status. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2464282085 From xgong at openjdk.org Mon Oct 27 02:32:05 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 27 Oct 2025 02:32:05 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: Message-ID: <9MTwN9coXc_qLnj64bDCX5rtgBk0ec7XGcTA-cgQ_Lw=.00d42c4f-e29a-4c7f-af4b-e72c93917ffd@github.com> On Fri, 24 Oct 2025 07:25:55 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename the matcher function and fix comment issue > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 402: > >> 400: // These ops are implemented with predicate instructions if input >> 401: // mask is a predciate. >> 402: return vt->isa_vectmask() == nullptr; > > If we had an assert above, what else that `vt` could be other than `vectmask`, it would help in understanding this logic here ;) `vt` is one of the normal `TypeVect` (i.e. `TypeVectA|S|D|X|Y|Z`) based on the vector length in bytes like other vector nodes. It is a a kind of `TypeVect` on architectures that do not support predicate feature. The mask is represented as the same with a vector on those platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2464287016 From xgong at openjdk.org Mon Oct 27 02:41:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 27 Oct 2025 02:41:07 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: <_XquQ5TR_3ZGRTZ1c3iVCRGkNBDYyXvXuhimlGQFKq4=.bae46e77-9028-4d92-8ae4-29a5eef4b27a@github.com> References: <_XquQ5TR_3ZGRTZ1c3iVCRGkNBDYyXvXuhimlGQFKq4=.bae46e77-9028-4d92-8ae4-29a5eef4b27a@github.com> Message-ID: On Fri, 24 Oct 2025 13:47:26 GMT, Bhavana Kilambi wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename the matcher function and fix comment issue > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2548: > >> 2546: } >> 2547: // VectorMaskToLongNode requires the input is either a mask or a vector with BOOLEAN type. >> 2548: if (Matcher::mask_op_uses_packed_vector(Op_VectorMaskToLong, opd->bottom_type()->is_vect())) { > > So without this patch, it'd generate `VectorMaskToLong -> URShiftLNode -> AndLNode` (as the earlier `if` condition would have been false) and in the backend, the implementation for `VectorMaskToLong` contains code to convert the mask in a predicate to a packed vector (followed by the actual `VectorMaskToLong` related code). With this patch, it now generates `VectorStoreMaskNode -> VectorMaskToLong -> URShiftLNode ... `(the backend implementation is now separated at the IR level). > Does the major performance uplift come from this Ideal optimization - `VectorMaskToLongNode::Ideal_MaskAll()` where the `VectorStoreMaskNode` gets optimized away? Yes, the IR changes you pointed above is right. The major performance uplift comes from the existing optimization of `VectorStoreMask (VectorLoadMask v) => v`. As you know, `VectorLoadMask` will be generated by some APIs like `VectorMask.fromArray()`. With this change, `VectorMask.fromLong()` also generates this IR. The mask conversions (V->P and P->V) between these APIs can be saved. Another performance uplift comes from the flexible vector register allocation. Before, the vector register is specified as the same for different instructions. But now, it depends on RA. In this case, it potentially breaks the un-expected data-dependence across loop iterations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2464294733 From xgong at openjdk.org Mon Oct 27 03:52:01 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 27 Oct 2025 03:52:01 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Fri, 24 Oct 2025 08:52:13 GMT, Emanuel Peter wrote: >> Ah, I see, thanks for the explanations. Is this documented somewhere in code comments? That would really save you having to explain it repeatedly: you could just point to the code comments ;) > > What about `prefers` instead of `uses`? Or `should_use`? Hi @eme64 @erifan , thanks for all your comments on this function. After a deep thinking, I think `mask_op_prefers_predicate()` is optimal. The implementation will be reverted back to the first version. Following is my consideration: 1) Before my patch, what a mask is for these mask ops based on the architectures. It is distinguished just based whether the mask is a predicate type or a vector. - On architectures that support the predicate feature, the mask's type is `TypeVectMask` which denotes a predicate type. And the backend is implemented with predicate instructions and requires the predicate input/output. - On architectures that do not support the predicate feature, the original mask's type is an unpacked `TypeVect` varying from `TypeVectA` to `TypeVectZ` based on the vector length with different element data size. As these ops are special that the implementation do not have any relationship with the element width in each lane, packing the mask to 8-bit element width would be friendly to performance. Hence, in IR-level, the original vector mask will be packed with a `VectorStoreMask` before passed to these ops. 2) I don't want to break current solution/idea of mask handling for these ops. In my patch, what I want to change is **using a helper function** to check whether the specified op is implemented with predicate instruction or not, **instead of** just checking the original mask type. If true, the mask is a predicate without any conversions needed. If not, the mask needs to be packed with a `VectorStoreMask`. Changing to check whether the mask is a `packed` vector makes things more confusing to me. Because it is just a temporary status of mask and special to these mask ops. We have to consider other ops that also use a vector mask. By default, the mask is either an unpacked vector or a predicate. Thanks, Xiaohong ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2464360599 From xgong at openjdk.org Mon Oct 27 06:21:02 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 27 Oct 2025 06:21:02 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Mon, 27 Oct 2025 03:47:05 GMT, Xiaohong Gong wrote: >> What about `prefers` instead of `uses`? Or `should_use`? > > Hi @eme64 @erifan , thanks for all your comments on this function. After a deep thinking, I think `mask_op_prefers_predicate()` is optimal. The implementation will be reverted back to the first version. Following is my consideration: > > 1) Before my patch, what a mask is for these mask ops based on the architectures. It is distinguished just based whether the mask is a predicate type or a vector. > - On architectures that support the predicate feature, the mask's type is `TypeVectMask` which denotes a predicate type. And the backend is implemented with predicate instructions and requires the predicate input/output. > - On architectures that do not support the predicate feature, the original mask's type is an unpacked `TypeVect` varying from `TypeVectA` to `TypeVectZ` based on the vector length with different element data size. As these ops are special that the implementation do not have any relationship with the element width in each lane, packing the mask to 8-bit element width would be friendly to performance. Hence, in IR-level, the original vector mask will be packed with a `VectorStoreMask` before passed to these ops. > > 2) I don't want to break current solution/idea of mask handling for these ops. In my patch, what I want to change is **using a helper function** to check whether the specified op is implemented with predicate instruction or not, **instead of** just checking the original mask type. If true, the mask is a predicate without any conversions needed. If not, the mask needs to be packed with a `VectorStoreMask`. > > Changing to check whether the mask is a `packed` vector makes things more confusing to me. Because it is just a temporary status of mask and special to these mask ops. We have to consider other ops that also use a vector mask. By default, the mask is either an unpacked vector or a predicate. > > Thanks, > Xiaohong The implementation on AArch64 would be like: bool Matcher::mask_op_prefers_predicate(int opcode, const TypeVect* vt) { // Only SVE supports the predicate feature. if (UseSVE == 0) { // On architectures that do not support the predicate feature, vector // mask is stored in a normal vector with the type of "TypeVect" varing // from "TypeVectA" to "TypeVectZ" based on the vector length in bytes. // It cannot be a "TypeVectMask". assert(vt->isa_vectmask == nullptr, "mask type not match"); return false; } assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE"); switch (opcode) { case Op_VectorMaskToLong: case Op_VectorLongToMask: // SVE does not have native predicate instructions for these two ops. // Instead, they are implemented with vector instructions. Hence, to // improve the performance, we prefer saving the mask in a vector as // the input/output of these IRs. return false; default: // By default, all the mask operations are implemented with predicate // instructions with a predicate input/output. return true; } } And the comments before the helper function in matcher.hpp: // Identify if a vector mask operation requires the input/output mask to be // saved with a predicate type (i.e. TypeVectMask) or not. Return true if it // requires a predicate type. And return false if it requires a vector type. static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt); Is that more clear? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2464540038 From aseoane at openjdk.org Mon Oct 27 06:35:27 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 06:35:27 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v2] In-Reply-To: References: Message-ID: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8347463 - Merge branch 'JDK-8347463' of github.com:anton-seoane/jdk into JDK-8347463 - Merge branch 'openjdk:master' into JDK-8347463 - Documentation for future similar cases - Test for JDK-8347463 - Change to a more specific type - Runtime call had void type but actually returned an object ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/a6225ebd..c85142ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=00-01 Stats: 11497 lines in 242 files changed: 7821 ins; 2031 del; 1645 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From aseoane at openjdk.org Mon Oct 27 06:35:27 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 06:35:27 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 09:19:22 GMT, Anton Seoane Ampudia wrote: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Switching back to draft after conversation with @robcasloz, to perform further investigation ------------- PR Comment: https://git.openjdk.org/jdk/pull/27913#issuecomment-3426511124 From epeter at openjdk.org Mon Oct 27 06:52:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 27 Oct 2025 06:52:04 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 18:04:57 GMT, Vladimir Ivanov wrote: >> @iwanowww I'm not quite following your suggestions / questions. >> >>> It seems the root cause is a divergence in functionality in GVN between different representations of Div/Mod nodes. >> >>> How hard would it be to align the implementation and improve GVN handling for other representations, so the same type is constructed irrespective of the IR shape? >> >> Do you consider the "expanded" versions of Div/Mod as a "different representation of Div/Mod"? >> If yes: we could pattern match for such "expanded" versions of Div/Mod, but it would be quite complex: you would have to parse through patterns like displayed [up here](https://github.com/openjdk/jdk/pull/27886#issuecomment-3436423466). Do you think that is a good idea? I already mentioned the idea [up here](https://github.com/openjdk/jdk/pull/27886#issuecomment-3435782079), but did not think it was desirable due to the complexity. >> >>> Alternatively, as part of the expansion, new representation can be wrapped in CastII/CastLL with the narrower type of original Div/Mod node. >> >> How does this "wrapping" help? After parsing, the CastII at the bottom of the "expanded" Div would just have the whole int range. How would the type of the CastII ever be improved, without pattern matching the "expanded" Div? > >> Do you consider the "expanded" versions of Div/Mod as a "different representation of Div/Mod"? > > Yes, exactly. > >> we could pattern match for such "expanded" versions of Div/Mod, but it would be quite complex: you would have to parse through patterns like displayed https://github.com/openjdk/jdk/pull/27886#issuecomment-3436423466. Do you think that is a good idea? > > Sure, It may be way above the complexity budget we are willing to spend on it. The expansion code I see for Div/Mod nodes doesn't look too complicated, but matching the pattern may require more effort. The positive thing is it'll optimize the pattern irrespective of the origin (either expanded Div/Mod or explicitly optimized in the code by the user). So, the question is how much complexity it requires vs scenarios it covers. > >> How does this "wrapping" help? After parsing, the CastII at the bottom of the "expanded" Div would just have the whole int range. How would the type of the CastII ever be improved, without pattern matching the "expanded" Div? > > It's not fully clear to me what is the scope of problematic scenarios. If it's only about Ideal() expanding the node before Value() has a chance to run, then wrapping the result of expansion in CastII/CastLL node and attach Value() as it's type should be enough (when produced type is narrower than Type::INT). > > If we want to to keep expanded shape while being able to compute its type as if it were the original node, then a new flavor of Cast node may help. The one which keeps the node type and its inputs and can run Value() as if it were the original node. @iwanowww I see, so we could implement something like a `CastII` with multiple inputs, which we know must all be identical at runtime. The first input is the one we will in the end pick. But during `Value`, we take the intersection of all input ranges. So if another (not the first input) has a narrower type, we can use that type. I suppose that would be feasible. Do you have a good name for such a node? What I don't know: how does that interact with other IGVN optimizations, especially those that want to pattern match specific nodes? Inserting such special cast nodes could interrupt `Ideal` optimizations, current pattern matching would not know how to deal with it. Probably it is not a big issue, but I'm not sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3449759936 From rcastanedalo at openjdk.org Mon Oct 27 07:26:08 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 07:26:08 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v9] In-Reply-To: References: Message-ID: On Sun, 26 Oct 2025 18:57:52 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Rename test Thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27586#pullrequestreview-3382106045 From jbhateja at openjdk.org Mon Oct 27 07:32:43 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 27 Oct 2025 07:32:43 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: References: Message-ID: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Removing redundant interferecne check from biasing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/31079431..e2f95f31 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Mon Oct 27 07:32:45 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 27 Oct 2025 07:32:45 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v7] In-Reply-To: <0uLLE8EsQwRujEmG0lCXXtkOGzR7_Zc4AvUcS-L-3mM=.b831ecd8-86cc-497a-9f95-c3cec47888dc@github.com> References: <0uLLE8EsQwRujEmG0lCXXtkOGzR7_Zc4AvUcS-L-3mM=.b831ecd8-86cc-497a-9f95-c3cec47888dc@github.com> Message-ID: On Fri, 24 Oct 2025 17:05:35 GMT, Quan Anh Mai wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > src/hotspot/share/opto/chaitin.cpp line 1693: > >> 1691: // then bias its color towards its input's def. >> 1692: if (lrin1 != 0 && lrg->_copy_bias == 0 && _ifg->test_edge_sq(lidx, lrin1) == 0) { >> 1693: lrg->_copy_bias = lrin1; > > I believe biasing is a hint, so it will not attempt to assign the same colour if they interfere. In that case is it better if we don't care about interference here? Hi @merykitty , Just above this check, we are checking for neighbour interference and eliminating the neighbours' register assignments from the register mask of the current def's live range; two nodes in the interference graph (IFG) interfere iff their live ranges overlap. Biasing is indeed an allocation hint, which is currently guarded by the non-infereference check. During actual register assignment later in the flow, there are two possible cases:- 1) Bias LRG was part of IFG and was already assigned a register, in the simplification phase we remove the LO degree LRGs from the interference graph as they are guaranteed to get a color (register), this in turn reduces the degree of their neighbours and eventually some of them may become K colorable where K is the number of register, anyways cutting the long story short, in this case we assign the bias color to definition if it belong to the subset of def register mask. 2) Bias LRG was part of the yanked list, which gets populated during the simplification stage and holds all the LRG removed from the interference graph, this generally captures LO degree LRGs, in this case since the LRG is not part of IFG hence and is not assigned a register yet, but we still constrain the definition register mask with the bias register mask. So, the additional guard check is strict and may be relaxed as it makes the second case non-reachable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2464656367 From chagedorn at openjdk.org Mon Oct 27 07:49:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 Oct 2025 07:49:06 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v9] In-Reply-To: References: Message-ID: On Sun, 26 Oct 2025 18:57:52 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Rename test test/hotspot/jtreg/compiler/loopopts/TooStrictAssertForUnrollAfterPeeling.java line 54: > 52: * -XX:-SplitIfBlocks > 53: * -XX:-UseOnStackReplacement > 54: * -XX:LoopMaxUnroll=2 Are these flags all required to trigger the issue or what is the motivation behind having this run compared to the above only? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2464694073 From chagedorn at openjdk.org Mon Oct 27 08:03:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 Oct 2025 08:03:07 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v4] In-Reply-To: References: Message-ID: On Sun, 26 Oct 2025 21:20:38 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > put back the OR in restORe Thanks for adding the summary, looks good to me, too. src/hotspot/share/opto/compile.hpp line 332: > 330: * If major progress is not set at the end of a loop opts phase, then we can stop loop opts, because we do not expect any further progress if we did more loop ops phases. > 331: * > 332: * This is not 100% accurate, the semantics of major progress has become less clear over time, but this is the general idea. Suggestion: * It also indicates that the graph was changed in a way that is promising to be able to apply more loop optimization. * If major progress is not set: * Loop tree information is valid. * If major progress is not set at the end of a loop opts phase, then we can stop loop opts, because we do not expect any further progress if we did more loop opts phases. * * This is not 100% accurate, the semantics of major progress has become less clear over time, but this is the general idea. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27912#pullrequestreview-3382206993 PR Review Comment: https://git.openjdk.org/jdk/pull/27912#discussion_r2464725986 From chagedorn at openjdk.org Mon Oct 27 08:14:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 Oct 2025 08:14:06 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v3] In-Reply-To: References: Message-ID: <_ASXYjiMmoPRBYSFQqTHB2N9yng6jMXyQH6lbIOOktY=.62676c5c-94a4-4442-8fba-86fa3fea6564@github.com> On Mon, 20 Oct 2025 16:19:37 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. >> >> This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). >> >> However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. >> >> This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. >> As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. >> >> ```c++ >> ... >> // Global Value Numbering >> i = hash_find_insert(k); // Check for pre-existing node >> if (i && (i != k)) { >> // Return the pre-existing node if it isn't dead >> NOT_PRODUCT(set_progress();) >> add_users_to_worklist(k); >> subsume_node(k, i); // Everybody using k now uses i >> return i; >> } >> ... >> >> >> The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. >> >> ### Proposed Fix >> >> We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) >> - [x] tier1-3, plus some internal testing >> - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Missing -XX:+UnlockDiagnosticVMOptions Otherwise, looks good, thanks! test/hotspot/jtreg/compiler/c2/TestEliminateRedundantConversionSequences.java line 32: > 30: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions > 31: * -XX:CompileCommand=compileonly,compiler.c2.TestEliminateRedundantConversionSequences::test* > 32: * -XX:-TieredCompilation -Xbatch -XX:VerifyIterativeGVN=1110 You could either add a separate run with `-XX:+StressIGVN` without a fixed seed or just add `-XX:+StressIGVN` here. I guess the latter is good enough. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27900#pullrequestreview-3382231930 PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2464744473 From bmaillard at openjdk.org Mon Oct 27 08:16:21 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 27 Oct 2025 08:16:21 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node [v2] In-Reply-To: References: Message-ID: > This PR prevents hitting an assert caused by encountering `top` while following the memory > slice associated with a field when eliminating allocations in macro node elimination. This situation > is the result of another elimination (boxing node elimination) that happened at the same > macro expansion iteration. > > ### Analysis > > The issue appears in the macro expansion phase. We have a nested `synchronized` block, > with the outer block synchronizing on `new A()` and the inner one on `TestTopInMacroElimination.class`. > In the inner `synchronized` block we have an `Integer.valueOf` call in a loop. > > In `PhaseMacroExpand::eliminate_boxing_node` we are getting rid of the `Integer.valueOf` > call, as it is a non-escaping boxing node. After having eliminated the call, > `PhaseMacroExpand::process_users_of_allocation` takes care of the users of the removed node. > There, we replace usages of the fallthrough memory projection with `top`. > > In the same macro expansion iteration, we later attempt to get rid of the `new A()` allocation > in `PhaseMacroExpand::create_scalarized_object_description`. There, we have to make > sure that all safepoints can still see the object fields as if the allocation was never deleted. > For this, we attempt to find the last value on the slice of each specific field (`a` > in this case). Because field `a` is never written to, and it is not explicitely initialized, > there is no `Store` associated to it and not even a dedicated memory slice (we end up > taking the `Bot` input on `MergeMem` nodes). By going up the memory chain, we eventually > encounter the `MemBarReleaseLock` whose input was set to `top`. This is where the assert > is hit. > > ### Proposed Fix > > In the end I opted for an analog fix as the similar [JDK-8325030](https://git.openjdk.org/jdk/pull/23104). > If we get `top` from `scan_mem_chain` in `PhaseMacroExpand::value_from_mem`, then we can safely > return `top` as well. This means that the safepoint will have `top` as data input, but this will > eventually cleaned up by the next round of IGVN. > > Another (tempting) option would have been to simply return `nullptr` from `PhaseMacroExpand::value_from_mem` when encoutering `top`. However this would result in bailing > out from eliminating this allocation temporarily and effectively delaying it to a subsqequent > macro expansion round. > > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8362832) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/macro.cpp Co-authored-by: Daniel Lund?n ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27903/files - new: https://git.openjdk.org/jdk/pull/27903/files/c3c92f53..0955e23d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27903&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27903&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27903/head:pull/27903 PR: https://git.openjdk.org/jdk/pull/27903 From bmaillard at openjdk.org Mon Oct 27 08:25:08 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 27 Oct 2025 08:25:08 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node [v2] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 10:37:59 GMT, Daniel Lund?n wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Daniel Lund?n > > Thanks for the fix @benoitmaillard! > >> This means that the safepoint will have top as data input, but this will > eventually cleaned up by the next round of IGVN. > > Is it valid for safepoints to even temporarily have top as data input? Even if this gets cleaned up eventually by IGVN, it seems potentially risky to have it in this state. Thanks for your review @dlunde. I would argue that this is acceptable, as we know the safepoint will be removed as soon as IGVN runs (since it is on a dead path). I see this as simply propagating dead path information and ensuring that it does not interfere with optimizing away the allocation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27903#issuecomment-3450022364 From chagedorn at openjdk.org Mon Oct 27 08:30:03 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 Oct 2025 08:30:03 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 12:22:11 GMT, Roberto Casta?eda Lozano wrote: > This changeset makes it easier to trace C2's individual bytecode parsing steps to the input class files and the output of type flow analysis, by: > > 1. dumping the `map` node (holding the JVM state), basic block (reverse post-order index), and method name as properties of the graph that C2 dumps after each parsed bytecode at IGV dump level 6; > > 2. appending this information to the graph name (hence generalizing [JDK-8356779](https://bugs.openjdk.org/browse/JDK-8356779)); and > > 3. making the graph name suffix configurable via `Options -> Graph Name Suffix`. By default, it appends `(map: [map], block #[block] at [method])` to the names of graphs containing these properties (bytecode parsing dumps), and nothing otherwise. > > Here are the `Outline` and `Properties` windows for > > $ java -Xbatch -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphLevel=6 > > before (left) and after (right) the changeset: > > before-after > > Please let me know if there are other bytecode parsing graph properties that could be useful to dump, and whether you think the default graph name suffix contains the right amount of information. > > #### Testing > - tier1. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure on HotSpot or IGV. Nice improvement! I tried your patch out and I sometimes see a missing index for `map`. That might be expected but when looking at it, it rather suggests that something is off. If there is no `map`, maybe we can use "none" or completely remove the "map" entry for that graph. Image src/utils/IdealGraphVisualizer/Settings/src/main/java/com/sun/hotspot/igv/settings/ViewPanel.java line 168: > 166: private void graphNameSuffixFieldActionPerformed(java.awt.event.ActionEvent evt) {//GEN-FIRST:event_graphNameSuffixFieldActionPerformed > 167: // TODO add your handling code here: > 168: }//GEN-LAST:event_graphNameSuffixFieldActionPerformed Can you explain why this nop-action is needed? ------------- PR Review: https://git.openjdk.org/jdk/pull/27975#pullrequestreview-3382239183 PR Review Comment: https://git.openjdk.org/jdk/pull/27975#discussion_r2464750262 From bmaillard at openjdk.org Mon Oct 27 08:42:37 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 27 Oct 2025 08:42:37 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v4] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. > > This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). > > However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. > > This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. > As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. > > ```c++ > ... > // Global Value Numbering > i = hash_find_insert(k); // Check for pre-existing node > if (i && (i != k)) { > // Return the pre-existing node if it isn't dead > NOT_PRODUCT(set_progress();) > add_users_to_worklist(k); > subsume_node(k, i); // Everybody using k now uses i > return i; > } > ... > > > The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. > > ### Proposed Fix > > We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) > - [x] tier1-3, plus some internal testing > - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Add -XX:+StressIGVN to run without fixed seed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27900/files - new: https://git.openjdk.org/jdk/pull/27900/files/12706636..16842d01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27900/head:pull/27900 PR: https://git.openjdk.org/jdk/pull/27900 From bmaillard at openjdk.org Mon Oct 27 08:42:39 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 27 Oct 2025 08:42:39 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v3] In-Reply-To: <_ASXYjiMmoPRBYSFQqTHB2N9yng6jMXyQH6lbIOOktY=.62676c5c-94a4-4442-8fba-86fa3fea6564@github.com> References: <_ASXYjiMmoPRBYSFQqTHB2N9yng6jMXyQH6lbIOOktY=.62676c5c-94a4-4442-8fba-86fa3fea6564@github.com> Message-ID: On Mon, 27 Oct 2025 08:08:41 GMT, Christian Hagedorn wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing -XX:+UnlockDiagnosticVMOptions > > test/hotspot/jtreg/compiler/c2/TestEliminateRedundantConversionSequences.java line 32: > >> 30: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions >> 31: * -XX:CompileCommand=compileonly,compiler.c2.TestEliminateRedundantConversionSequences::test* >> 32: * -XX:-TieredCompilation -Xbatch -XX:VerifyIterativeGVN=1110 > > You could either add a separate run with `-XX:+StressIGVN` without a fixed seed or just add `-XX:+StressIGVN` here. I guess the latter is good enough. I agree, and I just added `-XX:+StressIGVN` to the existing run. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2464815311 From shade at openjdk.org Mon Oct 27 08:53:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 27 Oct 2025 08:53:06 GMT Subject: RFR: 8370318: AES-GCM vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 07:20:48 GMT, Aleksey Shipilev wrote: > See the bug for symptoms and discussion. > > In short, in newly added intrinsic in JDK 24, there is a potential read out of Java heap if key array is at the edge of it, which will crash JVM. And that read is redundant for the code path in question, we only use it in the subsequent blocks that we never actually enter in the problematic case. So we never see any failures in testing: the only observable effect is SEGV on uncommitted heap access. It is somewhat similar to [JDK-8330611](https://bugs.openjdk.org/browse/JDK-8330611) we have fixed in other place. But this one can be caught with the explicit range check in debug code. > > I opted to keep this patch very simple, because I would backport it to 25u shortly after we integrate to mainline. It just moves the read down to the block where it is actually needed. Note that `aes_192` and `aes_256` labels are red herring in this code, they are unbound; you can even remove them without any bulid errors. The actual thing that drives path selection is `NROUNDS` -- that one is derived from the key array length -- and we are just doing the read too early. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `com/sun/crypto/provider/Cipher compiler/codegen/aes` (fails with range check only, passes with entire patch) > - [x] Linux x86_64 server fastdebug, `all` on AVX-512 machine Friendly reminder. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27951#issuecomment-3450118595 From roland at openjdk.org Mon Oct 27 09:18:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 27 Oct 2025 09:18:11 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v9] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 07:56:12 GMT, Emanuel Peter wrote: >> src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 1780: >> >>> 1778: fix_memory_uses(u, n, n, c); >>> 1779: } else if (_phase->C->get_alias_index(u->adr_type()) == _alias) { >>> 1780: _phase->igvn().replace_node(u, n); >> >> As far as I can see, the `lazy_replace` only did `igvn.replace_node` for non-ctrl nodes anyway. Since we are dealing with `PhiNode`s here, we might as well only use `igvn.replace_node`. >> >> I discovered this, because it hit my `!old_node->is_CFG()` check. > > @rwestrel Do you have an opinion on this? That change looks good to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2464914445 From rcastanedalo at openjdk.org Mon Oct 27 09:20:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 09:20:18 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 06:35:27 GMT, Anton Seoane Ampudia wrote: >> This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. >> >> The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. >> >> However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. >> >> An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. >> >> **Testing:** passes tiers 1-5 > > Anton Seoane Ampudia has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8347463 > - Merge branch 'JDK-8347463' of github.com:anton-seoane/jdk into JDK-8347463 > - Merge branch 'openjdk:master' into JDK-8347463 > - Documentation for future similar cases > - Test for JDK-8347463 > - Change to a more specific type > - Runtime call had void type but actually returned an object Thanks for getting to the bottom of this, Ant?n! The changeset looks good to me, modulo a few documentation and test comments. It would be good if someone from the JFR team (@mgronlun or @egahlin?) could have a look at this change as well. src/hotspot/share/opto/runtime.hpp line 55: > 53: // signature. Even if you don't plan on consuming the output of the call, C2 > 54: // needs this information to correctly track returned oops and avoid strange > 55: // deoptimization crashes (JDK-8347463). I agree with the intent of this comment, but the "strange deoptimization crashes" part could be made a bit more precise. Also, the first sentence is grammatically incorrect. Here is my suggestion: Suggestion: // // Please ensure the return type of the runtime call matches its signature, // even if the return value is unused. This is crucial for correct handling // of runtime calls that return an oop and may trigger deoptimization // on return. See rematerialize_objects() in deoptimization.cpp. test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 29: > 27: > 28: import jdk.jfr.Event; > 29: import jdk.jfr.Name; Unused, please remove. test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 34: > 32: /** > 33: * @test > 34: * @summary Tests that the getEventWriter call to write_checkpoint correctly Suggestion: * @summary Tests that the getEventWriter call to write_checkpoint correctly * @bug 8347463 test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 36: > 34: * @summary Tests that the getEventWriter call to write_checkpoint correctly > 35: * reports returning an oop > 36: * @requires vm.hasJFR & vm.continuations Do we need `vm.continuations`? test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 38: > 36: * @requires vm.hasJFR & vm.continuations > 37: * @library /test/lib / > 38: * @modules jdk.jfr/jdk.jfr.internal Do we need this? test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 53: > 51: // for the write_checkpoint call. Instead of explicitly checking for > 52: // it, we look for a non-void return type (which comes hand-in-hand > 53: // with the returns_oop information) No need to mention the bug number here, better to declare at the top using `@bug`: Suggestion: // Crash was due to the returns_oop field not being set // for the write_checkpoint call. Instead of explicitly checking for // it, we look for a non-void return type (which comes hand-in-hand // with the returns_oop information). test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 55: > 53: // with the returns_oop information) > 54: @Test > 55: @IR(failOn = { IRNode.STATIC_CALL_OF_METHOD, "write_checkpoint.*void"}) You could replace this check with a more precise, positive one: Suggestion: @IR(counts = { IRNode.STATIC_CALL_OF_METHOD, "write_checkpoint\s+java/lang/Object\s+\*", "1" }) test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 63: > 61: } > 62: > 63: } Please remove unnecessary whitespace. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27913#pullrequestreview-3382397915 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464866520 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464871994 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464870520 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464894276 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464903315 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464906522 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464909980 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464907436 From roland at openjdk.org Mon Oct 27 09:25:07 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 27 Oct 2025 09:25:07 GMT Subject: RFR: 8370251: C2: Inlining checks for method handle intrinsics are too strict In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 21:00:40 GMT, Vladimir Ivanov wrote: > C2 performs access checks during inlining attempts through method handle > intrinsic calls. But there are no such checks happening at runtime when > executing the calls. (Access checks are performed when corresponding method > handle is resolved.) So, inlining may fail due to access checks failure while > the call always succeeds at runtime. > > The fix is to skip access checks when inlining through method handle intrinsics. > > Testing: hs-tier1 - hs-tier4 Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27908#pullrequestreview-3382491855 From aseoane at openjdk.org Mon Oct 27 09:27:58 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 09:27:58 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v3] In-Reply-To: References: Message-ID: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/runtime.hpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/c85142ef..51fb388b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=01-02 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From roland at openjdk.org Mon Oct 27 09:32:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 27 Oct 2025 09:32:02 GMT Subject: RFR: 8370318: AES-GCM vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 07:20:48 GMT, Aleksey Shipilev wrote: > See the bug for symptoms and discussion. > > In short, in newly added intrinsic in JDK 24, there is a potential read out of Java heap if key array is at the edge of it, which will crash JVM. And that read is redundant for the code path in question, we only use it in the subsequent blocks that we never actually enter in the problematic case. So we never see any failures in testing: the only observable effect is SEGV on uncommitted heap access. It is somewhat similar to [JDK-8330611](https://bugs.openjdk.org/browse/JDK-8330611) we have fixed in other place. But this one can be caught with the explicit range check in debug code. > > I opted to keep this patch very simple, because I would backport it to 25u shortly after we integrate to mainline. It just moves the read down to the block where it is actually needed. Note that `aes_192` and `aes_256` labels are red herring in this code, they are unbound; you can even remove them without any bulid errors. The actual thing that drives path selection is `NROUNDS` -- that one is derived from the key array length -- and we are just doing the read too early. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `com/sun/crypto/provider/Cipher compiler/codegen/aes` (fails with range check only, passes with entire patch) > - [x] Linux x86_64 server fastdebug, `all` on AVX-512 machine Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27951#pullrequestreview-3382536124 From aseoane at openjdk.org Mon Oct 27 09:35:07 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 09:35:07 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 09:01:08 GMT, Roberto Casta?eda Lozano wrote: >> Anton Seoane Ampudia has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8347463 >> - Merge branch 'JDK-8347463' of github.com:anton-seoane/jdk into JDK-8347463 >> - Merge branch 'openjdk:master' into JDK-8347463 >> - Documentation for future similar cases >> - Test for JDK-8347463 >> - Change to a more specific type >> - Runtime call had void type but actually returned an object > > test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 29: > >> 27: >> 28: import jdk.jfr.Event; >> 29: import jdk.jfr.Name; > > Unused, please remove. I think you mean just line 29? AFAIK we need the header and importing the basics for testing (line 29 is unused, yes) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464973697 From dlunden at openjdk.org Mon Oct 27 09:36:05 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 27 Oct 2025 09:36:05 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node [v2] In-Reply-To: References: Message-ID: <_kVxu5TDIF4CgpKhW0MnXfGZiWr1YzlgIC_9ljNbfXM=.1498ff54-398c-47d7-acf4-a4c1452663b0@github.com> On Mon, 27 Oct 2025 08:16:21 GMT, Beno?t Maillard wrote: >> This PR prevents hitting an assert caused by encountering `top` while following the memory >> slice associated with a field when eliminating allocations in macro node elimination. This situation >> is the result of another elimination (boxing node elimination) that happened at the same >> macro expansion iteration. >> >> ### Analysis >> >> The issue appears in the macro expansion phase. We have a nested `synchronized` block, >> with the outer block synchronizing on `new A()` and the inner one on `TestTopInMacroElimination.class`. >> In the inner `synchronized` block we have an `Integer.valueOf` call in a loop. >> >> In `PhaseMacroExpand::eliminate_boxing_node` we are getting rid of the `Integer.valueOf` >> call, as it is a non-escaping boxing node. After having eliminated the call, >> `PhaseMacroExpand::process_users_of_allocation` takes care of the users of the removed node. >> There, we replace usages of the fallthrough memory projection with `top`. >> >> In the same macro expansion iteration, we later attempt to get rid of the `new A()` allocation >> in `PhaseMacroExpand::create_scalarized_object_description`. There, we have to make >> sure that all safepoints can still see the object fields as if the allocation was never deleted. >> For this, we attempt to find the last value on the slice of each specific field (`a` >> in this case). Because field `a` is never written to, and it is not explicitely initialized, >> there is no `Store` associated to it and not even a dedicated memory slice (we end up >> taking the `Bot` input on `MergeMem` nodes). By going up the memory chain, we eventually >> encounter the `MemBarReleaseLock` whose input was set to `top`. This is where the assert >> is hit. >> >> ### Proposed Fix >> >> In the end I opted for an analog fix as the similar [JDK-8325030](https://git.openjdk.org/jdk/pull/23104). >> If we get `top` from `scan_mem_chain` in `PhaseMacroExpand::value_from_mem`, then we can safely >> return `top` as well. This means that the safepoint will have `top` as data input, but this will >> eventually cleaned up by the next round of IGVN. >> >> Another (tempting) option would have been to simply return `nullptr` from `PhaseMacroExpand::value_from_mem` when encoutering `top`. However this would result in bailing >> out from eliminating this allocation temporarily and effectively delaying it to a subsqequent >> macro expansion round. >> >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8362832)... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Daniel Lund?n > Thanks for your review @dlunde. I would argue that this is acceptable, as we know the safepoint will be removed as soon as IGVN runs (since it is on a dead path). I see this as simply propagating dead path information and ensuring that it does not interfere with optimizing away the allocation. All right, seems harmless enough then. Thanks! ------------- Marked as reviewed by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/27903#pullrequestreview-3382561188 From aseoane at openjdk.org Mon Oct 27 09:41:24 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 09:41:24 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v4] In-Reply-To: References: Message-ID: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/51fb388b..e91cd483 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From rcastanedalo at openjdk.org Mon Oct 27 09:41:27 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 09:41:27 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 09:32:31 GMT, Anton Seoane Ampudia wrote: >> test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 29: >> >>> 27: >>> 28: import jdk.jfr.Event; >>> 29: import jdk.jfr.Name; >> >> Unused, please remove. > > I think you mean just line 29? AFAIK we need the header and importing the basics for testing (line 29 is unused, yes) Right, just line 29. >> test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 38: >> >>> 36: * @requires vm.hasJFR & vm.continuations >>> 37: * @library /test/lib / >>> 38: * @modules jdk.jfr/jdk.jfr.internal >> >> Do we need this? > > We don't. Removed as well! (Referring to line 38 only). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464990761 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464991906 From aseoane at openjdk.org Mon Oct 27 09:41:30 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 09:41:30 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 09:07:49 GMT, Roberto Casta?eda Lozano wrote: >> Anton Seoane Ampudia has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8347463 >> - Merge branch 'JDK-8347463' of github.com:anton-seoane/jdk into JDK-8347463 >> - Merge branch 'openjdk:master' into JDK-8347463 >> - Documentation for future similar cases >> - Test for JDK-8347463 >> - Change to a more specific type >> - Runtime call had void type but actually returned an object > > test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 36: > >> 34: * @summary Tests that the getEventWriter call to write_checkpoint correctly >> 35: * reports returning an oop >> 36: * @requires vm.hasJFR & vm.continuations > > Do we need `vm.continuations`? We don't. Removed the requirement > test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 38: > >> 36: * @requires vm.hasJFR & vm.continuations >> 37: * @library /test/lib / >> 38: * @modules jdk.jfr/jdk.jfr.internal > > Do we need this? We don't. Removed as well! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464983164 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2464989996 From aseoane at openjdk.org Mon Oct 27 09:53:16 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 09:53:16 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v5] In-Reply-To: References: Message-ID: <3Dnm4je4PJq-qCbRaaxg7slgI7i3Iz5Tn19_jnYk7I0=.18015b63-6ad6-46e5-ae52-808f94140839@github.com> > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Apply review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/e91cd483..9191c3b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=03-04 Stats: 8 lines in 1 file changed: 1 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From aseoane at openjdk.org Mon Oct 27 09:53:19 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 09:53:19 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v2] In-Reply-To: References: Message-ID: <_P2PY-zh8RX87bQtxQuZ-54vzcwK4uvdm15dauAwo7w=.7c17f75d-b8ae-463b-95b2-58e3ba8c8444@github.com> On Mon, 27 Oct 2025 09:17:29 GMT, Roberto Casta?eda Lozano wrote: >> Anton Seoane Ampudia has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8347463 >> - Merge branch 'JDK-8347463' of github.com:anton-seoane/jdk into JDK-8347463 >> - Merge branch 'openjdk:master' into JDK-8347463 >> - Documentation for future similar cases >> - Test for JDK-8347463 >> - Change to a more specific type >> - Runtime call had void type but actually returned an object > > Thanks for getting to the bottom of this, Ant?n! The changeset looks good to me, modulo a few documentation and test comments. > It would be good if someone from the JFR team (@mgronlun or @egahlin?) could have a look at this change as well. Thanks for your review @robcasloz! I have updated the changeset according to your comments ------------- PR Comment: https://git.openjdk.org/jdk/pull/27913#issuecomment-3450399965 From epeter at openjdk.org Mon Oct 27 10:22:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 27 Oct 2025 10:22:53 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v10] In-Reply-To: References: Message-ID: > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - Merge branch 'master' into JDK-8370220-get-ctrl-documentation - Merge branch 'JDK-8370220-get-ctrl-documentation' of https://github.com/eme64/jdk into JDK-8370220-get-ctrl-documentation - Apply suggestions from code review Co-authored-by: Christian Hagedorn - fix shenandoah replace for phis - renaming for Tobias - Apply suggestions from code review Co-authored-by: Tobias Hartmann - more for Christian part 3 - more for Christian part 2 - Apply suggestions from code review Co-authored-by: Christian Hagedorn - for Christian part 1 - ... and 9 more: https://git.openjdk.org/jdk/compare/aa5cb3a4...e4bcb769 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27892/files - new: https://git.openjdk.org/jdk/pull/27892/files/44e808bb..e4bcb769 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27892&range=08-09 Stats: 14620 lines in 312 files changed: 8959 ins; 3739 del; 1922 mod Patch: https://git.openjdk.org/jdk/pull/27892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27892/head:pull/27892 PR: https://git.openjdk.org/jdk/pull/27892 From epeter at openjdk.org Mon Oct 27 10:22:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 27 Oct 2025 10:22:54 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v7] In-Reply-To: References: <5NCbOn36uvy0cTen6j1ys_E4uDD-qWjW5N1B-doqCGU=.4f75ce02-f27c-4d9a-ace8-5073f211d78f@github.com> Message-ID: On Tue, 21 Oct 2025 09:42:41 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> renaming for Tobias > > Good naming update! The update looks good to me apart from some last nits. @chhagedorn @TobiHartmann Would either of you be able to re-approve? I've fixed a little issue for shenandoah, and Roland approved that change. I merged with master and am currently running some last testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27892#issuecomment-3450525961 From epeter at openjdk.org Mon Oct 27 10:22:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 27 Oct 2025 10:22:55 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v9] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 09:15:45 GMT, Roland Westrelin wrote: >> @rwestrel Do you have an opinion on this? > > That change looks good to me. @rwestrel Thank, good to know :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27892#discussion_r2465115642 From rcastanedalo at openjdk.org Mon Oct 27 10:23:36 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 10:23:36 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v2] In-Reply-To: References: Message-ID: <7b9lDpDqgp8WI8ou4nWz3uxBxS3mrlmhWJI3cayhxsw=.e381b7e1-6da6-4c4c-8c83-d6ca7840122c@github.com> > This changeset makes it easier to trace C2's individual bytecode parsing steps to the input class files and the output of type flow analysis, by: > > 1. dumping the `map` node (holding the JVM state), basic block (reverse post-order index), and method name as properties of the graph that C2 dumps after each parsed bytecode at IGV dump level 6; > > 2. appending this information to the graph name (hence generalizing [JDK-8356779](https://bugs.openjdk.org/browse/JDK-8356779)); and > > 3. making the graph name suffix configurable via `Options -> Graph Name Suffix`. By default, it appends `(map: [map], block #[block] at [method])` to the names of graphs containing these properties (bytecode parsing dumps), and nothing otherwise. > > Here are the `Outline` and `Properties` windows for > > $ java -Xbatch -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphLevel=6 > > before (left) and after (right) the changeset: > > before-after > > Please let me know if there are other bytecode parsing graph properties that could be useful to dump, and whether you think the default graph name suffix contains the right amount of information. > > #### Testing > - tier1. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure on HotSpot or IGV. Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Simplify code generated by NetBeans for the 'graphNameSuffixField' text field - Use '-' for non-existing maps ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27975/files - new: https://git.openjdk.org/jdk/pull/27975/files/49ee5740..4123bb0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27975&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27975&range=00-01 Stats: 16 lines in 3 files changed: 2 ins; 13 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27975/head:pull/27975 PR: https://git.openjdk.org/jdk/pull/27975 From rcastanedalo at openjdk.org Mon Oct 27 10:23:37 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 10:23:37 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 08:27:50 GMT, Christian Hagedorn wrote: > I tried your patch out and I sometimes see a missing index for map. That might be expected but when looking at it, it rather suggests that something is off. If there is no map, maybe we can use "none" or completely remove the "map" entry for that graph. It is expected, but I agree that the presentation could be improved. I went with printing `-` if there is no map, for compactness (commit 050fe0ddf645c5748e0ad7896686a7e6294afb3d), but I am OK with changing it to `none` if you think that is better. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27975#issuecomment-3450531366 From mgronlun at openjdk.org Mon Oct 27 10:25:26 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 27 Oct 2025 10:25:26 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v5] In-Reply-To: <3Dnm4je4PJq-qCbRaaxg7slgI7i3Iz5Tn19_jnYk7I0=.18015b63-6ad6-46e5-ae52-808f94140839@github.com> References: <3Dnm4je4PJq-qCbRaaxg7slgI7i3Iz5Tn19_jnYk7I0=.18015b63-6ad6-46e5-ae52-808f94140839@github.com> Message-ID: On Mon, 27 Oct 2025 09:53:16 GMT, Anton Seoane Ampudia wrote: >> This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. >> >> The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. >> >> However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. >> >> An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. >> >> **Testing:** passes tiers 1-5 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Apply review comments Marked as reviewed by mgronlun (Reviewer). Thanks for getting to the cause and fixing this. ------------- PR Review: https://git.openjdk.org/jdk/pull/27913#pullrequestreview-3382775048 PR Comment: https://git.openjdk.org/jdk/pull/27913#issuecomment-3450540758 From rcastanedalo at openjdk.org Mon Oct 27 10:28:28 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 10:28:28 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v2] In-Reply-To: <7b9lDpDqgp8WI8ou4nWz3uxBxS3mrlmhWJI3cayhxsw=.e381b7e1-6da6-4c4c-8c83-d6ca7840122c@github.com> References: <7b9lDpDqgp8WI8ou4nWz3uxBxS3mrlmhWJI3cayhxsw=.e381b7e1-6da6-4c4c-8c83-d6ca7840122c@github.com> Message-ID: On Mon, 27 Oct 2025 10:23:36 GMT, Roberto Casta?eda Lozano wrote: >> This changeset makes it easier to trace C2's individual bytecode parsing steps to the input class files and the output of type flow analysis, by: >> >> 1. dumping the `map` node (holding the JVM state), basic block (reverse post-order index), and method name as properties of the graph that C2 dumps after each parsed bytecode at IGV dump level 6; >> >> 2. appending this information to the graph name (hence generalizing [JDK-8356779](https://bugs.openjdk.org/browse/JDK-8356779)); and >> >> 3. making the graph name suffix configurable via `Options -> Graph Name Suffix`. By default, it appends `(map: [map], block #[block] at [method])` to the names of graphs containing these properties (bytecode parsing dumps), and nothing otherwise. >> >> Here are the `Outline` and `Properties` windows for >> >> $ java -Xbatch -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphLevel=6 >> >> before (left) and after (right) the changeset: >> >> before-after >> >> Please let me know if there are other bytecode parsing graph properties that could be useful to dump, and whether you think the default graph name suffix contains the right amount of information. >> >> #### Testing >> - tier1. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure on HotSpot or IGV. > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Simplify code generated by NetBeans for the 'graphNameSuffixField' text field > - Use '-' for non-existing maps Thanks for the review, Christian! I just addressed your comments, please let me know what you think about the new version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27975#issuecomment-3450550628 From rcastanedalo at openjdk.org Mon Oct 27 10:28:30 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 10:28:30 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v2] In-Reply-To: References: Message-ID: <4UX5Vaj1aGBtVya6GBPLaHAUMUF5Lg64XZeBlci_Q0o=.55273dc4-b6af-4896-bdd9-404549d1bdd1@github.com> On Mon, 27 Oct 2025 08:11:28 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Simplify code generated by NetBeans for the 'graphNameSuffixField' text field >> - Use '-' for non-existing maps > > src/utils/IdealGraphVisualizer/Settings/src/main/java/com/sun/hotspot/igv/settings/ViewPanel.java line 168: > >> 166: private void graphNameSuffixFieldActionPerformed(java.awt.event.ActionEvent evt) {//GEN-FIRST:event_graphNameSuffixFieldActionPerformed >> 167: // TODO add your handling code here: >> 168: }//GEN-LAST:event_graphNameSuffixFieldActionPerformed > > Can you explain why this nop-action is needed? Good catch. It is not needed, just the result of some spurious clicking on the NetBeans form editor that resulted in that code being auto-generated. I removed the action declaration and other GUI properties of the new text field to align the generated code with that of the other text fields in the Options form (commit 4123bb0c9ccb8b60993f4b575c36f9751cefd1b9). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27975#discussion_r2465139130 From chagedorn at openjdk.org Mon Oct 27 10:34:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 Oct 2025 10:34:07 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v10] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 10:22:53 GMT, Emanuel Peter wrote: >> When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. >> >> Here, I'm doing the following: >> - Add more documentation, and improve it in other cases. >> - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. >> - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` >> - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` >> - Made some methods private, and added some additional asserts. >> >> I'd be more than happy for even better names, and suggestions how to improve the documentation further :) >> >> Related issues: >> https://github.com/openjdk/jdk/pull/27889 >> https://github.com/openjdk/jdk/pull/15720 > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370220-get-ctrl-documentation > - Merge branch 'JDK-8370220-get-ctrl-documentation' of https://github.com/eme64/jdk into JDK-8370220-get-ctrl-documentation > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - fix shenandoah replace for phis > - renaming for Tobias > - Apply suggestions from code review > > Co-authored-by: Tobias Hartmann > - more for Christian part 3 > - more for Christian part 2 > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - for Christian part 1 > - ... and 9 more: https://git.openjdk.org/jdk/compare/2da7861e...e4bcb769 Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27892#pullrequestreview-3382807850 From chagedorn at openjdk.org Mon Oct 27 10:35:04 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 Oct 2025 10:35:04 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v4] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 08:42:37 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. >> >> This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). >> >> However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. >> >> This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. >> As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. >> >> ```c++ >> ... >> // Global Value Numbering >> i = hash_find_insert(k); // Check for pre-existing node >> if (i && (i != k)) { >> // Return the pre-existing node if it isn't dead >> NOT_PRODUCT(set_progress();) >> add_users_to_worklist(k); >> subsume_node(k, i); // Everybody using k now uses i >> return i; >> } >> ... >> >> >> The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. >> >> ### Proposed Fix >> >> We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) >> - [x] tier1-3, plus some internal testing >> - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Add -XX:+StressIGVN to run without fixed seed Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27900#pullrequestreview-3382811026 From chagedorn at openjdk.org Mon Oct 27 10:35:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 Oct 2025 10:35:07 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v3] In-Reply-To: References: <_ASXYjiMmoPRBYSFQqTHB2N9yng6jMXyQH6lbIOOktY=.62676c5c-94a4-4442-8fba-86fa3fea6564@github.com> Message-ID: On Mon, 27 Oct 2025 08:39:21 GMT, Beno?t Maillard wrote: >> test/hotspot/jtreg/compiler/c2/TestEliminateRedundantConversionSequences.java line 32: >> >>> 30: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions >>> 31: * -XX:CompileCommand=compileonly,compiler.c2.TestEliminateRedundantConversionSequences::test* >>> 32: * -XX:-TieredCompilation -Xbatch -XX:VerifyIterativeGVN=1110 >> >> You could either add a separate run with `-XX:+StressIGVN` without a fixed seed or just add `-XX:+StressIGVN` here. I guess the latter is good enough. > > I agree, and I just added `-XX:+StressIGVN` to the existing run. Looks good, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2465162966 From shade at openjdk.org Mon Oct 27 10:38:15 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 27 Oct 2025 10:38:15 GMT Subject: Integrated: 8370318: AES-GCM vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 07:20:48 GMT, Aleksey Shipilev wrote: > See the bug for symptoms and discussion. > > In short, in newly added intrinsic in JDK 24, there is a potential read out of Java heap if key array is at the edge of it, which will crash JVM. And that read is redundant for the code path in question, we only use it in the subsequent blocks that we never actually enter in the problematic case. So we never see any failures in testing: the only observable effect is SEGV on uncommitted heap access. It is somewhat similar to [JDK-8330611](https://bugs.openjdk.org/browse/JDK-8330611) we have fixed in other place. But this one can be caught with the explicit range check in debug code. > > I opted to keep this patch very simple, because I would backport it to 25u shortly after we integrate to mainline. It just moves the read down to the block where it is actually needed. Note that `aes_192` and `aes_256` labels are red herring in this code, they are unbound; you can even remove them without any bulid errors. The actual thing that drives path selection is `NROUNDS` -- that one is derived from the key array length -- and we are just doing the read too early. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `com/sun/crypto/provider/Cipher compiler/codegen/aes` (fails with range check only, passes with entire patch) > - [x] Linux x86_64 server fastdebug, `all` on AVX-512 machine This pull request has now been integrated. Changeset: 7bb490c4 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/7bb490c4bf7ae55547e4468da0795dac0a873d2b Stats: 22 lines in 2 files changed: 21 ins; 1 del; 0 mod 8370318: AES-GCM vector intrinsic may read out of bounds (x86_64, AVX-512) Reviewed-by: kvn, roland ------------- PR: https://git.openjdk.org/jdk/pull/27951 From shade at openjdk.org Mon Oct 27 10:38:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 27 Oct 2025 10:38:13 GMT Subject: RFR: 8370318: AES-GCM vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 07:20:48 GMT, Aleksey Shipilev wrote: > See the bug for symptoms and discussion. > > In short, in newly added intrinsic in JDK 24, there is a potential read out of Java heap if key array is at the edge of it, which will crash JVM. And that read is redundant for the code path in question, we only use it in the subsequent blocks that we never actually enter in the problematic case. So we never see any failures in testing: the only observable effect is SEGV on uncommitted heap access. It is somewhat similar to [JDK-8330611](https://bugs.openjdk.org/browse/JDK-8330611) we have fixed in other place. But this one can be caught with the explicit range check in debug code. > > I opted to keep this patch very simple, because I would backport it to 25u shortly after we integrate to mainline. It just moves the read down to the block where it is actually needed. Note that `aes_192` and `aes_256` labels are red herring in this code, they are unbound; you can even remove them without any bulid errors. The actual thing that drives path selection is `NROUNDS` -- that one is derived from the key array length -- and we are just doing the read too early. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `com/sun/crypto/provider/Cipher compiler/codegen/aes` (fails with range check only, passes with entire patch) > - [x] Linux x86_64 server fastdebug, `all` on AVX-512 machine Thank you! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27951#issuecomment-3450582979 From aseoane at openjdk.org Mon Oct 27 10:40:32 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 10:40:32 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v6] In-Reply-To: References: Message-ID: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/9191c3b3..280141be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From rcastanedalo at openjdk.org Mon Oct 27 10:40:35 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 10:40:35 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v5] In-Reply-To: <3Dnm4je4PJq-qCbRaaxg7slgI7i3Iz5Tn19_jnYk7I0=.18015b63-6ad6-46e5-ae52-808f94140839@github.com> References: <3Dnm4je4PJq-qCbRaaxg7slgI7i3Iz5Tn19_jnYk7I0=.18015b63-6ad6-46e5-ae52-808f94140839@github.com> Message-ID: On Mon, 27 Oct 2025 09:53:16 GMT, Anton Seoane Ampudia wrote: >> This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. >> >> The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. >> >> However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. >> >> An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. >> >> **Testing:** passes tiers 1-5 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Apply review comments Thanks for addressing my comments, just have two last suggestions, looks good otherwise! test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 51: > 49: // Crash was due to the returns_oop field not being set > 50: // for the write_checkpoint call. Instead of explicitly checking for > 51: // it, we look for a non-void return type (which comes hand-in-hand Suggestion: // it, we look for an oop return type (which comes hand-in-hand test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 55: > 53: @Test > 54: @IR(counts = { IRNode.STATIC_CALL_OF_METHOD, "write_checkpoint\s+java/lang/Object\s+\\*", "1" }) > 55: public void myTest() { Maybe use a more descriptive name than `myTest`, for example summarizing the effect of the test (`testStartRecordingAndEmitEvent`) or its intent (`testWriteCheckpointReturnType`). ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27913#pullrequestreview-3382802589 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2465156043 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2465162595 From aseoane at openjdk.org Mon Oct 27 10:49:50 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 10:49:50 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v7] In-Reply-To: References: Message-ID: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Change test method name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/280141be..922d03f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From rcastanedalo at openjdk.org Mon Oct 27 10:49:51 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 10:49:51 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v7] In-Reply-To: References: Message-ID: <_Atow0xqKu5DW3kdJms5ZzPuCpQnUquCQwq9A2CWe5I=.7e2af709-d864-4b19-a16b-59b373ab9954@github.com> On Mon, 27 Oct 2025 10:46:46 GMT, Anton Seoane Ampudia wrote: >> This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. >> >> The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. >> >> However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. >> >> An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. >> >> **Testing:** passes tiers 1-5 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Change test method name Thanks! Please test the latest changes, at least through some low tiers, before integration. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27913#pullrequestreview-3382860198 From aseoane at openjdk.org Mon Oct 27 10:49:53 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 10:49:53 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v5] In-Reply-To: References: <3Dnm4je4PJq-qCbRaaxg7slgI7i3Iz5Tn19_jnYk7I0=.18015b63-6ad6-46e5-ae52-808f94140839@github.com> Message-ID: On Mon, 27 Oct 2025 10:32:34 GMT, Roberto Casta?eda Lozano wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply review comments > > test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 55: > >> 53: @Test >> 54: @IR(counts = { IRNode.STATIC_CALL_OF_METHOD, "write_checkpoint\s+java/lang/Object\s+\\*", "1" }) >> 55: public void myTest() { > > Maybe use a more descriptive name than `myTest`, for example summarizing the effect of the test (`testStartRecordingAndEmitEvent`) or its intent (`testWriteCheckpointReturnType`). For sure. I don't know how I forgot to change such a placeholder name... I have gone for the latter suggestion ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2465189899 From apangin at openjdk.org Mon Oct 27 11:28:14 2025 From: apangin at openjdk.org (Andrei Pangin) Date: Mon, 27 Oct 2025 11:28:14 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v7] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:05:04 GMT, Francesco Andreuzzi wrote: >> I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. >> >> Passes tier1 and tier2 (fastdebug). > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > sleep Marked as reviewed by apangin (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/27742#pullrequestreview-3383005818 From epeter at openjdk.org Mon Oct 27 11:36:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 27 Oct 2025 11:36:54 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v8] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 105 additional commits since the last revision: - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName - Manuel's suggestions Co-authored-by: Manuel H?ssig - Merge branch 'master' into JDK-8367531-fix-addDataName - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Manuel H?ssig - improve tutorial for Manuel - fix TestMethodArguments.java after merge with master - fix tests after integration of Expressions/Operations - Merge branch 'master' into JDK-8367531-fix-addDataName - fix test - ... and 95 more: https://git.openjdk.org/jdk/compare/97335a98...317e3e8b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/68f719d7..317e3e8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=06-07 Stats: 35405 lines in 890 files changed: 19678 ins; 10988 del; 4739 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Mon Oct 27 11:40:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 27 Oct 2025 11:40:34 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body Message-ID: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Analysis: `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. Future Work: - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop ------------- Commit messages: - fix tab - added test - only for loops - solution wip - JDK-8370332 Changes: https://git.openjdk.org/jdk/pull/27955/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27955&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370332 Stats: 115 lines in 2 files changed: 100 ins; 10 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27955.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27955/head:pull/27955 PR: https://git.openjdk.org/jdk/pull/27955 From fandreuzzi at openjdk.org Mon Oct 27 11:46:51 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Mon, 27 Oct 2025 11:46:51 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v7] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 11:25:59 GMT, Andrei Pangin wrote: >> Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: >> >> sleep > > Marked as reviewed by apangin (Author). Thanks for the review @apangin, @dean-long, @shipilev. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27742#issuecomment-3450860153 From duke at openjdk.org Mon Oct 27 11:46:53 2025 From: duke at openjdk.org (duke) Date: Mon, 27 Oct 2025 11:46:53 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v7] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:05:04 GMT, Francesco Andreuzzi wrote: >> I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. >> >> Passes tier1 and tier2 (fastdebug). > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > sleep @fandreuz Your change (at version 5d0c70562614c5a3cd3391f5191bf60c5b51e82c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27742#issuecomment-3450864736 From mhaessig at openjdk.org Mon Oct 27 12:11:09 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 27 Oct 2025 12:11:09 GMT Subject: RFR: 8370579: PPC: fix inswri immediate argument order In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 15:19:06 GMT, Manuel H?ssig wrote: > This cleanup PR swaps and renames the immediate arguments of the two `insrwi` instruction macros in `ppc.ad` such that they correspond to the order and names in the manual. This involved swapping the arguments in all six usages. I hope this saves the next person trying to reason about this some confused hours. > > Testing: > - [x] Github Actions > - [x] Running some relevant tests (`compiler/c2/TestCharShortByteSwap.java`, `jdk/java/lang/Short/ByteSwap.java`, `jdk/java/lang/Integer/BitTwiddle.java`, `compiler/codegen/Test6431242.java`) in qemu Perhaps @MBaesken or @TheRealMDoerr could give this a tier1 spin on their ppc CI? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27978#issuecomment-3450956197 From dskantz at openjdk.org Mon Oct 27 12:33:49 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Mon, 27 Oct 2025 12:33:49 GMT Subject: RFR: 8362117: C2: compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a wrong result due to invalidated liveness assumptions for data phis [v2] In-Reply-To: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com> References: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com> Message-ID: On Wed, 3 Sep 2025 08:02:04 GMT, Daniel Skantz wrote: >> This PR addresses a wrong compilation during string optimizations. >> >> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2. >> >> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch. >> >> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117. >> >> Testing: T1-3 (aed5952). >> >> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test. > > Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: > > - store intermediate calculations > - direction convention Thanks! I get the impression that JDK-6892658 was written with no complex dependencies in mind, and that the result projection was assumed to be directly used by the constructor or as a parameter in the second StringBuilder chain, so it would have been safe to remove. In JDK-7179138, we added support for some null checks, which we might also want to keep supporting. In the string concat optimization, the toString and append calls are all removed, and instead individual stores or array copies are emitted for each of the arguments that make up the StringBuilder. I don't see an easy way to retain the information needed to support arbitrary conditional code such as checks for equality on this result. Previous attempts to support additional patterns have been complicated: JDK-7179138 has several follow-up bugs, and JDK-8341696 was backed out. I am thinking to either go for a spot fix (in this PR I am validating an assumption made in JDK-8291775), or possibly a more general constraint procedure where the result projection uses are bounded to what is known and safe: (potentially null-checked) uses in constructor or append calls of the second StringBuilder. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27028#issuecomment-3451041567 From aseoane at openjdk.org Mon Oct 27 12:42:30 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 12:42:30 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v8] In-Reply-To: References: Message-ID: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Revert to failOn check as getEventWriter is not always inlined ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/922d03f1..9ff84e73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From aseoane at openjdk.org Mon Oct 27 13:03:54 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 13:03:54 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v9] In-Reply-To: References: Message-ID: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Adjust comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/9ff84e73..aa09c1bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From dfenacci at openjdk.org Mon Oct 27 13:04:35 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 27 Oct 2025 13:04:35 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v2] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: On Wed, 1 Oct 2025 12:28:38 GMT, Saranya Natarajan wrote: >> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). >> >> ### Fix >> Implemented the suggested refactoring. >> >> ### Testing >> Github Actions, Tier 1-3 > > Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: > > - fixing test failure > - addressing review comments Changes requested by dfenacci (Committer). src/hotspot/share/opto/idealGraphPrinter.cpp line 1131: > 1129: print_property(C->matcher()->is_dontcare(node), "is_dontcare"); > 1130: print_property(!(C->matcher()->is_dontcare(node)),"is_dontcare", IdealGraphPrinter::FALSE_VALUE); > 1131: print_property((C->matcher()->find_old_node(node) != nullptr), "old_node_idx", C->matcher()->find_old_node(node)->_idx); I think we might have an issue here: `C->matcher()->find_old_node(node)->_idx` is always evaluated no matter if `C->matcher()->find_old_node(node) != nullptr` or not. src/hotspot/share/opto/idealGraphPrinter.hpp line 180: > 178: PrintProperties(IdealGraphPrinter *printer) : _printer(printer) {} > 179: void print_node_properties(Node *node, Compile *C); > 180: void print_lrg_properties(const LRG &lrg, const char *buffer); is passing by reference done to avoid copying? ------------- PR Review: https://git.openjdk.org/jdk/pull/26902#pullrequestreview-3383168804 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2465575666 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2465423152 From thartmann at openjdk.org Mon Oct 27 13:41:18 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 27 Oct 2025 13:41:18 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v10] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 10:22:53 GMT, Emanuel Peter wrote: >> When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. >> >> Here, I'm doing the following: >> - Add more documentation, and improve it in other cases. >> - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. >> - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` >> - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` >> - Made some methods private, and added some additional asserts. >> >> I'd be more than happy for even better names, and suggestions how to improve the documentation further :) >> >> Related issues: >> https://github.com/openjdk/jdk/pull/27889 >> https://github.com/openjdk/jdk/pull/15720 > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370220-get-ctrl-documentation > - Merge branch 'JDK-8370220-get-ctrl-documentation' of https://github.com/eme64/jdk into JDK-8370220-get-ctrl-documentation > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - fix shenandoah replace for phis > - renaming for Tobias > - Apply suggestions from code review > > Co-authored-by: Tobias Hartmann > - more for Christian part 3 > - more for Christian part 2 > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - for Christian part 1 > - ... and 9 more: https://git.openjdk.org/jdk/compare/ca257175...e4bcb769 Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27892#pullrequestreview-3383556091 From roland at openjdk.org Mon Oct 27 13:46:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 27 Oct 2025 13:46:47 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body In-Reply-To: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: On Thu, 23 Oct 2025 14:23:38 GMT, Emanuel Peter wrote: > Analysis: > `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. > > It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. > > What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. > > I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. > > Future Work: > - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. > - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop src/hotspot/share/opto/loopopts.cpp line 241: > 239: Node* in = n->in(j); > 240: // Check that in is a phi, and n was its only use. > 241: if (in->is_Phi() && in->in(0) == region && Does that work if, say, we're splitting: `(Add (Phi ..) (Phi ..)` With a single `Phi` as input twice? Doesn't the `Phi` have 2 uses then (the `Add`, twice)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27955#discussion_r2465736832 From chagedorn at openjdk.org Mon Oct 27 14:01:03 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 Oct 2025 14:01:03 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v17] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 19:07:32 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > mark LoopExitTest::is_valid_with_bt() const Nice update, it looks very good already! I did a complete pass and left some more comments. I'm a bit worried about breaking something which might not be noticed because we just don't create a counted loop anymore. Have you thought about some ways to test this? One idea could be to do some runs with some custom logging in place when a counted loop was successfully created and then compare the output to a baseline without your patch and the same logging in place. We should just have some confidence that we do not introduce (performance) regressions. src/hotspot/share/opto/loopnode.cpp line 1638: > 1636: #ifdef ASSERT > 1637: void PhaseIdealLoop::check_counted_loop_shape(IdealLoopTree* loop, Node* x, BasicType bt) { > 1638: Node* back_control = loop_exit_control(x, loop); How far away are we from just using `LoopStructure` and then `LoopStructure::is_valid()` instead? src/hotspot/share/opto/loopnode.cpp line 1698: > 1696: _mask = BoolTest(_mask).commute(); // And commute the exit test > 1697: } > 1698: if (_phase->is_member(_loop, _phase->get_ctrl(_limit))) { // Limit must be loop-invariant Here and related loop-(in)variant checks: Could we use `IdealLoopTree::is_invariant()`? src/hotspot/share/opto/loopnode.cpp line 1709: > 1707: > 1708: // Canonicalize the loop condition if it is 'ne'. > 1709: bool PhaseIdealLoop::LoopExitTest::canonicalize_mask(jlong stride_con) { You do not seem to use the return value and thus it can be removed. src/hotspot/share/opto/loopnode.cpp line 1731: > 1729: > 1730: // Should not reach > 1731: return false; With the assert above for `stride_con` being 1 or -1, do we need this `if` here? Or are you concerned about when this does not hold in product? But since we are not using the return value anyways, I don't think it makes a difference and you could just drop the `if`. src/hotspot/share/opto/loopnode.cpp line 1764: > 1762: // Get merge point > 1763: _xphi = incr->in(1); > 1764: _node = incr->in(2); Should we name `_node` simply `_stride_con`? src/hotspot/share/opto/loopnode.cpp line 1803: > 1801: _is_valid = false; > 1802: > 1803: _back_control = _phase->loop_exit_control(_head, _loop); You already call `loop_exit_control()` in the constructor. How about doing the following in the constructor instead? _back_control(_phase->loop_exit_control(_head, _loop)) _exit_test(_back_control, _loop, _phase) src/hotspot/share/opto/loopnode.cpp line 1821: > 1819: // if (!_iv_incr.is_valid_with_bt(_iv_bt)) { > 1820: // return; > 1821: // } leftover? src/hotspot/share/opto/loopnode.cpp line 2141: > 2139: // If that is not the case, we need to canonicalize the loop exit check by using different values for adjusted_limit > 2140: // (see LoopStructure::final_limit_correction()). > 2141: // Note that after canonicalization: Add some space: Suggestion: // // Note that after canonicalization: src/hotspot/share/opto/loopnode.cpp line 2405: > 2403: // } > 2404: // > 2405: // If the array is shorter than 0x8000 this exits through a AIOOB Suggestion: // If the array is shorter than 0x8000 this exits through an AIOOB src/hotspot/share/opto/loopnode.cpp line 2412: > 2410: const TypeInteger* incr_t = igvn.type(_iv_incr.incr())->is_integer(_iv_bt); > 2411: if (limit_t->hi_as_long() > incr_t->hi_as_long()) { > 2412: // if the limit can have a higher value than the increment (before the0 phi) Suggestion: // if the limit can have a higher value than the increment (before the phi) src/hotspot/share/opto/loopnode.cpp line 2477: > 2475: } > 2476: > 2477: bool CountedLoopConverter::is_safepoint_invalid(SafePointNode* sfpt) { Can be made `const`. src/hotspot/share/opto/loopnode.cpp line 2500: > 2498: PhaseIterGVN* igvn = &_phase->igvn(); > 2499: Node* init_control = _head->in(LoopNode::EntryControl); > 2500: const jlong stride_con = _structure.stride().compute_non_zero_stride_con(_structure.exit_test().mask(), _iv_bt); I've noticed that you use this pattern a few times. How about having a `LoopStructure::stride_con()` method instead? src/hotspot/share/opto/loopnode.cpp line 2505: > 2503: Node* cmp_limit = CmpNode::make(_structure.exit_test().limit(), igvn->integercon((stride_con > 0 > 2504: ? max_signed_integer(_iv_bt) > 2505: : min_signed_integer(_iv_bt)) It might be easier to read when we extract the `intercon()` call to a separate variable in a line above. src/hotspot/share/opto/loopnode.cpp line 2513: > 2511: Node* init_trip = _structure.phi()->in(LoopNode::EntryControl); > 2512: if (_insert_init_trip_limit_check) { > 2513: Node* cmp_limit = CmpNode::make(init_trip, _structure.exit_test().limit(), _iv_bt); You also seem to be using `_structure.exit_test().limit()` a lot. We could also provide a `structure.limit()` method instead. src/hotspot/share/opto/loopnode.cpp line 2559: > 2557: } > 2558: _phase->set_subtree_ctrl(adjusted_limit, false); > 2559: _phase->set_subtree_ctrl(adjusted_limit, false); Duplicated Suggestion: src/hotspot/share/opto/loopnode.cpp line 2615: > 2613: Node* iffalse = iff->as_If()->proj_out(!(iftrue_op == Op_IfTrue)); > 2614: > 2615: // Need to swap loop-exit and loop-back control? Suggestion: // Need to swap loop-exit and loop-back control? src/hotspot/share/opto/loopnode.cpp line 2747: > 2745: // If there is one, then we do not need to create an additional Loop Limit Check Predicate. > 2746: bool CountedLoopConverter::has_dominating_loop_limit_check(Node* init_trip, Node* limit, const jlong stride_con, > 2747: const BasicType iv_bt, Node* loop_entry) { Can be made `const` and indentation is off. src/hotspot/share/opto/loopnode.hpp line 277: > 275: > 276: // Match increment with optional truncation > 277: class TruncatedIncrement { You could move this code down to the other loop structure classes. src/hotspot/share/opto/loopnode.hpp line 1338: > 1336: _back_control(back_control), > 1337: _loop(loop), > 1338: _phase(phase) {} Maybe also add an assert here that `back_control` is non-null. src/hotspot/share/opto/loopnode.hpp line 2021: > 2019: PhaseIdealLoop::LoopIVStride _stride; > 2020: PhiNode* _phi = nullptr; > 2021: SafePointNode* _sfpt = nullptr; Can we name it `_safepoint`? src/hotspot/share/opto/loopnode.hpp line 2062: > 2060: #ifdef ASSERT > 2061: bool _checked_for_counted_loop = false; > 2062: #endif For single lines, you can use `DEBUG_ONLY()`, same at other places as well in the patch. Suggestion: DEBUG_ONLY(bool _checked_for_counted_loop = false;) src/hotspot/share/opto/loopnode.hpp line 2093: > 2091: assert(head != nullptr, ""); > 2092: assert(loop != nullptr, ""); > 2093: assert(iv_bt == T_INT || iv_bt == T_LONG, ""); // Loops can be either int or long. Maybe add an assertion message: Suggestion: assert(phase != nullptr, "must be"); // Fail early if mandatory parameters are null. assert(head != nullptr, "must be"); assert(loop != nullptr, "must be"); assert(iv_bt == T_INT || iv_bt == T_LONG, "either int or long loops"); src/hotspot/share/opto/loopopts.cpp line 4273: > 4271: // With an extra phi for the candidate iv? > 4272: // Or the region node is the loop head > 4273: if (!loop_exit.incr()->is_Phi() || loop_exit.incr()->in(0) == head) { You seem to query `incr()` many times - might be worth to extract to a separate `loop_incr` variable and reuse it. ------------- PR Review: https://git.openjdk.org/jdk/pull/24458#pullrequestreview-3383037106 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465494653 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465359232 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465561539 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465569195 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465429129 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465334860 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465349597 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465517176 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465402120 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465401282 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465572745 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465627831 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465648729 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465653480 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465676111 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465706245 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465716571 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465736725 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465343886 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465575237 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465582423 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465327913 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2465756399 From mdoerr at openjdk.org Mon Oct 27 14:12:38 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 27 Oct 2025 14:12:38 GMT Subject: RFR: 8370579: PPC: fix inswri immediate argument order In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 15:19:06 GMT, Manuel H?ssig wrote: > This cleanup PR swaps and renames the immediate arguments of the two `insrwi` instruction macros in `ppc.ad` such that they correspond to the order and names in the manual. This involved swapping the arguments in all six usages. I hope this saves the next person trying to reason about this some confused hours. > > Testing: > - [x] Github Actions > - [x] Running some relevant tests (`compiler/c2/TestCharShortByteSwap.java`, `jdk/java/lang/Short/ByteSwap.java`, `jdk/java/lang/Integer/BitTwiddle.java`, `compiler/codegen/Test6431242.java`) in qemu Looks good and tier1 has passed. Thanks for cleaning this up! The old code was really hard to read. Maybe @MBaesken can provide a 2nd review. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27978#pullrequestreview-3383718812 From chagedorn at openjdk.org Mon Oct 27 14:20:48 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 Oct 2025 14:20:48 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v2] In-Reply-To: <4UX5Vaj1aGBtVya6GBPLaHAUMUF5Lg64XZeBlci_Q0o=.55273dc4-b6af-4896-bdd9-404549d1bdd1@github.com> References: <4UX5Vaj1aGBtVya6GBPLaHAUMUF5Lg64XZeBlci_Q0o=.55273dc4-b6af-4896-bdd9-404549d1bdd1@github.com> Message-ID: On Mon, 27 Oct 2025 10:24:26 GMT, Roberto Casta?eda Lozano wrote: >> src/utils/IdealGraphVisualizer/Settings/src/main/java/com/sun/hotspot/igv/settings/ViewPanel.java line 168: >> >>> 166: private void graphNameSuffixFieldActionPerformed(java.awt.event.ActionEvent evt) {//GEN-FIRST:event_graphNameSuffixFieldActionPerformed >>> 167: // TODO add your handling code here: >>> 168: }//GEN-LAST:event_graphNameSuffixFieldActionPerformed >> >> Can you explain why this nop-action is needed? > > Good catch. It is not needed, just the result of some spurious clicking on the NetBeans form editor that resulted in that code being auto-generated. I removed the action declaration and other GUI properties of the new text field to align the generated code with that of the other text fields in the Options form (commit 4123bb0c9ccb8b60993f4b575c36f9751cefd1b9). Can this method then be removed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27975#discussion_r2465862929 From chagedorn at openjdk.org Mon Oct 27 14:20:46 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 27 Oct 2025 14:20:46 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v2] In-Reply-To: <7b9lDpDqgp8WI8ou4nWz3uxBxS3mrlmhWJI3cayhxsw=.e381b7e1-6da6-4c4c-8c83-d6ca7840122c@github.com> References: <7b9lDpDqgp8WI8ou4nWz3uxBxS3mrlmhWJI3cayhxsw=.e381b7e1-6da6-4c4c-8c83-d6ca7840122c@github.com> Message-ID: On Mon, 27 Oct 2025 10:23:36 GMT, Roberto Casta?eda Lozano wrote: >> This changeset makes it easier to trace C2's individual bytecode parsing steps to the input class files and the output of type flow analysis, by: >> >> 1. dumping the `map` node (holding the JVM state), basic block (reverse post-order index), and method name as properties of the graph that C2 dumps after each parsed bytecode at IGV dump level 6; >> >> 2. appending this information to the graph name (hence generalizing [JDK-8356779](https://bugs.openjdk.org/browse/JDK-8356779)); and >> >> 3. making the graph name suffix configurable via `Options -> Graph Name Suffix`. By default, it appends `(map: [map], block #[block] at [method])` to the names of graphs containing these properties (bytecode parsing dumps), and nothing otherwise. >> >> Here are the `Outline` and `Properties` windows for >> >> $ java -Xbatch -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphLevel=6 >> >> before (left) and after (right) the changeset: >> >> before-after >> >> Please let me know if there are other bytecode parsing graph properties that could be useful to dump, and whether you think the default graph name suffix contains the right amount of information. >> >> #### Testing >> - tier1. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure on HotSpot or IGV. > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Simplify code generated by NetBeans for the 'graphNameSuffixField' text field > - Use '-' for non-existing maps Looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27975#pullrequestreview-3383753959 From rcastanedalo at openjdk.org Mon Oct 27 14:32:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 14:32:52 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v9] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 13:03:54 GMT, Anton Seoane Ampudia wrote: >> This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. >> >> The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. >> >> However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. >> >> An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. >> >> **Testing:** passes tiers 1-5 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Adjust comment Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27913#pullrequestreview-3383819530 From roland at openjdk.org Mon Oct 27 14:44:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 27 Oct 2025 14:44:02 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v7] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 09:32:48 GMT, Qizheng Xing wrote: >> In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. >> >> loop-safepoint >> >> This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). >> >> >> Benchmark Mode Cnt Score Error Units >> LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline >> LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch >> >> >> Testing: tier1-2 on x86_64 and aarch64. > > Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: > > - Update microbench > - Add IR tests for nested loops Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23057#pullrequestreview-3383871439 From roland at openjdk.org Mon Oct 27 14:44:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 27 Oct 2025 14:44:06 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: On Tue, 16 Sep 2025 05:51:43 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve documentation comments > > src/hotspot/share/opto/loopnode.cpp line 3840: > >> 3838: // inside any nested loop, then that loop is okay >> 3839: // E) Otherwise, if an outer loop's ncsfpt on the idom-path is nested in >> 3840: // an inner loop, we need to prevent the inner loop from deleting it > > Nice, that's indeed an improvement :) It would be nice to make sure all cases here have an IR test which is not the case AFAICT. Can you open a JBS issue for that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2465947623 From epeter at openjdk.org Mon Oct 27 14:50:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 27 Oct 2025 14:50:47 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body In-Reply-To: References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: On Mon, 27 Oct 2025 13:43:57 GMT, Roland Westrelin wrote: >> Analysis: >> `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. >> >> It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. >> >> What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. >> >> I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. >> >> Future Work: >> - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. >> - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop > > src/hotspot/share/opto/loopopts.cpp line 241: > >> 239: Node* in = n->in(j); >> 240: // Check that in is a phi, and n was its only use. >> 241: if (in->is_Phi() && in->in(0) == region && > > Does that work if, say, we're splitting: > > `(Add (Phi ..) (Phi ..)` > > With a single `Phi` as input twice? Doesn't the `Phi` have 2 uses then (the `Add`, twice)? Hmm, good point. I'll have to see if I can find a reproducer for that... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27955#discussion_r2465977342 From roland at openjdk.org Mon Oct 27 15:01:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 27 Oct 2025 15:01:11 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v7] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 14:41:38 GMT, Roland Westrelin wrote: >> Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update microbench >> - Add IR tests for nested loops > > Looks good to me. > One question I have, maybe @rwestrel can weight in here too: how does all of this play with `LongCountedLoops`? I suppose they decay to int loops at some point... Right. But that's only guaranteed if there's a safepoint right above the exit condition of the long counted loop. I don't think that change makes a difference there given, for `LongCountedLoop`s, `IdealLoopTree::remove_safepoints()` is called with `keep_one = true`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-3451720944 From shade at openjdk.org Mon Oct 27 15:05:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 27 Oct 2025 15:05:51 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v7] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:05:04 GMT, Francesco Andreuzzi wrote: >> I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. >> >> Passes tier1 and tier2 (fastdebug). > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > sleep @dean-long -- you are fine with this patch, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27742#issuecomment-3451745425 From iveresov at openjdk.org Mon Oct 27 15:13:03 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 27 Oct 2025 15:13:03 GMT Subject: RFR: 8368321: Rethink compilation delay strategy for lukewarm methods [v2] In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 01:36:39 GMT, Igor Veresov wrote: >> In the current implementation we delay profiling of lukewarm methods (those that were never compiled by C2 during training) by increasing the 2->3 threshold by a factor. That may shift profiling of those too much into the future if a large factor is used, if we use a small factor, however, profiling may happen within the training run window so to speak. The solution I came up with it to delay profiling until we reach the number of invocations of a method equal to the number we had in the training run. After that we use the normal policy. >> >> Here is an example. I trained our JavacBenchApp for 5 iterations (which is artificially low and therefore many methods would be classified as lukewarm). Then I ran it for 200 iterations with AOT replay. >> >> old-vs-new >> >> While initially the performance is similar it quickly diverges. With the new approach we move to standard handling of lukewarm methods after 5 iterations and they get compiled with C2. With the old approach we don't. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Fix zero build Thank you Vladimirs for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27926#issuecomment-3451777687 From iveresov at openjdk.org Mon Oct 27 15:13:04 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 27 Oct 2025 15:13:04 GMT Subject: Integrated: 8368321: Rethink compilation delay strategy for lukewarm methods In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 01:22:44 GMT, Igor Veresov wrote: > In the current implementation we delay profiling of lukewarm methods (those that were never compiled by C2 during training) by increasing the 2->3 threshold by a factor. That may shift profiling of those too much into the future if a large factor is used, if we use a small factor, however, profiling may happen within the training run window so to speak. The solution I came up with it to delay profiling until we reach the number of invocations of a method equal to the number we had in the training run. After that we use the normal policy. > > Here is an example. I trained our JavacBenchApp for 5 iterations (which is artificially low and therefore many methods would be classified as lukewarm). Then I ran it for 200 iterations with AOT replay. > > old-vs-new > > While initially the performance is similar it quickly diverges. With the new approach we move to standard handling of lukewarm methods after 5 iterations and they get compiled with C2. With the old approach we don't. This pull request has now been integrated. Changeset: 1e49376e Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/1e49376ece39e8f9b5c72b58688b1e195a0014be Stats: 70 lines in 5 files changed: 34 ins; 15 del; 21 mod 8368321: Rethink compilation delay strategy for lukewarm methods Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/27926 From kxu at openjdk.org Mon Oct 27 15:14:54 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 27 Oct 2025 15:14:54 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v18] In-Reply-To: References: Message-ID: <00l5alXS4j8RVZevho6qoFOWWHFbdcTNMeMYer1Z3E8=.a4449561-7afb-48f1-80c4-98a60477ae38@github.com> > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/0a3fff1b..ead1ab34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=16-17 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From aseoane at openjdk.org Mon Oct 27 15:36:22 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 27 Oct 2025 15:36:22 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v9] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 14:29:43 GMT, Roberto Casta?eda Lozano wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjust comment > > Marked as reviewed by rcastanedalo (Reviewer). Reverted the more precise IR Framework check that @robcasloz suggested as the method where the actual call to `write_checkpoint` happened was not always being inlined, resulting in a failure. I am back to the original check for a non-void return type. Quick check shows no issues (ran tiers 1-3) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27913#issuecomment-3451927579 From rcastanedalo at openjdk.org Mon Oct 27 15:54:35 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 15:54:35 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v3] In-Reply-To: References: Message-ID: > This changeset makes it easier to trace C2's individual bytecode parsing steps to the input class files and the output of type flow analysis, by: > > 1. dumping the `map` node (holding the JVM state), basic block (reverse post-order index), and method name as properties of the graph that C2 dumps after each parsed bytecode at IGV dump level 6; > > 2. appending this information to the graph name (hence generalizing [JDK-8356779](https://bugs.openjdk.org/browse/JDK-8356779)); and > > 3. making the graph name suffix configurable via `Options -> Graph Name Suffix`. By default, it appends `(map: [map], block #[block] at [method])` to the names of graphs containing these properties (bytecode parsing dumps), and nothing otherwise. > > Here are the `Outline` and `Properties` windows for > > $ java -Xbatch -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphLevel=6 > > before (left) and after (right) the changeset: > > before-after > > Please let me know if there are other bytecode parsing graph properties that could be useful to dump, and whether you think the default graph name suffix contains the right amount of information. > > #### Testing > - tier1. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure on HotSpot or IGV. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Remove leftover from NetBeans code generation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27975/files - new: https://git.openjdk.org/jdk/pull/27975/files/4123bb0c..8da58972 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27975&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27975&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27975/head:pull/27975 PR: https://git.openjdk.org/jdk/pull/27975 From rcastanedalo at openjdk.org Mon Oct 27 15:54:36 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 27 Oct 2025 15:54:36 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v3] In-Reply-To: References: <4UX5Vaj1aGBtVya6GBPLaHAUMUF5Lg64XZeBlci_Q0o=.55273dc4-b6af-4896-bdd9-404549d1bdd1@github.com> Message-ID: On Mon, 27 Oct 2025 14:16:39 GMT, Christian Hagedorn wrote: >> Good catch. It is not needed, just the result of some spurious clicking on the NetBeans form editor that resulted in that code being auto-generated. I removed the action declaration and other GUI properties of the new text field to align the generated code with that of the other text fields in the Options form (commit 4123bb0c9ccb8b60993f4b575c36f9751cefd1b9). > > Can this method then be removed? Right, sorry, removed now (commit 8da5897204f1e51b4347b3a1110e33506d660dd1). I was expecting that NetBeans would detect it as dead and remove it automagically when re-generating `ViewPanel.java` from `ViewPanel.form`, but forgot to check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27975#discussion_r2466200094 From vlivanov at openjdk.org Mon Oct 27 16:17:23 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 27 Oct 2025 16:17:23 GMT Subject: RFR: 8370251: C2: Inlining checks for method handle intrinsics are too strict In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 21:00:40 GMT, Vladimir Ivanov wrote: > C2 performs access checks during inlining attempts through method handle > intrinsic calls. But there are no such checks happening at runtime when > executing the calls. (Access checks are performed when corresponding method > handle is resolved.) So, inlining may fail due to access checks failure while > the call always succeeds at runtime. > > The fix is to skip access checks when inlining through method handle intrinsics. > > Testing: hs-tier1 - hs-tier4 Thanks for the reviews, Vladimir and Roland. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27908#issuecomment-3452135480 From vlivanov at openjdk.org Mon Oct 27 16:17:24 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 27 Oct 2025 16:17:24 GMT Subject: Integrated: 8370251: C2: Inlining checks for method handle intrinsics are too strict In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 21:00:40 GMT, Vladimir Ivanov wrote: > C2 performs access checks during inlining attempts through method handle > intrinsic calls. But there are no such checks happening at runtime when > executing the calls. (Access checks are performed when corresponding method > handle is resolved.) So, inlining may fail due to access checks failure while > the call always succeeds at runtime. > > The fix is to skip access checks when inlining through method handle intrinsics. > > Testing: hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: 583ff202 Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/583ff202b1cc1f018d798a34d93359301840cf06 Stats: 137 lines in 2 files changed: 94 ins; 11 del; 32 mod 8370251: C2: Inlining checks for method handle intrinsics are too strict Reviewed-by: kvn, roland ------------- PR: https://git.openjdk.org/jdk/pull/27908 From psandoz at openjdk.org Mon Oct 27 16:54:42 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 27 Oct 2025 16:54:42 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> <1qB16WqDleABsguKwI8xSgWBf1NFQ7uOZByQHIIXdOU=.e30842bf-67f4-4fb3-b877-b91b288912bc@github.com> <3Vc1sIRj7GOSPv3E1tz6xOOmjTuN40yWsfbTvm5LdS0=.005267aa-3f4e-4f78-b55d-4900d8d7065e@github.com> Message-ID: On Mon, 27 Oct 2025 02:13:22 GMT, Xiaohong Gong wrote: >> Maybe @PaulSandoz has a good idea for a better naming of `VectorLoadMask` and `VectorStoreMask`? >> >> @XiaohongGong Is there any good place where we already document the different kinds of masks, and how they can be converted, and how they are used? If not: it would be really great if we could add that to `vectornode.hpp`. I also see that `TypeVectMask` has no class comment. We really should improve things there. It would make reviewing Vector API code so much easier. > > Hi @eme64 , I'm afraid that there is not a place that we document these things now. And I agree that clearly comments might be necessary. I'v created a separate JBS to record https://bugs.openjdk.org/browse/JDK-8370666. Thanks for your suggestion! > Maybe @PaulSandoz has a good idea for a better naming of `VectorLoadMask` and `VectorStoreMask`? > IIUC these nodes represent conversions or casts: - `VectorLoadMask` - converts a vector register of 8-bit lanes representing a mask to a platform-specific mask register - `VectorStoreMask` - converts a platform-specific mask register to a vector register of 8-bit lanes representing the mask In theory we could model such conversations using `VectorOperators` as we do other conversions, which might hold some clues as to their names. There is already `VectorMaskCastNode`, but i believe that operates on the platform-specific mask register, casting between different vector species of the same length. So perhaps we could rename to the following: - `VectorLoadMask` -> `VectorCastB2MaskNode` - `VectorStoreMask` -> `VectorCastMask2BNode` Having a naming convention for the various mask representations might further help and influence those names: - `BVectMask`, vector register of 8-bit lanes representing the mask - `NVectMask`, vector register of N-bit lanes representing the mask; and - `PVectMask`, representing the platform-specific predicate/mask register, which might be the same as `NVectMask` on certain hardware. Does that help? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2466385252 From rehn at openjdk.org Mon Oct 27 17:06:54 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 27 Oct 2025 17:06:54 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls Message-ID: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Hi, please consider. Sanity tested, running t1. Thanks, Robbin ------------- Commit messages: - Draft Changes: https://git.openjdk.org/jdk/pull/28005/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370708 Stats: 19 lines in 2 files changed: 13 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28005/head:pull/28005 PR: https://git.openjdk.org/jdk/pull/28005 From hgreule at openjdk.org Mon Oct 27 18:07:04 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 27 Oct 2025 18:07:04 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 18:04:57 GMT, Vladimir Ivanov wrote: > If we want to to keep expanded shape while being able to compute its type as if it were the original node, then a new flavor of Cast node may help. The one which keeps the node type and its inputs and can run Value() as if it were the original node. This is what we'd like to achieve, yes. This PR is basically just a simple workaround. So I guess it comes down to: Do we want to have a simple workaround for common cases? And if so, 1. Do we want to use this delay mechanism, or 2. Do we want to use Cast nodes I assume that the proper solution in form of a Cast-like node requires some more effort, and I'm not sure if anyone has the resources to work on that in the near future. > What I don't know: how does that interact with other IGVN optimizations, especially those that want to pattern match specific nodes? Inserting such special cast nodes could interrupt `Ideal` optimizations, current pattern matching would not know how to deal with it. Probably it is not a big issue, but I'm not sure. This isn't much different from methods like `uncast` I think. New methods like `get_in_of_type(index, opcode)` could help in such cases (check for the different ins of the cast), and maybe be even useful for other code in general. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3452639986 From vlivanov at openjdk.org Mon Oct 27 18:10:04 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 27 Oct 2025 18:10:04 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 18:04:50 GMT, Hannes Greule wrote: > how does that interact with other IGVN optimizations, especially those that want to pattern match specific nodes? Indeed, IR pattern matching can be affected as we already see with `ConstraintCastPP` nodes, so affected use sites have to be migrated to `Node::uncast()`/`Node::eqv_uncast()` helper methods. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3452649490 From duke at openjdk.org Mon Oct 27 18:11:09 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 27 Oct 2025 18:11:09 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v2] In-Reply-To: References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> Message-ID: On Thu, 23 Oct 2025 18:45:08 GMT, Chad Rakoczy wrote: >>> @chadrako what is status of this work. If you are struggling to reproduce [8369150](https://bugs.openjdk.org/browse/JDK-8369150) you can fix it separately. >> >> I haven't been able to reproduce that failure. I'll reopen [8369150](https://bugs.openjdk.org/browse/JDK-8369150) so it can be completed separately > >> @chadrako, is PR ready for testing now? > > Yes > @chadrako I think my suggestion was not correct. We should revert back to your first changes for `@requires`. Original code was correct and only `serviceability/jvmti/NMethodRelocation/NMethodRelocationTest.java` missed it. Since the tests get run with different GCs anyways I don't think we need to explicitly require the GC that they run with and just have one test config ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3452653450 From kvn at openjdk.org Mon Oct 27 18:25:21 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 27 Oct 2025 18:25:21 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v2] In-Reply-To: References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> Message-ID: On Mon, 27 Oct 2025 18:08:40 GMT, Chad Rakoczy wrote: > Since the tests get run with different GCs anyways I don't think we need to explicitly require the GC that they run with and just have one test config Agree. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3452705075 From vlivanov at openjdk.org Mon Oct 27 18:44:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 27 Oct 2025 18:44:11 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: <7u2xtXRTNr7N0wlHDhY9oOQvMobaPHHxxNz4mfYsork=.f8da7cf8-9fa8-4dea-815f-9f9301d4d451@github.com> On Mon, 27 Oct 2025 18:04:50 GMT, Hannes Greule wrote: > This PR is basically just a simple workaround. I'm not against proposed solution, just want to be sure we know its limitations and have a proper tool to avoid such bugs in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3452768800 From vlivanov at openjdk.org Mon Oct 27 18:44:15 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 27 Oct 2025 18:44:15 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v2] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 09:10:36 GMT, Hannes Greule wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > expand comments src/hotspot/share/opto/divnode.cpp line 556: > 554: // Less precise comparisons still work after transform_int_divide, e.g., > 555: // comparing with >= 21_476 does not conflict with the off-by-one overapproximation. > 556: if (phase->is_IterGVN() == nullptr) { `can_reshape == true` is equivalent and IMO a bit clearer than a subtype check. src/hotspot/share/opto/divnode.cpp line 1129: > 1127: // After idealizing, we have a subtraction from x, which means without > 1128: // recognizing that as a modulo operation, we end up with a range of TypeInt::INT. > 1129: if (phase->is_IterGVN() == nullptr) { Should it go after `!ti->is_con()` check? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27886#discussion_r2466699555 PR Review Comment: https://git.openjdk.org/jdk/pull/27886#discussion_r2466702392 From duke at openjdk.org Mon Oct 27 19:01:31 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 27 Oct 2025 19:01:31 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v3] In-Reply-To: References: Message-ID: > [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) > > Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) > > `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` > > `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions > > `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Remove explicit test config for different GCs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27659/files - new: https://git.openjdk.org/jdk/pull/27659/files/769800cb..c412bbed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27659&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27659&range=01-02 Stats: 257 lines in 3 files changed: 0 ins; 249 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/27659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27659/head:pull/27659 PR: https://git.openjdk.org/jdk/pull/27659 From duke at openjdk.org Mon Oct 27 20:00:36 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 27 Oct 2025 20:00:36 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache Message-ID: [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. ------------- Commit messages: - Update reference_count check to equals 1 Changes: https://git.openjdk.org/jdk/pull/28008/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28008&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370527 Stats: 8 lines in 1 file changed: 6 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28008.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28008/head:pull/28008 PR: https://git.openjdk.org/jdk/pull/28008 From dlong at openjdk.org Mon Oct 27 20:54:08 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 27 Oct 2025 20:54:08 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v9] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 13:03:54 GMT, Anton Seoane Ampudia wrote: >> This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. >> >> The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. >> >> However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. >> >> An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. >> >> **Testing:** passes tiers 1-5 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Adjust comment src/hotspot/share/opto/runtime.hpp line 55: > 53: // even if the return value is unused. This is crucial for correct handling > 54: // of runtime calls that return an oop and may trigger deoptimization > 55: // on return. See rematerialize_objects() in deoptimization.cpp. I was wondering why this is only a problem for deoptimization, and not regular safepoints that trigger a GC. But the comment in rematerialize_objects() explains that the return value is not part of the GC oopmap. test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 49: > 47: } > 48: > 49: // Crash was due to the returns_oop field not being set I was confused when I could not find "returns_oop". It turns out the names are CallNode::returns_pointer() and ScopeDesc::return_oop(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2467041649 PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2467037934 From dlong at openjdk.org Mon Oct 27 21:30:16 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 27 Oct 2025 21:30:16 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v7] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 15:03:38 GMT, Aleksey Shipilev wrote: >> Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: >> >> sleep > > @dean-long -- you are fine with this patch, right? @shipilev , yes, just let me run it through our testing... ------------- PR Comment: https://git.openjdk.org/jdk/pull/27742#issuecomment-3453401843 From duke at openjdk.org Mon Oct 27 21:54:02 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 27 Oct 2025 21:54:02 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 19:48:10 GMT, Chad Rakoczy wrote: > [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) > > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. @lmesnik My ad hoc testing shows that the immutable data is now being freed. Could you re-run the stress test to verify the memory leak is fixed on your end? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3453467946 From snatarajan at openjdk.org Mon Oct 27 22:03:44 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 27 Oct 2025 22:03:44 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v3] In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: > The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). > > ### Fix > Implemented the suggested refactoring. > > ### Testing > Github Actions, Tier 1-3 Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review comments#2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26902/files - new: https://git.openjdk.org/jdk/pull/26902/files/30ef8eed..31452c6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=01-02 Stats: 17 lines in 2 files changed: 3 ins; 2 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/26902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26902/head:pull/26902 PR: https://git.openjdk.org/jdk/pull/26902 From snatarajan at openjdk.org Mon Oct 27 22:10:03 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 27 Oct 2025 22:10:03 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v2] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: On Mon, 27 Oct 2025 13:00:26 GMT, Damon Fenacci wrote: >> Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixing test failure >> - addressing review comments > > src/hotspot/share/opto/idealGraphPrinter.cpp line 1131: > >> 1129: print_property(C->matcher()->is_dontcare(node), "is_dontcare"); >> 1130: print_property(!(C->matcher()->is_dontcare(node)),"is_dontcare", IdealGraphPrinter::FALSE_VALUE); >> 1131: print_property((C->matcher()->find_old_node(node) != nullptr), "old_node_idx", C->matcher()->find_old_node(node)->_idx); > > I think we might have an issue here: `C->matcher()->find_old_node(node)->_idx` is always evaluated no matter if `C->matcher()->find_old_node(node) != nullptr` or not. Yes, I have fixed this now. > src/hotspot/share/opto/idealGraphPrinter.hpp line 180: > >> 178: PrintProperties(IdealGraphPrinter *printer) : _printer(printer) {} >> 179: void print_node_properties(Node *node, Compile *C); >> 180: void print_lrg_properties(const LRG &lrg, const char *buffer); > > is passing by reference done to avoid copying? That is my only reason for doing this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2467191433 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2467193122 From snatarajan at openjdk.org Mon Oct 27 22:10:05 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 27 Oct 2025 22:10:05 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v2] In-Reply-To: <5qtZxVebyVn6WML3Q4508dXPwxkw-CWhD_pE6UaNfF8=.76830409-b57d-410f-a30b-c7d01b62df7f@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> <5qtZxVebyVn6WML3Q4508dXPwxkw-CWhD_pE6UaNfF8=.76830409-b57d-410f-a30b-c7d01b62df7f@github.com> Message-ID: <9e1r4VDSzP6VL3GMf8JQSDUcvwzjzy5XGKOFURXpGhk=.ce419221-79b3-44f6-b944-093b2d244f10@github.com> On Wed, 22 Oct 2025 09:15:41 GMT, Christian Hagedorn wrote: >> Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixing test failure >> - addressing review comments > > src/hotspot/share/opto/idealGraphPrinter.hpp line 172: > >> 170: }; >> 171: >> 172: class PrintProperties > > Do you really need it in the header file? You could also just move it the the source file directly where we use the class. My reasoning is keep the interface and implementation separate. I have kept it this way. Will that be okay ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2467196698 From dlong at openjdk.org Mon Oct 27 23:57:01 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 27 Oct 2025 23:57:01 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v4] In-Reply-To: References: Message-ID: On Sun, 26 Oct 2025 21:20:38 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > put back the OR in restORe Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27912#pullrequestreview-3386010210 From dlong at openjdk.org Tue Oct 28 01:25:04 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 28 Oct 2025 01:25:04 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v7] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:05:04 GMT, Francesco Andreuzzi wrote: >> I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. >> >> Passes tier1 and tier2 (fastdebug). > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > sleep Testing passed. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27742#pullrequestreview-3386137721 From lmesnik at openjdk.org Tue Oct 28 01:31:00 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 28 Oct 2025 01:31:00 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 21:51:15 GMT, Chad Rakoczy wrote: > @lmesnik My ad hoc testing shows that the immutable data is now being freed. Could you re-run the stress test to verify the memory leak is fixed on your end? I'll let you know about results in a couple of days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3454042421 From fyang at openjdk.org Tue Oct 28 02:56:03 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 28 Oct 2025 02:56:03 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls In-Reply-To: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Mon, 27 Oct 2025 16:47:35 GMT, Robbin Ehn wrote: > Hi, please consider. > > Sanity tested, running t1. > > Thanks, Robbin src/hotspot/cpu/riscv/riscv.ad line 1413: > 1411: __ build_frame(framesize); > 1412: > 1413: if (VerifyStackAtCalls) { I think this should be reflected in `MachPrologNode::format` at the same time. src/hotspot/cpu/riscv/riscv.ad line 1414: > 1412: > 1413: if (VerifyStackAtCalls) { > 1414: __ li(t2, MAJIK_DWORD); Can you use `mv` instead here for consistency with other places where we move an immediate? If `li` is prefered, we might want to do a separate change handling all the places. But I don't have a strong bias. src/hotspot/cpu/riscv/riscv.ad line 2440: > 2438: Label CHECK_PASSED; > 2439: __ ld(t1, Address(sp, framesize)); > 2440: __ li(t2, MAJIK_DWORD); Similar here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2467731875 PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2467734980 PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2467735404 From xgong at openjdk.org Tue Oct 28 05:52:38 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 28 Oct 2025 05:52:38 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v4] In-Reply-To: References: Message-ID: > The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. > > For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. > > These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. > > This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. > > It also modifies the Vector API jtreg tests for well testing. Here is the details: > > 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: > > VectorMaskToLong (VectorLongToMask l) => l > > Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. > > 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". > > Performance shows significant improvement on NVIDIA's Grace CPU. > > Here is the performance data with `-XX:UseSVE=2`: > > Benchmark bits inputs Mode Unit Before After Gain > MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 > MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 > MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 > MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 > MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/ms 101031... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Rename matcher helper function to "mask_op_prefers_predicate" and add more comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27481/files - new: https://git.openjdk.org/jdk/pull/27481/files/612c612f..3a40fc2a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=02-03 Stats: 74 lines in 10 files changed: 27 ins; 14 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/27481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27481/head:pull/27481 PR: https://git.openjdk.org/jdk/pull/27481 From xgong at openjdk.org Tue Oct 28 05:57:08 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 28 Oct 2025 05:57:08 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 06:27:26 GMT, Emanuel Peter wrote: >>> @XiaohongGong Actually, I just tried to submit via my standard script. It failed because of merging issues. Would you mind merging with master, so we are on the newest state? >> >> Thanks for looking at this PR @eme64 ! I'v rebased the PR to master and addressed your comments. Please let me know if any other issues. > > @XiaohongGong Thanks for merging, running testing now :) Hi @eme64 , I updated a commit to rename the helper matcher function and add some comments, assertion inside the function. Would you mind taking another look at the latest change? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3454721580 From shade at openjdk.org Tue Oct 28 06:36:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 28 Oct 2025 06:36:17 GMT Subject: RFR: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache [v7] In-Reply-To: References: Message-ID: <8fUKwFpPuZ_jlFuMApxWPtNHxP5pBF9_UeunSurJI0M=.7cd493c1-b72b-41da-ae91-468f0161c670@github.com> On Fri, 24 Oct 2025 14:05:04 GMT, Francesco Andreuzzi wrote: >> I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. >> >> Passes tier1 and tier2 (fastdebug). > > Francesco Andreuzzi has updated the pull request incrementally with one additional commit since the last revision: > > sleep Let's go then, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27742#issuecomment-3454803955 From fandreuzzi at openjdk.org Tue Oct 28 06:36:18 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Tue, 28 Oct 2025 06:36:18 GMT Subject: Integrated: 8369219: JNI::RegisterNatives causes a memory leak in CodeCache In-Reply-To: References: Message-ID: <7lvrKEO7GxCux1q5MjcH4oorPx3_eZO1WM5kxUtYnLQ=.3a252059-e5de-4e0f-b7e1-28951ad3d5ba@github.com> On Fri, 10 Oct 2025 11:50:50 GMT, Francesco Andreuzzi wrote: > I propose to amend `nmethod::is_cold` to let GC collect not-entrant native `nmethod` instances. > > Passes tier1 and tier2 (fastdebug). This pull request has now been integrated. Changeset: 05ee55ef Author: Francesco Andreuzzi Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/05ee55efcf138a28c895c395c49934390d10ee45 Stats: 146 lines in 3 files changed: 145 ins; 0 del; 1 mod 8369219: JNI::RegisterNatives causes a memory leak in CodeCache Reviewed-by: shade, apangin, dlong ------------- PR: https://git.openjdk.org/jdk/pull/27742 From epeter at openjdk.org Tue Oct 28 06:37:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 06:37:07 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v7] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 09:32:48 GMT, Qizheng Xing wrote: >> In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. >> >> loop-safepoint >> >> This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). >> >> >> Benchmark Mode Cnt Score Error Units >> LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline >> LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch >> >> >> Testing: tier1-2 on x86_64 and aarch64. > > Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: > > - Update microbench > - Add IR tests for nested loops @MaxXSoft Thanks for the work on this, and your patience with the review! @rwestrel suggested to file an RFE for additional tests, to cover all the mentioned cases A) - E). I think that would be a good idea, so please do that before integration, post the link here and link it to this RFE on JBS :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23057#pullrequestreview-3386876614 From epeter at openjdk.org Tue Oct 28 06:37:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 06:37:09 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 14:40:05 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 3840: >> >>> 3838: // inside any nested loop, then that loop is okay >>> 3839: // E) Otherwise, if an outer loop's ncsfpt on the idom-path is nested in >>> 3840: // an inner loop, we need to prevent the inner loop from deleting it >> >> Nice, that's indeed an improvement :) > > It would be nice to make sure all cases here have an IR test which is not the case AFAICT. Can you open a JBS issue for that? @rwestrel @MaxXSoft That is a good idea. Do either of you want to take care of those tests? Just filing an RFE will probably get the ideas lost in the ether ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2468089817 From epeter at openjdk.org Tue Oct 28 06:39:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 06:39:04 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v4] In-Reply-To: References: Message-ID: On Sun, 26 Oct 2025 21:20:38 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > put back the OR in restORe LGTM, though you should apply @chhagedorn 's suggestion before integration ;) And thanks for adding the documentation :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27912#pullrequestreview-3386881323 From epeter at openjdk.org Tue Oct 28 06:45:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 06:45:18 GMT Subject: RFR: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes [v10] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 10:31:42 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8370220-get-ctrl-documentation >> - Merge branch 'JDK-8370220-get-ctrl-documentation' of https://github.com/eme64/jdk into JDK-8370220-get-ctrl-documentation >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - fix shenandoah replace for phis >> - renaming for Tobias >> - Apply suggestions from code review >> >> Co-authored-by: Tobias Hartmann >> - more for Christian part 3 >> - more for Christian part 2 >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - for Christian part 1 >> - ... and 9 more: https://git.openjdk.org/jdk/compare/ffb48287...e4bcb769 > > Still good! @chhagedorn @TobiHartmann Thanks for the review / approval! @rwestrel Thanks for approving the shanandoah changes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27892#issuecomment-3454862103 From epeter at openjdk.org Tue Oct 28 06:45:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 06:45:19 GMT Subject: Integrated: 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 08:57:33 GMT, Emanuel Peter wrote: > When working on https://github.com/openjdk/jdk/pull/27889, I was irritated by the lack of documentation and suboptimal naming. > > Here, I'm doing the following: > - Add more documentation, and improve it in other cases. > - Rename "lazy" methods: "lazy" could indicate that we delay it somehow until later, but it is unclear what is delayed. > - `lazy_replace` -> `replace_ctrl_node_and_forward_ctrl_and_idom` > - `lazy_update` -> `install_lazy_ctrl_and_idom_forwarding` > - Made some methods private, and added some additional asserts. > > I'd be more than happy for even better names, and suggestions how to improve the documentation further :) > > Related issues: > https://github.com/openjdk/jdk/pull/27889 > https://github.com/openjdk/jdk/pull/15720 This pull request has now been integrated. Changeset: d5ce6669 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/d5ce66698d2f15c5f8316110a6118a10baa4013d Stats: 126 lines in 8 files changed: 63 ins; 4 del; 59 mod 8370220: C2: rename methods and improve documentation around get_ctrl and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/27892 From hgreule at openjdk.org Tue Oct 28 06:47:03 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 28 Oct 2025 06:47:03 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 18:40:09 GMT, Vladimir Ivanov wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> expand comments > > src/hotspot/share/opto/divnode.cpp line 556: > >> 554: // Less precise comparisons still work after transform_int_divide, e.g., >> 555: // comparing with >= 21_476 does not conflict with the off-by-one overapproximation. >> 556: if (phase->is_IterGVN() == nullptr) { > > `can_reshape == true` is equivalent and IMO a bit clearer than a subtype check. I'll change it. I didn't use it before due to e.g., https://bugs.openjdk.org/browse/JDK-8255443. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27886#discussion_r2468109400 From mchevalier at openjdk.org Tue Oct 28 07:22:59 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 28 Oct 2025 07:22:59 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v5] In-Reply-To: References: Message-ID: > Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. > > There is one detail, we used to have > > void restore_major_progress(int progress) { _major_progress += progress; } > > > It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. > > It has a weird semantics: > > Progress before | Progress after verification | Progress after restore | What would be the assignment semantics > ----------------|-----------------------------|-----------------------|- > 0 | 0 | 0 | 0 > 1 | 0 | 1 | 1 > 0 | 1 | 1 | 0 (mismatch!) > 1 | 1 | 2 | 1 (same truthiness) > > It is rather a or than a restore, and a proper boolean version of that would be > > void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } > > but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural > > void set_major_progress(bool progress) { _major_progress = progress; } > > that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: typoes in comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27912/files - new: https://git.openjdk.org/jdk/pull/27912/files/c0b0bdec..a774b904 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27912&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27912&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27912/head:pull/27912 PR: https://git.openjdk.org/jdk/pull/27912 From mchevalier at openjdk.org Tue Oct 28 07:23:01 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 28 Oct 2025 07:23:01 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v4] In-Reply-To: References: Message-ID: On Sun, 26 Oct 2025 21:20:38 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > put back the OR in restORe Typoes in comment fixed. I'll need at least one approval again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3454954121 From rehn at openjdk.org Tue Oct 28 07:24:40 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 07:24:40 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v2] In-Reply-To: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: > Hi, please consider. > > Sanity tested, running t1. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: li->mv, format, space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28005/files - new: https://git.openjdk.org/jdk/pull/28005/files/f7242f22..257b0499 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=00-01 Stats: 8 lines in 1 file changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28005/head:pull/28005 PR: https://git.openjdk.org/jdk/pull/28005 From rehn at openjdk.org Tue Oct 28 07:24:41 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 07:24:41 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls In-Reply-To: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Mon, 27 Oct 2025 16:47:35 GMT, Robbin Ehn wrote: > Hi, please consider. > > Sanity tested, running t1. > > Thanks, Robbin Fixed, thanks for having a look! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28005#issuecomment-3454965132 From rehn at openjdk.org Tue Oct 28 07:24:43 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 07:24:43 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v2] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Tue, 28 Oct 2025 02:50:18 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> li->mv, format, space > > src/hotspot/cpu/riscv/riscv.ad line 1413: > >> 1411: __ build_frame(framesize); >> 1412: >> 1413: if (VerifyStackAtCalls) { > > I think this should be reflected in `MachPrologNode::format` at the same time. Fixed > src/hotspot/cpu/riscv/riscv.ad line 1414: > >> 1412: >> 1413: if (VerifyStackAtCalls) { >> 1414: __ li(t2, MAJIK_DWORD); > > Can you use `mv` instead here for consistency with other places where we move an immediate? If `li` is prefered, we might want to do a separate change handling all the places. But I don't have a strong bias. Fixed > src/hotspot/cpu/riscv/riscv.ad line 2440: > >> 2438: Label CHECK_PASSED; >> 2439: __ ld(t1, Address(sp, framesize)); >> 2440: __ li(t2, MAJIK_DWORD); > > Similar here. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2468195870 PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2468196406 PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2468196934 From mchevalier at openjdk.org Tue Oct 28 07:30:08 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 28 Oct 2025 07:30:08 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v4] In-Reply-To: <3jSctyKb4Zi-tG17Yn9xKACgwnJBVU079t5m7VcvoGA=.ef922908-c05a-4046-ad30-365b228ee089@github.com> References: <3jSctyKb4Zi-tG17Yn9xKACgwnJBVU079t5m7VcvoGA=.ef922908-c05a-4046-ad30-365b228ee089@github.com> Message-ID: On Fri, 17 Oct 2025 14:54:35 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> driver -> main > > Could `PhaseIdealLoop::eliminate_useless_zero_trip_guard()` maybe also be an option? We are looping through all loops and check the `OpaqueZeroTripGuardNodes` anyways there. Since you made most of the test (at least), which is the biggest part of this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3454981348 From mchevalier at openjdk.org Tue Oct 28 07:30:13 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 28 Oct 2025 07:30:13 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v9] In-Reply-To: References: Message-ID: <9T_qIFDSxnt0RfSKknq6jkZnSlkEHslHL5NuquhMAOI=.6b7dc2e5-1341-4b9b-bbca-27d0eaca5d78@github.com> On Mon, 27 Oct 2025 07:45:39 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename test > > test/hotspot/jtreg/compiler/loopopts/TooStrictAssertForUnrollAfterPeeling.java line 54: > >> 52: * -XX:-SplitIfBlocks >> 53: * -XX:-UseOnStackReplacement >> 54: * -XX:LoopMaxUnroll=2 > > Are these flags all required to trigger the issue or what is the motivation behind having this run compared to the above only? That's the one in the reproducer you've crafted that give a simpler graph, if I remember correctly. I think it's valuable because the graph shape is different so it might trigger some asserts differently, exercise other paths, and if it breaks again, maybe someone who will have to look at it will be happy to find a run with a simpler graph. Maybe I can add in the summary that "if it helps investigate an issue, the @run 3 and 5 (with more flags) are expected to give a simpler graph". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2468209976 From fyang at openjdk.org Tue Oct 28 07:35:03 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 28 Oct 2025 07:35:03 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v2] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: <-TkszzeF-gAhWYSchHdLE1auScvuZOJnMDGtGXyUiXU=.765372e3-1d31-42d4-a8ac-78a6b4fa8bb2@github.com> On Tue, 28 Oct 2025 07:24:40 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Sanity tested, running t1. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > li->mv, format, space Thanks for the quick update. src/hotspot/cpu/riscv/riscv.ad line 2443: > 2441: // Check that stack depth is unchanged: find majik cookie on stack > 2442: int framesize = ra_->reg2offset_unchecked(OptoReg::add(ra_->_matcher._old_SP, -3 * VMRegImpl::slots_per_word)); > 2443: Label CHECK_PASSED; Nit: We rarely use big names for labels. Maybe `Label stack_ok`? ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28005#pullrequestreview-3387053687 PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2468230474 From rehn at openjdk.org Tue Oct 28 07:40:03 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 07:40:03 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v2] In-Reply-To: <-TkszzeF-gAhWYSchHdLE1auScvuZOJnMDGtGXyUiXU=.765372e3-1d31-42d4-a8ac-78a6b4fa8bb2@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> <-TkszzeF-gAhWYSchHdLE1auScvuZOJnMDGtGXyUiXU=.765372e3-1d31-42d4-a8ac-78a6b4fa8bb2@github.com> Message-ID: <9QzlaoiI4pElGifUETW55snFtqeHKW2Xno7f5gakwDM=.afffc4ec-976f-450a-85b4-2968fd2511b6@github.com> On Tue, 28 Oct 2025 07:31:42 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> li->mv, format, space > > src/hotspot/cpu/riscv/riscv.ad line 2443: > >> 2441: // Check that stack depth is unchanged: find majik cookie on stack >> 2442: int framesize = ra_->reg2offset_unchecked(OptoReg::add(ra_->_matcher._old_SP, -3 * VMRegImpl::slots_per_word)); >> 2443: Label CHECK_PASSED; > > Nit: We rarely use big names for labels. Maybe `Label stack_ok`? Yea, sure. Personally I prefer labels to stand out as control transfer is a bit tricky to keep track of in assembly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2468247372 From rehn at openjdk.org Tue Oct 28 07:48:50 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 07:48:50 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v3] In-Reply-To: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: > Hi, please consider. > > Sanity tested, running t1. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Label name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28005/files - new: https://git.openjdk.org/jdk/pull/28005/files/257b0499..1a2059f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28005/head:pull/28005 PR: https://git.openjdk.org/jdk/pull/28005 From rehn at openjdk.org Tue Oct 28 07:48:50 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 07:48:50 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v2] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: <8s4gVqDXSB4t4Xs1vl_Scz69NChhOL8nV_ymClv-FYs=.caacf9d8-dfc3-4763-af25-e0866ceaa2d7@github.com> On Tue, 28 Oct 2025 07:24:40 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Sanity tested, running t1. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > li->mv, format, space Thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28005#issuecomment-3455024715 From rehn at openjdk.org Tue Oct 28 07:48:53 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 07:48:53 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v2] In-Reply-To: <9QzlaoiI4pElGifUETW55snFtqeHKW2Xno7f5gakwDM=.afffc4ec-976f-450a-85b4-2968fd2511b6@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> <-TkszzeF-gAhWYSchHdLE1auScvuZOJnMDGtGXyUiXU=.765372e3-1d31-42d4-a8ac-78a6b4fa8bb2@github.com> <9QzlaoiI4pElGifUETW55snFtqeHKW2Xno7f5gakwDM=.afffc4ec-976f-450a-85b4-2968fd2511b6@github.com> Message-ID: <9IJ7OQ8XjsmX_6UA0eoQOfPOqv1vBmw1Pcu3pYtaKRc=.2d995e4e-d95d-4708-9922-6e5d780c586d@github.com> On Tue, 28 Oct 2025 07:37:22 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/riscv.ad line 2443: >> >>> 2441: // Check that stack depth is unchanged: find majik cookie on stack >>> 2442: int framesize = ra_->reg2offset_unchecked(OptoReg::add(ra_->_matcher._old_SP, -3 * VMRegImpl::slots_per_word)); >>> 2443: Label CHECK_PASSED; >> >> Nit: We rarely use big names for labels. Maybe `Label stack_ok`? > > Yea, sure. > > Personally I prefer labels to stand out as control transfer is a bit tricky to keep track of in assembly. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2468267302 From fyang at openjdk.org Tue Oct 28 08:14:02 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 28 Oct 2025 08:14:02 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v3] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Tue, 28 Oct 2025 07:48:50 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Sanity tested, running t1. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Label name src/hotspot/cpu/riscv/riscv.ad line 1375: > 1373: if (VerifyStackAtCalls) { > 1374: st->print("mv t2, %ld\n\t", MAJIK_DWORD); > 1375: st->print("sd t2, [sp, #%d]\n\t", - 3 * wordSize); Hmm ... I missed this one. Shouldn't the offset of the address be: `framesize - 3 * wordSize`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2468361310 From rcastanedalo at openjdk.org Tue Oct 28 08:15:03 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 28 Oct 2025 08:15:03 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v3] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: On Mon, 27 Oct 2025 22:03:44 GMT, Saranya Natarajan wrote: >> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). >> >> ### Fix >> Implemented the suggested refactoring. >> >> ### Testing >> Github Actions, Tier 1-3 > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments#2 Thanks for working on this, Saranya! We do not have good test coverage of IGV graph dumping, so only running regular tier testing might not catch possible regressions introduced by this changeset. Consider also comparing the XML graphs dumped before and after the changeset for a few well-known methods with deterministic compilation, e.g.: $ ${BASELINE_JAVA} -Xbatch -XX:PrintIdealGraphLevel=6 -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphFile=before.xml $ ${PATCHED_JAVA} -Xbatch -XX:PrintIdealGraphLevel=6 -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphFile=after.xml $ diff before.xml after.xml The only expected changes between the two files would be things like memory addresses, process and thread IDs, etc. But all the other properties should remain the same. ------------- PR Review: https://git.openjdk.org/jdk/pull/26902#pullrequestreview-3387223174 From epeter at openjdk.org Tue Oct 28 08:20:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 08:20:04 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Mon, 27 Oct 2025 06:17:10 GMT, Xiaohong Gong wrote: >> Hi @eme64 @erifan , thanks for all your comments on this function. After a deep thinking, I think `mask_op_prefers_predicate()` is optimal. The implementation will be reverted back to the first version. Following is my consideration: >> >> 1) Before my patch, what a mask is for these mask ops based on the architectures. It is distinguished just based whether the mask is a predicate type or a vector. >> - On architectures that support the predicate feature, the mask's type is `TypeVectMask` which denotes a predicate type. And the backend is implemented with predicate instructions and requires the predicate input/output. >> - On architectures that do not support the predicate feature, the original mask's type is an unpacked `TypeVect` varying from `TypeVectA` to `TypeVectZ` based on the vector length with different element data size. As these ops are special that the implementation do not have any relationship with the element width in each lane, packing the mask to 8-bit element width would be friendly to performance. Hence, in IR-level, the original vector mask will be packed with a `VectorStoreMask` before passed to these ops. >> >> 2) I don't want to break current solution/idea of mask handling for these ops. In my patch, what I want to change is **using a helper function** to check whether the specified op is implemented with predicate instruction or not, **instead of** just checking the original mask type. If true, the mask is a predicate without any conversions needed. If not, the mask needs to be packed with a `VectorStoreMask`. >> >> 3) This function can also be used for other ops in future. Let backend choose how a vector mask is represented (with a predicate register or a vector register). Currently, it is clear that the mask type is defined based on whether platform supports predicate or not. But it might be the case that the performance will be better if mask is implemented with vector than predicate on a predicate supported platform. For such ops, we can also use this function to guide how the mask is represented in IR-level. >> >> Changing to check whether the mask is a `packed` vector making things more confusing to me. Because it is just a temporary status of mask and special to these mask ops. We have to consider other ops that also use a vector mask. In general, the mask is represented with either an unpacked vector or a predicate. >> >> Thanks, >> Xiaohong > > The implementation on AArch64 would be like: > > bool Matcher::mask_op_prefers_predicate(int opcode, const TypeVect* vt) { > // Only SVE supports the predicate feature. > if (UseSVE == 0) { > // On architectures that do not support the predicate feature, vector > // mask is stored in a normal vector with the type of "TypeVect" varing > // from "TypeVectA" to "TypeVectZ" based on the vector length in bytes. > // It cannot be a "TypeVectMask". > assert(vt->isa_vectmask() == nullptr, "mask type not match"); > return false; > } > > assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE"); > switch (opcode) { > case Op_VectorMaskToLong: > case Op_VectorLongToMask: > // SVE does not have native predicate instructions for these two ops. > // Instead, they are implemented with vector instructions. Hence, to > // improve the performance, we prefer saving the mask in a vector as > // the input/output of these IRs. > return false; > default: > // By default, all the mask operations are implemented with predicate > // instructions with a predicate input/output. > return true; > } > } > > And the comments before the helper function in matcher.hpp: > > // Identify if a vector mask operation requires the input/output mask to be > // saved with a predicate type (i.e. TypeVectMask) or not. Return true if it > // requires a predicate type. And return false if it requires a vector type. > static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt); > > > Is that more clear? Thanks! Yes, these are a step int the right direction! :) Thanks a lot for the explanations, very helpful! Please make sure that they are all represented in the code comments, so we don't lose them to this GitHub thread! `// Identify if a vector mask operation requires the input/output mask to be` The language of `requires` slipped again into your explanation. Is that intended? Probably not? You should use a condensed version from your GitHub comments above, I think that would be very helpful :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468385896 From mbaesken at openjdk.org Tue Oct 28 08:34:01 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 28 Oct 2025 08:34:01 GMT Subject: RFR: 8370579: PPC: fix inswri immediate argument order In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 15:19:06 GMT, Manuel H?ssig wrote: > This cleanup PR swaps and renames the immediate arguments of the two `insrwi` instruction macros in `ppc.ad` such that they correspond to the order and names in the manual. This involved swapping the arguments in all six usages. I hope this saves the next person trying to reason about this some confused hours. > > Testing: > - [x] Github Actions > - [x] Running some relevant tests (`compiler/c2/TestCharShortByteSwap.java`, `jdk/java/lang/Short/ByteSwap.java`, `jdk/java/lang/Integer/BitTwiddle.java`, `compiler/codegen/Test6431242.java`) in qemu Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27978#pullrequestreview-3387323680 From aseoane at openjdk.org Tue Oct 28 08:51:06 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 28 Oct 2025 08:51:06 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v9] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 20:49:40 GMT, Dean Long wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjust comment > > test/hotspot/jtreg/compiler/intrinsics/TestReturnsOopSetForJFRWriteCheckpoint.java line 49: > >> 47: } >> 48: >> 49: // Crash was due to the returns_oop field not being set > > I was confused when I could not find "returns_oop". It turns out the names are CallNode::returns_pointer() and ScopeDesc::return_oop(). Oh, I got them mixed up. I will update things accordingly, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2468526424 From mhaessig at openjdk.org Tue Oct 28 09:02:44 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 28 Oct 2025 09:02:44 GMT Subject: RFR: 8370579: PPC: fix inswri immediate argument order In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 14:10:24 GMT, Martin Doerr wrote: >> This cleanup PR swaps and renames the immediate arguments of the two `insrwi` instruction macros in `ppc.ad` such that they correspond to the order and names in the manual. This involved swapping the arguments in all six usages. I hope this saves the next person trying to reason about this some confused hours. >> >> Testing: >> - [x] Github Actions >> - [x] Running some relevant tests (`compiler/c2/TestCharShortByteSwap.java`, `jdk/java/lang/Short/ByteSwap.java`, `jdk/java/lang/Integer/BitTwiddle.java`, `compiler/codegen/Test6431242.java`) in qemu > > Looks good and tier1 has passed. Thanks for cleaning this up! The old code was really hard to read. Maybe @MBaesken can provide a 2nd review. Thank you for testing and reviewing, @TheRealMDoerr & @MBaesken! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27978#issuecomment-3455283367 From mhaessig at openjdk.org Tue Oct 28 09:02:45 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 28 Oct 2025 09:02:45 GMT Subject: Integrated: 8370579: PPC: fix inswri immediate argument order In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 15:19:06 GMT, Manuel H?ssig wrote: > This cleanup PR swaps and renames the immediate arguments of the two `insrwi` instruction macros in `ppc.ad` such that they correspond to the order and names in the manual. This involved swapping the arguments in all six usages. I hope this saves the next person trying to reason about this some confused hours. > > Testing: > - [x] Github Actions > - [x] Running some relevant tests (`compiler/c2/TestCharShortByteSwap.java`, `jdk/java/lang/Short/ByteSwap.java`, `jdk/java/lang/Integer/BitTwiddle.java`, `compiler/codegen/Test6431242.java`) in qemu This pull request has now been integrated. Changeset: 96259936 Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/9625993611bb6acf84d428bea4a65d33b9d66e5f Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod 8370579: PPC: fix inswri immediate argument order Reviewed-by: mdoerr, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/27978 From mli at openjdk.org Tue Oct 28 09:13:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 Oct 2025 09:13:08 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v5] In-Reply-To: References: <4JU6tXdnIFqPXbDMV14FBOr7yvA4mQz6Cd4s1wOjmY0=.05b28d35-bc90-49b5-a321-d37ab3070799@github.com> <33pm95WaCjJp6jwm4LaSx9F_ZGPy_uZGlwukkN3o0XM=.b3c414d5-4598-40bf-b0d9-ea9088e4f881@github.com> Message-ID: On Fri, 24 Oct 2025 07:12:29 GMT, Emanuel Peter wrote: >> Added tests to cover UI{GE|LT|LE}forF and UL{GE|LT|LE}forD. >> >> Other tests for example UI{GE|LT|LE}forD UL{GE|LT|LE}forF could be added when I work on https://github.com/openjdk/jdk/pull/25336 or https://github.com/openjdk/jdk/pull/25341, as currently they are not vectorized. > > If you already have the tests in code, it may be good to just put all tests in now. Of course with adjusted IR rules. That would allow us to verify correctness on all combinations, and backport the tests as well. What do you think? Hi @eme64 Can you have a another look? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2468618311 From rehn at openjdk.org Tue Oct 28 09:19:09 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 09:19:09 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v3] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Tue, 28 Oct 2025 08:11:15 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Label name > > src/hotspot/cpu/riscv/riscv.ad line 1375: > >> 1373: if (VerifyStackAtCalls) { >> 1374: st->print("mv t2, %ld\n\t", MAJIK_DWORD); >> 1375: st->print("sd t2, [sp, #%d]\n\t", - 3 * wordSize); > > Hmm ... I missed this one. Shouldn't the offset of the address be: `framesize - 3 * wordSize`? The two above is printed like: st->print("sd fp, [sp, #%d]\n\t", - 2 * wordSize); st->print("sd ra, [sp, #%d]\n\t", - wordSize); I just followed that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2468643768 From aseoane at openjdk.org Tue Oct 28 09:19:53 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 28 Oct 2025 09:19:53 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v10] In-Reply-To: References: Message-ID: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Rename test and update comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/aa09c1bb..1f922569 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From shade at openjdk.org Tue Oct 28 09:25:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 28 Oct 2025 09:25:13 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 19:48:10 GMT, Chad Rakoczy wrote: > [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) > > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. I don't understand this patch. So for release builds, we get stuck at `reference_count=1` and do continuous `os::free(_immutable_data)`? How's that correct? ------------- PR Review: https://git.openjdk.org/jdk/pull/28008#pullrequestreview-3387604354 From epeter at openjdk.org Tue Oct 28 09:38:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 09:38:25 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v7] In-Reply-To: References: Message-ID: <0Az1aLCOKf3dXJpKJ8yIjeK4OF0gRjb4i0H7dRS4fl4=.19557e74-9d9a-4f0a-809c-d4ca1c105057@github.com> On Fri, 24 Oct 2025 09:02:19 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? @eme64 >> >> ## Issue >> >> Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. >> For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. >> >> ## ?Fix >> Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. >> >> Thanks >> >> This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > more tests src/hotspot/share/opto/superword.cpp line 1703: > 1701: switch (cmp0->Opcode()) { > 1702: case Op_CmpF: > 1703: case Op_CmpD: { Suggestion: case Op_CmpD: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2468734826 From roland at openjdk.org Tue Oct 28 09:38:45 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 09:38:45 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: > We already transform: > > (LShiftX (AddX a con0), con1) into (AddX (LShiftX a con1) con0< > THis is a variant with SubX. I found that this helps RCE. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Beno?t Maillard - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Beno?t Maillard ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27842/files - new: https://git.openjdk.org/jdk/pull/27842/files/8b581911..e2533202 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27842&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27842&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27842.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27842/head:pull/27842 PR: https://git.openjdk.org/jdk/pull/27842 From aseoane at openjdk.org Tue Oct 28 09:39:05 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 28 Oct 2025 09:39:05 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v11] In-Reply-To: References: Message-ID: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Rename class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/1f922569..80f572a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From epeter at openjdk.org Tue Oct 28 09:41:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 09:41:35 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v7] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 09:02:19 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? @eme64 >> >> ## Issue >> >> Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. >> For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. >> >> ## ?Fix >> Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. >> >> Thanks >> >> This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > more tests The code looks really good now (modulo one tiny suggestion). I'll run some internal testing now... src/hotspot/share/opto/superword.cpp line 1748: > 1746: } > 1747: break; > 1748: } Suggestion: For consistency: let's remove the braces for these cases ;) You also don't have any below. test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 1526: > 1524: @Warmup(0) > 1525: @Run(test = {// Signed > 1526: "testCMoveIGTforI", Thank you very much for adding all the unsigned tests! We should eventually also add more signed tests. Are you planning on doing that anyway in the future? Either way: we should have an RFE, and link it to this bug here. If you don't want to work on it, then please assign it to me ;) ------------- PR Review: https://git.openjdk.org/jdk/pull/27942#pullrequestreview-3387682395 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2468737088 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2468750039 From epeter at openjdk.org Tue Oct 28 09:45:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 09:45:30 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <_XquQ5TR_3ZGRTZ1c3iVCRGkNBDYyXvXuhimlGQFKq4=.bae46e77-9028-4d92-8ae4-29a5eef4b27a@github.com> Message-ID: On Mon, 27 Oct 2025 02:38:44 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 2548: >> >>> 2546: } >>> 2547: // VectorMaskToLongNode requires the input is either a mask or a vector with BOOLEAN type. >>> 2548: if (Matcher::mask_op_uses_packed_vector(Op_VectorMaskToLong, opd->bottom_type()->is_vect())) { >> >> So without this patch, it'd generate `VectorMaskToLong -> URShiftLNode -> AndLNode` (as the earlier `if` condition would have been false) and in the backend, the implementation for `VectorMaskToLong` contains code to convert the mask in a predicate to a packed vector (followed by the actual `VectorMaskToLong` related code). With this patch, it now generates `VectorStoreMaskNode -> VectorMaskToLong -> URShiftLNode ... `(the backend implementation is now separated at the IR level). >> Does the major performance uplift come from this Ideal optimization - `VectorMaskToLongNode::Ideal_MaskAll()` where the `VectorStoreMaskNode` gets optimized away? > > Yes, the IR changes you pointed above is right. > > The major performance uplift comes from the existing optimization of `VectorStoreMask (VectorLoadMask v) => v`. As you know, `VectorLoadMask` will be generated by some APIs like `VectorMask.fromArray()`. With this change, `VectorMask.fromLong()` also generates this IR. The mask conversions (V->P and P->V) between these APIs can be saved. > > Another performance uplift comes from the flexible vector register allocation. Before, the vector register is specified as the same for different instructions. But now, it depends on RA. In this case, it potentially breaks the un-expected data-dependence across loop iterations. @XiaohongGong If this is only about `VectorStoreMask (VectorLoadMask v) => v`, why not solve the issue with an `Ideal` optimization? Would that be an alternative? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468764825 From roland at openjdk.org Tue Oct 28 09:46:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 09:46:06 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: > We already transform: > > (LShiftX (AddX a con0), con1) into (AddX (LShiftX a con1) con0< > THis is a variant with SubX. I found that this helps RCE. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8369435 - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Beno?t Maillard - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Beno?t Maillard - review - test & fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27842/files - new: https://git.openjdk.org/jdk/pull/27842/files/e2533202..a5cce41e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27842&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27842&range=02-03 Stats: 26386 lines in 704 files changed: 15428 ins; 6512 del; 4446 mod Patch: https://git.openjdk.org/jdk/pull/27842.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27842/head:pull/27842 PR: https://git.openjdk.org/jdk/pull/27842 From rcastanedalo at openjdk.org Tue Oct 28 09:50:50 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 28 Oct 2025 09:50:50 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v11] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 09:39:05 GMT, Anton Seoane Ampudia wrote: >> This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. >> >> The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. >> >> However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. >> >> An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. >> >> **Testing:** passes tiers 1-5 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Rename class test/hotspot/jtreg/compiler/intrinsics/TestReturnOopSetForJFRWriteCheckpoint.java line 38: > 36: * @requires vm.hasJFR > 37: * @library /test/lib / > 38: * @run driver compiler.intrinsics.TestReturnsOopSetForJFRWriteCheckpoint You will have to update this line as well after the class name change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2468766731 From aseoane at openjdk.org Tue Oct 28 09:50:47 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 28 Oct 2025 09:50:47 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v12] In-Reply-To: References: Message-ID: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Final renaming touches ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27913/files - new: https://git.openjdk.org/jdk/pull/27913/files/80f572a2..72202ad6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27913&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27913/head:pull/27913 PR: https://git.openjdk.org/jdk/pull/27913 From aseoane at openjdk.org Tue Oct 28 09:50:51 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 28 Oct 2025 09:50:51 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v11] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 09:43:30 GMT, Roberto Casta?eda Lozano wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename class > > test/hotspot/jtreg/compiler/intrinsics/TestReturnOopSetForJFRWriteCheckpoint.java line 38: > >> 36: * @requires vm.hasJFR >> 37: * @library /test/lib / >> 38: * @run driver compiler.intrinsics.TestReturnsOopSetForJFRWriteCheckpoint > > You will have to update this line as well after the class name change. Oh, right. My bad. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27913#discussion_r2468774982 From roland at openjdk.org Tue Oct 28 09:52:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 09:52:54 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: > We already transform: > > (LShiftX (AddX a con0), con1) into (AddX (LShiftX a con1) con0< > THis is a variant with SubX. I found that this helps RCE. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27842/files - new: https://git.openjdk.org/jdk/pull/27842/files/a5cce41e..9cd1d8cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27842&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27842&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27842.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27842/head:pull/27842 PR: https://git.openjdk.org/jdk/pull/27842 From roland at openjdk.org Tue Oct 28 09:52:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 09:52:54 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: On Fri, 17 Oct 2025 15:36:10 GMT, Beno?t Maillard wrote: > Update: testing looks good @benoitmaillard Thanks for running testing. I also updated the change with your new/fixed comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27842#issuecomment-3455525569 From mli at openjdk.org Tue Oct 28 09:56:54 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 Oct 2025 09:56:54 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v8] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? @eme64 > > ## Issue > > Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. > For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. > > ## ?Fix > Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. > > Thanks > > This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/superword.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/superword.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27942/files - new: https://git.openjdk.org/jdk/pull/27942/files/ecb38321..b356d39e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27942&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27942/head:pull/27942 PR: https://git.openjdk.org/jdk/pull/27942 From mli at openjdk.org Tue Oct 28 09:56:55 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 Oct 2025 09:56:55 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v7] In-Reply-To: References: Message-ID: <75UFHGFK4-k0z_kmiRPKquF74dR57S7yH1RfwUarRkw=.d22b6a35-e4e7-4f01-bc7a-49dde9469e03@github.com> On Tue, 28 Oct 2025 09:39:08 GMT, Emanuel Peter wrote: > The code looks really good now (modulo one tiny suggestion). Fixed with github help, thanks for the *suggestion*! :) > I'll run some internal testing now... Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3455541312 From epeter at openjdk.org Tue Oct 28 09:57:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 09:57:17 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v4] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 05:52:38 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Rename matcher helper function to "mask_op_prefers_predicate" and add > more comments @XiaohongGong Thanks for the updates. I left a few more comments. And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help. src/hotspot/cpu/aarch64/aarch64_vector.ad line 401: > 399: } > 400: > 401: assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE"); Suggestion: assert(vt->isa_vectmask() != nullptr, "The mask type must be a TypeVectMask on SVE"); Hotspot style guide does not like implicit null/zero checks ;) src/hotspot/share/opto/matcher.hpp line 339: > 337: // saved with a predicate type (i.e. TypeVectMask) or not. Return true if it > 338: // requires a predicate type. And return false if it requires a vector type. > 339: static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt); You need to decide if it is a `prefers` or a `requires` concept. You had some really good explanations here, and I think it would be great if you used some of that here. https://github.com/openjdk/jdk/pull/27481#discussion_r2464360599 ------------- PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3387731863 PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468775304 PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468794050 From epeter at openjdk.org Tue Oct 28 09:57:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 09:57:19 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2] In-Reply-To: References: <1nyscX3Q2Lz20MypQNxqXWPJ9QVtIqpjxrdzkOcw-1k=.af499d8e-2770-43a1-975f-5a2863c5c900@github.com> <6oM9gPUpvrTaebiuwuXA-y0BXWMliOLjjCCpkbQqw5M=.a3d6eb95-ce6a-4c2c-8107-dc9a172cbca7@github.com> <4l5TryoeE4xWUvCtyOU4hSUArWtuXomHRuMiQfAflDM=.9faf7b8b-5135-4696-8921-d61e624d684f@github.com> <0Eavy5pO0jAAKOzZYAoRyykKNSGUDUinBBk2CUMcK1c=.a6c59caf-d3a8-45ba-a368-9849b735e854@github.com> <20otCwtIy0nolE3sBw3b-YoXUa3SL_xsBKB9Cw_kD2o=.62cc0b37-9b0c-4ecb-9509-5c48ac4f3a75@github.com> <4Q3oTRS_-2DbBIJzsz3u67tZfA8joIiMJ4x0z4rZqlo=.4e238026-4cf9-4564-b5d3-1539062736f6@github.com> <1qB16WqDleABsguKwI8xSgWBf1NFQ7uOZByQHIIXdOU=.e30842bf-67f4-4fb3-b877-b91b288912bc@github.com> <3Vc1sIRj7GOSPv3E1tz6xOOmjTuN40yWsfbTvm5LdS0=.005267aa-3f4e-4f78-b55d-4900d8d7065e@github.com> Message-ID: On Mon, 27 Oct 2025 16:52:25 GMT, Paul Sandoz wrote: >> Hi @eme64 , I'm afraid that there is not a place that we document these things now. And I agree that clearly comments might be necessary. I'v created a separate JBS to record https://bugs.openjdk.org/browse/JDK-8370666. Thanks for your suggestion! > >> Maybe @PaulSandoz has a good idea for a better naming of `VectorLoadMask` and `VectorStoreMask`? >> > > IIUC these nodes represent conversions or casts: > - `VectorLoadMask` - converts a vector register of 8-bit lanes representing a mask to a platform-specific mask register > - `VectorStoreMask` - converts a platform-specific mask register to a vector register of 8-bit lanes representing the mask > > In theory we could model such conversations using `VectorOperators` as we do other conversions, which might hold some clues as to their names. There is already `VectorMaskCastNode`, but i believe that operates on the platform-specific mask register, casting between different vector species of the same length. > > So perhaps we could rename to the following: > > - `VectorLoadMask` -> `VectorCastB2MaskNode` > - `VectorStoreMask` -> `VectorCastMask2BNode` > > Having a naming convention for the various mask representations might further help and influence those names: > - `BVectMask`, vector register of 8-bit lanes representing the mask > - `NVectMask`, vector register of N-bit lanes representing the mask; and > - `PVectMask`, representing the platform-specific predicate/mask register, which might be the same as `NVectMask` on certain hardware. > > Does that help? @PaulSandoz That sounds like a great idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468786919 From mli at openjdk.org Tue Oct 28 10:03:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 Oct 2025 10:03:42 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v7] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 09:38:36 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> more tests > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 1526: > >> 1524: @Warmup(0) >> 1525: @Run(test = {// Signed >> 1526: "testCMoveIGTforI", > > Thank you very much for adding all the unsigned tests! > > We should eventually also add more signed tests. Are you planning on doing that anyway in the future? Either way: we should have an RFE, and link it to this bug here. If you don't want to work on it, then please assign it to me ;) Yes, I agree we should add more tests for signed ones. I'll do it later, it's tracked here: https://bugs.openjdk.org/browse/JDK-8370794. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2468821316 From roland at openjdk.org Tue Oct 28 10:05:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 10:05:46 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v4] In-Reply-To: References: Message-ID: > In the `test1()` method of the test case: > > `inlined2()` calls `clone()` for an object loaded from field `field` > that has inexact type `A` at parse time. The intrinsic for `clone()` > inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the > load of `field` is optimized out because it reads back a newly > allocated `B` written to `field` in the same method. `ArrayCopy` can > now be optimized because the type of its `src` input is known. The > type of its `dest` input is the `CheckCastPP` from the allocation of > the cloned object created at parse time. That one has type `A`. A > series of `Load`s/`Store`s are created to copy the fields of class `B` > from `src` (of type `B`) to `dest` of (type `A`). > > Writting to `dest` with offsets for fields that don't exist in `A`, > causes this code in `Compile::flatten_alias_type()`: > > > } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { > // Static fields are in the space above the normal instance > // fields in the java.lang.Class instance. > if (ik != ciEnv::current()->Class_klass()) { > to = nullptr; > tj = TypeOopPtr::BOTTOM; > offset = tj->offset(); > } > > > to assign it some slice that doesn't match the one that's used at the > same offset in `B`. > > That causes an assert in `ArrayCopyNode::try_clone_instance()` to > fire. With a release build, execution proceeds. `test1()` also has a > non escaping allocation. That one causes EA to run and > `ConnectionGraph::split_unique_types()` to move the store to the non > escaping allocation to a new slice. In the process, when it iterates > over `MergeMem` nodes, it notices the stores added by > `ArrayCopyNode::try_clone_instance()`, finds that some are not on the > right slice, tries to move them to the correct slice (expecting they > are from a non escaping EA). That causes some of the `Store`s to be > disconnected. When the resulting code runs, execution fails as some > fields are not copied. > > The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` > when `src` and `dest` classes don't match as this seems like a rare > enough corner case. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - package fix - Merge branch 'master' into JDK-8339526 - review - Merge branch 'master' into JDK-8339526 - review - Merge branch 'master' into JDK-8339526 - Update src/hotspot/share/opto/arraycopynode.cpp Co-authored-by: Christian Hagedorn - test & fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27604/files - new: https://git.openjdk.org/jdk/pull/27604/files/6dedf517..d69605e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27604&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27604&range=02-03 Stats: 37560 lines in 944 files changed: 20751 ins; 11773 del; 5036 mod Patch: https://git.openjdk.org/jdk/pull/27604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27604/head:pull/27604 PR: https://git.openjdk.org/jdk/pull/27604 From roland at openjdk.org Tue Oct 28 10:05:49 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 10:05:49 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v3] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 11:41:41 GMT, Roberto Casta?eda Lozano wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8339526 >> - review >> - Merge branch 'master' into JDK-8339526 >> - Update src/hotspot/share/opto/arraycopynode.cpp >> >> Co-authored-by: Christian Hagedorn >> - test & fix > >> I will just run (...) a set of benchmarks to increase the confidence that this is indeed a very corner case. > > I ran DaCapo 23 and did not hit the problematic case once. The regular case (exactly same type) is exercised by more than half of the DaCapo 23 benchmarks. > > Will come back with test results in a day or two. @robcasloz thanks for running tests. > test/hotspot/jtreg/compiler/arraycopy/TestCloneUnknownClassAtParseTime.java line 1: > >> 1: /* > > Please add a package declaration (`package compiler.arraycopy;`). It would also be valuable if you could incorporate the detailed failure analysis from the PR description into this test file. Done in new commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27604#issuecomment-3455569071 PR Review Comment: https://git.openjdk.org/jdk/pull/27604#discussion_r2468815342 From epeter at openjdk.org Tue Oct 28 10:07:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 10:07:26 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v16] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:57:33 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Fix include order @MaxXSoft Ok, now it looks really good. I have one minor nit. Running some internal testing now. test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 84: > 82: LIMITS_64_5 = INTS_64.next(); > 83: LIMITS_64_6 = INTS_64.next(); > 84: LIMITS_64_7 = INTS_64.next(); Why not assign them directly? You just need to declare the generators first. Would save us a couple of lines. ------------- PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3377615582 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2461105244 From roland at openjdk.org Tue Oct 28 10:13:32 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 10:13:32 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v4] In-Reply-To: References: Message-ID: > In: > > > for (int i = 100; i < 1100; i++) { > v += floatArray[i - 100]; > Objects.checkIndex(i, longRange); > } > > > The int counted loop has both an int range check and a long range. The > int range check is optimized first. Assertion predicates are inserted > above the loop. One predicates checks that: > > > init - 100 > > The loop is then transformed to enable the optimization of the long > range check. The loop is short running, so there's no need to create a > loop nest. The counted loop is mostly left as is but, the loop's > bounds are changed from: > > > for (int i = 100; i < 1100; i++) { > > > to: > > > for (int i = 0; i < 1000; i++) { > > > The reason for that the long range check transformation expects the > loop to start at 0. > > Pre/main/post loops are created. Template Assertion predicates are > added above the main loop. The loop is unrolled. Initialized assertion > predicates are created. The one created from the condition: > > > init - 100 > > checks the value of `i` out of the pre loop which is 1. That check fails. > > The root cause of the failure is that when bounds of the counted loop > are changed, template assertion predicates need to be updated with and > adjusted init input. > > When the bounds of the loop are known, the assertion predicates can be > updated in place. Otherwise, when the loop is speculated to be short > running, the assertion predicates are updated when they are cloned. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into JDK-8366888 - whitespaces - review - Merge branch 'master' into JDK-8366888 - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - whitespaces - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27250/files - new: https://git.openjdk.org/jdk/pull/27250/files/4ed60fc0..ce97c772 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=02-03 Stats: 37559 lines in 943 files changed: 20749 ins; 11773 del; 5037 mod Patch: https://git.openjdk.org/jdk/pull/27250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27250/head:pull/27250 PR: https://git.openjdk.org/jdk/pull/27250 From roland at openjdk.org Tue Oct 28 10:13:36 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 10:13:36 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v3] In-Reply-To: References: Message-ID: On Mon, 13 Oct 2025 15:15:56 GMT, Roland Westrelin wrote: >> In: >> >> >> for (int i = 100; i < 1100; i++) { >> v += floatArray[i - 100]; >> Objects.checkIndex(i, longRange); >> } >> >> >> The int counted loop has both an int range check and a long range. The >> int range check is optimized first. Assertion predicates are inserted >> above the loop. One predicates checks that: >> >> >> init - 100 > >> >> The loop is then transformed to enable the optimization of the long >> range check. The loop is short running, so there's no need to create a >> loop nest. The counted loop is mostly left as is but, the loop's >> bounds are changed from: >> >> >> for (int i = 100; i < 1100; i++) { >> >> >> to: >> >> >> for (int i = 0; i < 1000; i++) { >> >> >> The reason for that the long range check transformation expects the >> loop to start at 0. >> >> Pre/main/post loops are created. Template Assertion predicates are >> added above the main loop. The loop is unrolled. Initialized assertion >> predicates are created. The one created from the condition: >> >> >> init - 100 > >> >> checks the value of `i` out of the pre loop which is 1. That check fails. >> >> The root cause of the failure is that when bounds of the counted loop >> are changed, template assertion predicates need to be updated with and >> adjusted init input. >> >> When the bounds of the loop are known, the assertion predicates can be >> updated in place. Otherwise, when the loop is speculated to be short >> running, the assertion predicates are updated when they are cloned. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8366888 > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - whitespaces > - fix Anyone for a second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27250#issuecomment-3455628712 From epeter at openjdk.org Tue Oct 28 10:16:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 10:16:08 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v3] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 16:15:05 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - remove std::hash > - remove unordered_map, add some comments for all_instances_size Nice, that looks better already. Just looked over the diff, now going to look at the whole patch again. test/hotspot/gtest/opto/test_rangeinference.cpp line 285: > 283: // Quick helper for the tediousness below > 284: auto f = [](auto x, auto y) { > 285: return x < y ? RBTreeOrdering::LT : RBTreeOrdering::GT; Suggestion: assert(x != y, "we only handle lt and gt cases"); return x < y ? RBTreeOrdering::LT : RBTreeOrdering::GT; Would that be correct? ------------- PR Review: https://git.openjdk.org/jdk/pull/27618#pullrequestreview-3387858662 PR Review Comment: https://git.openjdk.org/jdk/pull/27618#discussion_r2468878934 From xgong at openjdk.org Tue Oct 28 10:18:08 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 28 Oct 2025 10:18:08 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v4] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 09:53:15 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename matcher helper function to "mask_op_prefers_predicate" and add >> more comments > > src/hotspot/share/opto/matcher.hpp line 339: > >> 337: // saved with a predicate type (i.e. TypeVectMask) or not. Return true if it >> 338: // requires a predicate type. And return false if it requires a vector type. >> 339: static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt); > > You need to decide if it is a `prefers` or a `requires` concept. > You had some really good explanations here, and I think it would be great if you used some of that here. > https://github.com/openjdk/jdk/pull/27481#discussion_r2464360599 To me, `requires` is more accurate. That means the backend implementation requires the mask to be saved in a predicate register, while a vector input is not accepted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468890615 From epeter at openjdk.org Tue Oct 28 10:20:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 10:20:05 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v3] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 16:15:05 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - remove std::hash > - remove unordered_map, add some comments for all_instances_size Looks really good. I'll run some internal testing now... ------------- PR Review: https://git.openjdk.org/jdk/pull/27618#pullrequestreview-3387883706 From xgong at openjdk.org Tue Oct 28 10:23:06 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 28 Oct 2025 10:23:06 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Tue, 28 Oct 2025 08:17:07 GMT, Emanuel Peter wrote: >> The implementation on AArch64 would be like: >> >> bool Matcher::mask_op_prefers_predicate(int opcode, const TypeVect* vt) { >> // Only SVE supports the predicate feature. >> if (UseSVE == 0) { >> // On architectures that do not support the predicate feature, vector >> // mask is stored in a normal vector with the type of "TypeVect" varing >> // from "TypeVectA" to "TypeVectZ" based on the vector length in bytes. >> // It cannot be a "TypeVectMask". >> assert(vt->isa_vectmask() == nullptr, "mask type not match"); >> return false; >> } >> >> assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE"); >> switch (opcode) { >> case Op_VectorMaskToLong: >> case Op_VectorLongToMask: >> // SVE does not have native predicate instructions for these two ops. >> // Instead, they are implemented with vector instructions. Hence, to >> // improve the performance, we prefer saving the mask in a vector as >> // the input/output of these IRs. >> return false; >> default: >> // By default, all the mask operations are implemented with predicate >> // instructions with a predicate input/output. >> return true; >> } >> } >> >> And the comments before the helper function in matcher.hpp: >> >> // Identify if a vector mask operation requires the input/output mask to be >> // saved with a predicate type (i.e. TypeVectMask) or not. Return true if it >> // requires a predicate type. And return false if it requires a vector type. >> static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt); >> >> >> Is that more clear? Thanks! > > Yes, these are a step int the right direction! :) > > Thanks a lot for the explanations, very helpful! Please make sure that they are all represented in the code comments, so we don't lose them to this GitHub thread! > > `// Identify if a vector mask operation requires the input/output mask to be` > The language of `requires` slipped again into your explanation. Is that intended? Probably not? > You should use a condensed version from your GitHub comments above, I think that would be very helpful :) Current comment might be confusing. I will use `requires` both in comments and method name. Hope this would be more clear. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468909747 From xgong at openjdk.org Tue Oct 28 10:23:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 28 Oct 2025 10:23:07 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <_XquQ5TR_3ZGRTZ1c3iVCRGkNBDYyXvXuhimlGQFKq4=.bae46e77-9028-4d92-8ae4-29a5eef4b27a@github.com> Message-ID: On Tue, 28 Oct 2025 09:43:03 GMT, Emanuel Peter wrote: >> Yes, the IR changes you pointed above is right. >> >> The major performance uplift comes from the existing optimization of `VectorStoreMask (VectorLoadMask v) => v`. As you know, `VectorLoadMask` will be generated by some APIs like `VectorMask.fromArray()`. With this change, `VectorMask.fromLong()` also generates this IR. The mask conversions (V->P and P->V) between these APIs can be saved. >> >> Another performance uplift comes from the flexible vector register allocation. Before, the vector register is specified as the same for different instructions. But now, it depends on RA. In this case, it potentially breaks the un-expected data-dependence across loop iterations. > > @XiaohongGong If this is only about `VectorStoreMask (VectorLoadMask v) => v`, why not solve the issue with an `Ideal` optimization? Would that be an alternative? `VectorStoreMask (VectorLoadMask v) => v` is already existed in C2. Spliting the `VectorLongToMask` and `VectorMaskToLong` can reuse this transformation. That's why the performance can be improved. Because redundent mask conversions are optimized out in some case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468900531 From epeter at openjdk.org Tue Oct 28 10:24:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 10:24:06 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v7] In-Reply-To: References: Message-ID: <2FRuuf61gpfe7UZ6Uk43SrOBR36mTvLA2Ve5GWlHj54=.abc0e3dc-c1df-4811-9efb-ae9017c101af@github.com> On Tue, 28 Oct 2025 10:00:44 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 1526: >> >>> 1524: @Warmup(0) >>> 1525: @Run(test = {// Signed >>> 1526: "testCMoveIGTforI", >> >> Thank you very much for adding all the unsigned tests! >> >> We should eventually also add more signed tests. Are you planning on doing that anyway in the future? Either way: we should have an RFE, and link it to this bug here. If you don't want to work on it, then please assign it to me ;) > > Yes, I agree we should add more tests for signed ones. > I'll do it later, it's tracked here: https://bugs.openjdk.org/browse/JDK-8370794. Excellent, thank you :) @Hamlin-Li So thank you for working on this! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2468906726 PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2468910446 From epeter at openjdk.org Tue Oct 28 10:24:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 10:24:07 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v7] In-Reply-To: <2FRuuf61gpfe7UZ6Uk43SrOBR36mTvLA2Ve5GWlHj54=.abc0e3dc-c1df-4811-9efb-ae9017c101af@github.com> References: <2FRuuf61gpfe7UZ6Uk43SrOBR36mTvLA2Ve5GWlHj54=.abc0e3dc-c1df-4811-9efb-ae9017c101af@github.com> Message-ID: <_bz-9zS5tdX212MpQI6PZqWhuUVj3BRahv6sF6u_bLM=.f9691038-879a-4434-b41b-1ee828b08248@github.com> On Tue, 28 Oct 2025 10:19:54 GMT, Emanuel Peter wrote: >> Yes, I agree we should add more tests for signed ones. >> I'll do it later, it's tracked here: https://bugs.openjdk.org/browse/JDK-8370794. > > Excellent, thank you :) This will also be excellent preparation work for if-conversion. I hope I can start prototyping in the next months :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27942#discussion_r2468909016 From epeter at openjdk.org Tue Oct 28 10:30:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 10:30:03 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Tue, 28 Oct 2025 10:20:49 GMT, Xiaohong Gong wrote: >> Yes, these are a step int the right direction! :) >> >> Thanks a lot for the explanations, very helpful! Please make sure that they are all represented in the code comments, so we don't lose them to this GitHub thread! >> >> `// Identify if a vector mask operation requires the input/output mask to be` >> The language of `requires` slipped again into your explanation. Is that intended? Probably not? >> You should use a condensed version from your GitHub comments above, I think that would be very helpful :) > > Current comment might be confusing. I will use `requires` both in comments and method name. Hope this would be more clear. WDYT? Well, we went with `prefers` because you said that on `aarch64` both are implemented, see our conversation above. So we are now spinning in circles. I would approach it like this: Write down what it means if the method returns true, and what it means if it returns false. Make sure to use `requires`, if anything else is not permitted/implemented. Use `prefers` if both are permitted/implemented, but one is preferred. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468923970 From epeter at openjdk.org Tue Oct 28 10:30:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 10:30:04 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Tue, 28 Oct 2025 10:24:59 GMT, Emanuel Peter wrote: >> Current comment might be confusing. I will use `requires` both in comments and method name. Hope this would be more clear. WDYT? > > Well, we went with `prefers` because you said that on `aarch64` both are implemented, see our conversation above. So we are now spinning in circles. > > I would approach it like this: > Write down what it means if the method returns true, and what it means if it returns false. Make sure to use `requires`, if anything else is not permitted/implemented. Use `prefers` if both are permitted/implemented, but one is preferred. Another idea: use a return `Enum`. Then you can give things names, which can sometimes be more helpful than `true/false`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468933077 From epeter at openjdk.org Tue Oct 28 10:30:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 10:30:06 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v4] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 10:15:34 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/matcher.hpp line 339: >> >>> 337: // saved with a predicate type (i.e. TypeVectMask) or not. Return true if it >>> 338: // requires a predicate type. And return false if it requires a vector type. >>> 339: static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt); >> >> You need to decide if it is a `prefers` or a `requires` concept. >> You had some really good explanations here, and I think it would be great if you used some of that here. >> https://github.com/openjdk/jdk/pull/27481#discussion_r2464360599 > > To me, `requires` is more accurate. That means the backend implementation requires the mask to be saved in a predicate register, while a vector input is not accepted. See answer here: https://github.com/openjdk/jdk/pull/27481#discussion_r2468923970 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468930663 From mli at openjdk.org Tue Oct 28 10:35:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 Oct 2025 10:35:06 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v2] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 11:17:04 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - tests >> - switch > > You should also change the PR description, especially you should describe what went wrong at what point. > > > Well, you mostly already explain. I think the issue is that we don't really carry the "unsigned-ness" of the comparison, and then end up doing signed instead of unsigned comparison... Hi, while @eme64 running some tests, can someone else have another look at this pr? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3455754135 From chagedorn at openjdk.org Tue Oct 28 10:45:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 28 Oct 2025 10:45:06 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v9] In-Reply-To: <9T_qIFDSxnt0RfSKknq6jkZnSlkEHslHL5NuquhMAOI=.6b7dc2e5-1341-4b9b-bbca-27d0eaca5d78@github.com> References: <9T_qIFDSxnt0RfSKknq6jkZnSlkEHslHL5NuquhMAOI=.6b7dc2e5-1341-4b9b-bbca-27d0eaca5d78@github.com> Message-ID: On Tue, 28 Oct 2025 07:24:55 GMT, Marc Chevalier wrote: >> test/hotspot/jtreg/compiler/loopopts/TooStrictAssertForUnrollAfterPeeling.java line 54: >> >>> 52: * -XX:-SplitIfBlocks >>> 53: * -XX:-UseOnStackReplacement >>> 54: * -XX:LoopMaxUnroll=2 >> >> Are these flags all required to trigger the issue or what is the motivation behind having this run compared to the above only? > > That's the one in the reproducer you've crafted that give a simpler graph, if I remember correctly. I think it's valuable because the graph shape is different so it might trigger some asserts differently, exercise other paths, and if it breaks again, maybe someone who will have to look at it will be happy to find a run with a simpler graph. Maybe I can add in the summary that "if it helps investigate an issue, the @run 3 and 5 (with more flags) are expected to give a simpler graph". Thanks for the explanation. But it would also trigger without the additional flags? For the reproducer, I just disabled as many optimizations as possible to get an easier graph which I often do while debugging. The problem I see is that we could define such additional runs in many of our tests to get some simpler or different graph shapes. But I would argue that this should rather be part of a separate stress job instead. This also keeps the execution time short for tier1. But you could certainly leave a comment in the test how to get a simpler graph if required. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2468993153 From xgong at openjdk.org Tue Oct 28 10:50:05 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 28 Oct 2025 10:50:05 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Tue, 28 Oct 2025 10:27:39 GMT, Emanuel Peter wrote: >> Well, we went with `prefers` because you said that on `aarch64` both are implemented, see our conversation above. So we are now spinning in circles. >> >> I would approach it like this: >> Write down what it means if the method returns true, and what it means if it returns false. Make sure to use `requires`, if anything else is not permitted/implemented. Use `prefers` if both are permitted/implemented, but one is preferred. > > Another idea: use a return `Enum`. Then you can give things names, which can sometimes be more helpful than `true/false`. I'm sorry that I might not explain too clear in above comments. A mask op can either be implemented with vector registers or predicate register on AArch64. But we can just choose one of them for a specific architecture. On NEON, it must use vector. But on SVE, in general mask ops use predicate instructions. However, for several special ops like ops in this PR, they are implemented with vector instructions. And there is no predicate version supported. Anyway, I will change to use `requires` any where. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2469018071 From snatarajan at openjdk.org Tue Oct 28 11:55:02 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 28 Oct 2025 11:55:02 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v3] In-Reply-To: References: Message-ID: <5lH-NWvPMZGLThh87G2aUtYHHKZFvqjFwll41Qn4JE8=.e52d1cc2-3818-45a7-b559-5b6e6c4fd117@github.com> On Mon, 27 Oct 2025 15:54:35 GMT, Roberto Casta?eda Lozano wrote: >> This changeset makes it easier to trace C2's individual bytecode parsing steps to the input class files and the output of type flow analysis, by: >> >> 1. dumping the `map` node (holding the JVM state), basic block (reverse post-order index), and method name as properties of the graph that C2 dumps after each parsed bytecode at IGV dump level 6; >> >> 2. appending this information to the graph name (hence generalizing [JDK-8356779](https://bugs.openjdk.org/browse/JDK-8356779)); and >> >> 3. making the graph name suffix configurable via `Options -> Graph Name Suffix`. By default, it appends `(map: [map], block #[block] at [method])` to the names of graphs containing these properties (bytecode parsing dumps), and nothing otherwise. >> >> Here are the `Outline` and `Properties` windows for >> >> $ java -Xbatch -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphLevel=6 >> >> before (left) and after (right) the changeset: >> >> before-after >> >> Please let me know if there are other bytecode parsing graph properties that could be useful to dump, and whether you think the default graph name suffix contains the right amount of information. >> >> #### Testing >> - tier1. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure on HotSpot or IGV. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove leftover from NetBeans code generation Thank you for the improvement. I have tested some random programs and the changes look good. Marked as reviewed by snatarajan (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27975#pullrequestreview-3388356191 PR Review: https://git.openjdk.org/jdk/pull/27975#pullrequestreview-3388363651 From fyang at openjdk.org Tue Oct 28 12:14:04 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 28 Oct 2025 12:14:04 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v3] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: <7FzSzpKxWXt8FQn1sudqcNpH1UoTn114nx4K6cgPRjk=.89e678d7-a4b1-4322-9242-79eb15db8a1e@github.com> On Tue, 28 Oct 2025 09:16:17 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/riscv.ad line 1375: >> >>> 1373: if (VerifyStackAtCalls) { >>> 1374: st->print("mv t2, %ld\n\t", MAJIK_DWORD); >>> 1375: st->print("sd t2, [sp, #%d]\n\t", - 3 * wordSize); >> >> Hmm ... I missed this one. Shouldn't the offset of the address be: `framesize - 3 * wordSize`? > > The two above is printed like: > > st->print("sd fp, [sp, #%d]\n\t", - 2 * wordSize); > st->print("sd ra, [sp, #%d]\n\t", - wordSize); > > I just followed that. Ah, I see. Seems we need to update this instruction sequence to match what `build_frame` does in `MachPrologNode::emit`. I guess that was once missed when we change `MachPrologNode::emit`. 4871 void MacroAssembler::build_frame(int framesize) { 4872 assert(framesize >= 2, "framesize must include space for FP/RA"); 4873 assert(framesize % (2*wordSize) == 0, "must preserve 2*wordSize alignment"); 4874 sub(sp, sp, framesize); 4875 sd(fp, Address(sp, framesize - 2 * wordSize)); 4876 sd(ra, Address(sp, framesize - wordSize)); 4877 if (PreserveFramePointer) { add(fp, sp, framesize); } 4878 } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2469353195 From rcastanedalo at openjdk.org Tue Oct 28 12:26:03 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 28 Oct 2025 12:26:03 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v3] In-Reply-To: <5lH-NWvPMZGLThh87G2aUtYHHKZFvqjFwll41Qn4JE8=.e52d1cc2-3818-45a7-b559-5b6e6c4fd117@github.com> References: <5lH-NWvPMZGLThh87G2aUtYHHKZFvqjFwll41Qn4JE8=.e52d1cc2-3818-45a7-b559-5b6e6c4fd117@github.com> Message-ID: On Tue, 28 Oct 2025 11:51:11 GMT, Saranya Natarajan wrote: > Thank you for the improvement. I have tested some random programs and the changes look good. Thanks for reviewing, Saranya! @chhagedorn I removed the dead code you referred to, are you happy with the current version? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27975#issuecomment-3456207012 From chagedorn at openjdk.org Tue Oct 28 12:31:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 28 Oct 2025 12:31:06 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v3] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 15:54:35 GMT, Roberto Casta?eda Lozano wrote: >> This changeset makes it easier to trace C2's individual bytecode parsing steps to the input class files and the output of type flow analysis, by: >> >> 1. dumping the `map` node (holding the JVM state), basic block (reverse post-order index), and method name as properties of the graph that C2 dumps after each parsed bytecode at IGV dump level 6; >> >> 2. appending this information to the graph name (hence generalizing [JDK-8356779](https://bugs.openjdk.org/browse/JDK-8356779)); and >> >> 3. making the graph name suffix configurable via `Options -> Graph Name Suffix`. By default, it appends `(map: [map], block #[block] at [method])` to the names of graphs containing these properties (bytecode parsing dumps), and nothing otherwise. >> >> Here are the `Outline` and `Properties` windows for >> >> $ java -Xbatch -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphLevel=6 >> >> before (left) and after (right) the changeset: >> >> before-after >> >> Please let me know if there are other bytecode parsing graph properties that could be useful to dump, and whether you think the default graph name suffix contains the right amount of information. >> >> #### Testing >> - tier1. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure on HotSpot or IGV. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove leftover from NetBeans code generation Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27975#pullrequestreview-3388531270 From chagedorn at openjdk.org Tue Oct 28 12:31:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 28 Oct 2025 12:31:07 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v3] In-Reply-To: References: <4UX5Vaj1aGBtVya6GBPLaHAUMUF5Lg64XZeBlci_Q0o=.55273dc4-b6af-4896-bdd9-404549d1bdd1@github.com> Message-ID: On Mon, 27 Oct 2025 15:50:52 GMT, Roberto Casta?eda Lozano wrote: >> Can this method then be removed? > > Right, sorry, removed now (commit 8da5897204f1e51b4347b3a1110e33506d660dd1). I was expecting that NetBeans would detect it as dead and remove it automagically when re-generating `ViewPanel.java` from `ViewPanel.form`, but forgot to check. No worries! Thanks for cleaning it up, it looks good to me now ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27975#discussion_r2469399695 From chagedorn at openjdk.org Tue Oct 28 12:32:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 28 Oct 2025 12:32:11 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v5] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 07:22:59 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > typoes in comment Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27912#pullrequestreview-3388535875 From rcastanedalo at openjdk.org Tue Oct 28 12:38:07 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 28 Oct 2025 12:38:07 GMT Subject: RFR: 8370569: IGV: dump more graph properties at bytecode parsing [v3] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 15:54:35 GMT, Roberto Casta?eda Lozano wrote: >> This changeset makes it easier to trace C2's individual bytecode parsing steps to the input class files and the output of type flow analysis, by: >> >> 1. dumping the `map` node (holding the JVM state), basic block (reverse post-order index), and method name as properties of the graph that C2 dumps after each parsed bytecode at IGV dump level 6; >> >> 2. appending this information to the graph name (hence generalizing [JDK-8356779](https://bugs.openjdk.org/browse/JDK-8356779)); and >> >> 3. making the graph name suffix configurable via `Options -> Graph Name Suffix`. By default, it appends `(map: [map], block #[block] at [method])` to the names of graphs containing these properties (bytecode parsing dumps), and nothing otherwise. >> >> Here are the `Outline` and `Properties` windows for >> >> $ java -Xbatch -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphLevel=6 >> >> before (left) and after (right) the changeset: >> >> before-after >> >> Please let me know if there are other bytecode parsing graph properties that could be useful to dump, and whether you think the default graph name suffix contains the right amount of information. >> >> #### Testing >> - tier1. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure on HotSpot or IGV. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove leftover from NetBeans code generation Thanks for reviewing, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27975#issuecomment-3456258436 From roland at openjdk.org Tue Oct 28 12:42:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 12:42:14 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v15] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Fri, 24 Oct 2025 13:31:16 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - Roberto's patches > > src/hotspot/share/opto/multnode.cpp line 273: > >> 271: ProjNode::dump_compact_spec(st); >> 272: MemNode::dump_adr_type(_adr_type, st); >> 273: } > > Can you show us an example out put of `dump`? I'm just wondering if there maybe needs to be a space between the two, and if it is immediately readable :) Actually, `Node::dump` already takes care of dumping the `adr_type`. So I removed the `dump` methods from `NarrowMemProjNode`. Here is an example output: 59 Initialize === 83 1 62 1 1 1 121 [[ 124 123 63 64 65 ]] !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57) TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57) 63 NarrowMemProj === 59 [[ 15 ]] #2 Memory: @java/lang/Object *, idx=4; !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57) 64 NarrowMemProj === 59 [[ 15 ]] #2 Memory: @java/lang/Object+8 * [narrowklass], idx=5; !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57) 65 NarrowMemProj === 59 [[ 15 ]] #2 Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57) 66 CheckCastPP === 125 86 [[ 79 ]] #float[int:1] (java/lang/Cloneable,java/io/Serializable):NotNull:exact * !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2469429116 From rehn at openjdk.org Tue Oct 28 12:44:37 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 12:44:37 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v4] In-Reply-To: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: > Hi, please consider. > > Sanity tested and no issues with MAJIK t1 (with +VSC). > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Fixed format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28005/files - new: https://git.openjdk.org/jdk/pull/28005/files/1a2059f6..0dfed06f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=02-03 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28005/head:pull/28005 PR: https://git.openjdk.org/jdk/pull/28005 From rehn at openjdk.org Tue Oct 28 12:44:38 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 12:44:38 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v3] In-Reply-To: <7FzSzpKxWXt8FQn1sudqcNpH1UoTn114nx4K6cgPRjk=.89e678d7-a4b1-4322-9242-79eb15db8a1e@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> <7FzSzpKxWXt8FQn1sudqcNpH1UoTn114nx4K6cgPRjk=.89e678d7-a4b1-4322-9242-79eb15db8a1e@github.com> Message-ID: On Tue, 28 Oct 2025 12:10:53 GMT, Fei Yang wrote: >> The two above is printed like: >> >> st->print("sd fp, [sp, #%d]\n\t", - 2 * wordSize); >> st->print("sd ra, [sp, #%d]\n\t", - wordSize); >> >> I just followed that. > > Ah, I see. Seems we need to update this instruction sequence to match what `build_frame` does in `MachPrologNode::emit`. I guess that was once missed when we change `MachPrologNode::emit`. > > > 4871 void MacroAssembler::build_frame(int framesize) { > 4872 assert(framesize >= 2, "framesize must include space for FP/RA"); > 4873 assert(framesize % (2*wordSize) == 0, "must preserve 2*wordSize alignment"); > 4874 sub(sp, sp, framesize); > 4875 sd(fp, Address(sp, framesize - 2 * wordSize)); > 4876 sd(ra, Address(sp, framesize - wordSize)); > 4877 if (PreserveFramePointer) { add(fp, sp, framesize); } > 4878 } Fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2469432061 From rcastanedalo at openjdk.org Tue Oct 28 12:47:17 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 28 Oct 2025 12:47:17 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v4] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 10:05:46 GMT, Roland Westrelin wrote: >> In the `test1()` method of the test case: >> >> `inlined2()` calls `clone()` for an object loaded from field `field` >> that has inexact type `A` at parse time. The intrinsic for `clone()` >> inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the >> load of `field` is optimized out because it reads back a newly >> allocated `B` written to `field` in the same method. `ArrayCopy` can >> now be optimized because the type of its `src` input is known. The >> type of its `dest` input is the `CheckCastPP` from the allocation of >> the cloned object created at parse time. That one has type `A`. A >> series of `Load`s/`Store`s are created to copy the fields of class `B` >> from `src` (of type `B`) to `dest` of (type `A`). >> >> Writting to `dest` with offsets for fields that don't exist in `A`, >> causes this code in `Compile::flatten_alias_type()`: >> >> >> } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { >> // Static fields are in the space above the normal instance >> // fields in the java.lang.Class instance. >> if (ik != ciEnv::current()->Class_klass()) { >> to = nullptr; >> tj = TypeOopPtr::BOTTOM; >> offset = tj->offset(); >> } >> >> >> to assign it some slice that doesn't match the one that's used at the >> same offset in `B`. >> >> That causes an assert in `ArrayCopyNode::try_clone_instance()` to >> fire. With a release build, execution proceeds. `test1()` also has a >> non escaping allocation. That one causes EA to run and >> `ConnectionGraph::split_unique_types()` to move the store to the non >> escaping allocation to a new slice. In the process, when it iterates >> over `MergeMem` nodes, it notices the stores added by >> `ArrayCopyNode::try_clone_instance()`, finds that some are not on the >> right slice, tries to move them to the correct slice (expecting they >> are from a non escaping EA). That causes some of the `Store`s to be >> disconnected. When the resulting code runs, execution fails as some >> fields are not copied. >> >> The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` >> when `src` and `dest` classes don't match as this seems like a rare >> enough corner case. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - package fix > - Merge branch 'master' into JDK-8339526 > - review > - Merge branch 'master' into JDK-8339526 > - review > - Merge branch 'master' into JDK-8339526 > - Update src/hotspot/share/opto/arraycopynode.cpp > > Co-authored-by: Christian Hagedorn > - test & fix Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27604#pullrequestreview-3388590620 From rcastanedalo at openjdk.org Tue Oct 28 13:34:21 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 28 Oct 2025 13:34:21 GMT Subject: Integrated: 8370569: IGV: dump more graph properties at bytecode parsing In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 12:22:11 GMT, Roberto Casta?eda Lozano wrote: > This changeset makes it easier to trace C2's individual bytecode parsing steps to the input class files and the output of type flow analysis, by: > > 1. dumping the `map` node (holding the JVM state), basic block (reverse post-order index), and method name as properties of the graph that C2 dumps after each parsed bytecode at IGV dump level 6; > > 2. appending this information to the graph name (hence generalizing [JDK-8356779](https://bugs.openjdk.org/browse/JDK-8356779)); and > > 3. making the graph name suffix configurable via `Options -> Graph Name Suffix`. By default, it appends `(map: [map], block #[block] at [method])` to the names of graphs containing these properties (bytecode parsing dumps), and nothing otherwise. > > Here are the `Outline` and `Properties` windows for > > $ java -Xbatch -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphLevel=6 > > before (left) and after (right) the changeset: > > before-after > > Please let me know if there are other bytecode parsing graph properties that could be useful to dump, and whether you think the default graph name suffix contains the right amount of information. > > #### Testing > - tier1. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure on HotSpot or IGV. This pull request has now been integrated. Changeset: 5c5367c3 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/5c5367c3124ed8c950539a6a90c631727146c5bc Stats: 140 lines in 9 files changed: 91 ins; 13 del; 36 mod 8370569: IGV: dump more graph properties at bytecode parsing Reviewed-by: chagedorn, snatarajan ------------- PR: https://git.openjdk.org/jdk/pull/27975 From roland at openjdk.org Tue Oct 28 13:35:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 13:35:19 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v15] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Fri, 24 Oct 2025 13:21:04 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - Roberto's patches > > src/hotspot/share/opto/multnode.hpp line 232: > >> 230: }; >> 231: >> 232: template ProjNode* MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj) const { > > Does this not belong right after the `MultiNode`? Or even in `multnode.cpp`? It needs the `ProjNode` declaration because it accesses `proj->_con` and can't be in `multnode.cpp` because of the template parameter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2469591960 From roland at openjdk.org Tue Oct 28 13:38:48 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 13:38:48 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v16] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: - review - Merge branch 'master' into JDK-8327963 - review - Roberto's patches - review - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/graphKit.cpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/graphKit.cpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/multnode.hpp Co-authored-by: Roberto Casta?eda Lozano - ... and 47 more: https://git.openjdk.org/jdk/compare/96259936...957be06e ------------- Changes: https://git.openjdk.org/jdk/pull/24570/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=15 Stats: 955 lines in 24 files changed: 864 ins; 25 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From roland at openjdk.org Tue Oct 28 13:38:48 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 28 Oct 2025 13:38:48 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v6] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <8BJorsTgiK1pTElabu0NZFko5n4mlpAhadlt87w_v2s=.f19e86d1-d646-46eb-860f-cbbadf37ada3@github.com> Message-ID: On Tue, 27 May 2025 07:56:46 GMT, Emanuel Peter wrote: >>> > I also think it would be good to investigate, separately, early elimination of dead array allocations, even after the integration of this work. Dead allocations may inhibit later optimizations so it would be good to eliminate them as early as possible anyway. One difficulty (not addressed in [c28f81a](https://github.com/openjdk/jdk/commit/c28f81a7ef2a4f3d3cb761ea23a80c09276e7e58)) is that early array elimination should still generate the nonnegative array size check code. >>> >>> That makes sense. It would be useful to have a bugs to track that one. >> >> Turns out there is one already: [JDK-8180290](https://bugs.openjdk.org/browse/JDK-8180290), I just added a comment there. > >> > > I also think it would be good to investigate, separately, early elimination of dead array allocations, even after the integration of this work. Dead allocations may inhibit later optimizations so it would be good to eliminate them as early as possible anyway. One difficulty (not addressed in [c28f81a](https://github.com/openjdk/jdk/commit/c28f81a7ef2a4f3d3cb761ea23a80c09276e7e58)) is that early array elimination should still generate the nonnegative array size check code. >> > >> > >> > That makes sense. It would be useful to have a bugs to track that one. >> >> Turns out there is one already: [JDK-8180290](https://bugs.openjdk.org/browse/JDK-8180290), I just added a comment there. > > Should we link it on JIRA? @eme64 I pushed a new commit which should address all your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3456505508 From mablakatov at openjdk.org Tue Oct 28 13:57:08 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Tue, 28 Oct 2025 13:57:08 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v13] In-Reply-To: References: Message-ID: > Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. > > Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. > > The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms > IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms > LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms > > > Fujitsu A64FX (SVE 512-bit): > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms > IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms > LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - Merge commit 'c8679713402186b24608fa4c91397b6a4fd5ebf3' into 8343689 Change-Id: Icfa70da585e034774e4ff0f60b8f0c9ce0598399 - cleanup: remove redundand local variables Change-Id: I6fb6a9a7a236537612caa5d53c5516ed2f260bad - cleanup: remove a trivial switch-case statement Change-Id: Ib914ce02ae9d88057cb0b88d4880df6ca64f8184 - Assert the exact supported VL of 32B in SVE-specific methods Change-Id: I8768c653ff563cd8a7a75cd06a6523a9526d15ec - cleanup: fix long line formatting Change-Id: I173e70a2fa9a45f56fe50d4a6b81699665e3433d - fixup: remove VL asserts in match rules to fix failures on >= 512b SVE platforms Change-Id: I721f5a97076d645905ee1716f7d57ec8c90ef6e9 - Merge branch 'master' into 8343689 Change-Id: Iebe758e4c7b3ab0de5f580199f8909e96b8c6274 - cleanup: start the SVE Integer Misc - Unpredicated section - Merge branch 'master' - Address review comments and simplify the implementation - remove the loops from gt128b methods making them 256b only - fixup: missed fnoregs in instruct reduce_mulL_256b - use an extra vtmp3 reg for the 256b integer method - remove a no longer needed change in reduce_mul_integral_le128b - cleanup: unify comments - ... and 14 more: https://git.openjdk.org/jdk/compare/c8679713...e564d6c1 ------------- Changes: https://git.openjdk.org/jdk/pull/23181/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=12 Stats: 272 lines in 5 files changed: 206 ins; 2 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/23181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23181/head:pull/23181 PR: https://git.openjdk.org/jdk/pull/23181 From eastigeevich at openjdk.org Tue Oct 28 14:56:53 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 28 Oct 2025 14:56:53 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 09:22:25 GMT, Aleksey Shipilev wrote: > I don't understand this patch. So for release builds, we get stuck at `reference_count=1` and do continuous `os::free(_immutable_data)`? How's that correct? I think we should not continuously free as long as we call `nmethod::purge` once per nmethod. The counter is a part of the immutable data, not a part of nmethod, which is shared among copies of an nmethod. int reference_count = get_immutable_data_references_counter(); We have: nmethod_orig ---| nmethod_copy1 ---|---> {data, counter == 3} nmethod_copy3 ---| The counter becoming 1 means the only one nmethod points at `{data, counter == 1}`. Executing the code // Free memory if this is the last nmethod referencing immutable data if (reference_count == 1) { // Updating the counter here is not necessary since the memory is // being freed so only do it for debug builds to eliminate a write DEBUG_ONLY(set_immutable_data_references_counter(reference_count - 1);) os::free(_immutable_data); will release the memory used for the immutable data. This also makes the counter garbage. As a result there will be no more nmethods sharing the pointer to the immutable data. There will be a problem if `purge` is called again for the same nmethod. `get_immutable_data_references_counter` might return garbage. `purge` does not free CodeCache memory used by the nmethod. Maybe we should check `_immutable_data` is not `blob_end()` either explicitly or with an assert. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3456910478 From fyang at openjdk.org Tue Oct 28 15:19:25 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 28 Oct 2025 15:19:25 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v4] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Tue, 28 Oct 2025 12:44:37 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Sanity tested and no issues with MAJIK t1 (with +VSC). >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Fixed format src/hotspot/cpu/riscv/riscv.ad line 1375: > 1373: if (VerifyStackAtCalls) { > 1374: st->print("mv t2, %ld\n\t", MAJIK_DWORD); > 1375: st->print("sd t2, [sp, #%d]\n\t", - 3 * wordSize); Thanks for the update. You might want to change this into `st->print("sd t2, [sp, #%d]\n\t", framesize - 3 * wordSize);` at the same time. BTW: My local `hs:tier1` with `-XX:+VerifyStackAtCalls` using fastdebug build is good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2469973749 From duke at openjdk.org Tue Oct 28 15:47:00 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 28 Oct 2025 15:47:00 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 14:54:27 GMT, Evgeny Astigeevich wrote: > Maybe we should check _immutable_data is not blob_end() either explicitly or with an assert. It does get checked before we read the reference counter ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3457165644 From kvn at openjdk.org Tue Oct 28 16:07:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 Oct 2025 16:07:20 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v3] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 19:01:31 GMT, Chad Rakoczy wrote: >> [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) >> >> Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) >> >> `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` >> >> `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions >> >> `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Remove explicit test config for different GCs My testing of v02 passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27659#pullrequestreview-3389604431 From duke at openjdk.org Tue Oct 28 16:16:15 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 28 Oct 2025 16:16:15 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: <04Um17ZQ89DAZTTtIkSVsylmXRzwoekyoYHOu1KmhsU=.0ef9d38b-aeb8-450f-8a1e-f901851a1b8b@github.com> On Tue, 28 Oct 2025 14:54:27 GMT, Evgeny Astigeevich wrote: > I don't understand this patch. So for release builds, we get stuck at `reference_count=1` and do continuous `os::free(_immutable_data)`? How's that correct? The reference counter will still be 1 but the memory will be freed. We first check `if (_immutable_data != blob_end())` to verify that the nmethod has immutable data. If it doesn't we never read the reference counter Also after either 1) the reference counter has been updated or 2) the memory has been freed we set `_immutable_data = blob_end();` so that if purge is called again (which I'm not sure if that ever actually happens) it won't even attempt to read the reference counter so it's not an issue of double freeing or reading freed memory ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3457316819 From shade at openjdk.org Tue Oct 28 16:24:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 28 Oct 2025 16:24:30 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 19:48:10 GMT, Chad Rakoczy wrote: > [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) > > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. Well, it sounds like "reference counter" is misleading then. You seem to be saying that "since the initial value is 1 even for unreferenced data, then the final value should also be 1, and that is when we free". But why that value is not 0? That would make much more sense for a _reference counter_. In other words, I think the problem is with initial value, not with the final one. Right? Plus, drop the `DEBUG_ONLY` thing: it only makes debug and release bits perform differently, which obscures the testing, i.e. you cannot readily trust that fastdebug and release builds do the same thing. Only ever do `DEBUG_ONLY` counter updates for _debugging_ counters. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3457363352 From duke at openjdk.org Tue Oct 28 16:28:43 2025 From: duke at openjdk.org (duke) Date: Tue, 28 Oct 2025 16:28:43 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v3] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 19:01:31 GMT, Chad Rakoczy wrote: >> [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) >> >> Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) >> >> `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` >> >> `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions >> >> `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Remove explicit test config for different GCs @chadrako Your change (at version c412bbed813fef97a169a6dd6a76cf502e1446b4) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3457381339 From epeter at openjdk.org Tue Oct 28 16:30:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 16:30:24 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v2] In-Reply-To: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: > Analysis: > `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. > > It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. > > What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. > > I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. > > Future Work: > - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. > - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: allow unique out with multiple uses ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27955/files - new: https://git.openjdk.org/jdk/pull/27955/files/9488f50a..98dbf27b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27955&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27955&range=00-01 Stats: 21 lines in 1 file changed: 14 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/27955.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27955/head:pull/27955 PR: https://git.openjdk.org/jdk/pull/27955 From epeter at openjdk.org Tue Oct 28 16:32:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 16:32:56 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v2] In-Reply-To: References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: On Tue, 28 Oct 2025 16:30:24 GMT, Emanuel Peter wrote: >> Analysis: >> `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. >> >> It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. >> >> What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. >> >> I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. >> >> Future Work: >> - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. >> - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > allow unique out with multiple uses @rwestrel I tried to find a reproducer that crashes the same way, but I could not make it work. Still: I now adjusted the code and I think it should work. But we would only really get confidence once I continue work on `VerifyLoopOptimizations`, for example checking that there are no dead nodes in `_body`. Are you ok with this? Any other suggestions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27955#issuecomment-3457401349 From duke at openjdk.org Tue Oct 28 16:35:33 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 28 Oct 2025 16:35:33 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: References: Message-ID: > [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) > > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28008/files - new: https://git.openjdk.org/jdk/pull/28008/files/4333a628..60de0f94 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28008&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28008&range=00-01 Stats: 10 lines in 1 file changed: 3 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28008.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28008/head:pull/28008 PR: https://git.openjdk.org/jdk/pull/28008 From eastigeevich at openjdk.org Tue Oct 28 16:40:12 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 28 Oct 2025 16:40:12 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: References: Message-ID: <2g5fmbW46jlCkrftIet564aTu-IfhpPob58PlNu_P5k=.e1c2d95b-6d35-485d-8caf-0a45c02f5b2e@github.com> On Tue, 28 Oct 2025 16:35:33 GMT, Chad Rakoczy wrote: >> [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) >> >> [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/28008#pullrequestreview-3389765202 From duke at openjdk.org Tue Oct 28 16:40:14 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 28 Oct 2025 16:40:14 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 16:21:51 GMT, Aleksey Shipilev wrote: > You seem to be saying that "since the initial value is 1 even for unreferenced data, then the final value should also be 1, and that is when we free". The final value should be 0 when it is freed but I didn't think it was necessary to actually write that 0 memory that was about to be freed anyways I fixed the code to decrement the reference counter regardless and then check if it zero before freeing. I was trying to get fancy but it is more trouble than it is worth ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3457425416 From kvn at openjdk.org Tue Oct 28 16:49:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 Oct 2025 16:49:38 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v3] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 19:01:31 GMT, Chad Rakoczy wrote: >> [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) >> >> Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) >> >> `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` >> >> `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions >> >> `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Remove explicit test config for different GCs Waiting second review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3457468260 From shade at openjdk.org Tue Oct 28 16:52:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 28 Oct 2025 16:52:56 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 16:35:33 GMT, Chad Rakoczy wrote: >> [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) >> >> [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup Ah, so _there_! I confused myself. This one is readable: the counter `0` means we can free. It would be even better if you did `inc_immutable_data_refcount()` and `dec_immutable_data_refcount()`, and did e.g.: if (dec_immutable_data_refcount() == 0) { os::free(_immutable_data); } int dec_immutable_data_refcount() { int refcount = get(...); assert(refcount > 0, "Must be positive"); set(refcount - 1); return refcount - 1; } Because the next thing you know this would need to be replaced with Atomics a year later. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28008#pullrequestreview-3389823108 From epeter at openjdk.org Tue Oct 28 17:13:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 17:13:01 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v16] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Tue, 28 Oct 2025 13:38:48 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: > > - review > - Merge branch 'master' into JDK-8327963 > - review > - Roberto's patches > - review > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Roberto Casta?eda Lozano > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Roberto Casta?eda Lozano > - Update src/hotspot/share/opto/graphKit.cpp > > Co-authored-by: Roberto Casta?eda Lozano > - Update src/hotspot/share/opto/graphKit.cpp > > Co-authored-by: Roberto Casta?eda Lozano > - Update src/hotspot/share/opto/multnode.hpp > > Co-authored-by: Roberto Casta?eda Lozano > - ... and 47 more: https://git.openjdk.org/jdk/compare/96259936...957be06e @rwestrel Thanks for the updates, it already looks better :) I had a few minutes to look over the `apply_..` solutions. I left a few comments, and hope that we can make the code just a little slicker still ;) src/hotspot/share/opto/memnode.cpp line 5484: > 5482: }; > 5483: return apply_to_narrow_mem_projs(filter); > 5484: } It seems to me that the upper method is only used by the lower here. Why not just collapse them? It would also reduce the "overloading noise". src/hotspot/share/opto/memnode.hpp line 1428: > 1426: template NarrowMemProjNode* apply_to_narrow_mem_projs(DUIterator& i, Callback callback) const { > 1427: return apply_to_narrow_mem_projs_any_iterator(UsesIterator(i, this), callback); > 1428: } Is this one still needed? src/hotspot/share/opto/multnode.hpp line 141: > 139: > 140: // Same but for matching _con and _is_io_use > 141: template ProjNode* apply_to_projs(Callback callback, uint which_proj, bool is_io_use) const; Do these need to be `public`? Or could they be `protected`, so they are only available to subtypes? And do we really need all the variants of `apply_to_projs`, or could we collapse them a little? ------------- PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-3389846667 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2470344461 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2470334385 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2470369879 From epeter at openjdk.org Tue Oct 28 17:13:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 17:13:03 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v16] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Tue, 28 Oct 2025 16:55:39 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: >> >> - review >> - Merge branch 'master' into JDK-8327963 >> - review >> - Roberto's patches >> - review >> - Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Update src/hotspot/share/opto/graphKit.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Update src/hotspot/share/opto/graphKit.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Update src/hotspot/share/opto/multnode.hpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - ... and 47 more: https://git.openjdk.org/jdk/compare/96259936...957be06e > > src/hotspot/share/opto/memnode.hpp line 1428: > >> 1426: template NarrowMemProjNode* apply_to_narrow_mem_projs(DUIterator& i, Callback callback) const { >> 1427: return apply_to_narrow_mem_projs_any_iterator(UsesIterator(i, this), callback); >> 1428: } > > Is this one still needed? It also seems that the upper two can be merged. Maybe all "overloadings" of `apply_to_narrow_mem_projs` can be merged, no? Or are there really multiple uses? > src/hotspot/share/opto/multnode.hpp line 141: > >> 139: >> 140: // Same but for matching _con and _is_io_use >> 141: template ProjNode* apply_to_projs(Callback callback, uint which_proj, bool is_io_use) const; > > Do these need to be `public`? Or could they be `protected`, so they are only available to subtypes? > > And do we really need all the variants of `apply_to_projs`, or could we collapse them a little? It is just a lot of boilerplate, would be nice if it was a little slicker ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2470349998 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2470370657 From epeter at openjdk.org Tue Oct 28 17:13:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 17:13:03 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v16] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Tue, 28 Oct 2025 16:59:41 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/memnode.hpp line 1428: >> >>> 1426: template NarrowMemProjNode* apply_to_narrow_mem_projs(DUIterator& i, Callback callback) const { >>> 1427: return apply_to_narrow_mem_projs_any_iterator(UsesIterator(i, this), callback); >>> 1428: } >> >> Is this one still needed? > > It also seems that the upper two can be merged. Maybe all "overloadings" of `apply_to_narrow_mem_projs` can be merged, no? Or are there really multiple uses? Basically we could call `apply_to_narrow_mem_projs_any_iterator` directly from the two uses: - `already_has_narrow_mem_proj_with_adr_type` - `for_each_narrow_mem_proj_with_new_uses` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2470355697 From rehn at openjdk.org Tue Oct 28 18:03:10 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Oct 2025 18:03:10 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v4] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Tue, 28 Oct 2025 15:15:58 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed format > > src/hotspot/cpu/riscv/riscv.ad line 1375: > >> 1373: if (VerifyStackAtCalls) { >> 1374: st->print("mv t2, %ld\n\t", MAJIK_DWORD); >> 1375: st->print("sd t2, [sp, #%d]\n\t", - 3 * wordSize); > > Thanks for the update. You might want to change this into `st->print("sd t2, [sp, #%d]\n\t", framesize - 3 * wordSize);` at the same time. BTW: My local `hs:tier1` with `-XX:+VerifyStackAtCalls` using fastdebug build is good. Oh, did I miss that, sorry! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2470551572 From epeter at openjdk.org Tue Oct 28 18:04:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Oct 2025 18:04:06 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v16] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 08:57:33 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Fix include order Testing passed, code looks good to me too now :) Thanks for all the work you put into this, really appreciated ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3390142520 From mli at openjdk.org Tue Oct 28 20:51:55 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 Oct 2025 20:51:55 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v8] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 17:59:10 GMT, Emanuel Peter wrote: > Testing passed - Approved. Thanks for all the work, and looking forward to what you have still planned :) @eme64 Thank you for reviewing and testing! I'll tag you when I send out other prs. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3458430292 From dlong at openjdk.org Tue Oct 28 20:58:47 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 28 Oct 2025 20:58:47 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v3] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 19:01:31 GMT, Chad Rakoczy wrote: >> [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) >> >> Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) >> >> `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` >> >> `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions >> >> `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Remove explicit test config for different GCs Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27659#pullrequestreview-3390766815 From duke at openjdk.org Tue Oct 28 21:05:28 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 28 Oct 2025 21:05:28 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: Message-ID: > [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) > > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: - Update NMethod.java for parity - Add functions to inc or dec ref count ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28008/files - new: https://git.openjdk.org/jdk/pull/28008/files/60de0f94..b441dc24 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28008&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28008&range=01-02 Stats: 105 lines in 3 files changed: 28 ins; 9 del; 68 mod Patch: https://git.openjdk.org/jdk/pull/28008.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28008/head:pull/28008 PR: https://git.openjdk.org/jdk/pull/28008 From duke at openjdk.org Tue Oct 28 21:09:53 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 28 Oct 2025 21:09:53 GMT Subject: Integrated: 8369147: Various issues with new tests added by JDK-8316694 In-Reply-To: References: Message-ID: On Mon, 6 Oct 2025 20:13:46 GMT, Chad Rakoczy wrote: > [JDK-8369147](https://bugs.openjdk.org/browse/JDK-8369147) > > Fixes tests added in [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) > > `DeoptimizeRelocatedNMethod.java` and `RelocateNMethod.java` failed because they attempted to relocate nmethods to the `MethodProfiled` code heap which does not exist when `TieredCompilation` is false. Updated the tests to use `MethodNonProfiled` heap which exists regardless of `TieredCompilation` > > `StressNMethodRelocation.java` runs for 60 seconds and also compiles 1024 methods with C2. This was causing the test to timeout if the compilation took too much time. Increasing the timeout to 5 minutes should give C2 enough time to compile the functions > > `NMethodRelocationTest.java` runs using SerialGC which caused a multiple GC error when trying to run with another GC. Added a requires to force SerialGC This pull request has now been integrated. Changeset: 73f93920 Author: Chad Rakoczy Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/73f93920b950b4ce5fa177db50010e95265d6a7f Stats: 288 lines in 6 files changed: 5 ins; 260 del; 23 mod 8369147: Various issues with new tests added by JDK-8316694 Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/27659 From lmesnik at openjdk.org Tue Oct 28 21:12:47 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 28 Oct 2025 21:12:47 GMT Subject: RFR: 8370846: Support execution of mlvm testing with test thread factory Message-ID: The MainWrapper used test thread factory has generated lambda method. So the AbsentInformationException is expected. The actual source path is not checked. Tested by run mlvm tests with and without test thread factory. Also jdk/test/lib/thread/TestThreadFactory.java updated to provide TestThreadFactory. isTestThreadFactorySet() that could be used by tests instead of checking property "test.thread.factory" directly. ------------- Commit messages: - fixed checks - 8370846: Support execution of mlvm testing with test thread factory Changes: https://git.openjdk.org/jdk/pull/28028/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28028&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370846 Stats: 37 lines in 2 files changed: 32 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28028.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28028/head:pull/28028 PR: https://git.openjdk.org/jdk/pull/28028 From vlivanov at openjdk.org Tue Oct 28 22:25:09 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 28 Oct 2025 22:25:09 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: Message-ID: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Fix merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/a1101cda..ed324159 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=17-18 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From lmesnik at openjdk.org Tue Oct 28 22:35:34 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 28 Oct 2025 22:35:34 GMT Subject: RFR: 8370846: Support execution of mlvm testing with test thread factory [v2] In-Reply-To: References: Message-ID: > The MainWrapper used test thread factory has generated lambda method. So the AbsentInformationException is expected. The actual source path is not checked. > > Tested by run mlvm tests with and without test thread factory. > > Also > jdk/test/lib/thread/TestThreadFactory.java > updated to provide TestThreadFactory. isTestThreadFactorySet() > that could be used by tests instead of checking property "test.thread.factory" directly. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: improved comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28028/files - new: https://git.openjdk.org/jdk/pull/28028/files/bf057a58..660ec003 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28028&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28028&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28028.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28028/head:pull/28028 PR: https://git.openjdk.org/jdk/pull/28028 From duke at openjdk.org Wed Oct 29 00:38:34 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 29 Oct 2025 00:38:34 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: > [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) > > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Fix NMethod.java immutable data ref count ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28008/files - new: https://git.openjdk.org/jdk/pull/28008/files/b441dc24..26bdc3ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28008&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28008&range=02-03 Stats: 13 lines in 2 files changed: 1 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/28008.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28008/head:pull/28008 PR: https://git.openjdk.org/jdk/pull/28008 From duke at openjdk.org Wed Oct 29 00:52:01 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 29 Oct 2025 00:52:01 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 16:50:09 GMT, Aleksey Shipilev wrote: > Ah, so _there_! I confused myself. This one is readable: the counter `0` means we can free. It would be even better if you did `inc_immutable_data_refcount()` and `dec_immutable_data_refcount()`, and did e.g.: > > ``` > if (dec_immutable_data_refcount() == 0) { > os::free(_immutable_data); > } > > int dec_immutable_data_refcount() { > int refcount = get(...); > assert(refcount > 0, "Must be positive"); > set(refcount - 1); > return refcount - 1; > } > ``` > > Because the next thing you know this would need to be replaced with Atomics a year later. I agree this makes the code cleaner. I replaced the getter and setter for the counter with `init_immutable_data_ref_count`, `inc_immutable_data_ref_count`, and `dec_immutable_data_ref_count`. I also shortened the counter name from `immutable_data_references_counter` to `immutable_data_ref_count` I modified `NMethod.java` to calculate the offsets that same way as is done in the JVM. I missed this in [JDK-8369642](https://bugs.openjdk.org/browse/JDK-8369642) The last notable change is that I modified the [immutable data size calculation](https://github.com/chadrako/jdk/blob/26bdc3ceb4ab9ad9cb9a4218bb87ce2d7546fa22/src/hotspot/share/code/nmethod.cpp#L1155) to only include a reference counter if there is immutable data ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3459206813 From dlong at openjdk.org Wed Oct 29 01:22:05 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 29 Oct 2025 01:22:05 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v12] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 09:50:47 GMT, Anton Seoane Ampudia wrote: >> This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. >> >> The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. >> >> However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. >> >> An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. >> >> **Testing:** passes tiers 1-5 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Final renaming touches Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27913#pullrequestreview-3391390772 From duke at openjdk.org Wed Oct 29 01:26:43 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 29 Oct 2025 01:26:43 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v5] In-Reply-To: References: Message-ID: > [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) > > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Add include to fix build issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28008/files - new: https://git.openjdk.org/jdk/pull/28008/files/26bdc3ce..6739c4fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28008&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28008&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28008.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28008/head:pull/28008 PR: https://git.openjdk.org/jdk/pull/28008 From duke at openjdk.org Wed Oct 29 02:52:12 2025 From: duke at openjdk.org (duke) Date: Wed, 29 Oct 2025 02:52:12 GMT Subject: Withdrawn: 8361417: JVMCI getModifiers incorrect for inner classes In-Reply-To: References: Message-ID: <_pOTDXcK2qzlMiJwjuZfM3HXnAM3EvWYTs6_lsuFxd0=.e272a768-bbfa-4dcd-bc53-ce8740307b68@github.com> On Fri, 4 Jul 2025 16:10:22 GMT, Doug Simon wrote: > The result of `ResolvedJavaType.getModifiers()` should always have been the same as `Class.getModifiers()`. This is currently not the case for inner classes. Instead, the value is derived from `Klass::_access_flags` where as it should be derived from the `InnerClasses` attribute (as it is for `Class`). > > This PR aligns `ResolvedJavaType.getModifiers()` with `Class.getModifiers()`. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26135 From wenanjian at openjdk.org Wed Oct 29 03:40:38 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 29 Oct 2025 03:40:38 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v12] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: Update the logic to ensure counter increase time same ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/716825a4..2014a3c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=10-11 Stats: 185 lines in 1 file changed: 61 ins; 121 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Wed Oct 29 03:40:41 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 29 Oct 2025 03:40:41 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v11] In-Reply-To: References: Message-ID: <5iYXi3_8n6rS2hMkJuwa-_LJ4ij8R7EAflF8s5-VNyI=.c5fe8db8-30af-42d8-b4e0-fbeb38ee4832@github.com> On Sat, 18 Oct 2025 11:31:37 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge branch 'openjdk:master' into aes_ctr > - add assertion and change test > - add zbb and zvbb check > - Merge branch 'openjdk:master' into aes_ctr > - Merge branch 'openjdk:master' into aes_ctr > - fix the counter increase at limit and add test > - change format > - update reg use and instruction > - change some name and format > - delete useless Label, change L_judge_used to L_slow_loop > - ... and 2 more: https://git.openjdk.org/jdk/compare/eff6439e...716825a4 @theRealAph @RealFYang I still can not find any suitable way for RVV to increase more than one counter and keep the increase time same. So I change to increase one counter each time and extract a separate function to perform counter increase to ensure that the time for each increment is equal refer to the aarch64 implementation; additionally, I have added some comments and Pseudocode ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3459489205 From duke at openjdk.org Wed Oct 29 04:02:23 2025 From: duke at openjdk.org (duke) Date: Wed, 29 Oct 2025 04:02:23 GMT Subject: Withdrawn: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 19:45:41 GMT, Aleksey Shipilev wrote: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24018 From duke at openjdk.org Wed Oct 29 06:24:46 2025 From: duke at openjdk.org (erifan) Date: Wed, 29 Oct 2025 06:24:46 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: > According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 environment using qemu. > > This PR fixes these test failures. erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8369456-select-from-two-vectors-failure - 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. This test problem was discovered by simulating a 512-bit sve2 environment using qemu. This PR fixes these test failures. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27723/files - new: https://git.openjdk.org/jdk/pull/27723/files/90842d7a..b1025a01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27723&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27723&range=00-01 Stats: 52562 lines in 1216 files changed: 32288 ins; 13926 del; 6348 mod Patch: https://git.openjdk.org/jdk/pull/27723.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27723/head:pull/27723 PR: https://git.openjdk.org/jdk/pull/27723 From wenanjian at openjdk.org Wed Oct 29 07:03:48 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 29 Oct 2025 07:03:48 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v13] In-Reply-To: References: Message-ID: <9NCXWsBW5TTtNLxDqIInodSU-nLiaf86r2dyMtoKklM=.0964bb38-e5cb-499d-a9fc-4efdca0ecfd0@github.com> > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: delete useless reg ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/2014a3c7..4039116c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=11-12 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From xgong at openjdk.org Wed Oct 29 07:58:05 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 29 Oct 2025 07:58:05 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 05:54:02 GMT, Xiaohong Gong wrote: >> @XiaohongGong Thanks for merging, running testing now :) > > Hi @eme64 , I updated a commit to rename the helper matcher function and add some comments, assertion inside the function. Would you mind taking another look at the latest change? Thanks a lot! > @XiaohongGong Thanks for the updates. I left a few more comments. > > And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help. Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3?4 weeks. Would that be okay with you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3460248930 From epeter at openjdk.org Wed Oct 29 08:22:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 08:22:08 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 07:55:45 GMT, Xiaohong Gong wrote: > > @XiaohongGong Thanks for the updates. I left a few more comments. > > And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help. > > Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3?4 weeks. Would that be okay with you? That would be excellent! I'm not trying to rush you, it would just be nice if we could do it in the next months :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3460319722 From xgong at openjdk.org Wed Oct 29 08:27:08 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 29 Oct 2025 08:27:08 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 07:55:45 GMT, Xiaohong Gong wrote: >> Hi @eme64 , I updated a commit to rename the helper matcher function and add some comments, assertion inside the function. Would you mind taking another look at the latest change? Thanks a lot! > >> @XiaohongGong Thanks for the updates. I left a few more comments. >> >> And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help. > > Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3?4 weeks. Would that be okay with you? > > > @XiaohongGong Thanks for the updates. I left a few more comments. > > > And thanks for filing: https://bugs.openjdk.org/browse/JDK-8370666 Are you planning on working on that, or do you know someone else? I could try, but I'm less familiar with all the concepts, and would need a lot of help. > > > > > > Yeah, I'd be glad to work on that in the future, but I have some more urgent tasks to handle right now. I can probably start on it in about 3?4 weeks. Would that be okay with you? > > That would be excellent! I'm not trying to rush you, it would just be nice if we could do it in the next months :) OK, I will try my best starting with it a few weeks later. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3460334852 From epeter at openjdk.org Wed Oct 29 08:34:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 08:34:00 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination Message-ID: **Analysis** We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. **Solution** We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 (the idea is to bail out of the elimination if any of the found stores are mismatched.) **Details** How the bad sequence develops, and which components are involved. 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) 6 ConI === 23 [[ 4 ]] #int:16777216 7 ConI === 23 [[ 4 ]] #int:256 8 ConI === 23 [[ 4 ]] #int:1048576 9 ConL === 23 [[ 4 ]] #long:68719476737 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 If I understand it right, there zero is just a placeholder. And so we get: (rr) p sv->print_fields_on(tty) Fields: 0, 68719476737, 1048576, 256, 16777216 We can see the `zero`, followed by the `ConL`, and then 3 `ConI`. This sequence is then serialized into a stream, and stored in the `nmethod`, as part of the compilation. 3) Once we deopt, we deserialize from the stream, and reconstruct the `sv` (`ObjectValue`). See `rematerialize_objects`. 4) In `Deoptimization::realloc_objects`, we allocate a new array, with `5` elements, because the `sv` has `5` elements. 5) In `Deoptimization::reassign_type_array_elements`, we step through all elements of the `sv`, and fill the values into the array. When we encounter the `[int=0, long=ConL]`, we interpret them as a pair: https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/runtime/deoptimization.cpp#L1378-L1408 We do not store an int zero and then a long, but rather we "remove" the zero, and just store the long. The zero was just a placeholder. And so finally, we arrive with this sequence in the array: (rr) p obj->print_on (tty) [I {0x00000006230b9a10} - klass: {type array int} - flags: is_cloneable_fast - length: 5 - 0: 0x1 1 - 1: 0x10 16 - 2: 0x100000 1048576 - 3: 0x100 256 - 4: 0x1000000 16777216 ------------- Commit messages: - Apply suggestions from code review - more assert adjustment - ignore debug flag - id for tests, and fix up the assert - pass int for short slot - another test - improve test - wip new IR test - fix up asserts - improved comments in test - ... and 4 more: https://git.openjdk.org/jdk/compare/7bb490c4...9114d379 Changes: https://git.openjdk.org/jdk/pull/27997/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27997&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370405 Stats: 357 lines in 5 files changed: 357 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27997.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27997/head:pull/27997 PR: https://git.openjdk.org/jdk/pull/27997 From epeter at openjdk.org Wed Oct 29 08:34:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 08:34:04 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 10:40:18 GMT, Emanuel Peter wrote: > **Analysis** > We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. > During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. > But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. > > Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. > Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. > > **Solution** > We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. > > But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: > https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 > (the idea is to bail out of the elimination if any of the found stores are mismatched.) > > **Details** > > How the bad sequence develops, and which components are involved. > > 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) > > 6 ConI === 23 [[ 4 ]] #int:16777216 > 7 ConI === 23 [[ 4 ]] #int:256 > 8 ConI === 23 [[ 4 ]] #int:1048576 > 9 ConL === 23 [[ 4 ]] #long:68719476737 > 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] > 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) > > > 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: > https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 > If I understand it right, there zero is just a placeholder. > > And so we get: > > (rr) p sv->print_fields_on(tty) > Fields: 0, 68719476737, 1048576, 256, 16777216 > > We can see the `zero`, followed by the `ConL`, and then 3 `ConI`. > > This sequence is then serialized into a stream, and stored in the `nmethod`, as part of the compilation. > > 3) Once we deopt,... src/hotspot/share/opto/macro.cpp line 874: > 872: assert(false, "field_val does not fit field_type"); > 873: } > 874: #endif I'm not yet happy with this assert. It is not super easy to get it right, but currently it is a bit weak. Do reviewers have any good ideas here? test/hotspot/jtreg/compiler/c2/TestMergeStoresAndAllocationElimination.java line 40: > 38: */ > 39: > 40: public class TestMergeStoresAndAllocationElimination { This is the reproducer for the wrong result. test/hotspot/jtreg/compiler/escapeAnalysis/TestRematerializeObjects.java line 47: > 45: /** > 46: * More complicated test cases can be found in {@link TestRematerializeObjectsFuzzing}. > 47: */ Suggestion: test/hotspot/jtreg/compiler/escapeAnalysis/TestRematerializeObjects.java line 48: > 46: * More complicated test cases can be found in {@link TestRematerializeObjectsFuzzing}. > 47: */ > 48: public class TestRematerializeObjects { These tests would not reproduce, but at least they let me check for the success of MergeStores and allocation elimination, and we get some examples that run through `PhaseMacroExpand::create_scalarized_object_description`. test/hotspot/jtreg/compiler/escapeAnalysis/TestRematerializeObjects.java line 52: > 50: public static void main(String[] args) { > 51: TestFramework framework = new TestFramework(TestRematerializeObjects.class); > 52: //framework.addFlags("-XX:-TieredCompilation", "-Xbatch", "-XX:-CICompileOSR"); Suggestion: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2472111485 PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2472116101 PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2472117508 PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2472121111 PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2472118058 From qxing at openjdk.org Wed Oct 29 09:02:16 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 29 Oct 2025 09:02:16 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: <4dxuy7RYykOejdvyiYvsTwivcfnOkhucFp5JZPUbDWU=.e36545ce-18e9-4ec8-a670-02bb99fa569a@github.com> On Mon, 27 Oct 2025 14:40:05 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 3840: >> >>> 3838: // inside any nested loop, then that loop is okay >>> 3839: // E) Otherwise, if an outer loop's ncsfpt on the idom-path is nested in >>> 3840: // an inner loop, we need to prevent the inner loop from deleting it >> >> Nice, that's indeed an improvement :) > > It would be nice to make sure all cases here have an IR test which is not the case AFAICT. Can you open a JBS issue for that? @rwestrel @eme64 Do IR tests in `TestRedundantSafepointElimination.java` in this patch cover all these cases? Specifically: * Case A: `testTopLevelCountedLoop`, `testTopLevelCountedLoopWithDomCall` * Case B: tests containing nested loops * Case C: `testOuterLoopWithDomCall` * Case D: `testOuterLoopWithLocalNonCallSafepoint` * Case E: `testLoopNeedsToPreserveSafepoint` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2472199900 From roland at openjdk.org Wed Oct 29 09:06:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 29 Oct 2025 09:06:41 GMT Subject: RFR: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores [v2] In-Reply-To: References: <2cQ6gXCmY3uz_H3H_Sks0KZbQP4G5V3PRP8QSxiRG6g=.44a63dc4-1078-4762-bef5-ac3e4a995ca3@github.com> Message-ID: On Wed, 8 Oct 2025 11:39:38 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8339526 >> - Update src/hotspot/share/opto/arraycopynode.cpp >> >> Co-authored-by: Christian Hagedorn >> - test & fix > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn @robcasloz thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/27604#issuecomment-3460456986 From roland at openjdk.org Wed Oct 29 09:06:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 29 Oct 2025 09:06:43 GMT Subject: Integrated: 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 10:39:21 GMT, Roland Westrelin wrote: > In the `test1()` method of the test case: > > `inlined2()` calls `clone()` for an object loaded from field `field` > that has inexact type `A` at parse time. The intrinsic for `clone()` > inserts an `Allocate` and an `ArrayCopy` nodes. When igvn runs, the > load of `field` is optimized out because it reads back a newly > allocated `B` written to `field` in the same method. `ArrayCopy` can > now be optimized because the type of its `src` input is known. The > type of its `dest` input is the `CheckCastPP` from the allocation of > the cloned object created at parse time. That one has type `A`. A > series of `Load`s/`Store`s are created to copy the fields of class `B` > from `src` (of type `B`) to `dest` of (type `A`). > > Writting to `dest` with offsets for fields that don't exist in `A`, > causes this code in `Compile::flatten_alias_type()`: > > > } else if (offset < 0 || offset >= ik->layout_helper_size_in_bytes()) { > // Static fields are in the space above the normal instance > // fields in the java.lang.Class instance. > if (ik != ciEnv::current()->Class_klass()) { > to = nullptr; > tj = TypeOopPtr::BOTTOM; > offset = tj->offset(); > } > > > to assign it some slice that doesn't match the one that's used at the > same offset in `B`. > > That causes an assert in `ArrayCopyNode::try_clone_instance()` to > fire. With a release build, execution proceeds. `test1()` also has a > non escaping allocation. That one causes EA to run and > `ConnectionGraph::split_unique_types()` to move the store to the non > escaping allocation to a new slice. In the process, when it iterates > over `MergeMem` nodes, it notices the stores added by > `ArrayCopyNode::try_clone_instance()`, finds that some are not on the > right slice, tries to move them to the correct slice (expecting they > are from a non escaping EA). That causes some of the `Store`s to be > disconnected. When the resulting code runs, execution fails as some > fields are not copied. > > The fix I propose is to skip `ArrayCopyNode::try_clone_instance()` > when `src` and `dest` classes don't match as this seems like a rare > enough corner case. This pull request has now been integrated. Changeset: 5a2b0ca7 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/5a2b0ca7fea7d1a283aa90696c3989ae189148ec Stats: 101 lines in 2 files changed: 101 ins; 0 del; 0 mod 8339526: C2: store incorrectly removed for clone() transformed to series of loads/stores Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/27604 From aseoane at openjdk.org Wed Oct 29 09:13:42 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 29 Oct 2025 09:13:42 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v12] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 09:50:47 GMT, Anton Seoane Ampudia wrote: >> This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. >> >> The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. >> >> However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. >> >> An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. >> >> **Testing:** passes tiers 1-5 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Final renaming touches Thanks to all for the reviews and suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27913#issuecomment-3460487776 From duke at openjdk.org Wed Oct 29 09:13:43 2025 From: duke at openjdk.org (duke) Date: Wed, 29 Oct 2025 09:13:43 GMT Subject: RFR: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) [v12] In-Reply-To: References: Message-ID: <5qPYze8djM2BRQeWv29Wq1n9T5CRs-fCQvpuI0lZUBI=.04fe36fd-0ecd-4bd6-af64-04992a23053b@github.com> On Tue, 28 Oct 2025 09:50:47 GMT, Anton Seoane Ampudia wrote: >> This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. >> >> The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. >> >> However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. >> >> An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. >> >> **Testing:** passes tiers 1-5 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Final renaming touches @anton-seoane Your change (at version 72202ad633c0e8b54b6df6f688f92b39bd7f780f) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27913#issuecomment-3460497759 From rcastanedalo at openjdk.org Wed Oct 29 09:21:48 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 29 Oct 2025 09:21:48 GMT Subject: RFR: 8370853: IGV: SEGV in IdealGraphPrinter::print after JDK-8370569 Message-ID: This (trivial?) changeset initializes `IdealGraphPrinter::_parse` to `nullptr` on construction. This prevents segmentation faults when this field is accessed in: https://github.com/openjdk/jdk/blob/20bcf0eddaee0a57142bcc614cc5415b53c16460/src/hotspot/share/opto/idealGraphPrinter.cpp#L1007-L1017 This failure is triggered when IGV graph dumping is run with a print level lower than 6 on platforms where `IdealGraphPrinter::_parse` is not implicitly initialized to `nullptr`. On print level 6, `IdealGraphPrinter::_parse` is initialized by https://github.com/openjdk/jdk/blob/5a2b0ca7fea7d1a283aa90696c3989ae189148ec/src/hotspot/share/opto/parse2.cpp#L2785 before it is accessed. **Testing:** tier1-3 (including the affected test `TestVectorInsertByte.java` on the affected platform `macosx-aarch64-debug`). ------------- Commit messages: - Explicitly initialize IdealGraphPrinter::_parse to nullptr Changes: https://git.openjdk.org/jdk/pull/28040/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28040&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370853 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28040/head:pull/28040 PR: https://git.openjdk.org/jdk/pull/28040 From epeter at openjdk.org Wed Oct 29 09:22:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 09:22:41 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v3] In-Reply-To: References: Message-ID: On Wed, 15 Oct 2025 16:15:05 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - remove std::hash > - remove unordered_map, add some comments for all_instances_size @merykitty Thanks for working on this! Especially I'm happy with the extra gtest-ing that we are now able to do on the types. This optimization will be the entry point for many KnownBits optimizations, that is exciting! This still needs a second thorough review though, since it is not trivial ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27618#pullrequestreview-3392347802 From shade at openjdk.org Wed Oct 29 09:27:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 Oct 2025 09:27:42 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v5] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 01:26:43 GMT, Chad Rakoczy wrote: >> [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) >> >> [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add include to fix build issue Looks reasonable to me, thanks. FWIW, I was happy with the (simpler) previous version of the patch, and was content with doing this refactoring later. Maybe split them out, if you want to spend more time on this? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28008#pullrequestreview-3392367891 From dfenacci at openjdk.org Wed Oct 29 09:32:40 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 29 Oct 2025 09:32:40 GMT Subject: RFR: 8370853: IGV: SEGV in IdealGraphPrinter::print after JDK-8370569 In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:12:16 GMT, Roberto Casta?eda Lozano wrote: > This (trivial?) changeset initializes `IdealGraphPrinter::_parse` to `nullptr` on construction. This prevents segmentation faults when this field is accessed in: https://github.com/openjdk/jdk/blob/20bcf0eddaee0a57142bcc614cc5415b53c16460/src/hotspot/share/opto/idealGraphPrinter.cpp#L1007-L1017 > This failure is triggered when IGV graph dumping is run with a print level lower than 6 on platforms where `IdealGraphPrinter::_parse` is not implicitly initialized to `nullptr`. On print level 6, `IdealGraphPrinter::_parse` is initialized by https://github.com/openjdk/jdk/blob/5a2b0ca7fea7d1a283aa90696c3989ae189148ec/src/hotspot/share/opto/parse2.cpp#L2785 before it is accessed. > > **Testing:** tier1-3 (including the affected test `TestVectorInsertByte.java` on the affected platform `macosx-aarch64-debug`). LGTM. Thanks for fixing it @robcasloz ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28040#pullrequestreview-3392386384 From rcastanedalo at openjdk.org Wed Oct 29 09:32:41 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 29 Oct 2025 09:32:41 GMT Subject: RFR: 8370853: IGV: SEGV in IdealGraphPrinter::print after JDK-8370569 In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:12:16 GMT, Roberto Casta?eda Lozano wrote: > This (trivial?) changeset initializes `IdealGraphPrinter::_parse` to `nullptr` on construction. This prevents segmentation faults when this field is accessed in: https://github.com/openjdk/jdk/blob/20bcf0eddaee0a57142bcc614cc5415b53c16460/src/hotspot/share/opto/idealGraphPrinter.cpp#L1007-L1017 > This failure is triggered when IGV graph dumping is run with a print level lower than 6 on platforms where `IdealGraphPrinter::_parse` is not implicitly initialized to `nullptr`. On print level 6, `IdealGraphPrinter::_parse` is initialized by https://github.com/openjdk/jdk/blob/5a2b0ca7fea7d1a283aa90696c3989ae189148ec/src/hotspot/share/opto/parse2.cpp#L2785 before it is accessed. > > **Testing:** tier1-3 (including the affected test `TestVectorInsertByte.java` on the affected platform `macosx-aarch64-debug`). This failure illustrates the need for basic regression tests that exercise IGV graph dumping. I reported an RFE for adding them: [JDK-8370870](https://bugs.openjdk.org/browse/JDK-8370870). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28040#issuecomment-3460565666 From rcastanedalo at openjdk.org Wed Oct 29 09:32:41 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 29 Oct 2025 09:32:41 GMT Subject: RFR: 8370853: IGV: SEGV in IdealGraphPrinter::print after JDK-8370569 In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:29:52 GMT, Roberto Casta?eda Lozano wrote: >> This (trivial?) changeset initializes `IdealGraphPrinter::_parse` to `nullptr` on construction. This prevents segmentation faults when this field is accessed in: https://github.com/openjdk/jdk/blob/20bcf0eddaee0a57142bcc614cc5415b53c16460/src/hotspot/share/opto/idealGraphPrinter.cpp#L1007-L1017 >> This failure is triggered when IGV graph dumping is run with a print level lower than 6 on platforms where `IdealGraphPrinter::_parse` is not implicitly initialized to `nullptr`. On print level 6, `IdealGraphPrinter::_parse` is initialized by https://github.com/openjdk/jdk/blob/5a2b0ca7fea7d1a283aa90696c3989ae189148ec/src/hotspot/share/opto/parse2.cpp#L2785 before it is accessed. >> >> **Testing:** tier1-3 (including the affected test `TestVectorInsertByte.java` on the affected platform `macosx-aarch64-debug`). > > This failure illustrates the need for basic regression tests that exercise IGV graph dumping. I reported an RFE for adding them: [JDK-8370870](https://bugs.openjdk.org/browse/JDK-8370870). > LGTM. Thanks for fixing it @robcasloz Thanks for reviewing, Damon! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28040#issuecomment-3460567084 From aseoane at openjdk.org Wed Oct 29 09:40:20 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 29 Oct 2025 09:40:20 GMT Subject: Integrated: 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) In-Reply-To: References: Message-ID: <5ST5jacnFXvYNEnb-LpeXBsjEO0-DGF8JiXFYFBHlnk=.c32a7ef3-4d25-4e98-af83-8c67c72f87d7@github.com> On Tue, 21 Oct 2025 09:19:22 GMT, Anton Seoane Ampudia wrote: > This PR introduces a fix for a intermittent assert crash due to a non-oop found in the stack when deoptimizing. > > The `inline_native_GetEventWriter` JFR intrinsic performs a call into the runtime, which can safepoint, to write a checkpoint for the vthread. This call returns a global handle (`jobject`) that then gets resolved to a raw oop. > > However, the corresponding `jfr_write_checkpoint_Type` does not set any return, modelling the call as `void`. If a safepoint hits in the small window after the stub returns but before the writer oop is used, and the GC moves the object in that window, the deoptimization path cannot resolve a handle that it never recorded, leading to the subsequent crash. > > An IR Framework test is introduced to exercise the error explicitly. Additionally, related documentation in form of comments in the appropriate file (`runtime.hpp`) is added to hopefully prevent similar cases in the future. > > **Testing:** passes tiers 1-5 This pull request has now been integrated. Changeset: 8457f38f Author: Anton Seoane Ampudia Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/8457f38f14182e2a55ff5d243cdacb06c9003c49 Stats: 76 lines in 3 files changed: 74 ins; 0 del; 2 mod 8347463: jdk/jfr/threading/TestManyVirtualThreads.java crashes with assert(oopDesc::is_oop_or_null(val)) Reviewed-by: dlong, rcastanedalo, mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/27913 From aseoane at openjdk.org Wed Oct 29 09:41:16 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 29 Oct 2025 09:41:16 GMT Subject: RFR: 8370853: IGV: SEGV in IdealGraphPrinter::print after JDK-8370569 In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:12:16 GMT, Roberto Casta?eda Lozano wrote: > This (trivial?) changeset initializes `IdealGraphPrinter::_parse` to `nullptr` on construction. This prevents segmentation faults when this field is accessed in: https://github.com/openjdk/jdk/blob/20bcf0eddaee0a57142bcc614cc5415b53c16460/src/hotspot/share/opto/idealGraphPrinter.cpp#L1007-L1017 > This failure is triggered when IGV graph dumping is run with a print level lower than 6 on platforms where `IdealGraphPrinter::_parse` is not implicitly initialized to `nullptr`. On print level 6, `IdealGraphPrinter::_parse` is initialized by https://github.com/openjdk/jdk/blob/5a2b0ca7fea7d1a283aa90696c3989ae189148ec/src/hotspot/share/opto/parse2.cpp#L2785 before it is accessed. > > **Testing:** tier1-3 (including the affected test `TestVectorInsertByte.java` on the affected platform `macosx-aarch64-debug`). Looks good to me too! ------------- Marked as reviewed by aseoane (Author). PR Review: https://git.openjdk.org/jdk/pull/28040#pullrequestreview-3392422662 From qxing at openjdk.org Wed Oct 29 09:45:04 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 29 Oct 2025 09:45:04 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v17] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: Make code more compact ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25928/files - new: https://git.openjdk.org/jdk/pull/25928/files/9bb3f7d7..092d968d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=15-16 Stats: 49 lines in 1 file changed: 10 ins; 29 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From epeter at openjdk.org Wed Oct 29 09:45:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 09:45:05 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v17] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:41:18 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Make code more compact Nice, even better :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3392417607 From qxing at openjdk.org Wed Oct 29 09:45:06 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 29 Oct 2025 09:45:06 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com> References: <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com> Message-ID: On Tue, 9 Sep 2025 08:40:35 GMT, Emanuel Peter wrote: >> Hi @jatin-bhateja, I've added a micro benchmark that includes the `numberOfNibbles` implementation from this PR description and your micro kernel. >> >> Here's my test results on an Intel(R) Xeon(R) Platinum: >> >> >> # Baseline: >> Benchmark Mode Cnt Score Error Units >> CountLeadingZeros.benchClzLongConstrained avgt 15 1517.888 ? 5.691 ns/op >> CountLeadingZeros.benchNumberOfNibbles avgt 15 1094.422 ? 1.753 ns/op >> >> # This patch: >> Benchmark Mode Cnt Score Error Units >> CountLeadingZeros.benchClzLongConstrained avgt 15 0.948 ? 0.002 ns/op >> CountLeadingZeros.benchNumberOfNibbles avgt 15 942.438 ? 1.742 ns/op > > @MaxXSoft Feel free to just ping me again when you want another review :) > FYI: I'll be on a longer vacation starting in about a week, so don't expect me to respond then. @eme64 Thank you for the review! @merykitty @jatin-bhateja Do you have any other suggestions regarding the latest changes in this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3460610149 From qxing at openjdk.org Wed Oct 29 09:45:08 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 29 Oct 2025 09:45:08 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v16] In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 15:53:50 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix include order > > test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 84: > >> 82: LIMITS_64_5 = INTS_64.next(); >> 83: LIMITS_64_6 = INTS_64.next(); >> 84: LIMITS_64_7 = INTS_64.next(); > > Why not assign them directly? You just need to declare the generators first. Would save us a couple of lines. Updated ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2472302364 From mhaessig at openjdk.org Wed Oct 29 09:47:34 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 29 Oct 2025 09:47:34 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 10:40:18 GMT, Emanuel Peter wrote: > Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 > > **Analysis** > We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. > During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. > But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. > > Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. > Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. > > **Solution** > We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. > > But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: > https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 > (the idea is to bail out of the elimination if any of the found stores are mismatched.) > > **Details** > > How the bad sequence develops, and which components are involved. > > 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) > > 6 ConI === 23 [[ 4 ]] #int:16777216 > 7 ConI === 23 [[ 4 ]] #int:256 > 8 ConI === 23 [[ 4 ]] #int:1048576 > 9 ConL === 23 [[ 4 ]] #long:68719476737 > 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] > 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) > > > 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: > https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 > If I understand it right, there zero is just a placeholder. > > And so we get: > > (rr) p sv->print_fields_on(tty) > Fields: 0, 68719476737, 1048576, 256, 16777216 > > We can see the `zero`, followed by the `ConL`, and then 3 `ConI`. > > This se... Thank you for fixing this and providing such a detailed explanation in the reproducer! If I understand correctly, this fix prevents mismatched accesses to occur when rematerializing rather than fixing rematerialization to handle mismatched accesses and the assert is to find cases that still trigger the incorrect corner case of rematerialization? This begs the naive question: How much harder is it to fix rematerialization to handle mismatched accesses? ------------- PR Review: https://git.openjdk.org/jdk/pull/27997#pullrequestreview-3392451330 From rehn at openjdk.org Wed Oct 29 09:48:16 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 29 Oct 2025 09:48:16 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v5] In-Reply-To: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: > Hi, please consider. > > Sanity tested and no issues with MAJIK t1 (with +VSC). > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into vsc - Forgot fix format for VSAC - Fixed format - Label name - li->mv, format, space - Draft ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28005/files - new: https://git.openjdk.org/jdk/pull/28005/files/0dfed06f..00bd0deb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=03-04 Stats: 5529 lines in 237 files changed: 2986 ins; 1718 del; 825 mod Patch: https://git.openjdk.org/jdk/pull/28005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28005/head:pull/28005 PR: https://git.openjdk.org/jdk/pull/28005 From rehn at openjdk.org Wed Oct 29 09:48:18 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 29 Oct 2025 09:48:18 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v4] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: <7LWvO7xFVb8ZchKYPXI5Bwr3jowqCs2CGLhcMMuBKMc=.b9abe444-c304-4b46-9f00-ad96e20cd879@github.com> On Tue, 28 Oct 2025 17:59:37 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/riscv.ad line 1375: >> >>> 1373: if (VerifyStackAtCalls) { >>> 1374: st->print("mv t2, %ld\n\t", MAJIK_DWORD); >>> 1375: st->print("sd t2, [sp, #%d]\n\t", - 3 * wordSize); >> >> Thanks for the update. You might want to change this into `st->print("sd t2, [sp, #%d]\n\t", framesize - 3 * wordSize);` at the same time. BTW: My local `hs:tier1` with `-XX:+VerifyStackAtCalls` using fastdebug build is good. > > Oh, did I miss that, sorry! Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2472330996 From duke at openjdk.org Wed Oct 29 09:50:51 2025 From: duke at openjdk.org (erifan) Date: Wed, 29 Oct 2025 09:50:51 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 06:24:46 GMT, erifan wrote: >> According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. >> >> This test problem was discovered by simulating a 512-bit sve2 environment using qemu. >> >> This PR fixes these test failures. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8369456-select-from-two-vectors-failure > - 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms > > According the AD file, partial cases where `vector_length_in_bytes > 8` > of the vector API `selectFrom` are not supported on the AArch64 SVE2 > platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out > these cases, leading to test faiulres on sve2 plaftforms where > `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 > environment using qemu. > > This PR fixes these test failures. Hi, could anyone help take a look at this PR, it's a simple test bug fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27723#issuecomment-3460633323 From epeter at openjdk.org Wed Oct 29 09:52:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 09:52:51 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:44:43 GMT, Manuel H?ssig wrote: >> Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 >> >> **Analysis** >> We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. >> During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. >> But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. >> >> Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. >> Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. >> >> **Solution** >> We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. >> >> But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: >> https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 >> (the idea is to bail out of the elimination if any of the found stores are mismatched.) >> >> **Details** >> >> How the bad sequence develops, and which components are involved. >> >> 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) >> >> 6 ConI === 23 [[ 4 ]] #int:16777216 >> 7 ConI === 23 [[ 4 ]] #int:256 >> 8 ConI === 23 [[ 4 ]] #int:1048576 >> 9 ConL === 23 [[ 4 ]] #long:68719476737 >> 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] >> 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) >> >> >> 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: >> https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 >> If I understand it right, there zero is just a placeholder. >> >> And so we get: >> >> (rr) p sv->print_fields_on(tty) >> Fields: 0, 68719476737, 1048576, 256, 167772... > > Thank you for fixing this and providing such a detailed explanation in the reproducer! > > If I understand correctly, this fix prevents mismatched accesses to occur when rematerializing rather than fixing rematerialization to handle mismatched accesses and the assert is to find cases that still trigger the incorrect corner case of rematerialization? > This begs the naive question: How much harder is it to fix rematerialization to handle mismatched accesses? @mhaessig Yes, correct. This is just a fix, so bailout is better than adding more capabilities. We can do that in a future RFE. I think we should be able to handle some mismatched cases. At least those where we have "reasonable alignment": - `int[]` where we have a `mismatched StoreL` that covers 2 int elements exactly -> low hanging fruit. - `int[]` with a `mismatched StoreB` -> tricky, because we would need to somehow splice together values. I fear there will be a lot of edge-cases quickly, and a lot of potential to get wrong executions if we get it wrong ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3460642090 From haosun at openjdk.org Wed Oct 29 09:56:52 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 29 Oct 2025 09:56:52 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 06:24:46 GMT, erifan wrote: >> According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. >> >> This test problem was discovered by simulating a 512-bit sve2 environment using qemu. >> >> This PR fixes these test failures. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8369456-select-from-two-vectors-failure > - 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms > > According the AD file, partial cases where `vector_length_in_bytes > 8` > of the vector API `selectFrom` are not supported on the AArch64 SVE2 > platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out > these cases, leading to test faiulres on sve2 plaftforms where > `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 > environment using qemu. > > This PR fixes these test failures. LGTM. This patch is reviewed and tested internally. ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/27723#pullrequestreview-3392486550 From epeter at openjdk.org Wed Oct 29 10:01:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 10:01:13 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 08:25:18 GMT, Emanuel Peter wrote: >> Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 >> >> **Analysis** >> We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. >> During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. >> But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. >> >> Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. >> Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. >> >> **Solution** >> We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. >> >> But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: >> https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 >> (the idea is to bail out of the elimination if any of the found stores are mismatched.) >> >> **Details** >> >> How the bad sequence develops, and which components are involved. >> >> 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) >> >> 6 ConI === 23 [[ 4 ]] #int:16777216 >> 7 ConI === 23 [[ 4 ]] #int:256 >> 8 ConI === 23 [[ 4 ]] #int:1048576 >> 9 ConL === 23 [[ 4 ]] #long:68719476737 >> 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] >> 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) >> >> >> 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: >> https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 >> If I understand it right, there zero is just a placeholder. >> >> And so we get: >> >> (rr) p sv->print_fields_on(tty) >> Fields: 0, 68719476737, 1048576, 256, 167772... > > src/hotspot/share/opto/macro.cpp line 874: > >> 872: assert(false, "field_val does not fit field_type"); >> 873: } >> 874: #endif > > I'm not yet happy with this assert. It is not super easy to get it right, but currently it is a bit weak. > > Do reviewers have any good ideas here? I can make the assert strong for primitive types, and that is where the bug happened. I tried to make it work for pointers too, but I got this example: value_type: java/lang/Object:NotNull * field_type: java/lang/Object (java/util/Enumeration) * It happens in `ClassLoader.getResources` (about line 1445). We seem to get back the `value_type` from a nested call to `parent.getResources(name);`, but we don't seem to capture that this has the `Enumeration` interface. But the `field_type` (store to `tmp[0]`) knows about that, and so it is a "narrower" type. Is this expected? - If yes: can I even write an assert here? - If no: is this something we need/should fix? 1436 public Enumeration getResources(String name) throws IOException { 1437 Objects.requireNonNull(name); 1438 @SuppressWarnings("unchecked") 1439 Enumeration[] tmp = (Enumeration[]) new Enumeration[2]; 1440 if (parent != null) { 1441 tmp[0] = parent.getResources(name); <------ look here 1442 } else { 1443 tmp[0] = BootLoader.findResources(name); 1444 } 1445 tmp[1] = findResources(name); 1446 1447 return new CompoundEnumeration<>(tmp); 1448 } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2472374332 From epeter at openjdk.org Wed Oct 29 10:09:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 10:09:37 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 06:24:46 GMT, erifan wrote: >> According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. >> >> This test problem was discovered by simulating a 512-bit sve2 environment using qemu. >> >> This PR fixes these test failures. > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8369456-select-from-two-vectors-failure > - 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms > > According the AD file, partial cases where `vector_length_in_bytes > 8` > of the vector API `selectFrom` are not supported on the AArch64 SVE2 > platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out > these cases, leading to test faiulres on sve2 plaftforms where > `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 > environment using qemu. > > This PR fixes these test failures. Drive-by comment. test/hotspot/jtreg/compiler/vectorapi/TestSelectFromTwoVectorOp.java line 221: > 219: @IR(counts = {IRNode.SELECT_FROM_TWO_VECTOR_VB, IRNode.VECTOR_SIZE_64, ">0"}, > 220: applyIfCPUFeature = {"sve2", "true"}, > 221: applyIf = {"MaxVectorSize", "64"}) Would it make sense to add some IR rule for cases with `MaxVectorSize > 64`? Because now you just weakened the test, rather than ensuring that there is a test for larger sizes. ------------- PR Review: https://git.openjdk.org/jdk/pull/27723#pullrequestreview-3392525780 PR Review Comment: https://git.openjdk.org/jdk/pull/27723#discussion_r2472393201 From epeter at openjdk.org Wed Oct 29 10:09:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 10:09:38 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 10:05:05 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8369456-select-from-two-vectors-failure >> - 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms >> >> According the AD file, partial cases where `vector_length_in_bytes > 8` >> of the vector API `selectFrom` are not supported on the AArch64 SVE2 >> platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out >> these cases, leading to test faiulres on sve2 plaftforms where >> `MaxVectorSize > 16`. >> >> This test problem was discovered by simulating a 512-bit sve2 >> environment using qemu. >> >> This PR fixes these test failures. > > test/hotspot/jtreg/compiler/vectorapi/TestSelectFromTwoVectorOp.java line 221: > >> 219: @IR(counts = {IRNode.SELECT_FROM_TWO_VECTOR_VB, IRNode.VECTOR_SIZE_64, ">0"}, >> 220: applyIfCPUFeature = {"sve2", "true"}, >> 221: applyIf = {"MaxVectorSize", "64"}) > > Would it make sense to add some IR rule for cases with `MaxVectorSize > 64`? Because now you just weakened the test, rather than ensuring that there is a test for larger sizes. Maybe it would be enough to just remove the `, IRNode.VECTOR_SIZE_64`, so that the test could check for the largest vector length available on the platform? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27723#discussion_r2472395771 From eastigeevich at openjdk.org Wed Oct 29 10:11:41 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 29 Oct 2025 10:11:41 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v5] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 01:26:43 GMT, Chad Rakoczy wrote: >> [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) >> >> [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add include to fix build issue lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/28008#pullrequestreview-3392537437 From rcastanedalo at openjdk.org Wed Oct 29 10:11:48 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 29 Oct 2025 10:11:48 GMT Subject: RFR: 8370853: IGV: SEGV in IdealGraphPrinter::print after JDK-8370569 In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:37:55 GMT, Anton Seoane Ampudia wrote: > Looks good to me too! Thanks for reviewing, Ant?n. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28040#issuecomment-3460708904 From xgong at openjdk.org Wed Oct 29 10:13:03 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 29 Oct 2025 10:13:03 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v5] In-Reply-To: References: Message-ID: > The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. > > For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. > > These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. > > This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. > > It also modifies the Vector API jtreg tests for well testing. Here is the details: > > 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: > > VectorMaskToLong (VectorLongToMask l) => l > > Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. > > 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". > > Performance shows significant improvement on NVIDIA's Grace CPU. > > Here is the performance data with `-XX:UseSVE=2`: > > Benchmark bits inputs Mode Unit Before After Gain > MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 > MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 > MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 > MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 > MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/ms 101031... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Update comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27481/files - new: https://git.openjdk.org/jdk/pull/27481/files/3a40fc2a..40c2df04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=03-04 Stats: 26 lines in 3 files changed: 1 ins; 2 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/27481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27481/head:pull/27481 PR: https://git.openjdk.org/jdk/pull/27481 From xgong at openjdk.org Wed Oct 29 10:19:02 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 29 Oct 2025 10:19:02 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Tue, 28 Oct 2025 10:27:39 GMT, Emanuel Peter wrote: >> Well, we went with `prefers` because you said that on `aarch64` both are implemented, see our conversation above. So we are now spinning in circles. >> >> I would approach it like this: >> Write down what it means if the method returns true, and what it means if it returns false. Make sure to use `requires`, if anything else is not permitted/implemented. Use `prefers` if both are permitted/implemented, but one is preferred. > > Another idea: use a return `Enum`. Then you can give things names, which can sometimes be more helpful than `true/false`. Hi @eme64 , I updated a commit which mainly changes the comments. The function name `mask_op_prefers_predicate` remains unchanged. After giving it careful thought overnight, I believe this name is more accurate. I?m sorry if my earlier explanation caused any confusion. Would you mind checking whether it's fine to you? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2472425982 From thartmann at openjdk.org Wed Oct 29 10:22:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 29 Oct 2025 10:22:10 GMT Subject: RFR: 8370853: IGV: SEGV in IdealGraphPrinter::print after JDK-8370569 In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:12:16 GMT, Roberto Casta?eda Lozano wrote: > This (trivial?) changeset initializes `IdealGraphPrinter::_parse` to `nullptr` on construction. This prevents segmentation faults when this field is accessed in: https://github.com/openjdk/jdk/blob/20bcf0eddaee0a57142bcc614cc5415b53c16460/src/hotspot/share/opto/idealGraphPrinter.cpp#L1007-L1017 > This failure is triggered when IGV graph dumping is run with a print level lower than 6 on platforms where `IdealGraphPrinter::_parse` is not implicitly initialized to `nullptr`. On print level 6, `IdealGraphPrinter::_parse` is initialized by https://github.com/openjdk/jdk/blob/5a2b0ca7fea7d1a283aa90696c3989ae189148ec/src/hotspot/share/opto/parse2.cpp#L2785 before it is accessed. > > **Testing:** tier1-3 (including the affected test `TestVectorInsertByte.java` on the affected platform `macosx-aarch64-debug`). Good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28040#pullrequestreview-3392580597 From thartmann at openjdk.org Wed Oct 29 10:25:21 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 29 Oct 2025 10:25:21 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 10:40:18 GMT, Emanuel Peter wrote: > Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 > > **Analysis** > We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. > During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. > But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. > > Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. > Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. > > **Solution** > We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. > > But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: > https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 > (the idea is to bail out of the elimination if any of the found stores are mismatched.) > > **Details** > > How the bad sequence develops, and which components are involved. > > 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) > > 6 ConI === 23 [[ 4 ]] #int:16777216 > 7 ConI === 23 [[ 4 ]] #int:256 > 8 ConI === 23 [[ 4 ]] #int:1048576 > 9 ConL === 23 [[ 4 ]] #long:68719476737 > 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] > 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) > > > 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: > https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 > If I understand it right, there zero is just a placeholder. > > And so we get: > > (rr) p sv->print_fields_on(tty) > Fields: 0, 68719476737, 1048576, 256, 16777216 > > We can see the `zero`, followed by the `ConL`, and then 3 `ConI`. > > This se... src/hotspot/share/runtime/deoptimization.cpp line 1393: > 1391: tty->print_cr("Deopt rematerialization found [int, long] in a int/flat array."); > 1392: sv->print_fields_on(tty); > 1393: assert(false, "never hit this case in testing, seems to be a strange case"); Looks like this code came from https://openjdk.org/jeps/243, so it's worth double-checking with the Graal team if it's still needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2472443012 From epeter at openjdk.org Wed Oct 29 10:30:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 10:30:34 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 10:22:17 GMT, Tobias Hartmann wrote: >> Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 >> >> **Analysis** >> We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. >> During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. >> But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. >> >> Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. >> Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. >> >> **Solution** >> We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. >> >> But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: >> https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 >> (the idea is to bail out of the elimination if any of the found stores are mismatched.) >> >> **Details** >> >> How the bad sequence develops, and which components are involved. >> >> 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) >> >> 6 ConI === 23 [[ 4 ]] #int:16777216 >> 7 ConI === 23 [[ 4 ]] #int:256 >> 8 ConI === 23 [[ 4 ]] #int:1048576 >> 9 ConL === 23 [[ 4 ]] #long:68719476737 >> 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] >> 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) >> >> >> 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: >> https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 >> If I understand it right, there zero is just a placeholder. >> >> And so we get: >> >> (rr) p sv->print_fields_on(tty) >> Fields: 0, 68719476737, 1048576, 256, 167772... > > src/hotspot/share/runtime/deoptimization.cpp line 1393: > >> 1391: tty->print_cr("Deopt rematerialization found [int, long] in a int/flat array."); >> 1392: sv->print_fields_on(tty); >> 1393: assert(false, "never hit this case in testing, seems to be a strange case"); > > Looks like this code came from https://openjdk.org/jeps/243, so it's worth double-checking with the Graal team if it's still needed. Right. I though I would just put an assert here, and if none of our testing fails, we can eventually remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2472457736 From rcastanedalo at openjdk.org Wed Oct 29 10:52:34 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 29 Oct 2025 10:52:34 GMT Subject: RFR: 8370853: IGV: SEGV in IdealGraphPrinter::print after JDK-8370569 In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 10:19:41 GMT, Tobias Hartmann wrote: > Good and trivial. Thanks, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28040#issuecomment-3460880431 From rcastanedalo at openjdk.org Wed Oct 29 10:52:35 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 29 Oct 2025 10:52:35 GMT Subject: Integrated: 8370853: IGV: SEGV in IdealGraphPrinter::print after JDK-8370569 In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:12:16 GMT, Roberto Casta?eda Lozano wrote: > This (trivial?) changeset initializes `IdealGraphPrinter::_parse` to `nullptr` on construction. This prevents segmentation faults when this field is accessed in: https://github.com/openjdk/jdk/blob/20bcf0eddaee0a57142bcc614cc5415b53c16460/src/hotspot/share/opto/idealGraphPrinter.cpp#L1007-L1017 > This failure is triggered when IGV graph dumping is run with a print level lower than 6 on platforms where `IdealGraphPrinter::_parse` is not implicitly initialized to `nullptr`. On print level 6, `IdealGraphPrinter::_parse` is initialized by https://github.com/openjdk/jdk/blob/5a2b0ca7fea7d1a283aa90696c3989ae189148ec/src/hotspot/share/opto/parse2.cpp#L2785 before it is accessed. > > **Testing:** tier1-3 (including the affected test `TestVectorInsertByte.java` on the affected platform `macosx-aarch64-debug`). This pull request has now been integrated. Changeset: 05ef8f46 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/05ef8f4611fb9908f40ed8944da3429acdf82ef5 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8370853: IGV: SEGV in IdealGraphPrinter::print after JDK-8370569 Reviewed-by: dfenacci, aseoane, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/28040 From fyang at openjdk.org Wed Oct 29 10:56:32 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 29 Oct 2025 10:56:32 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v5] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Wed, 29 Oct 2025 09:48:16 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Sanity tested and no issues with MAJIK t1 (with +VSC). >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into vsc > - Forgot fix format for VSAC > - Fixed format > - Label name > - li->mv, format, space > - Draft Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28005#pullrequestreview-3392721834 From qamai at openjdk.org Wed Oct 29 12:39:54 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 29 Oct 2025 12:39:54 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> References: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> Message-ID: <0P3YPFcgQ47dmo5VIsAoXRn8ZKoWK0SpMD33Gr9I6Ro=.0a0ebe06-e76a-431d-9296-9df9c0f38d30@github.com> On Mon, 27 Oct 2025 07:32:43 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing redundant interferecne check from biasing Thanks for working on this, I think this patch looks good. src/hotspot/share/opto/chaitin.cpp line 1497: > 1495: tempmask.clear_to_sets(lrg.num_regs()); > 1496: OptoReg::Name reg = find_first_set(lrg, tempmask); > 1497: if (OptoReg::is_valid(reg)) Style: should have `{ }` for this `if`. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3393116699 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2472850708 From epeter at openjdk.org Wed Oct 29 12:48:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 12:48:15 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 10:27:26 GMT, Emanuel Peter wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 1393: >> >>> 1391: tty->print_cr("Deopt rematerialization found [int, long] in a int/flat array."); >>> 1392: sv->print_fields_on(tty); >>> 1393: assert(false, "never hit this case in testing, seems to be a strange case"); >> >> Looks like this code came from https://openjdk.org/jeps/243, so it's worth double-checking with the Graal team if it's still needed. > > Right. I though I would just put an assert here, and if none of our testing fails, we can eventually remove it. @dougxc It seems in all our testing this assert does not fail. Do you think it may impact Graal? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2472885290 From epeter at openjdk.org Wed Oct 29 13:45:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 13:45:15 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> References: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> Message-ID: <6vTvleAaqmzZLssla5DZcGE9B6CP9IdW4iAe9ea82vQ=.5eedfc50-9748-4f8e-804a-d30104f3087b@github.com> On Mon, 27 Oct 2025 07:32:43 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing redundant interferecne check from biasing src/hotspot/share/opto/chaitin.cpp line 1668: > 1666: > 1667: auto is_commutative_oper = [](MachNode* def) { > 1668: if (def) { Fly-by comment: Suggestion: if (def != nullptr) { Hotspot style guide does not allow implicit null check ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2473146806 From dlunden at openjdk.org Wed Oct 29 14:11:14 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 29 Oct 2025 14:11:14 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> References: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> Message-ID: On Mon, 27 Oct 2025 07:32:43 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing redundant interferecne check from biasing I'd like to have a look at this as well before it is integrated! Reviewing now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3461738432 From dnsimon at openjdk.org Wed Oct 29 14:27:24 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 29 Oct 2025 14:27:24 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 12:45:54 GMT, Emanuel Peter wrote: >> Right. I though I would just put an assert here, and if none of our testing fails, we can eventually remove it. > > @dougxc It seems in all our testing this assert does not fail. Do you think it may impact Graal? This is to support Truffle where `long` and `double` fields can be encoded in `int[]` arrays. It's a bit like https://bugs.openjdk.org/browse/JDK-8231756 where fields are encoded in `byte[]` arrays. @tkrodriguez or @woess can you please confirm we still need this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2473429050 From epeter at openjdk.org Wed Oct 29 14:33:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 14:33:01 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 14:24:51 GMT, Doug Simon wrote: >> @dougxc It seems in all our testing this assert does not fail. Do you think it may impact Graal? > > This is to support Truffle where `long` and `double` fields can be encoded in `int[]` arrays. It's a bit like https://bugs.openjdk.org/browse/JDK-8231756 where fields are encoded in `byte[]` arrays. @tkrodriguez or @woess can you please confirm we still need this. @dougxc @tkrodriguez @woess Can we guard some of the logic in `#if INCLUDE_JVMCI` though? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2473463243 From epeter at openjdk.org Wed Oct 29 14:38:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 14:38:14 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:58:14 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/macro.cpp line 874: >> >>> 872: assert(false, "field_val does not fit field_type"); >>> 873: } >>> 874: #endif >> >> I'm not yet happy with this assert. It is not super easy to get it right, but currently it is a bit weak. >> >> Do reviewers have any good ideas here? > > I can make the assert strong for primitive types, and that is where the bug happened. > > I tried to make it work for pointers too, but I got this example: > > value_type: java/lang/Object:NotNull * > field_type: java/lang/Object (java/util/Enumeration) * > > It happens in `ClassLoader.getResources` (about line 1445). > We seem to get back the `value_type = field_val->bottom_type()` from a nested call to `parent.getResources(name);`, but we don't seem to capture that this has the `Enumeration` interface. > But the `field_type` (store to `tmp[0]`) knows about that, and so it is a "narrower" type. > Is this expected? > - If yes: can I even write an assert here? > - If no: is this something we need/should fix? > > > 1436 public Enumeration getResources(String name) throws IOException { > 1437 Objects.requireNonNull(name); > 1438 @SuppressWarnings("unchecked") > 1439 Enumeration[] tmp = (Enumeration[]) new Enumeration[2]; > 1440 if (parent != null) { > 1441 tmp[0] = parent.getResources(name); <------ look here > 1442 } else { > 1443 tmp[0] = BootLoader.findResources(name); > 1444 } > 1445 tmp[1] = findResources(name); > 1446 > 1447 return new CompoundEnumeration<>(tmp); > 1448 } @TobiHartmann gave me this: https://github.com/openjdk/jdk/pull/10901 https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2019-May/033803.html I need to study it more, but the idea is that interfaces cannot be trusted, and I could just try to ignore the interface part of the type. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2473497571 From qamai at openjdk.org Wed Oct 29 14:41:26 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 29 Oct 2025 14:41:26 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 14:36:00 GMT, Emanuel Peter wrote: >> I can make the assert strong for primitive types, and that is where the bug happened. >> >> I tried to make it work for pointers too, but I got this example: >> >> value_type: java/lang/Object:NotNull * >> field_type: java/lang/Object (java/util/Enumeration) * >> >> It happens in `ClassLoader.getResources` (about line 1445). >> We seem to get back the `value_type = field_val->bottom_type()` from a nested call to `parent.getResources(name);`, but we don't seem to capture that this has the `Enumeration` interface. >> But the `field_type` (store to `tmp[0]`) knows about that, and so it is a "narrower" type. >> Is this expected? >> - If yes: can I even write an assert here? >> - If no: is this something we need/should fix? >> >> >> 1436 public Enumeration getResources(String name) throws IOException { >> 1437 Objects.requireNonNull(name); >> 1438 @SuppressWarnings("unchecked") >> 1439 Enumeration[] tmp = (Enumeration[]) new Enumeration[2]; >> 1440 if (parent != null) { >> 1441 tmp[0] = parent.getResources(name); <------ look here >> 1442 } else { >> 1443 tmp[0] = BootLoader.findResources(name); >> 1444 } >> 1445 tmp[1] = findResources(name); >> 1446 >> 1447 return new CompoundEnumeration<>(tmp); >> 1448 } > > @TobiHartmann gave me this: > https://github.com/openjdk/jdk/pull/10901 > https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2019-May/033803.html > I need to study it more, but the idea is that interfaces cannot be trusted, and I could just try to ignore the interface part of the type. It seems that we don't trust the return type of a method call (look at `TypeFunc::make(ciMethod*)` where we pass `ignore_interfaces` into `TypeTuple::make_domain` and `TypeTuple::make_range`). I don't understand why, though, interfaces are weird. However, `tmp` is trusted because it is created with `anewarray`. In `Parse::do_anewarray`, we pass `trusted_interfaces` to `TypeKlassPtr::make`. The question is, why are we storing an `Object` into an array of `Enumeration`s? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2473513179 From tonyp at openjdk.org Wed Oct 29 14:52:46 2025 From: tonyp at openjdk.org (Antonios Printezis) Date: Wed, 29 Oct 2025 14:52:46 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v8] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 09:56:54 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? @eme64 >> >> ## Issue >> >> Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. >> For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. >> >> ## ?Fix >> Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. >> >> Thanks >> >> This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/superword.cpp > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/superword.cpp > > Co-authored-by: Emanuel Peter LGTM ------------- Marked as reviewed by tonyp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27942#pullrequestreview-3394022496 From epeter at openjdk.org Wed Oct 29 14:53:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 14:53:47 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 14:38:23 GMT, Quan Anh Mai wrote: >> @TobiHartmann gave me this: >> https://github.com/openjdk/jdk/pull/10901 >> https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2019-May/033803.html >> I need to study it more, but the idea is that interfaces cannot be trusted, and I could just try to ignore the interface part of the type. > > It seems that we don't trust the return type of a method call (look at `TypeFunc::make(ciMethod*)` where we pass `ignore_interfaces` into `TypeTuple::make_domain` and `TypeTuple::make_range`). I don't understand why, though, interfaces are weird. > > However, `tmp` is trusted because it is created with `anewarray`. In `Parse::do_anewarray`, we pass `trusted_interfaces` to `TypeKlassPtr::make`. > > The question is, why are we storing an `Object` into an array of `Enumeration`s? Right, I follow you all the way up to your question. I would answer like this: `tmp` knows that its elements (the fields) have type `java/lang/Object (java/util/Enumeration)`, so they must be `Object` of interface `Enumeration`. But the projection from the call does not trust the interface, and so it just knows that it produces an `Object`. I'll try to strip interface information from both, and see if I get a match that way. Does that sound reasonable? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2473586053 From mli at openjdk.org Wed Oct 29 14:55:52 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Oct 2025 14:55:52 GMT Subject: RFR: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP [v8] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 14:49:50 GMT, Antonios Printezis wrote: > LGTM @gctony Thank you for reviewing! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27942#issuecomment-3462017836 From qamai at openjdk.org Wed Oct 29 14:58:59 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 29 Oct 2025 14:58:59 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 14:50:48 GMT, Emanuel Peter wrote: >> It seems that we don't trust the return type of a method call (look at `TypeFunc::make(ciMethod*)` where we pass `ignore_interfaces` into `TypeTuple::make_domain` and `TypeTuple::make_range`). I don't understand why, though, interfaces are weird. >> >> However, `tmp` is trusted because it is created with `anewarray`. In `Parse::do_anewarray`, we pass `trusted_interfaces` to `TypeKlassPtr::make`. >> >> The question is, why are we storing an `Object` into an array of `Enumeration`s? > > Right, I follow you all the way up to your question. I would answer like this: > > `tmp` knows that its elements (the fields) have type `java/lang/Object (java/util/Enumeration)`, so they must be `Object` of interface `Enumeration`. But the projection from the call does not trust the interface, and so it just knows that it produces an `Object`. > > I'll try to strip interface information from both, and see if I get a match that way. Does that sound reasonable? @eme64 But `aastore` does a check cast before the store. Since `Object` is not a subtype of `Object(Enumeration)`, should we do a check cast and the value to store is a `CheckCastPP` with the appropriate type? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2473619411 From lmesnik at openjdk.org Wed Oct 29 14:59:01 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 29 Oct 2025 14:59:01 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: References: Message-ID: <4J2beErwndG7oZtRcQ1t6Fmpd-kATlHH-S9KnSEluSA=.2b865c8c-893f-4588-b821-7049efb01328@github.com> On Wed, 29 Oct 2025 00:49:06 GMT, Chad Rakoczy wrote: >> Ah, so _there_! I confused myself. This one is readable: the counter `0` means we can free. It would be even better if you did `inc_immutable_data_refcount()` and `dec_immutable_data_refcount()`, and did e.g.: >> >> >> if (dec_immutable_data_refcount() == 0) { >> os::free(_immutable_data); >> } >> >> int dec_immutable_data_refcount() { >> int refcount = get(...); >> assert(refcount > 0, "Must be positive"); >> set(refcount - 1); >> return refcount - 1; >> } >> >> >> Because the next thing you know this would need to be replaced with Atomics a year later. > >> Ah, so _there_! I confused myself. This one is readable: the counter `0` means we can free. It would be even better if you did `inc_immutable_data_refcount()` and `dec_immutable_data_refcount()`, and did e.g.: >> >> ``` >> if (dec_immutable_data_refcount() == 0) { >> os::free(_immutable_data); >> } >> >> int dec_immutable_data_refcount() { >> int refcount = get(...); >> assert(refcount > 0, "Must be positive"); >> set(refcount - 1); >> return refcount - 1; >> } >> ``` >> >> Because the next thing you know this would need to be replaced with Atomics a year later. > > I agree this makes the code cleaner. > > I replaced the getter and setter for the counter with `init_immutable_data_ref_count`, `inc_immutable_data_ref_count`, and `dec_immutable_data_ref_count`. I also shortened the counter name from `immutable_data_references_counter` to `immutable_data_ref_count` > > I modified `NMethod.java` to calculate the offsets that same way as is done in the JVM. I missed this in [JDK-8369642](https://bugs.openjdk.org/browse/JDK-8369642) > > The last notable change is that I modified the [immutable data size calculation](https://github.com/chadrako/jdk/blob/26bdc3ceb4ab9ad9cb9a4218bb87ce2d7546fa22/src/hotspot/share/code/nmethod.cpp#L1155) to only include a reference counter if there is immutable data @chadrako The testing pass now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3462037452 From epeter at openjdk.org Wed Oct 29 15:03:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 15:03:34 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 14:56:17 GMT, Quan Anh Mai wrote: >> Right, I follow you all the way up to your question. I would answer like this: >> >> `tmp` knows that its elements (the fields) have type `java/lang/Object (java/util/Enumeration)`, so they must be `Object` of interface `Enumeration`. But the projection from the call does not trust the interface, and so it just knows that it produces an `Object`. >> >> I'll try to strip interface information from both, and see if I get a match that way. Does that sound reasonable? > > @eme64 But `aastore` does a check cast before the store. Since `Object` is not a subtype of `Object(Enumeration)`, should we do a check cast and the value to store is a `CheckCastPP` with the appropriate type? I'm not following. Can you spell it out with a bit more detail? I'm not very familiar with how we deal with oops and interfaces in general, so I still need to read up a bit more now. Just to make clear: we are not talking about a regular store here, but rather capturing the value that would be stored, and instead pass it to the deopt SafePoint. But you probably are aware of that ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2473645592 From mli at openjdk.org Wed Oct 29 15:03:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Oct 2025 15:03:48 GMT Subject: Integrated: 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP In-Reply-To: References: Message-ID: On Wed, 22 Oct 2025 20:48:17 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? @eme64 > > ## Issue > > Currently, in SLP when transform from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), the unsigned-ness in CmpU is lost, then end up doing a signed instead of unsigned comparison in VectorMaskCmp. > For details please check code at `SuperWordVTransformBuilder::make_vector_vtnode_for_pack` and `PackSet::get_bool_test`. > > ## ?Fix > Currently, `BoolTest` does not support an unsigned construction (`BoolTest( mask btm ) : _test(btm) { assert((btm & unsigned_compare) == 0, "unsupported");}`), seems to me a feasible solution would be get the unsigned information from CmpU (which could be an input of Bool) and pass it to VectorMaskCmp. > > Thanks > > This pr could also lead to more optimizations, like: https://github.com/openjdk/jdk/pull/25336 and https://github.com/openjdk/jdk/pull/25341. This pull request has now been integrated. Changeset: eab5644a Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/eab5644a96e20409f31622d2e6c33636a7a49768 Stats: 782 lines in 3 files changed: 779 ins; 0 del; 3 mod 8370481: C2 SuperWord: Long/Integer.compareUnsigned return wrong value in SLP Reviewed-by: epeter, tonyp ------------- PR: https://git.openjdk.org/jdk/pull/27942 From epeter at openjdk.org Wed Oct 29 15:06:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 15:06:46 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 15:00:54 GMT, Emanuel Peter wrote: >> @eme64 But `aastore` does a check cast before the store. Since `Object` is not a subtype of `Object(Enumeration)`, should we do a check cast and the value to store is a `CheckCastPP` with the appropriate type? > > I'm not following. Can you spell it out with a bit more detail? I'm not very familiar with how we deal with oops and interfaces in general, so I still need to read up a bit more now. > > Just to make clear: we are not talking about a regular store here, but rather capturing the value that would be stored, and instead pass it to the deopt SafePoint. But you probably are aware of that ;) @rwestrel Do you have an idea how to strip away the interface information? Or would you follow another idea? I can also take the simple route here, and for now only assert for primitive types. Because the bug comes from MergeStores, and that only works for primitive types. And then I can file a follow-up RFE for someone to strengthen the assert, and possibly fix up other things. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2473661647 From qamai at openjdk.org Wed Oct 29 15:36:17 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 29 Oct 2025 15:36:17 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 15:03:40 GMT, Emanuel Peter wrote: >> I'm not following. Can you spell it out with a bit more detail? I'm not very familiar with how we deal with oops and interfaces in general, so I still need to read up a bit more now. >> >> Just to make clear: we are not talking about a regular store here, but rather capturing the value that would be stored, and instead pass it to the deopt SafePoint. But you probably are aware of that ;) > > @rwestrel Do you have an idea how to strip away the interface information? Or would you follow another idea? > > I can also take the simple route here, and for now only assert for primitive types. Because the bug comes from MergeStores, and that only works for primitive types. And then I can file a follow-up RFE for someone to strengthen the assert, and possibly fix up other things. A `TypeInstPtr` has a `_klass` and an `_interfaces`. The `_klass` is a non-interface `ciInstanceKlass` and the `_interfaces` contains the list of interfaces that `TypeInstPtr` must satisfy. For example, a `String` object would have the type (accessed by `PhaseGVN::type`) being `String (all the interfaces that String satisfies transitively)`, an `Enumeration` would have the type being `Object (Enumeration)`, a `List` object would have the type being `Object (List, Iterable, Collection)`. A `TypeInstPtr` is a subtype of another `TypeInstPtr` iff the `_klass` of the first one is a subtype of the `_klass` of the second one, and the `_interfaces` of the first `TypeInstPtr` is a superset of the `_interfaces` of the second `TypeInstPtr`. The logic is very convoluted because the implementation of `join` is very confusing, but that is the spirit. In the type system, the thing that is present is the thing that is trusted. As a result, because we don't trust the return type of `parent.getResources(name)`, we strip the interface part and the type of the result is `Object`. In contrast, we trust the type of a value returned by `anewarray`, so the type of `tmp` is an array of `Object (Enumeration)`. When executing the `aastore` bytecode. The compiler first does `Parse::array_store_check`, this, in turns, calls `GraphKit::gen_checkcast`. It can be seen that `obj` is of type `Object` while `a_e_klass` is the klass pointer of `Object (Enumeration)`. Since `Object` is not a subtype of `Object (Enumeration)`, we should have generate a check cast, then `obj` would be casted to `Object (Enumeration)` with a `CheckCastPP`. This `CheckCastPP` is then used as the input of the `EncodePNode`, then to the `StoreNNode`. So the type being stored should not be `Object`, but `Object (Enumeration)`. That is the spirit, I wonder what is unexpected here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2473840361 From epeter at openjdk.org Wed Oct 29 15:56:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 15:56:12 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 15:33:05 GMT, Quan Anh Mai wrote: >> @rwestrel Do you have an idea how to strip away the interface information? Or would you follow another idea? >> >> I can also take the simple route here, and for now only assert for primitive types. Because the bug comes from MergeStores, and that only works for primitive types. And then I can file a follow-up RFE for someone to strengthen the assert, and possibly fix up other things. > > A `TypeInstPtr` has a `_klass` and an `_interfaces`. The `_klass` is a non-interface `ciInstanceKlass` and the `_interfaces` contains the list of interfaces that `TypeInstPtr` must satisfy. For example, a `String` object would have the type (accessed by `PhaseGVN::type`) being `String (all the interfaces that String satisfies transitively)`, an `Enumeration` would have the type being `Object (Enumeration)`, a `List` object would have the type being `Object (List, Iterable, Collection)`. > > A `TypeInstPtr` is a subtype of another `TypeInstPtr` iff the `_klass` of the first one is a subtype of the `_klass` of the second one, and the `_interfaces` of the first `TypeInstPtr` is a superset of the `_interfaces` of the second `TypeInstPtr`. The logic is very convoluted because the implementation of `join` is very confusing, but that is the spirit. > > In the type system, the thing that is present is the thing that is trusted. As a result, because we don't trust the return type of `parent.getResources(name)`, we strip the interface part and the type of the result is `Object`. In contrast, we trust the type of a value returned by `anewarray`, so the type of `tmp` is an array of `Object (Enumeration)`. > > When executing the `aastore` bytecode. The compiler first does `Parse::array_store_check`, this, in turns, calls `GraphKit::gen_checkcast`. It can be seen that `obj` is of type `Object` while `a_e_klass` is the klass pointer of `Object (Enumeration)`. Since `Object` is not a subtype of `Object (Enumeration)`, we should have generate a check cast, then `obj` would be casted to `Object (Enumeration)` with a `CheckCastPP`. This `CheckCastPP` is then used as the input of the `EncodePNode`, then to the `StoreNNode`. So the type being stored should not be `Object`, but `Object (Enumeration)`. > > That is the spirit, I wonder what is unexpected here. Ah, I see. You are wondering if there should be a `Parse::array_store_check`, why is there not a checkcast so that we cast from `Object` -> `Object (Enumeration)`. Great question. I'll investigate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2473968831 From dlunden at openjdk.org Wed Oct 29 16:08:25 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 29 Oct 2025 16:08:25 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> References: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> Message-ID: On Mon, 27 Oct 2025 07:32:43 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing redundant interferecne check from biasing Thanks for working on this @jatin-bhateja! I think the code changes themselves look sound, but I would like a bit more information about the performance and code size improvements. I'm also running some additional testing and benchmarking, and will let you know when I have the results. > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. Can you elaborate on how you measured this improvement? > Thorough validations are underway using the latest Intel Software Development Emulator version 9.58. Great, can you elaborate more on this? What types of validations? Also, here is a patch with some simple style and wording fixes: https://github.com/dlunde/jdk/commit/d2b511804c757c89c5662028ea9e4a9dff43b641. I know you just moved some of the affected code around, but we might as well fix a few style issues while we are at it. ------------- Changes requested by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3394597821 From epeter at openjdk.org Wed Oct 29 16:23:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 16:23:34 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: <_vdMcb8lPgqA7RCDCir9rVLnxHAS7SZF-Wa8HElvzjg=.e6e9e011-d0cc-4588-82f3-c113f7166a3c@github.com> On Wed, 29 Oct 2025 15:53:10 GMT, Emanuel Peter wrote: >> A `TypeInstPtr` has a `_klass` and an `_interfaces`. The `_klass` is a non-interface `ciInstanceKlass` and the `_interfaces` contains the list of interfaces that `TypeInstPtr` must satisfy. For example, a `String` object would have the type (accessed by `PhaseGVN::type`) being `String (all the interfaces that String satisfies transitively)`, an `Enumeration` would have the type being `Object (Enumeration)`, a `List` object would have the type being `Object (List, Iterable, Collection)`. >> >> A `TypeInstPtr` is a subtype of another `TypeInstPtr` iff the `_klass` of the first one is a subtype of the `_klass` of the second one, and the `_interfaces` of the first `TypeInstPtr` is a superset of the `_interfaces` of the second `TypeInstPtr`. The logic is very convoluted because the implementation of `join` is very confusing, but that is the spirit. >> >> In the type system, the thing that is present is the thing that is trusted. As a result, because we don't trust the return type of `parent.getResources(name)`, we strip the interface part and the type of the result is `Object`. In contrast, we trust the type of a value returned by `anewarray`, so the type of `tmp` is an array of `Object (Enumeration)`. >> >> When executing the `aastore` bytecode. The compiler first does `Parse::array_store_check`, this, in turns, calls `GraphKit::gen_checkcast`. It can be seen that `obj` is of type `Object` while `a_e_klass` is the klass pointer of `Object (Enumeration)`. Since `Object` is not a subtype of `Object (Enumeration)`, we should have generate a check cast, then `obj` would be casted to `Object (Enumeration)` with a `CheckCastPP`. This `CheckCastPP` is then used as the input of the `EncodePNode`, then to the `StoreNNode`. So the type being stored should not be `Object`, but `Object (Enumeration)`. >> >> That is the spirit, I wonder what is unexpected here. > > Ah, I see. You are wondering if there should be a `Parse::array_store_check`, why is there not a checkcast so that we cast from `Object` -> `Object (Enumeration)`. Great question. I'll investigate. Quick summary of investigation: The `EncodeP` is the `field_val`: (rr) p field_val->dump_bfs(3,0,"#") dist dump --------------------------------------------- 3 303 CallDynamicJava === 203 299 202 8 1 (10 40 1 1 90 90 69 ) [[ 304 305 306 308 317 316 ]] # Dynamic java.lang.ClassLoader::findResources java/lang/Object * ( java/security/SecureClassLoader:NotNull *, java/lang/String (java/io/Serializable,java/lang/Comparable,java/lang/CharSequence,java/lang/constant/Constable,java/lang/constant/ConstantDesc):exact * ) ClassLoader::getResources @ bci:42 (line 1445) !jvms: ClassLoader::getResources @ bci:42 (line 1445) 3 332 If === 311 331 [[ 333 334 ]] P=0.999999, C=-1.000000 !jvms: ClassLoader::getResources @ bci:45 (line 1445) 2 308 Proj === 303 [[ 335 330 ]] #5 Oop:java/lang/Object * !jvms: ClassLoader::getResources @ bci:42 (line 1445) 2 334 IfTrue === 332 [[ 376 335 ]] #1 !jvms: ClassLoader::getResources @ bci:45 (line 1445) 1 335 CastPP === 334 308 [[ 358 343 353 ]] #java/lang/Object:NotNull * Oop:java/lang/Object:NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) 0 358 EncodeP === _ 335 [[ 359 ]] #narrowoop: java/lang/Object:NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) It seems the value comes from the call, via a projection, and a null-check cast. But I see no type/interface check cast for `Enumeration`. This is the store that is being eliminated: 359 StoreN === 377 393 322 358 [[ 365 ]] @narrowoop: java/lang/Object *[int:>=0] (java/lang/Cloneable,java/io/Serializable)+any * [narrow], idx=6; Memory: @narrowoop: java/lang/Object (java/util/Enumeration) *[int:2] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[1] *,iid=73 [narrow], idx=12; !jvms: ClassLoader::getResources @ bci:45 (line 1445) Tracing back where the store comes from: `Parse::array_store`. (rr) p elemtype->dump() narrowoop: java/lang/Object (java/util/Enumeration) * Inside `Parse::array_store_check` -> `GraphKit::gen_checkcast` we create a null check, with the CastPP we saw above: 335 CastPP === 334 308 [[ 24 ]] #java/lang/Object:NotNull * Oop:java/lang/Object:NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) More in next comment... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2474151074 From epeter at openjdk.org Wed Oct 29 16:40:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 16:40:14 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: <_vdMcb8lPgqA7RCDCir9rVLnxHAS7SZF-Wa8HElvzjg=.e6e9e011-d0cc-4588-82f3-c113f7166a3c@github.com> References: <_vdMcb8lPgqA7RCDCir9rVLnxHAS7SZF-Wa8HElvzjg=.e6e9e011-d0cc-4588-82f3-c113f7166a3c@github.com> Message-ID: On Wed, 29 Oct 2025 16:20:51 GMT, Emanuel Peter wrote: >> Ah, I see. You are wondering if there should be a `Parse::array_store_check`, why is there not a checkcast so that we cast from `Object` -> `Object (Enumeration)`. Great question. I'll investigate. > > Quick summary of investigation: > > The `EncodeP` is the `field_val`: > > (rr) p field_val->dump_bfs(3,0,"#") > dist dump > --------------------------------------------- > 3 303 CallDynamicJava === 203 299 202 8 1 (10 40 1 1 90 90 69 ) [[ 304 305 306 308 317 316 ]] # Dynamic java.lang.ClassLoader::findResources java/lang/Object * ( java/security/SecureClassLoader:NotNull *, java/lang/String (java/io/Serializable,java/lang/Comparable,java/lang/CharSequence,java/lang/constant/Constable,java/lang/constant/ConstantDesc):exact * ) ClassLoader::getResources @ bci:42 (line 1445) !jvms: ClassLoader::getResources @ bci:42 (line 1445) > 3 332 If === 311 331 [[ 333 334 ]] P=0.999999, C=-1.000000 !jvms: ClassLoader::getResources @ bci:45 (line 1445) > 2 308 Proj === 303 [[ 335 330 ]] #5 Oop:java/lang/Object * !jvms: ClassLoader::getResources @ bci:42 (line 1445) > 2 334 IfTrue === 332 [[ 376 335 ]] #1 !jvms: ClassLoader::getResources @ bci:45 (line 1445) > 1 335 CastPP === 334 308 [[ 358 343 353 ]] #java/lang/Object:NotNull * Oop:java/lang/Object:NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) > 0 358 EncodeP === _ 335 [[ 359 ]] #narrowoop: java/lang/Object:NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) > > It seems the value comes from the call, via a projection, and a null-check cast. But I see no type/interface check cast for `Enumeration`. > > This is the store that is being eliminated: > > 359 StoreN === 377 393 322 358 [[ 365 ]] @narrowoop: java/lang/Object *[int:>=0] (java/lang/Cloneable,java/io/Serializable)+any * [narrow], idx=6; Memory: @narrowoop: java/lang/Object (java/util/Enumeration) *[int:2] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[1] *,iid=73 [narrow], idx=12; !jvms: ClassLoader::getResources @ bci:45 (line 1445) > > > Tracing back where the store comes from: `Parse::array_store`. > > (rr) p elemtype->dump() > narrowoop: java/lang/Object (java/util/Enumeration) * > > Inside `Parse::array_store_check` -> `GraphKit::gen_checkcast` we create a null check, with the CastPP we saw above: > > 335 CastPP === 334 308 [[ 24 ]] #java/lang/Object:NotNull * Oop:java/lang/Object:NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) > > > More in next comment... Then we come here: 3355 // Generate the subtype check 3356 Node* improved_superklass = superklass; 3357 if (improved_klass_ptr_type != klass_ptr_type && improved_klass_ptr_type->singleton()) { 3358 improved_superklass = makecon(improved_klass_ptr_type); 3359 } 3360 Node* not_subtype_ctrl = gen_subtype_check(not_null_obj, improved_superklass); 3361 3362 // Plug in success path into the merge 3363 cast_obj = _gvn.transform(new CheckCastPPNode(control(), not_null_obj, toop)); This gives us a `cast_obj` that knows about `Enumeration`: 350 CheckCastPP === 348 335 [[ ]] #java/lang/Object (java/util/Enumeration):NotNull * Oop:java/lang/Object (java/util/Enumeration):NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) So things are looking promising for now, `res` is that `350 CheckCastPP`. But out in `array_store`, this is not what gets picked up when we do: `val = pop();`. Instead we get the null-check only `335 CastPP`. So somehow it must have been lost? Tracking the slot, I see that `Parse::array_store_check` does `replace_in_map(value, cast);`. But we don't seem to do that for `GraphKit::gen_checkcast`. @merykitty @rwestrel Maybe we should investigate this separately from this bugfix here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2474210489 From kvn at openjdk.org Wed Oct 29 17:02:23 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 29 Oct 2025 17:02:23 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: <4J2beErwndG7oZtRcQ1t6Fmpd-kATlHH-S9KnSEluSA=.2b865c8c-893f-4588-b821-7049efb01328@github.com> References: <4J2beErwndG7oZtRcQ1t6Fmpd-kATlHH-S9KnSEluSA=.2b865c8c-893f-4588-b821-7049efb01328@github.com> Message-ID: On Wed, 29 Oct 2025 14:56:14 GMT, Leonid Mesnik wrote: >>> Ah, so _there_! I confused myself. This one is readable: the counter `0` means we can free. It would be even better if you did `inc_immutable_data_refcount()` and `dec_immutable_data_refcount()`, and did e.g.: >>> >>> ``` >>> if (dec_immutable_data_refcount() == 0) { >>> os::free(_immutable_data); >>> } >>> >>> int dec_immutable_data_refcount() { >>> int refcount = get(...); >>> assert(refcount > 0, "Must be positive"); >>> set(refcount - 1); >>> return refcount - 1; >>> } >>> ``` >>> >>> Because the next thing you know this would need to be replaced with Atomics a year later. >> >> I agree this makes the code cleaner. >> >> I replaced the getter and setter for the counter with `init_immutable_data_ref_count`, `inc_immutable_data_ref_count`, and `dec_immutable_data_ref_count`. I also shortened the counter name from `immutable_data_references_counter` to `immutable_data_ref_count` >> >> I modified `NMethod.java` to calculate the offsets that same way as is done in the JVM. I missed this in [JDK-8369642](https://bugs.openjdk.org/browse/JDK-8369642) >> >> The last notable change is that I modified the [immutable data size calculation](https://github.com/chadrako/jdk/blob/26bdc3ceb4ab9ad9cb9a4218bb87ce2d7546fa22/src/hotspot/share/code/nmethod.cpp#L1155) to only include a reference counter if there is immutable data > > @chadrako The testing pass now. @lmesnik please add link to testing in confidential comment in JBS ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3462693312 From epeter at openjdk.org Wed Oct 29 17:07:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 17:07:41 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: > Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 > > **Analysis** > We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. > During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. > But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. > > Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. > Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. > > **Solution** > We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. > > But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: > https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 > (the idea is to bail out of the elimination if any of the found stores are mismatched.) > > **Details** > > How the bad sequence develops, and which components are involved. > > 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) > > 6 ConI === 23 [[ 4 ]] #int:16777216 > 7 ConI === 23 [[ 4 ]] #int:256 > 8 ConI === 23 [[ 4 ]] #int:1048576 > 9 ConL === 23 [[ 4 ]] #long:68719476737 > 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] > 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) > > > 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: > https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 > If I understand it right, there zero is just a placeholder. > > And so we get: > > (rr) p sv->print_fields_on(tty) > Fields: 0, 68719476737, 1048576, 256, 16777216 > > We can see the `zero`, followed by the `ConL`, and then 3 `ConI`. > > This se... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores - only verify primitive types - Apply suggestions from code review - more assert adjustment - ignore debug flag - id for tests, and fix up the assert - pass int for short slot - another test - improve test - wip new IR test - ... and 6 more: https://git.openjdk.org/jdk/compare/ec5c366a...b6e032c2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27997/files - new: https://git.openjdk.org/jdk/pull/27997/files/9114d379..b6e032c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27997&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27997&range=00-01 Stats: 6694 lines in 293 files changed: 3969 ins; 1760 del; 965 mod Patch: https://git.openjdk.org/jdk/pull/27997.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27997/head:pull/27997 PR: https://git.openjdk.org/jdk/pull/27997 From qamai at openjdk.org Wed Oct 29 17:07:42 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 29 Oct 2025 17:07:42 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: <_vdMcb8lPgqA7RCDCir9rVLnxHAS7SZF-Wa8HElvzjg=.e6e9e011-d0cc-4588-82f3-c113f7166a3c@github.com> Message-ID: <-zYyllocIDIPMRhGJzDLTgeUaBkWy1LFY0BfF2kK94k=.804d52b6-b010-4061-8ca9-bee006ed7549@github.com> On Wed, 29 Oct 2025 16:37:08 GMT, Emanuel Peter wrote: >> Quick summary of investigation: >> >> The `EncodeP` is the `field_val`: >> >> (rr) p field_val->dump_bfs(3,0,"#") >> dist dump >> --------------------------------------------- >> 3 303 CallDynamicJava === 203 299 202 8 1 (10 40 1 1 90 90 69 ) [[ 304 305 306 308 317 316 ]] # Dynamic java.lang.ClassLoader::findResources java/lang/Object * ( java/security/SecureClassLoader:NotNull *, java/lang/String (java/io/Serializable,java/lang/Comparable,java/lang/CharSequence,java/lang/constant/Constable,java/lang/constant/ConstantDesc):exact * ) ClassLoader::getResources @ bci:42 (line 1445) !jvms: ClassLoader::getResources @ bci:42 (line 1445) >> 3 332 If === 311 331 [[ 333 334 ]] P=0.999999, C=-1.000000 !jvms: ClassLoader::getResources @ bci:45 (line 1445) >> 2 308 Proj === 303 [[ 335 330 ]] #5 Oop:java/lang/Object * !jvms: ClassLoader::getResources @ bci:42 (line 1445) >> 2 334 IfTrue === 332 [[ 376 335 ]] #1 !jvms: ClassLoader::getResources @ bci:45 (line 1445) >> 1 335 CastPP === 334 308 [[ 358 343 353 ]] #java/lang/Object:NotNull * Oop:java/lang/Object:NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) >> 0 358 EncodeP === _ 335 [[ 359 ]] #narrowoop: java/lang/Object:NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) >> >> It seems the value comes from the call, via a projection, and a null-check cast. But I see no type/interface check cast for `Enumeration`. >> >> This is the store that is being eliminated: >> >> 359 StoreN === 377 393 322 358 [[ 365 ]] @narrowoop: java/lang/Object *[int:>=0] (java/lang/Cloneable,java/io/Serializable)+any * [narrow], idx=6; Memory: @narrowoop: java/lang/Object (java/util/Enumeration) *[int:2] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[1] *,iid=73 [narrow], idx=12; !jvms: ClassLoader::getResources @ bci:45 (line 1445) >> >> >> Tracing back where the store comes from: `Parse::array_store`. >> >> (rr) p elemtype->dump() >> narrowoop: java/lang/Object (java/util/Enumeration) * >> >> Inside `Parse::array_store_check` -> `GraphKit::gen_checkcast` we create a null check, with the CastPP we saw above: >> >> 335 CastPP === 334 308 [[ 24 ]] #java/lang/Object:NotNull * Oop:java/lang/Object:NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) >> >> >> More in next comment... > > Then we come here: > > 3355 // Generate the subtype check > 3356 Node* improved_superklass = superklass; > 3357 if (improved_klass_ptr_type != klass_ptr_type && improved_klass_ptr_type->singleton()) { > 3358 improved_superklass = makecon(improved_klass_ptr_type); > 3359 } > 3360 Node* not_subtype_ctrl = gen_subtype_check(not_null_obj, improved_superklass); > 3361 > 3362 // Plug in success path into the merge > 3363 cast_obj = _gvn.transform(new CheckCastPPNode(control(), not_null_obj, toop)); > > This gives us a `cast_obj` that knows about `Enumeration`: > > 350 CheckCastPP === 348 335 [[ ]] #java/lang/Object (java/util/Enumeration):NotNull * Oop:java/lang/Object (java/util/Enumeration):NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) > > So things are looking promising for now, `res` is that `350 CheckCastPP`. But out in `array_store`, this is not what gets picked up when we do: `val = pop();`. > Instead we get the null-check only `335 CastPP`. So somehow it must have been lost? > Tracking the slot, I see that `Parse::array_store_check` does `replace_in_map(value, cast);`. > But we don't seem to do that for `GraphKit::gen_checkcast`. > > @merykitty @rwestrel Maybe we should investigate this separately from this bugfix here? Yes I agree that we should investigate this separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2474273004 From qamai at openjdk.org Wed Oct 29 17:07:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 29 Oct 2025 17:07:44 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: <-zYyllocIDIPMRhGJzDLTgeUaBkWy1LFY0BfF2kK94k=.804d52b6-b010-4061-8ca9-bee006ed7549@github.com> References: <_vdMcb8lPgqA7RCDCir9rVLnxHAS7SZF-Wa8HElvzjg=.e6e9e011-d0cc-4588-82f3-c113f7166a3c@github.com> <-zYyllocIDIPMRhGJzDLTgeUaBkWy1LFY0BfF2kK94k=.804d52b6-b010-4061-8ca9-bee006ed7549@github.com> Message-ID: On Wed, 29 Oct 2025 17:00:08 GMT, Quan Anh Mai wrote: >> Then we come here: >> >> 3355 // Generate the subtype check >> 3356 Node* improved_superklass = superklass; >> 3357 if (improved_klass_ptr_type != klass_ptr_type && improved_klass_ptr_type->singleton()) { >> 3358 improved_superklass = makecon(improved_klass_ptr_type); >> 3359 } >> 3360 Node* not_subtype_ctrl = gen_subtype_check(not_null_obj, improved_superklass); >> 3361 >> 3362 // Plug in success path into the merge >> 3363 cast_obj = _gvn.transform(new CheckCastPPNode(control(), not_null_obj, toop)); >> >> This gives us a `cast_obj` that knows about `Enumeration`: >> >> 350 CheckCastPP === 348 335 [[ ]] #java/lang/Object (java/util/Enumeration):NotNull * Oop:java/lang/Object (java/util/Enumeration):NotNull * !jvms: ClassLoader::getResources @ bci:45 (line 1445) >> >> So things are looking promising for now, `res` is that `350 CheckCastPP`. But out in `array_store`, this is not what gets picked up when we do: `val = pop();`. >> Instead we get the null-check only `335 CastPP`. So somehow it must have been lost? >> Tracking the slot, I see that `Parse::array_store_check` does `replace_in_map(value, cast);`. >> But we don't seem to do that for `GraphKit::gen_checkcast`. >> >> @merykitty @rwestrel Maybe we should investigate this separately from this bugfix here? > > Yes I agree that we should investigate this separately. We have these lines: // Note I do NOT always 'replace_in_map(obj,result)' here. // if( tk->klass()->can_be_primary_super() ) // This means that if I successfully store an Object into an array-of-String // I 'forget' that the Object is really now known to be a String. I have to // do this because we don't have true union types for interfaces - if I store // a Baz into an array-of-Interface and then tell the optimizer it's an // Interface, I forget that it's also a Baz and cannot do Baz-like field // references to it. FIX THIS WHEN UNION TYPES APPEAR! // replace_in_map( obj, res ); But we do have union types now. So this seems doable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2474282523 From qamai at openjdk.org Wed Oct 29 17:12:57 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 29 Oct 2025 17:12:57 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 17:07:41 GMT, Emanuel Peter wrote: >> Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 >> >> **Analysis** >> We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. >> During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. >> But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. >> >> Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. >> Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. >> >> **Solution** >> We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. >> >> But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: >> https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 >> (the idea is to bail out of the elimination if any of the found stores are mismatched.) >> >> **Details** >> >> How the bad sequence develops, and which components are involved. >> >> 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) >> >> 6 ConI === 23 [[ 4 ]] #int:16777216 >> 7 ConI === 23 [[ 4 ]] #int:256 >> 8 ConI === 23 [[ 4 ]] #int:1048576 >> 9 ConL === 23 [[ 4 ]] #long:68719476737 >> 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] >> 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) >> >> >> 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: >> https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 >> If I understand it right, there zero is just a placeholder. >> >> And so we get: >> >> (rr) p sv->print_fields_on(tty) >> Fields: 0, 68719476737, 1048576, 256, 167772... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores > - only verify primitive types > - Apply suggestions from code review > - more assert adjustment > - ignore debug flag > - id for tests, and fix up the assert > - pass int for short slot > - another test > - improve test > - wip new IR test > - ... and 6 more: https://git.openjdk.org/jdk/compare/b652ace1...b6e032c2 Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27997#pullrequestreview-3394885138 From epeter at openjdk.org Wed Oct 29 17:12:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 17:12:57 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 17:07:30 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores >> - only verify primitive types >> - Apply suggestions from code review >> - more assert adjustment >> - ignore debug flag >> - id for tests, and fix up the assert >> - pass int for short slot >> - another test >> - improve test >> - wip new IR test >> - ... and 6 more: https://git.openjdk.org/jdk/compare/b652ace1...b6e032c2 > > Marked as reviewed by qamai (Committer). Thanks for suggesting the fix, I just copied the fix from Valhalla :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3462733878 From epeter at openjdk.org Wed Oct 29 17:12:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 17:12:59 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: <_vdMcb8lPgqA7RCDCir9rVLnxHAS7SZF-Wa8HElvzjg=.e6e9e011-d0cc-4588-82f3-c113f7166a3c@github.com> <-zYyllocIDIPMRhGJzDLTgeUaBkWy1LFY0BfF2kK94k=.804d52b6-b010-4061-8ca9-bee006ed7549@github.com> Message-ID: On Wed, 29 Oct 2025 17:03:56 GMT, Quan Anh Mai wrote: >> Yes I agree that we should investigate this separately. > > We have these lines: > > // Note I do NOT always 'replace_in_map(obj,result)' here. > // if( tk->klass()->can_be_primary_super() ) > // This means that if I successfully store an Object into an array-of-String > // I 'forget' that the Object is really now known to be a String. I have to > // do this because we don't have true union types for interfaces - if I store > // a Baz into an array-of-Interface and then tell the optimizer it's an > // Interface, I forget that it's also a Baz and cannot do Baz-like field > // references to it. FIX THIS WHEN UNION TYPES APPEAR! > // replace_in_map( obj, res ); > > But we do have union types now. So this seems doable. I modified the check to only check primitive types now. And filed this RFE: [JDK-8370901](https://bugs.openjdk.org/browse/JDK-8370901) @merykitty @rwestrel would either of you want to look into that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2474295135 From kvn at openjdk.org Wed Oct 29 17:22:35 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 29 Oct 2025 17:22:35 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 17:07:41 GMT, Emanuel Peter wrote: >> Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 >> >> **Analysis** >> We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. >> During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. >> But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. >> >> Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. >> Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. >> >> **Solution** >> We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. >> >> But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: >> https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 >> (the idea is to bail out of the elimination if any of the found stores are mismatched.) >> >> **Details** >> >> How the bad sequence develops, and which components are involved. >> >> 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) >> >> 6 ConI === 23 [[ 4 ]] #int:16777216 >> 7 ConI === 23 [[ 4 ]] #int:256 >> 8 ConI === 23 [[ 4 ]] #int:1048576 >> 9 ConL === 23 [[ 4 ]] #long:68719476737 >> 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] >> 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) >> >> >> 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: >> https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 >> If I understand it right, there zero is just a placeholder. >> >> And so we get: >> >> (rr) p sv->print_fields_on(tty) >> Fields: 0, 68719476737, 1048576, 256, 167772... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores > - only verify primitive types > - Apply suggestions from code review > - more assert adjustment > - ignore debug flag > - id for tests, and fix up the assert > - pass int for short slot > - another test > - improve test > - wip new IR test > - ... and 6 more: https://git.openjdk.org/jdk/compare/73c2a8fe...b6e032c2 > But now we run MergeStores, and merge two of the StoreI into a mismatched StoreL. Since associated allocation is marked as `_is_scalar_replaceable` we should be able to check that before trying to merge stores and avoid this mismatching. This will allow to eliminate allocation. I would still keep your check because mismatching may come from different place in a future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3462766568 From epeter at openjdk.org Wed Oct 29 17:36:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 17:36:49 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 17:20:15 GMT, Vladimir Kozlov wrote: > > But now we run MergeStores, and merge two of the StoreI into a mismatched StoreL. > > Since associated allocation is marked as `_is_scalar_replaceable` we should be able to check that before trying to merge stores and avoid this mismatching. This will allow to eliminate allocation. > > I would still keep your check because mismatching may come from different place in a future. I'm not sure I'm understanding you corretly. Are you saying we should find a way to do MergeStores after allocation elimination? @TobiHartmann Suggested I should move MergeStores as late as possible. I agree we should move MergeStores, but I'm not sure we should do it as part of this fix. I think it is quite rare that allocation elimination could succeed after loop-opts (and not already before), and that we also succeed with MergeStores. MergeStores is in since JDK21, and nobody filed a bug report because they noticed a regression (yet). What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3462815309 From mli at openjdk.org Wed Oct 29 17:37:23 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Oct 2025 17:37:23 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP Message-ID: Hi, Can you help to review this patch? [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. ==================== In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 Thanks! ------------- Commit messages: - fix regression for unsigned EQ/NE - initial commit - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 7 more: https://git.openjdk.org/jdk/compare/eab5644a...696ae0d7 Changes: https://git.openjdk.org/jdk/pull/28047/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28047&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370794 Stats: 1438 lines in 3 files changed: 1337 ins; 37 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/28047.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28047/head:pull/28047 PR: https://git.openjdk.org/jdk/pull/28047 From mli at openjdk.org Wed Oct 29 17:37:24 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Oct 2025 17:37:24 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 16:38:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. > > ==================== > > In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. > As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. > > [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 > > Thanks! @eme64 When I develop this test pr, found out that, after https://github.com/openjdk/jdk/pull/27942 there could be some failing scenarios for unsigned EQ/NE (found by the newly developed unsigned EQ/NE test). We could fix it with the following patch, I can put the fix patch and related EQ/NE signed/unsigned tests in this pr, but seems it's better to put them in another separate pr? Please let me know what do you think. Thanks diff --git a/src/hotspot/share/opto/subnode.cpp b/src/hotspot/share/opto/subnode.cpp index 9c6c7498dd0..a10cd2a8d5b 100644 --- a/src/hotspot/share/opto/subnode.cpp +++ b/src/hotspot/share/opto/subnode.cpp @@ -1398,6 +1398,21 @@ const Type *BoolTest::cc2logical( const Type *CC ) const { return TypeInt::BOOL; } +BoolTest::mask BoolTest::unsigned_mask(BoolTest::mask btm) { + switch(btm) { + case eq: + case ne: + return btm; + case lt: + case le: + case gt: + case ge: + return mask(btm | unsigned_compare); + default: + ShouldNotReachHere(); + } +} + //------------------------------dump_spec------------------------------------- // Print special per-node info void BoolTest::dump_on(outputStream *st) const { diff --git a/src/hotspot/share/opto/subnode.hpp b/src/hotspot/share/opto/subnode.hpp index 2c3d9cfd35e..463d9e020cb 100644 --- a/src/hotspot/share/opto/subnode.hpp +++ b/src/hotspot/share/opto/subnode.hpp @@ -331,7 +331,7 @@ struct BoolTest { mask negate( ) const { return negate_mask(_test); } // Return the negative mask for the given mask, for both signed and unsigned comparison. static mask negate_mask(mask btm) { return mask(btm ^ 4); } - static mask unsigned_mask(mask btm) { return mask(btm | unsigned_compare); } + static mask unsigned_mask(mask btm); bool is_canonical( ) const { return (_test == BoolTest::ne || _test == BoolTest::lt || _test == BoolTest::le || _test == BoolTest::overflow); } bool is_less( ) const { return _test == BoolTest::lt || _test == BoolTest::le; } bool is_greater( ) const { return _test == BoolTest::gt || _test == BoolTest::ge; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/28047#issuecomment-3462672014 From epeter at openjdk.org Wed Oct 29 17:37:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Oct 2025 17:37:24 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 16:52:10 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. >> >> ==================== >> >> In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. >> As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. >> >> [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 >> >> Thanks! > > @eme64 When I develop this test pr, found out that, after https://github.com/openjdk/jdk/pull/27942 there could be some failing scenarios for unsigned EQ/NE (found by the newly developed unsigned EQ/NE test). We could fix it with the following patch, I can put the fix patch and related EQ/NE signed/unsigned tests in this pr, but seems it's better to put them in another separate pr? Please let me know what do you think. Thanks > > > diff --git a/src/hotspot/share/opto/subnode.cpp b/src/hotspot/share/opto/subnode.cpp > index 9c6c7498dd0..a10cd2a8d5b 100644 > --- a/src/hotspot/share/opto/subnode.cpp > +++ b/src/hotspot/share/opto/subnode.cpp > @@ -1398,6 +1398,21 @@ const Type *BoolTest::cc2logical( const Type *CC ) const { > return TypeInt::BOOL; > } > > +BoolTest::mask BoolTest::unsigned_mask(BoolTest::mask btm) { > + switch(btm) { > + case eq: > + case ne: > + return btm; > + case lt: > + case le: > + case gt: > + case ge: > + return mask(btm | unsigned_compare); > + default: > + ShouldNotReachHere(); > + } > +} > + > //------------------------------dump_spec------------------------------------- > // Print special per-node info > void BoolTest::dump_on(outputStream *st) const { > diff --git a/src/hotspot/share/opto/subnode.hpp b/src/hotspot/share/opto/subnode.hpp > index 2c3d9cfd35e..463d9e020cb 100644 > --- a/src/hotspot/share/opto/subnode.hpp > +++ b/src/hotspot/share/opto/subnode.hpp > @@ -331,7 +331,7 @@ struct BoolTest { > mask negate( ) const { return negate_mask(_test); } > // Return the negative mask for the given mask, for both signed and unsigned comparison. > static mask negate_mask(mask btm) { return mask(btm ^ 4); } > - static mask unsigned_mask(mask btm) { return mask(btm | unsigned_compare); } > + static mask unsigned_mask(mask btm); > bool is_canonical( ) const { return (_test == BoolTest::ne || _test == BoolTest::lt || _test == BoolTest::le || _test == BoolTest::overflow); } > bool is_less( ) const { return _test == BoolTest::lt || _test == BoolTest::le; } > bool is_greater( ) const { return _test == BoolTest::gt || _test == BoolTest::ge; } @Hamlin-Li By failing you mean they now produce a wrong result? If yes, maybe we can convert this issue here to a bugfix? And just add all the additional tests to ensure we don't have any other bugs. Is it just a regression from the last patch, or an older issue? Probably just a regression from https://github.com/openjdk/jdk/pull/27942, because before we just did nothing with the mask, and that is what you want to go back to, right? BTW: thanks for writing all the tests, it seems to be the only way to ensure we get it all right :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28047#issuecomment-3462688904 From mli at openjdk.org Wed Oct 29 17:37:25 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Oct 2025 17:37:25 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 16:52:10 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. >> >> ==================== >> >> In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. >> As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. >> >> [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 >> >> Thanks! > > @eme64 When I develop this test pr, found out that, after https://github.com/openjdk/jdk/pull/27942 there could be some failing scenarios for unsigned EQ/NE (found by the newly developed unsigned EQ/NE test). We could fix it with the following patch, I can put the fix patch and related EQ/NE signed/unsigned tests in this pr, but seems it's better to put them in another separate pr? Please let me know what do you think. Thanks > > > diff --git a/src/hotspot/share/opto/subnode.cpp b/src/hotspot/share/opto/subnode.cpp > index 9c6c7498dd0..a10cd2a8d5b 100644 > --- a/src/hotspot/share/opto/subnode.cpp > +++ b/src/hotspot/share/opto/subnode.cpp > @@ -1398,6 +1398,21 @@ const Type *BoolTest::cc2logical( const Type *CC ) const { > return TypeInt::BOOL; > } > > +BoolTest::mask BoolTest::unsigned_mask(BoolTest::mask btm) { > + switch(btm) { > + case eq: > + case ne: > + return btm; > + case lt: > + case le: > + case gt: > + case ge: > + return mask(btm | unsigned_compare); > + default: > + ShouldNotReachHere(); > + } > +} > + > //------------------------------dump_spec------------------------------------- > // Print special per-node info > void BoolTest::dump_on(outputStream *st) const { > diff --git a/src/hotspot/share/opto/subnode.hpp b/src/hotspot/share/opto/subnode.hpp > index 2c3d9cfd35e..463d9e020cb 100644 > --- a/src/hotspot/share/opto/subnode.hpp > +++ b/src/hotspot/share/opto/subnode.hpp > @@ -331,7 +331,7 @@ struct BoolTest { > mask negate( ) const { return negate_mask(_test); } > // Return the negative mask for the given mask, for both signed and unsigned comparison. > static mask negate_mask(mask btm) { return mask(btm ^ 4); } > - static mask unsigned_mask(mask btm) { return mask(btm | unsigned_compare); } > + static mask unsigned_mask(mask btm); > bool is_canonical( ) const { return (_test == BoolTest::ne || _test == BoolTest::lt || _test == BoolTest::le || _test == BoolTest::overflow); } > bool is_less( ) const { return _test == BoolTest::lt || _test == BoolTest::le; } > bool is_greater( ) const { return _test == BoolTest::gt || _test == BoolTest::ge; } > @Hamlin-Li By failing you mean they now produce a wrong result? If yes, maybe we can convert this issue here to a bugfix? And just add all the additional tests to ensure we don't have any other bugs. > > Is it just a regression from the last patch, or an older issue? Probably just a regression from #27942, because before we just did nothing with the mask, and that is what you want to go back to, right? Yes, it's a regression from #27942. I'll update the bug and pr, also add fix and tests for unsigned EQ/NE. > BTW: thanks for writing all the tests, it seems to be the only way to ensure we get it all right :) Thanks for the quick response! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28047#issuecomment-3462781362 From kvn at openjdk.org Wed Oct 29 17:44:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 29 Oct 2025 17:44:41 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: <0arzN_oUqMaMPPK2NYsUIxKKW3hhimOZ82JJ7D1Chw4=.46ca057a-8fb5-4eb5-92cd-c3f20b29dc32@github.com> On Wed, 29 Oct 2025 17:07:41 GMT, Emanuel Peter wrote: >> Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 >> >> **Analysis** >> We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. >> During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. >> But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. >> >> Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. >> Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. >> >> **Solution** >> We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. >> >> But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: >> https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 >> (the idea is to bail out of the elimination if any of the found stores are mismatched.) >> >> **Details** >> >> How the bad sequence develops, and which components are involved. >> >> 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) >> >> 6 ConI === 23 [[ 4 ]] #int:16777216 >> 7 ConI === 23 [[ 4 ]] #int:256 >> 8 ConI === 23 [[ 4 ]] #int:1048576 >> 9 ConL === 23 [[ 4 ]] #long:68719476737 >> 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] >> 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) >> >> >> 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: >> https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 >> If I understand it right, there zero is just a placeholder. >> >> And so we get: >> >> (rr) p sv->print_fields_on(tty) >> Fields: 0, 68719476737, 1048576, 256, 167772... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores > - only verify primitive types > - Apply suggestions from code review > - more assert adjustment > - ignore debug flag > - id for tests, and fix up the assert > - pass int for short slot > - another test > - improve test > - wip new IR test > - ... and 6 more: https://git.openjdk.org/jdk/compare/bbe734ba...b6e032c2 No. From your description EA happens first and marked allocation as scalarizable and MergeStores happens after that. I suggest to not execute MergeStores for stores associated with allocation marked `_is_scalar_replaceable`. Unless I misinterpreted situation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3462840156 PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3462841661 From kvn at openjdk.org Wed Oct 29 17:50:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 29 Oct 2025 17:50:41 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: <28JzI3gxRKU4fIQ7544Xl9MW1ZGYR7IazhAsMPilpF0=.6c993c80-ab41-49e1-9084-98e226ac7c8a@github.com> On Wed, 29 Oct 2025 17:07:41 GMT, Emanuel Peter wrote: >> Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 >> >> **Analysis** >> We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. >> During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. >> But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. >> >> Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. >> Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. >> >> **Solution** >> We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. >> >> But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: >> https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 >> (the idea is to bail out of the elimination if any of the found stores are mismatched.) >> >> **Details** >> >> How the bad sequence develops, and which components are involved. >> >> 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) >> >> 6 ConI === 23 [[ 4 ]] #int:16777216 >> 7 ConI === 23 [[ 4 ]] #int:256 >> 8 ConI === 23 [[ 4 ]] #int:1048576 >> 9 ConL === 23 [[ 4 ]] #long:68719476737 >> 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] >> 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) >> >> >> 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: >> https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 >> If I understand it right, there zero is just a placeholder. >> >> And so we get: >> >> (rr) p sv->print_fields_on(tty) >> Fields: 0, 68719476737, 1048576, 256, 167772... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores > - only verify primitive types > - Apply suggestions from code review > - more assert adjustment > - ignore debug flag > - id for tests, and fix up the assert > - pass int for short slot > - another test > - improve test > - wip new IR test > - ... and 6 more: https://git.openjdk.org/jdk/compare/c2c3fce1...b6e032c2 If MergeStores happens before EA then yes, we should move MergeStores after EA. Or during EA check mismatching accesses and not mark such allocation as scalarizable. But this is less preferable. And I am fine to do that in separate changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3462853113 PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3462857857 From duke at openjdk.org Wed Oct 29 17:58:46 2025 From: duke at openjdk.org (Kirill Shirokov) Date: Wed, 29 Oct 2025 17:58:46 GMT Subject: RFR: 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces Message-ID: This PR addresses the trailing whitespaces for a .py test. They were introduced in commit 916694f2c1e7fc8d6a88e7026bc2d29ba2923849 and not detected by jcheck, since checking *.py for whitespaces is not enabled in .jcheck/conf. So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? ------------- Commit messages: - 8344345: File test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces Changes: https://git.openjdk.org/jdk/pull/27058/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27058&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344345 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/27058.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27058/head:pull/27058 PR: https://git.openjdk.org/jdk/pull/27058 From phh at openjdk.org Wed Oct 29 17:58:46 2025 From: phh at openjdk.org (Paul Hohensee) Date: Wed, 29 Oct 2025 17:58:46 GMT Subject: RFR: 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces In-Reply-To: References: Message-ID: On Tue, 2 Sep 2025 18:22:45 GMT, Kirill Shirokov wrote: > This PR addresses the trailing whitespaces for a .py test. > > They were introduced in commit 916694f2c1e7fc8d6a88e7026bc2d29ba2923849 and not detected by jcheck, since checking *.py for whitespaces is not enabled in .jcheck/conf. > > So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? Yes, please file an issue. ------------- Marked as reviewed by phh (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27058#pullrequestreview-3181709866 From duke at openjdk.org Wed Oct 29 18:55:50 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 29 Oct 2025 18:55:50 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v5] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:24:55 GMT, Aleksey Shipilev wrote: > Looks reasonable to me, thanks. FWIW, I was happy with the (simpler) previous version of the patch, and was content with doing this refactoring later. Maybe split them out, if you want to spend more time on this? I think I?m happy with this approach as is. I don?t think it needs to be split out into a separate refactor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3463280558 From duke at openjdk.org Wed Oct 29 20:23:10 2025 From: duke at openjdk.org (Tobias Hotz) Date: Wed, 29 Oct 2025 20:23:10 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v9] In-Reply-To: References: Message-ID: <2rpTqTSOzGtT6SCXvjIrzH1iPBj1zMXXBH0RdQxQiok=.e59eb3a8-5f2d-4638-8f58-ab4c29c95a05@github.com> > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Tobias Hotz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Add new asserts and change special case calculations - Merge branch 'master' of https://github.com/openjdk/jdk into better_interger_div_type - Add additional nodes to fail conditions to detect idealized/transformed DivI Nodes that did not constant fold - Remove checks for bottom and reorganize DivI/DivL Value functions - Adjust long constant folding test as well - Adjust test, assert and comments - Remove too strict assert from old code path - Fix if condition - Simplify the special case path - Add a simple path for non-special-case corner calculation - ... and 15 more: https://git.openjdk.org/jdk/compare/32697bf6...45a91bd0 ------------- Changes: https://git.openjdk.org/jdk/pull/26143/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=08 Stats: 713 lines in 2 files changed: 609 ins; 90 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From vlivanov at openjdk.org Wed Oct 29 21:32:06 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 29 Oct 2025 21:32:06 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> References: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> Message-ID: On Mon, 27 Oct 2025 07:32:43 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing redundant interferecne check from biasing Hard-coded lists in `Matcher::should_attempt_register_biasing()` and `is_commutative_oper` look fragile and hard to verify. (Especially `is_commutative_oper` which is used to check the root of matched ideal tree.) With proper ADLC support, that information can be placed on individual AD instructions which would make it clearer what is affected. ------------- PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3396363276 From sviswanathan at openjdk.org Wed Oct 29 22:30:09 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 29 Oct 2025 22:30:09 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:36:21 GMT, Jatin Bhateja wrote: > Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral (comparison) operation[s]. However, the safest way to compare two Float16 values is to use Float16.compare/compareTo method, given that floating point comparisons can also be unordered. > > e.g., both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, but are not numerically equivalent with integral comparison. > jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) > $3 ==> 0 > > In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short must be sign-extended before operation. > > Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27977#pullrequestreview-3396601848 From dlong at openjdk.org Wed Oct 29 22:47:08 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 29 Oct 2025 22:47:08 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> References: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> Message-ID: <79AScufLeBvh-9BFYBHAktT8fIQlOkuIAWrB5HrfrkM=.87162308-51ee-4369-8985-302105e77622@github.com> On Mon, 27 Oct 2025 07:32:43 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing redundant interferecne check from biasing src/hotspot/cpu/x86/x86_64.ad line 498: > 496: case xorL_rReg_im1_ndd_rule: > 497: case xorL_rReg_ndd_rule: > 498: case xorL_rReg_rReg_mem_ndd_rule: Having a list that needs adjusting as new rules are added seems fragile. Is there a way to detect that a rule is missing here? Is there an alternative way of implementing this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2475827936 From qamai at openjdk.org Thu Oct 30 00:24:26 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 30 Oct 2025 00:24:26 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v4] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Add assertion for the helper in CTPComparator Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27618/files - new: https://git.openjdk.org/jdk/pull/27618/files/513e3e9e..bd617d7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27618/head:pull/27618 PR: https://git.openjdk.org/jdk/pull/27618 From qamai at openjdk.org Thu Oct 30 05:21:32 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 30 Oct 2025 05:21:32 GMT Subject: RFR: 8370914: C2: Reimplement Type::join Message-ID: Hi, Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the places they happen makes it easier to catch unexpected interactions. It also helps us avoid sprinkling a bunch of cases in each meet and join method. Future work: - More cleanup can be made. I purposely avoid modifying the `xmeet` methods too much. There is a lot of room for simplification since the number of operand combinations has decreased significantly. - Remove the remaining remnants of `dual` such as `TypeInt::_dual` and `TypePtr::above_centerline`. - Stronger invariants can probably be asserted. For example, it seems that we can enforce that no instance with the concrete type being `TypeOopPtr` can be made, or `TypePtr` cannot be made with `BotPTR`. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - whitespace - Reimplement Type::join Changes: https://git.openjdk.org/jdk/pull/28051/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370914 Stats: 1905 lines in 7 files changed: 850 ins; 634 del; 421 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051 From jbhateja at openjdk.org Thu Oct 30 06:31:06 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 30 Oct 2025 06:31:06 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 22:26:58 GMT, Sandhya Viswanathan wrote: >> Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral (comparison) operation[s]. However, the safest way to compare two Float16 values is to use Float16.compare/compareTo method, given that floating point comparisons can also be unordered. >> >> e.g., both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, but are not numerically equivalent with integral comparison. >> jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) >> $3 ==> 0 >> >> In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short must be sign-extended before operation. >> >> Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Looks good to me. Thanks @sviswa7, Hi @TobiHartmann, @eme64, let me know if this is good to land. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27977#issuecomment-3466297359 From duke at openjdk.org Thu Oct 30 06:54:10 2025 From: duke at openjdk.org (Tobias Hotz) Date: Thu, 30 Oct 2025 06:54:10 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v2] In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 07:29:24 GMT, Manuel H?ssig wrote: >> Thanks for the fast review! The main reason for all the if cases is that min_int / (-1) is undefined behavior in C++, as it overflows. All code has to be careful that this special case can't happen in C++ code, and that's the main motivation behind all the ifs. I've added a comment that describes that. >> Otherwise, you would be right: Redudant calculations are no problem, min and max would take care of that. >> >> Regarding testing: I only ran tier1 tests on my machine and GHA > > @ichttt, are you still working on this? :slightly_smiling_face: @mhaessig I've added some new asserts to try and detect where it went wrong and merged the latest upstream. Can you run the failing test again please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3466352959 From epeter at openjdk.org Thu Oct 30 06:54:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 06:54:11 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: <28JzI3gxRKU4fIQ7544Xl9MW1ZGYR7IazhAsMPilpF0=.6c993c80-ab41-49e1-9084-98e226ac7c8a@github.com> References: <28JzI3gxRKU4fIQ7544Xl9MW1ZGYR7IazhAsMPilpF0=.6c993c80-ab41-49e1-9084-98e226ac7c8a@github.com> Message-ID: On Wed, 29 Oct 2025 17:47:52 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores >> - only verify primitive types >> - Apply suggestions from code review >> - more assert adjustment >> - ignore debug flag >> - id for tests, and fix up the assert >> - pass int for short slot >> - another test >> - improve test >> - wip new IR test >> - ... and 6 more: https://git.openjdk.org/jdk/compare/617bfc00...b6e032c2 > > And I am fine to do that in separate changes. @vnkozlov Thanks for the additional ideas and clarifications. > I suggest to not execute MergeStores for stores associated with allocation marked _is_scalar_replaceable. Right. We could keep things as they are now: EA analysis (mark stores scalar replacable), then MergeStores (but avoid scalar replacable stores), then do allocation elimination (if possoible, and no related store is still in a loop, for example). Downside: if allocation elimination still fails, we would perform neither allocation elimination nor MergeStores. The best course of action is to try to push MergeStores as late as possible, to disentangle it from other optimizations. We have already had to move MergeStores from post-loop-opts to a separate later phase: https://github.com/openjdk/jdk/pull/23944 So this would just be taking a further step into that direction. So where would I move `process_for_merge_stores_igvn` in that follow-up RFE? - After macro expansion? - After Barrier expansion? - After `optimize_logic_cones`? - After `process_late_inline_calls_no_inline`? Later is not possible, because (at least for now), we do need igvn. We could also try to go later, but that would require detaching MergeStores from IGVN, and would take a bit of a redesign. FYI: there was a PR for extending to `MergeLoads`: https://github.com/openjdk/jdk/pull/24023 In connection with that, we already considered refactoring MergeStores to avoid reliance on IGVN, and so we could push the optimization even later. The question is a bit how much time I should spend on this. Just moving `process_for_merge_stores_igvn` a little later is very little effort (just move the line, and some thorough testing). Refactoring to avoid IGVN is much more effort. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3466352871 From qamai at openjdk.org Thu Oct 30 07:06:06 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 30 Oct 2025 07:06:06 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 17:07:41 GMT, Emanuel Peter wrote: >> Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 >> >> **Analysis** >> We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. >> During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. >> But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. >> >> Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. >> Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. >> >> **Solution** >> We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. >> >> But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: >> https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 >> (the idea is to bail out of the elimination if any of the found stores are mismatched.) >> >> **Details** >> >> How the bad sequence develops, and which components are involved. >> >> 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) >> >> 6 ConI === 23 [[ 4 ]] #int:16777216 >> 7 ConI === 23 [[ 4 ]] #int:256 >> 8 ConI === 23 [[ 4 ]] #int:1048576 >> 9 ConL === 23 [[ 4 ]] #long:68719476737 >> 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] >> 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) >> >> >> 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: >> https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 >> If I understand it right, there zero is just a placeholder. >> >> And so we get: >> >> (rr) p sv->print_fields_on(tty) >> Fields: 0, 68719476737, 1048576, 256, 167772... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores > - only verify primitive types > - Apply suggestions from code review > - more assert adjustment > - ignore debug flag > - id for tests, and fix up the assert > - pass int for short slot > - another test > - improve test > - wip new IR test > - ... and 6 more: https://git.openjdk.org/jdk/compare/60350647...b6e032c2 Regardless, I think this patch makes sense. Bailing out of scalar elimination when we are doing it is better than when we are running EA, and we should generally try to do it if we can. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3466382031 From epeter at openjdk.org Thu Oct 30 07:14:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 07:14:04 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:36:21 GMT, Jatin Bhateja wrote: > Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral (comparison) operation[s]. However, the safest way to compare two Float16 values is to use Float16.compare/compareTo method, given that floating point comparisons can also be unordered. > > e.g., both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, but are not numerically equivalent with integral comparison. > jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) > $3 ==> 0 > > In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short must be sign-extended before operation. > > Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin @jatin-bhateja Thanks for fixing this! I have a few nits below. I'll run testing once you addressed them :) test/hotspot/jtreg/compiler/c2/TestFloat16Reduction.java line 33: > 31: * @library /test/lib / > 32: * @run main/othervm -XX:-TieredCompilation > 33: * compiler.c2.TestFloat16Reduction Was the flag required for reproducing the issue? If it was not required: just remove it If it was required: add a run without the flag, in addition to a run with the flag. test/hotspot/jtreg/compiler/c2/TestFloat16Reduction.java line 161: > 159: GOLDEN_MAX = MAXReduceLong(); > 160: GOLDEN_MIN = MINReduceLong(); > 161: } A total nit, and optional: you could make the fields static, and just assign values as you declare the fields. That would save you doing it all in the constructor. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27977#pullrequestreview-3397713210 PR Review Comment: https://git.openjdk.org/jdk/pull/27977#discussion_r2476638829 PR Review Comment: https://git.openjdk.org/jdk/pull/27977#discussion_r2476644247 From epeter at openjdk.org Thu Oct 30 07:14:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 07:14:06 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 07:03:42 GMT, Emanuel Peter wrote: >> Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral (comparison) operation[s]. However, the safest way to compare two Float16 values is to use Float16.compare/compareTo method, given that floating point comparisons can also be unordered. >> >> e.g., both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, but are not numerically equivalent with integral comparison. >> jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) >> $3 ==> 0 >> >> In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short must be sign-extended before operation. >> >> Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > test/hotspot/jtreg/compiler/c2/TestFloat16Reduction.java line 33: > >> 31: * @library /test/lib / >> 32: * @run main/othervm -XX:-TieredCompilation >> 33: * compiler.c2.TestFloat16Reduction > > Was the flag required for reproducing the issue? > If it was not required: just remove it > If it was required: add a run without the flag, in addition to a run with the flag. Also: the flat `-XX:-TieredCompilation` is now applied to the VM that runs the TestFramework, but that is not necessary. You could just do `framework.addFlags("-XX:-TieredCompilation")`, so that the flag only gets applied to the test VM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27977#discussion_r2476644092 From rsunderbabu at openjdk.org Thu Oct 30 07:16:37 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Thu, 30 Oct 2025 07:16:37 GMT Subject: RFR: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support Message-ID: We have a host of tests under test/hotspot/jtreg/compiler/intrinsics/sha which checks if the SHA intrinsics flags' enable/disable setting is in sync with CPU support in the underlying platform. There might be situations where the intrinsics might not be enabled despite the hardware supporting the relevant instructions. For example, there might be reliability issues or performance issues. In such situations, the tests will fail. Till now, the approach has been to exclude the platforms where the support is yet to be provided and remove the exclusion after. This necessitates additional work on the test front. A more compact design would be make predicate probes to rely on intrinsics availability in the platform as opposed to hardware support availability. The migration to intrinsics availability would especially help update releases where feature backport might not be complete. PS: This fix can/should be propagated to other such tests as well. Once this PR gets approval, I will work on similar tests. ------------- Commit messages: - 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support Changes: https://git.openjdk.org/jdk/pull/28053/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28053&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293484 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28053/head:pull/28053 PR: https://git.openjdk.org/jdk/pull/28053 From duke at openjdk.org Thu Oct 30 07:18:53 2025 From: duke at openjdk.org (Harshit470250) Date: Thu, 30 Oct 2025 07:18:53 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file Message-ID: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> This pr adds the size of the match rule nodes. There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. ------------- Commit messages: - Merge remote-tracking branch 'origin/master' - remove whitespace - Resolved a bug - remove TODO comments - final size added - Final sizes - oop_decoder and load_const_optimized - error fix and added more sizes - upto line 9078 - upto line 9283 - ... and 1 more: https://git.openjdk.org/jdk/compare/f3dfdfa3...c253449a Changes: https://git.openjdk.org/jdk/pull/28054/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370920 Stats: 155 lines in 1 file changed: 2 ins; 0 del; 153 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From epeter at openjdk.org Thu Oct 30 07:22:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 07:22:07 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Wed, 29 Oct 2025 10:16:41 GMT, Xiaohong Gong wrote: >> Another idea: use a return `Enum`. Then you can give things names, which can sometimes be more helpful than `true/false`. > > Hi @eme64 , I updated a commit which mainly changes the comments. The function name `mask_op_prefers_predicate` remains unchanged. After giving it careful thought overnight, I believe this name is more accurate. I?m sorry if my earlier explanation caused any confusion. Would you mind checking whether it's fine to you? Thanks! Nice, it looks much better now, thanks for the updates :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2476667488 From epeter at openjdk.org Thu Oct 30 07:22:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 07:22:12 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v5] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 10:13:03 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update comments src/hotspot/share/opto/matcher.hpp line 340: > 338: // - Return true if it prefers a predicate type (i.e. TypeVectMask). > 339: // - Return false if it prefers a general vector type (i.e. TypeVectA to TypeVectZ). > 340: static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt); Nice, this looks much clearer now, thanks for the updates :) I'll have a look at the whole PR at a later point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2476666651 From epeter at openjdk.org Thu Oct 30 07:34:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 07:34:09 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v5] In-Reply-To: References: Message-ID: <5YRBuKmUoT6YIb0tbKzavjTITGfECr1mUpO3suWQyww=.1a4b619c-a222-440f-9773-0fcc843c9988@github.com> On Wed, 29 Oct 2025 10:13:03 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update comments Generally, this patch looks reasonable, but I'm not a aarch64 or x64 specialist for these ops. I think we have sufficient aarch64 specialists look at this already. But I'd like to ping @sviswa7 and @jatin-bhateja to sanity check the x64 changes, IR rules etc :) After approval from x64 folks, I can offer to do some internal testing :) test/jdk/jdk/incubator/vector/Long128VectorTests.java line 6850: > 6848: var vmask = VectorMask.fromLong(SPECIES, inputLong); > 6849: // Insert "not()" to avoid the "fromLong/toLong" being optimized out by compiler. > 6850: long outputLong = vmask.not().toLong(); That sounds a bit fragile. Is there something that would catch if it did ever get optimized away? ------------- PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3397777045 PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2476685457 From mhaessig at openjdk.org Thu Oct 30 07:37:10 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 30 Oct 2025 07:37:10 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v2] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 06:51:17 GMT, Tobias Hotz wrote: >> @ichttt, are you still working on this? :slightly_smiling_face: > > @mhaessig I've added some new asserts to try and detect where it went wrong and merged the latest upstream. Can you run the failing test again please? Thank you @ichttt, I will kick off a run. I have been trying to extract a reproducer for you, but was not able to reproduce the failure again. Perhaps the asserts will help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3466468483 From rrich at openjdk.org Thu Oct 30 07:54:40 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 30 Oct 2025 07:54:40 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots Message-ID: With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. ##### Testing with fastdebug builds on AARCH64 and PPC64: hotspot_vector_1 hotspot_vector_2 jdk_vector jdk_vector_sanity ##### The change passed our CI testing: Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. Testing was done on the main platforms and also on Linux/PPC64le and AIX. C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: compiler/vectorapi/VectorRearrangeTest.java jdk/incubator/vector/Byte128VectorLoadStoreTests.java jdk/incubator/vector/Double256VectorLoadStoreTests.java jdk/incubator/vector/Float128VectorTests.java jdk/incubator/vector/Long256VectorLoadStoreTests.java jdk/incubator/vector/Short128VectorLoadStoreTests.java jdk/incubator/vector/Vector64ConversionTests.java ------------- Commit messages: - PPC: OptoAssembly for vector spilling - Assert aligned sp offsets in vector spilling - Delete TMP and !UseNewCode - Align Matcher::_new_SP for better vector spilling - TMP: trace unaligned vector spilling - Add test Changes: https://git.openjdk.org/jdk/pull/27969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27969&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370473 Stats: 202 lines in 7 files changed: 156 ins; 29 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/27969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27969/head:pull/27969 PR: https://git.openjdk.org/jdk/pull/27969 From dnsimon at openjdk.org Thu Oct 30 07:59:05 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 30 Oct 2025 07:59:05 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 14:29:54 GMT, Emanuel Peter wrote: >> This is to support Truffle where `long` and `double` fields can be encoded in `int[]` arrays. It's a bit like https://bugs.openjdk.org/browse/JDK-8231756 where fields are encoded in `byte[]` arrays. @tkrodriguez or @woess can you please confirm we still need this. > > @dougxc @tkrodriguez @woess Can we guard some of the logic in `#if INCLUDE_JVMCI` though? Yes, guarding this with `#if INCLUDE_JVMCI` is fine. More background thanks to @woess: This is [just how we encode long/double constants in an int array in the debug info](https://github.com/oracle/graal/blob/4cc7e42d1422e8502172d599216d7a0b6d263d52/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/gen/DebugInfoBuilder.java#L167). While for the `byte[]` case, we put illegal/marker values in the remaining slots, for the `int[]` case we put a int 0 constant into the second slot, when we store a long constant + illegal pair. In the debug info, this is reversed so that the second slot is actually [the low part](https://github.com/openjdk/jdk/blob/17fd801b24162dfbac6d4e63ef5048a0fb146074/src/hotspot/share/runtime/deoptimization.cpp#L1396), i.e. the long constant, and the first slot would then contain the int 0 constant (as checked by `sv->field_at(i)->is_constant_int()`). The illegal marker (added [here](https://github.com/openjdk/jdk/commit/12f8b52fb8b4741b5a0087223ca11c210a561038#diff-6de46252a64a081f101658db197dbe77ac35313688971c5aa51700aa0772c9e1R113)) could b e used instead to be consistent with what we do in the isVirtualByteArray case but changing that now probably isn?t worth it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2476745599 From duke at openjdk.org Thu Oct 30 08:53:40 2025 From: duke at openjdk.org (erifan) Date: Thu, 30 Oct 2025 08:53:40 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 10:06:02 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorapi/TestSelectFromTwoVectorOp.java line 221: >> >>> 219: @IR(counts = {IRNode.SELECT_FROM_TWO_VECTOR_VB, IRNode.VECTOR_SIZE_64, ">0"}, >>> 220: applyIfCPUFeature = {"sve2", "true"}, >>> 221: applyIf = {"MaxVectorSize", "64"}) >> >> Would it make sense to add some IR rule for cases with `MaxVectorSize > 64`? Because now you just weakened the test, rather than ensuring that there is a test for larger sizes. > > Maybe it would be enough to just remove the `, IRNode.VECTOR_SIZE_64`, so that the test could check for the largest vector length available on the platform? Hi @eme64 thanks for your review. As the vector species is fixed `ByteVector.SPECIES_512`, removing `IRNode.VECTOR_SIZE_64` doesn't work. Adding a rule for cases where `MaxVectorSize > 64` makes sense to me. I'll add the rule in the next commit, thanks~ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27723#discussion_r2476919178 From epeter at openjdk.org Thu Oct 30 08:54:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 08:54:38 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 07:56:27 GMT, Doug Simon wrote: >> @dougxc @tkrodriguez @woess Can we guard some of the logic in `#if INCLUDE_JVMCI` though? > > Yes, guarding this with `#if INCLUDE_JVMCI` is fine. > > More background thanks to @woess: > > This is [just how we encode long/double constants in an int array in the debug info](https://github.com/oracle/graal/blob/4cc7e42d1422e8502172d599216d7a0b6d263d52/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/gen/DebugInfoBuilder.java#L167). While for the `byte[]` case, we put illegal/marker values in the remaining slots, for the `int[]` case we put a int 0 constant into the second slot, when we store a long constant + illegal pair. In the debug info, this is reversed so that the second slot is actually [the low part](https://github.com/openjdk/jdk/blob/17fd801b24162dfbac6d4e63ef5048a0fb146074/src/hotspot/share/runtime/deoptimization.cpp#L1396), i.e. the long constant, and the first slot would then contain the int 0 constant (as checked by `sv->field_at(i)->is_constant_int()`). The illegal marker (added [here](https://github.com/openjdk/jdk/commit/12f8b52fb8b4741b5a0087223ca11c210a561038#diff-6de46252a64a081f101658db197dbe77ac35313688971c5aa51700aa0772c9e1R113)) could be used instead to be consistent with what we do in the isVirtualByteArray case but changing that now probably isn?t worth it. @dougxc Thanks for getting more info! I'll try to guard it with `#if INCLUDE_JVMCI` :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2476925291 From jbhateja at openjdk.org Thu Oct 30 09:01:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 30 Oct 2025 09:01:38 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop [v2] In-Reply-To: References: Message-ID: <0GIK4lMx8MCx1F-F0doUmYMOFIGde_nje4JGtn7X_cQ=.5227ed9d-b120-44d2-8b58-53b584fbc334@github.com> > Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral (comparison) operation[s]. However, the safest way to compare two Float16 values is to use Float16.compare/compareTo method, given that floating point comparisons can also be unordered. > > e.g., both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, but are not numerically equivalent with integral comparison. > jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) > $3 ==> 0 > > In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short must be sign-extended before operation. > > Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27977/files - new: https://git.openjdk.org/jdk/pull/27977/files/bd6f290b..f5e1a857 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27977&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27977&range=00-01 Stats: 31 lines in 1 file changed: 3 ins; 13 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/27977.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27977/head:pull/27977 PR: https://git.openjdk.org/jdk/pull/27977 From jbhateja at openjdk.org Thu Oct 30 09:01:39 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 30 Oct 2025 09:01:39 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop [v2] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 07:06:36 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/TestFloat16Reduction.java line 33: >> >>> 31: * @library /test/lib / >>> 32: * @run main/othervm -XX:-TieredCompilation >>> 33: * compiler.c2.TestFloat16Reduction >> >> Was the flag required for reproducing the issue? >> If it was not required: just remove it >> If it was required: add a run without the flag, in addition to a run with the flag. > > Also: the flat `-XX:-TieredCompilation` is now applied to the VM that runs the TestFramework, but that is not necessary. You could just do `framework.addFlags("-XX:-TieredCompilation")`, so that the flag only gets applied to the test VM. My bad, we don't need these, looks like a copy-paste error, IR framework uses white box APIs ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27977#discussion_r2476947396 From jbhateja at openjdk.org Thu Oct 30 09:01:40 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 30 Oct 2025 09:01:40 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop [v2] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 07:06:41 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > test/hotspot/jtreg/compiler/c2/TestFloat16Reduction.java line 161: > >> 159: GOLDEN_MAX = MAXReduceLong(); >> 160: GOLDEN_MIN = MINReduceLong(); >> 161: } > > A total nit, and optional: you could make the fields static, and just assign values as you declare the fields. That would save you doing it all in the constructor. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27977#discussion_r2476947040 From jbhateja at openjdk.org Thu Oct 30 09:06:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 30 Oct 2025 09:06:03 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop [v3] In-Reply-To: References: Message-ID: <4HXEOVQ2X0dIl7uFo1xuUJugrCUnYHU7Pp8uxpBv1Nw=.a982e500-e7e0-461f-81fc-419461a40830@github.com> > Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral (comparison) operation[s]. However, the safest way to compare two Float16 values is to use Float16.compare/compareTo method, given that floating point comparisons can also be unordered. > > e.g., both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, but are not numerically equivalent with integral comparison. > jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) > $3 ==> 0 > > In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short must be sign-extended before operation. > > Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Change GOLDEN constants to static variables ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27977/files - new: https://git.openjdk.org/jdk/pull/27977/files/f5e1a857..91970a18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27977&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27977&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/27977.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27977/head:pull/27977 PR: https://git.openjdk.org/jdk/pull/27977 From epeter at openjdk.org Thu Oct 30 09:23:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 09:23:17 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v3] In-Reply-To: References: Message-ID: > Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 > > **Analysis** > We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. > During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. > But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. > > Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. > Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. > > **Solution** > We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. > > But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: > https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 > (the idea is to bail out of the elimination if any of the found stores are mismatched.) > > **Details** > > How the bad sequence develops, and which components are involved. > > 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) > > 6 ConI === 23 [[ 4 ]] #int:16777216 > 7 ConI === 23 [[ 4 ]] #int:256 > 8 ConI === 23 [[ 4 ]] #int:1048576 > 9 ConL === 23 [[ 4 ]] #long:68719476737 > 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] > 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) > > > 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: > https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 > If I understand it right, there zero is just a placeholder. > > And so we get: > > (rr) p sv->print_fields_on(tty) > Fields: 0, 68719476737, 1048576, 256, 16777216 > > We can see the `zero`, followed by the `ConL`, and then 3 `ConI`. > > This se... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rm assert, guard with INCLUDE_JVMCI ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27997/files - new: https://git.openjdk.org/jdk/pull/27997/files/b6e032c2..c1d9cacc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27997&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27997&range=01-02 Stats: 11 lines in 1 file changed: 6 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27997.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27997/head:pull/27997 PR: https://git.openjdk.org/jdk/pull/27997 From epeter at openjdk.org Thu Oct 30 09:28:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 09:28:08 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v3] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 08:51:59 GMT, Emanuel Peter wrote: >> Yes, guarding this with `#if INCLUDE_JVMCI` is fine. >> >> More background thanks to @woess: >> >> This is [just how we encode long/double constants in an int array in the debug info](https://github.com/oracle/graal/blob/4cc7e42d1422e8502172d599216d7a0b6d263d52/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/gen/DebugInfoBuilder.java#L167). While for the `byte[]` case, we put illegal/marker values in the remaining slots, for the `int[]` case we put a int 0 constant into the second slot, when we store a long constant + illegal pair. In the debug info, this is reversed so that the second slot is actually [the low part](https://github.com/openjdk/jdk/blob/17fd801b24162dfbac6d4e63ef5048a0fb146074/src/hotspot/share/runtime/deoptimization.cpp#L1396), i.e. the long constant, and the first slot would then contain the int 0 constant (as checked by `sv->field_at(i)->is_constant_int()`). The illegal marker (added [here](https://github.com/openjdk/jdk/commit/12f8b52fb8b4741b5a0087223ca11c210a561038#diff-6de46252a64a081f101658db197dbe77ac35313688971c5aa51700aa0772c9e1R113)) coul d be used instead to be consistent with what we do in the isVirtualByteArray case but changing that now probably isn?t worth it. > > @dougxc Thanks for getting more info! I'll try to guard it with `#if INCLUDE_JVMCI` :) Ok, it is adjusted :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27997#discussion_r2477068579 From epeter at openjdk.org Thu Oct 30 09:40:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 09:40:29 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: <28JzI3gxRKU4fIQ7544Xl9MW1ZGYR7IazhAsMPilpF0=.6c993c80-ab41-49e1-9084-98e226ac7c8a@github.com> References: <28JzI3gxRKU4fIQ7544Xl9MW1ZGYR7IazhAsMPilpF0=.6c993c80-ab41-49e1-9084-98e226ac7c8a@github.com> Message-ID: On Wed, 29 Oct 2025 17:47:52 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores >> - only verify primitive types >> - Apply suggestions from code review >> - more assert adjustment >> - ignore debug flag >> - id for tests, and fix up the assert >> - pass int for short slot >> - another test >> - improve test >> - wip new IR test >> - ... and 6 more: https://git.openjdk.org/jdk/compare/57042cfd...b6e032c2 > > And I am fine to do that in separate changes. @vnkozlov I filed this: [JDK-8370936](https://bugs.openjdk.org/browse/JDK-8370936) C2 MergeStores: move process_for_merge_stores_igvn after allocation elimination I also fixed the `assert` that Graal would have tripped over, and refactored the code with `#if INCLUDE_JVMCI`. Update: @merykitty assigned this to himself, thank you! [JDK-8370901](https://bugs.openjdk.org/browse/JDK-8370901) C2: strenghten assert in create_scalarized_object_description after JDK-8370405 Is there anything else I can improve here? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3466959188 From mhaessig at openjdk.org Thu Oct 30 10:25:36 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 30 Oct 2025 10:25:36 GMT Subject: RFR: 8370914: C2: Reimplement Type::join In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 03:12:05 GMT, Quan Anh Mai wrote: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Impressive work @merykitty! It would be good if you could explain how your new lattice definition compares to our current definition. Especially, in regard to the completeness of join, which we currently get for free from symmetry and a complete meet, and associativity and distributivity and how we have to implement the relation in Value() to keep those properties. While I see that your definition is potentially easier to reason about, I think that you first need to convince everyone that your definition has no significant drawbacks to what we currently have simply because this change is so fundamental. src/hotspot/share/opto/type.cpp line 977: > 975: }; > 976: > 977: void Type::check_symmetrical(const Type* t1, const Type* t2, VerifyMeet& verify) { This will need a rename ;) ------------- PR Review: https://git.openjdk.org/jdk/pull/28051#pullrequestreview-3398626162 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2477334295 From mchevalier at openjdk.org Thu Oct 30 10:26:17 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 30 Oct 2025 10:26:17 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v10] In-Reply-To: References: Message-ID: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Fewer flag combinations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27586/files - new: https://git.openjdk.org/jdk/pull/27586/files/7db83901..6e556d6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27586&range=08-09 Stats: 36 lines in 1 file changed: 10 ins; 26 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27586/head:pull/27586 PR: https://git.openjdk.org/jdk/pull/27586 From mchevalier at openjdk.org Thu Oct 30 10:26:18 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 30 Oct 2025 10:26:18 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v9] In-Reply-To: References: <9T_qIFDSxnt0RfSKknq6jkZnSlkEHslHL5NuquhMAOI=.6b7dc2e5-1341-4b9b-bbca-27d0eaca5d78@github.com> Message-ID: On Tue, 28 Oct 2025 10:40:44 GMT, Christian Hagedorn wrote: >> That's the one in the reproducer you've crafted that give a simpler graph, if I remember correctly. I think it's valuable because the graph shape is different so it might trigger some asserts differently, exercise other paths, and if it breaks again, maybe someone who will have to look at it will be happy to find a run with a simpler graph. Maybe I can add in the summary that "if it helps investigate an issue, the @run 3 and 5 (with more flags) are expected to give a simpler graph". > > Thanks for the explanation. But it would also trigger without the additional flags? For the reproducer, I just disabled as many optimizations as possible to get an easier graph which I often do while debugging. The problem I see is that we could define such additional runs in many of our tests to get some simpler or different graph shapes. But I would argue that this should rather be part of a separate stress job instead. This also keeps the execution time short for tier1. But you could certainly leave a comment in the test how to get a simpler graph if required. I've removed these runs with additional flags and commented that one can use them to simplify graph in case they need to investigate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27586#discussion_r2477357852 From epeter at openjdk.org Thu Oct 30 10:37:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 10:37:55 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: On Tue, 28 Oct 2025 09:52:54 GMT, Roland Westrelin wrote: >> We already transform: >> >> (LShiftX (AddX a con0), con1) into (AddX (LShiftX a con1) con0<> >> THis is a variant with SubX. I found that this helps RCE. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespace Looks good to me! I closed https://bugs.openjdk.org/browse/JDK-8359688 as a duplicate, since you were able to enable the IR rule here. Amazing, one more MemorySegment case fixed :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27842#pullrequestreview-3398728127 From roland at openjdk.org Thu Oct 30 10:37:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 30 Oct 2025 10:37:56 GMT Subject: RFR: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: On Thu, 16 Oct 2025 12:11:31 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace > > LGTM @merykitty @benoitmaillard @eme64 thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/27842#issuecomment-3467252378 From roland at openjdk.org Thu Oct 30 10:37:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 30 Oct 2025 10:37:58 GMT Subject: Integrated: 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: <9bFkcaxVVCBhXOpokcS1lMtBDhCyf4eCeWc6wN7Jkpo=.78d2d6b6-4f7e-4361-a361-bab524c35fb2@github.com> Message-ID: On Thu, 16 Oct 2025 09:36:03 GMT, Roland Westrelin wrote: > We already transform: > > (LShiftX (AddX a con0), con1) into (AddX (LShiftX a con1) con0< > THis is a variant with SubX. I found that this helps RCE. This pull request has now been integrated. Changeset: 80fcfaf4 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/80fcfaf41aa2d6af30f15877e4466647dbca424e Stats: 251 lines in 5 files changed: 151 ins; 98 del; 2 mod 8369435: C2: transform (LShiftX (SubX con0 a), con1) into (SubX con0< References: Message-ID: On Thu, 30 Oct 2025 10:23:00 GMT, Manuel H?ssig wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > Impressive work @merykitty! > > It would be good if you could explain how your new lattice definition compares to our current definition. Especially, in regard to the completeness of join, which we currently get for free from symmetry and a complete meet, and associativity and distributivity and how we have to implement the relation in Value() to keep those properties. > > While I see that your definition is potentially easier to reason about, I think that you first need to convince everyone that your definition has no significant drawbacks to what we currently have simply because this change is so fundamental. @mhaessig Thanks for taking a look. > It would be good if you could explain how your new lattice definition compares to our current definition. The thing is that it is easier and more intuitive to think of a `Type` object as a set of possible values. For example, a `TypeInt` represents a set of `int` values, and a node (e.g. an `AddI`) at runtime must take values that are elements of the `TypeInt` of that node. Thinking of a `Type` as a lattice point is both more unintuitive and easier to misstep. > Especially, in regard to the completeness of join, which we currently get for free from symmetry and a complete meet It is not free, though. It requires us to prove that the type representation is symmetric and implement `xmeet` in a way that satisfies such symmetry. If the symmetry is not trivial, such as in the case of `TypeInstPtr`, it results in a meet that is harder to understand than implementing meet and join separately. It is because instead of reasoning about sets and their elements, we have to keep track of the core symmetry, and how our lattice points work with such symmetry, definitely not free. The core issue is that the existence of dual types doubles the set of values that can participate in a set. For example, in `TypeInstPtr`, instead of implementing a meet and a join, each with 9 possible combinations, all of which are intuitive (because they represent real sets), we have to implement a meet that takes into consideration 25 input combinations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3467357470 From chagedorn at openjdk.org Thu Oct 30 11:21:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 30 Oct 2025 11:21:34 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v10] In-Reply-To: References: Message-ID: <4Ng3t3NYSubp0itC5Y8ejNlbSYgb6lUAj18KqL5Ob9w=.f6929de5-501c-400a-b178-58272504c72d@github.com> On Thu, 30 Oct 2025 10:26:17 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Fewer flag combinations Thanks for the update, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27586#pullrequestreview-3399006352 From epeter at openjdk.org Thu Oct 30 11:38:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 11:38:36 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 16:38:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. > > ==================== > > In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. > As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. > > [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 > > Thanks! > > Tests running... That looks really good, thanks for writing all the tests, that's amazing :) I'll run some internal tests now... ------------- PR Review: https://git.openjdk.org/jdk/pull/28047#pullrequestreview-3399114367 From epeter at openjdk.org Thu Oct 30 12:26:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 12:26:07 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v4] In-Reply-To: References: Message-ID: <1UNdzkgCUH6tju9WzaTQaBdeT8Xv9T4TWnk2Jg3SMoA=.6ee10e45-8c73-444a-a9da-ca0c03bdaf79@github.com> On Mon, 27 Oct 2025 08:42:37 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. >> >> This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). >> >> However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. >> >> This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. >> As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. >> >> ```c++ >> ... >> // Global Value Numbering >> i = hash_find_insert(k); // Check for pre-existing node >> if (i && (i != k)) { >> // Return the pre-existing node if it isn't dead >> NOT_PRODUCT(set_progress();) >> add_users_to_worklist(k); >> subsume_node(k, i); // Everybody using k now uses i >> return i; >> } >> ... >> >> >> The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. >> >> ### Proposed Fix >> >> We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) >> - [x] tier1-3, plus some internal testing >> - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Add -XX:+StressIGVN to run without fixed seed Looks good to me :) src/hotspot/share/opto/phaseX.cpp line 2568: > 2566: // ConvI2F->ConvF2I->ConvI2F > 2567: // Note: there may be other 3-nodes conversion chains that would require to be added here, but these > 2568: // are the only ones that are known to trigger missed optimizations otherwise You may want to update the description, and give a bit of extra information. Because you are saying `n` does not have to be a conversion, but it may be that `n` is about to be replaced with a conversion, right? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27900#pullrequestreview-3399343300 PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2477902652 From epeter at openjdk.org Thu Oct 30 12:29:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 12:29:57 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 08:16:21 GMT, Beno?t Maillard wrote: >> This PR prevents hitting an assert caused by encountering `top` while following the memory >> slice associated with a field when eliminating allocations in macro node elimination. This situation >> is the result of another elimination (boxing node elimination) that happened at the same >> macro expansion iteration. >> >> ### Analysis >> >> The issue appears in the macro expansion phase. We have a nested `synchronized` block, >> with the outer block synchronizing on `new A()` and the inner one on `TestTopInMacroElimination.class`. >> In the inner `synchronized` block we have an `Integer.valueOf` call in a loop. >> >> In `PhaseMacroExpand::eliminate_boxing_node` we are getting rid of the `Integer.valueOf` >> call, as it is a non-escaping boxing node. After having eliminated the call, >> `PhaseMacroExpand::process_users_of_allocation` takes care of the users of the removed node. >> There, we replace usages of the fallthrough memory projection with `top`. >> >> In the same macro expansion iteration, we later attempt to get rid of the `new A()` allocation >> in `PhaseMacroExpand::create_scalarized_object_description`. There, we have to make >> sure that all safepoints can still see the object fields as if the allocation was never deleted. >> For this, we attempt to find the last value on the slice of each specific field (`a` >> in this case). Because field `a` is never written to, and it is not explicitely initialized, >> there is no `Store` associated to it and not even a dedicated memory slice (we end up >> taking the `Bot` input on `MergeMem` nodes). By going up the memory chain, we eventually >> encounter the `MemBarReleaseLock` whose input was set to `top`. This is where the assert >> is hit. >> >> ### Proposed Fix >> >> In the end I opted for an analog fix as the similar [JDK-8325030](https://git.openjdk.org/jdk/pull/23104). >> If we get `top` from `scan_mem_chain` in `PhaseMacroExpand::value_from_mem`, then we can safely >> return `top` as well. This means that the safepoint will have `top` as data input, but this will >> eventually cleaned up by the next round of IGVN. >> >> Another (tempting) option would have been to simply return `nullptr` from `PhaseMacroExpand::value_from_mem` when encoutering `top`. However this would result in bailing >> out from eliminating this allocation temporarily and effectively delaying it to a subsqequent >> macro expansion round. >> >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8362832)... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Daniel Lund?n src/hotspot/share/opto/macro.cpp line 506: > 504: } else if (mem->is_top()) { > 505: // The slice is on a dead path. Returning top prevents bailing out > 506: // from the elimination, and IGVN can later clean up. You could make it more specific, and say what you say in your PR description: `return nullptr` would lead to elimination bailout, but we want to prevent that. Just forwarding the `top` is also legal, and `IGVN` can just clean things up, and remove whatever receives top. Does this mean that there could be paths that don't get `top`, and so for those paths it is nice that we are able to remove the allocation, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27903#discussion_r2477913239 From qamai at openjdk.org Thu Oct 30 12:30:56 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 30 Oct 2025 12:30:56 GMT Subject: RFR: 8370914: C2: Reimplement Type::join In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 03:12:05 GMT, Quan Anh Mai wrote: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Furthermore, I believe our current representation is also not fully symmetric. For example, when meeting 2 `TypeInstPtr` with `TopPTR`, instead of intersecting their `_interfaces`, we union them. This is because when joining 2 `TypeInstPtr` with `BotPTR`, the duals of them would be 2 `TopPTR`, and since it is a join, we need to union their `_interfaces`. However, if we are really meeting 2 `TopPTR`, unioning the `_interfaces` would definitely be incorrect, meeting the `_interfaces` requires us to intersect their set of interfaces. As a result, it can be seen that the symmetry of `TypeInstPtr` relies on the fact that a "normal" `TopPTR` must have empty `_interfaces`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3467753246 From mhaessig at openjdk.org Thu Oct 30 12:39:20 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 30 Oct 2025 12:39:20 GMT Subject: RFR: 8370914: C2: Reimplement Type::join In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 03:12:05 GMT, Quan Anh Mai wrote: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... I'm not trying to argue for symmetry. All I am trying to say is that our current type lattice has some properties, mainly associativity and distributivity, that let us apply meets and joins in arbitrary order and reach the same optimal result. Also, we need a lattice, otherwise not every pair of types have a meet or join, which would be quite cumbersome/impossible to use. So I want to understand mathematically how you uphold these properties. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3467781324 From qamai at openjdk.org Thu Oct 30 13:23:46 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 30 Oct 2025 13:23:46 GMT Subject: RFR: 8370914: C2: Reimplement Type::join In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 03:12:05 GMT, Quan Anh Mai wrote: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... For associativity and distributivity, I think we need to check for comformance of each implementation. Luckily, if for a `Type` object, these properties satisfy for each of its fields, then the properties satisfy for the object as a whole. > Also, we need a lattice, otherwise not every pair of types have a meet or join, which would be quite cumbersome/impossible to use. We actually rarely need to meet/join unrelated types (e.g joining a float and an int). As a result, disallowing those operations may help us catch unwanted errors. We still have `Type::TOP` and `Type::BOTTOM`, though. Anyway, thinking of `Type`s as lattice points is both unnecessary and makes the problems harder to comprehend. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3467969137 From epeter at openjdk.org Thu Oct 30 15:59:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 15:59:08 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Tue, 28 Oct 2025 22:25:09 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Fix merge That is how far I got today. Only was able to look at details, I'll have to get the overview tomorrow or next week. Thanks for keeping on working on this, this is a complex one! I left off at `reachability.cpp` at `PhaseIdealLoop::insert_rf`, marking it for myself ;) src/hotspot/share/opto/c2_globals.hpp line 86: > 84: \ > 85: product(bool, StressReachabilityFences, false, DIAGNOSTIC, \ > 86: "Aggressively insert reachability fences for all oop arguments") \ It could be nice if you gave some more detail here, what these flags do. src/hotspot/share/opto/compile.cpp line 4029: > 4027: Node* in = n->in(j); > 4028: if (in->is_DecodeNarrowPtr() && (is_uncommon || !in->has_non_debug_uses())) { > 4029: n->set_req(j, in->in(1)); Can you say why you changed this code here? Is it equivalent? src/hotspot/share/opto/escape.cpp line 1230: > 1228: SafePointNode* sfpt = safepoints.at(spi)->as_SafePoint(); > 1229: > 1230: sfpt->remove_non_debug_edges(non_debug_edges_worklist); This looks a bit "hacky". Can you add some code comments why we need to do it this way? src/hotspot/share/opto/loopTransform.cpp line 76: > 74: return head()->as_OuterStripMinedLoop()->outer_loop_exit(); > 75: } else { > 76: // For now, conservatively report multiple loop exits exist. Can this happen? Do you have an example? src/hotspot/share/opto/loopnode.cpp line 5119: > 5117: if (stop_early) { > 5118: assert(do_expensive_nodes || do_optimize_reachability_fences, "why are we here?"); > 5119: if (do_optimize_reachability_fences && optimize_reachability_fences()) { Can you explain why you call `optimize_reachability_fences` here and also below? src/hotspot/share/opto/loopnode.hpp line 1150: > 1148: > 1149: void remove_dead_node(Node* dead) { > 1150: assert(dead->outcnt() == 0 && !dead->is_top(), "node must be dead"); Could you assert `dead->is_dead()` here? We should probably also not call this on a `CFG` node, otherwise we might destroy the "ctrl forwarding", see: https://git.openjdk.org/jdk/pull/27892 I'm only putting so much scrutiny here, because you are adding a new public method to `PhaseIdealLoop`, and that would require that it is clear how to use it, and not to use it. src/hotspot/share/opto/reachability.cpp line 89: > 87: // In terms of dominance relation it can be formulated as "a referent has a user which is dominated by the redundant RF". > 88: // Until loop opts are over, only RF nodes are considered as usages (controlled by rf_only flag). > 89: static bool is_redundant_rf_helper(Node* ctrl, Node* referent, PhaseIdealLoop* phase, PhaseGVN& gvn, bool rf_only) { Nit: `_helper` is fine if it is used as some internal method, i.e. only `is_redundant_rf` uses `is_redundant_rf_helper`. But it seems you are using it from different places. Can you find a better name? src/hotspot/share/opto/reachability.cpp line 102: > 100: return true; > 101: } > 102: } Can you explain in a code comment? src/hotspot/share/opto/reachability.cpp line 118: > 116: } > 117: } else { > 118: assert(rf_only, ""); Does `phase == nullptr` imply `rf_only`? If so, you should add an assert at the top of the method. src/hotspot/share/opto/reachability.cpp line 180: > 178: return false; // uncommon traps are exit points > 179: } > 180: return true; Looks like we have established "significance" by principle of exclusion. That feels a little brittle, what if there is yet another category we would have to exclude? Would that lead to correctness issues, or only be inefficient? Also: "significant" is a bit of a vague term. Significant for what? "reachability tracking purposes", of course, we are in `reachability.hpp` ;) But can you be more specific? ------------- PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-3399439848 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2477973204 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478484569 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478466641 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478491570 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478507770 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478533282 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478583211 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478587566 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478608309 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478647914 From epeter at openjdk.org Thu Oct 30 15:59:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 15:59:10 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 15:03:48 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix merge > > src/hotspot/share/opto/escape.cpp line 1230: > >> 1228: SafePointNode* sfpt = safepoints.at(spi)->as_SafePoint(); >> 1229: >> 1230: sfpt->remove_non_debug_edges(non_debug_edges_worklist); > > This looks a bit "hacky". Can you add some code comments why we need to do it this way? Same for the other occurances ;) > src/hotspot/share/opto/reachability.cpp line 89: > >> 87: // In terms of dominance relation it can be formulated as "a referent has a user which is dominated by the redundant RF". >> 88: // Until loop opts are over, only RF nodes are considered as usages (controlled by rf_only flag). >> 89: static bool is_redundant_rf_helper(Node* ctrl, Node* referent, PhaseIdealLoop* phase, PhaseGVN& gvn, bool rf_only) { > > Nit: `_helper` is fine if it is used as some internal method, i.e. only `is_redundant_rf` uses `is_redundant_rf_helper`. But it seems you are using it from different places. Can you find a better name? Can you write a comment what `ctrl`? Is it the `referent_ctrl`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478471846 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478596033 From epeter at openjdk.org Thu Oct 30 15:59:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Oct 2025 15:59:14 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 15:39:50 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/reachability.cpp line 89: >> >>> 87: // In terms of dominance relation it can be formulated as "a referent has a user which is dominated by the redundant RF". >>> 88: // Until loop opts are over, only RF nodes are considered as usages (controlled by rf_only flag). >>> 89: static bool is_redundant_rf_helper(Node* ctrl, Node* referent, PhaseIdealLoop* phase, PhaseGVN& gvn, bool rf_only) { >> >> Nit: `_helper` is fine if it is used as some internal method, i.e. only `is_redundant_rf` uses `is_redundant_rf_helper`. But it seems you are using it from different places. Can you find a better name? > > Can you write a comment what `ctrl`? Is it the `referent_ctrl`? Ah no, in all cases I could see it was actually the `rf` itself, right? Why not give it a more specific name? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478599560 From duke at openjdk.org Thu Oct 30 16:05:01 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 30 Oct 2025 16:05:01 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: <4J2beErwndG7oZtRcQ1t6Fmpd-kATlHH-S9KnSEluSA=.2b865c8c-893f-4588-b821-7049efb01328@github.com> References: <4J2beErwndG7oZtRcQ1t6Fmpd-kATlHH-S9KnSEluSA=.2b865c8c-893f-4588-b821-7049efb01328@github.com> Message-ID: On Wed, 29 Oct 2025 14:56:14 GMT, Leonid Mesnik wrote: >>> Ah, so _there_! I confused myself. This one is readable: the counter `0` means we can free. It would be even better if you did `inc_immutable_data_refcount()` and `dec_immutable_data_refcount()`, and did e.g.: >>> >>> ``` >>> if (dec_immutable_data_refcount() == 0) { >>> os::free(_immutable_data); >>> } >>> >>> int dec_immutable_data_refcount() { >>> int refcount = get(...); >>> assert(refcount > 0, "Must be positive"); >>> set(refcount - 1); >>> return refcount - 1; >>> } >>> ``` >>> >>> Because the next thing you know this would need to be replaced with Atomics a year later. >> >> I agree this makes the code cleaner. >> >> I replaced the getter and setter for the counter with `init_immutable_data_ref_count`, `inc_immutable_data_ref_count`, and `dec_immutable_data_ref_count`. I also shortened the counter name from `immutable_data_references_counter` to `immutable_data_ref_count` >> >> I modified `NMethod.java` to calculate the offsets that same way as is done in the JVM. I missed this in [JDK-8369642](https://bugs.openjdk.org/browse/JDK-8369642) >> >> The last notable change is that I modified the [immutable data size calculation](https://github.com/chadrako/jdk/blob/26bdc3ceb4ab9ad9cb9a4218bb87ce2d7546fa22/src/hotspot/share/code/nmethod.cpp#L1155) to only include a reference counter if there is immutable data > > @chadrako The testing pass now. @lmesnik @vnkozlov Is testing still running or can this patch be integrated? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3468766652 From kvn at openjdk.org Thu Oct 30 16:23:40 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 Oct 2025 16:23:40 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: <4J2beErwndG7oZtRcQ1t6Fmpd-kATlHH-S9KnSEluSA=.2b865c8c-893f-4588-b821-7049efb01328@github.com> References: <4J2beErwndG7oZtRcQ1t6Fmpd-kATlHH-S9KnSEluSA=.2b865c8c-893f-4588-b821-7049efb01328@github.com> Message-ID: <3u2t7aKEFuXHd7SgkSw4AxPGmD6TKdZWA1niXn0uoS4=.c7ddd0d5-b37c-4980-8f52-aedd5dd55baa@github.com> On Wed, 29 Oct 2025 14:56:14 GMT, Leonid Mesnik wrote: >>> Ah, so _there_! I confused myself. This one is readable: the counter `0` means we can free. It would be even better if you did `inc_immutable_data_refcount()` and `dec_immutable_data_refcount()`, and did e.g.: >>> >>> ``` >>> if (dec_immutable_data_refcount() == 0) { >>> os::free(_immutable_data); >>> } >>> >>> int dec_immutable_data_refcount() { >>> int refcount = get(...); >>> assert(refcount > 0, "Must be positive"); >>> set(refcount - 1); >>> return refcount - 1; >>> } >>> ``` >>> >>> Because the next thing you know this would need to be replaced with Atomics a year later. >> >> I agree this makes the code cleaner. >> >> I replaced the getter and setter for the counter with `init_immutable_data_ref_count`, `inc_immutable_data_ref_count`, and `dec_immutable_data_ref_count`. I also shortened the counter name from `immutable_data_references_counter` to `immutable_data_ref_count` >> >> I modified `NMethod.java` to calculate the offsets that same way as is done in the JVM. I missed this in [JDK-8369642](https://bugs.openjdk.org/browse/JDK-8369642) >> >> The last notable change is that I modified the [immutable data size calculation](https://github.com/chadrako/jdk/blob/26bdc3ceb4ab9ad9cb9a4218bb87ce2d7546fa22/src/hotspot/share/code/nmethod.cpp#L1155) to only include a reference counter if there is immutable data > > @chadrako The testing pass now. > @lmesnik @vnkozlov Is testing still running or can this patch be integrated? I don't know what tests @lmesnik ran. That is why I asked him. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3468863824 From kvn at openjdk.org Thu Oct 30 16:51:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 Oct 2025 16:51:38 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: <28JzI3gxRKU4fIQ7544Xl9MW1ZGYR7IazhAsMPilpF0=.6c993c80-ab41-49e1-9084-98e226ac7c8a@github.com> Message-ID: <8qC94yLfWCHW9yKuRo8KI8Sm1rMCapA6QtTgwVBflZI=.b5074b5f-9ef3-4fd3-813d-987049c7eeb6@github.com> On Thu, 30 Oct 2025 06:51:14 GMT, Emanuel Peter wrote: > So where would I move process_for_merge_stores_igvn in that follow-up RFE? >From EA POV moving it after last macro nodes elimination and before `expand_barriers` is fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3469007075 From kvn at openjdk.org Thu Oct 30 16:44:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 Oct 2025 16:44:52 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v3] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 09:23:17 GMT, Emanuel Peter wrote: >> Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 >> >> **Analysis** >> We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. >> During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. >> But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. >> >> Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. >> Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. >> >> **Solution** >> We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. >> >> But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: >> https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 >> (the idea is to bail out of the elimination if any of the found stores are mismatched.) >> >> **Details** >> >> How the bad sequence develops, and which components are involved. >> >> 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) >> >> 6 ConI === 23 [[ 4 ]] #int:16777216 >> 7 ConI === 23 [[ 4 ]] #int:256 >> 8 ConI === 23 [[ 4 ]] #int:1048576 >> 9 ConL === 23 [[ 4 ]] #long:68719476737 >> 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] >> 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) >> >> >> 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: >> https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 >> If I understand it right, there zero is just a placeholder. >> >> And so we get: >> >> (rr) p sv->print_fields_on(tty) >> Fields: 0, 68719476737, 1048576, 256, 167772... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm assert, guard with INCLUDE_JVMCI To be clear, as I said before, current changes are fine and should be done anyway. All other improvements can be done separately. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27997#pullrequestreview-3400616527 From kvn at openjdk.org Thu Oct 30 17:06:34 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 Oct 2025 17:06:34 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v3] In-Reply-To: References: Message-ID: <2FxiDgEOjVKfibJ1PGDPLUZnnQPeOTWD_i9wMtfWPbQ=.d0c59a04-d6ae-4988-b2fb-9d34da3c2223@github.com> On Thu, 30 Oct 2025 09:23:17 GMT, Emanuel Peter wrote: >> Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 >> >> **Analysis** >> We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. >> During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. >> But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. >> >> Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. >> Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. >> >> **Solution** >> We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. >> >> But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: >> https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 >> (the idea is to bail out of the elimination if any of the found stores are mismatched.) >> >> **Details** >> >> How the bad sequence develops, and which components are involved. >> >> 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) >> >> 6 ConI === 23 [[ 4 ]] #int:16777216 >> 7 ConI === 23 [[ 4 ]] #int:256 >> 8 ConI === 23 [[ 4 ]] #int:1048576 >> 9 ConL === 23 [[ 4 ]] #long:68719476737 >> 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] >> 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) >> >> >> 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: >> https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 >> If I understand it right, there zero is just a placeholder. >> >> And so we get: >> >> (rr) p sv->print_fields_on(tty) >> Fields: 0, 68719476737, 1048576, 256, 167772... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm assert, guard with INCLUDE_JVMCI I filed [JDK-8370964](https://bugs.openjdk.org/browse/JDK-8370964) to investigate similar issue with `ClearArrayNode`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3469063547 From hgreule at openjdk.org Thu Oct 30 18:23:46 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 30 Oct 2025 18:23:46 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v3] In-Reply-To: References: Message-ID: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27886/files - new: https://git.openjdk.org/jdk/pull/27886/files/6a8d842f..c32bb551 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=01-02 Stats: 20 lines in 1 file changed: 11 ins; 5 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27886/head:pull/27886 PR: https://git.openjdk.org/jdk/pull/27886 From lmesnik at openjdk.org Thu Oct 30 21:59:05 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 30 Oct 2025 21:59:05 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: References: <4J2beErwndG7oZtRcQ1t6Fmpd-kATlHH-S9KnSEluSA=.2b865c8c-893f-4588-b821-7049efb01328@github.com> Message-ID: <1eHT8RSOFH39pbWnb5UcaXpr8Mzs132SLqFvpP-q_vQ=.15a6efcc-d413-42f1-b1b0-0088cf38f6d8@github.com> On Thu, 30 Oct 2025 16:02:39 GMT, Chad Rakoczy wrote: >> @chadrako The testing pass now. > > @lmesnik @vnkozlov Is testing still running or can this patch be integrated? @chadrako @vnkozlov The test finished and passed. I added confidential comment with link in JBS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3470415015 From kvn at openjdk.org Thu Oct 30 22:40:12 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 Oct 2025 22:40:12 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: References: <4J2beErwndG7oZtRcQ1t6Fmpd-kATlHH-S9KnSEluSA=.2b865c8c-893f-4588-b821-7049efb01328@github.com> Message-ID: On Thu, 30 Oct 2025 16:02:39 GMT, Chad Rakoczy wrote: >> @chadrako The testing pass now. > > @lmesnik @vnkozlov Is testing still running or can this patch be integrated? > @chadrako @vnkozlov The test finished and passed. I added confidential comment with link in JBS. Thank you @lmesnik. It seems only failed test was run. I will submit additional testing (hs-tier1-5) and let you know results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3470504056 From duke at openjdk.org Thu Oct 30 23:55:18 2025 From: duke at openjdk.org (duke) Date: Thu, 30 Oct 2025 23:55:18 GMT Subject: Withdrawn: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type In-Reply-To: References: Message-ID: On Wed, 6 Aug 2025 23:33:23 GMT, Francisco Ferrari Bihurriet wrote: > Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. > > With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. > > #### Testing > > In order to evaluate the changes, the following testing has been performed: > > * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) > * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) > * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call > * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) > * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ > * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26666 From xgong at openjdk.org Fri Oct 31 01:21:08 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 31 Oct 2025 01:21:08 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 08:50:25 GMT, erifan wrote: >> Maybe it would be enough to just remove the `, IRNode.VECTOR_SIZE_64`, so that the test could check for the largest vector length available on the platform? > > Hi @eme64 thanks for your review. As the vector species is fixed `ByteVector.SPECIES_512`, removing `IRNode.VECTOR_SIZE_64` doesn't work. Adding a rule for cases where `MaxVectorSize > 64` makes sense to me. I'll add the rule in the next commit, thanks~ I'm afraid that there is not a machine which really runs with `MaxVectorSize > 64` both on X86 and AArch64. Can we just check the `MaxVectorSize = 64` case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27723#discussion_r2479908586 From xgong at openjdk.org Fri Oct 31 01:26:03 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 31 Oct 2025 01:26:03 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3] In-Reply-To: References: <6R47fPRorPlTKGZrk-KbM0ipHtyu7INOyJlaXMyfBLo=.1c9d45fa-add6-462c-9426-8830db3077b4@github.com> Message-ID: On Thu, 30 Oct 2025 07:19:50 GMT, Emanuel Peter wrote: >> Hi @eme64 , I updated a commit which mainly changes the comments. The function name `mask_op_prefers_predicate` remains unchanged. After giving it careful thought overnight, I believe this name is more accurate. I?m sorry if my earlier explanation caused any confusion. Would you mind checking whether it's fine to you? Thanks! > > Nice, it looks much better now, thanks for the updates :) Thanks for your help on reviewing this part @eme64 @erifan ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2479914775 From xgong at openjdk.org Fri Oct 31 01:39:03 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 31 Oct 2025 01:39:03 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v5] In-Reply-To: <5YRBuKmUoT6YIb0tbKzavjTITGfECr1mUpO3suWQyww=.1a4b619c-a222-440f-9773-0fcc843c9988@github.com> References: <5YRBuKmUoT6YIb0tbKzavjTITGfECr1mUpO3suWQyww=.1a4b619c-a222-440f-9773-0fcc843c9988@github.com> Message-ID: On Thu, 30 Oct 2025 07:27:43 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comments > > test/jdk/jdk/incubator/vector/Long128VectorTests.java line 6850: > >> 6848: var vmask = VectorMask.fromLong(SPECIES, inputLong); >> 6849: // Insert "not()" to avoid the "fromLong/toLong" being optimized out by compiler. >> 6850: long outputLong = vmask.not().toLong(); > > That sounds a bit fragile. Is there something that would catch if it did ever get optimized away? I'm not sure. But currently `fromLong` + `toLong` would be identified to a long input: https://github.com/openjdk/jdk/blob/6347f10bf1dd3959cc1f2aba32e72ca8d9d56e82/src/hotspot/share/opto/vectornode.cpp#L1926-L1931 So the original tests cannot test these two APIs exactly. But as a smoke test, it was used to verify the correctness of java-level APIs instead of the hotspot intrinsification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2479931719 From xgong at openjdk.org Fri Oct 31 01:43:06 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 31 Oct 2025 01:43:06 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v5] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 10:13:03 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update comments Hi @sviswa7, @jatin-bhateja , could you please help take a look at this PR especially the X86 changes? Thanks so much! Hi @PaulSandoz , @sviswa7, would you mind taking look at the changes on jdk.incubator.vector tests part? It would be more helpful if I can get any feedback from you. Thanks a lot in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3470974812 From duke at openjdk.org Fri Oct 31 02:19:07 2025 From: duke at openjdk.org (erifan) Date: Fri, 31 Oct 2025 02:19:07 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 01:18:00 GMT, Xiaohong Gong wrote: >> Hi @eme64 thanks for your review. As the vector species is fixed `ByteVector.SPECIES_512`, removing `IRNode.VECTOR_SIZE_64` doesn't work. Adding a rule for cases where `MaxVectorSize > 64` makes sense to me. I'll add the rule in the next commit, thanks~ > > I'm afraid that there is not a machine which really runs with `MaxVectorSize > 64` both on X86 and AArch64. Can we just check the `MaxVectorSize = 64` case? Yes, we currently do not have any >128 bits SVE2 machines. According to https://github.com/openjdk/jdk/blob/4f9f086847f531ab1791727d74955cfd8ec56811/src/hotspot/cpu/aarch64/aarch64_vector.ad#L273 these cases are currently unsupported. This is why the test failed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27723#discussion_r2479973852 From xgong at openjdk.org Fri Oct 31 02:21:09 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 31 Oct 2025 02:21:09 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v11] In-Reply-To: References: <8T7swIJ17tLLg4FO_N5UZ0HsMYrz31ywBiMZohefGTE=.386eeb0d-8541-4c35-8a68-6caf31ea867e@github.com> Message-ID: <1lf_Ps_3CAik5IvWKzPzNwuCz7-Py1uAl2Ce1WLXfC4=.30671958-61ef-48a1-bb36-b9fdfbd449d7@github.com> On Wed, 10 Sep 2025 15:54:24 GMT, Mikhail Ablakatov wrote: > > Do you intend to ignore ops with >32B vector size? May I ask the reason? > > The reason is the lack of relevant hardware. The only publicly available platform that implements 512b SVE I'm aware of is Fujitsu A64FX. I used to have access to that platform but no longer which makes it difficult to test and benchmark changes for 512b SVE. Stripping that functionality and keeping the implementation in bounds of 256b SVE reduces complexity of this patch. > The 512-bit SVE cases can be tested with QEMU. I'm fine with this PR. I will trigger a test with my AArch64 environment which support 512-bit SVE. I assume you'v verified the correctness on other SVE machines? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-3471039132 From dlong at openjdk.org Fri Oct 31 03:12:01 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 31 Oct 2025 03:12:01 GMT Subject: RFR: 8370914: C2: Reimplement Type::join In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 03:12:05 GMT, Quan Anh Mai wrote: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... I like this idea, but even if it is 100% correct, how do we prove that? Can we exhaustively test all possibilities? We might consider introducing the new join as a parallel implementation, so it can compare the result with the old join as a sanity check. Then after we are convinced of its correctness, we remove the old join. As for unit testing, all we have is check_symmetrical(), right? With a change this fundamental, I think we will need correspondingly better tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3471128648 From kvn at openjdk.org Fri Oct 31 04:21:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 31 Oct 2025 04:21:07 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v5] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 01:26:43 GMT, Chad Rakoczy wrote: >> [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) >> >> [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add include to fix build issue My testing passed ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28008#pullrequestreview-3402541613 From epeter at openjdk.org Fri Oct 31 04:52:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 31 Oct 2025 04:52:07 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: <1MCzjBuHpcNGUkZNZAFWMVqRq7q_lTSVvlR3n3urWto=.743fc4de-d975-4954-9b1d-a02689dd75da@github.com> On Wed, 29 Oct 2025 16:38:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. > > ==================== > > In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. > As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. > > [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 > > Thanks! > > Tests running... Nice work! The tests are passing on my side :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28047#pullrequestreview-3402581124 From epeter at openjdk.org Fri Oct 31 05:50:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 31 Oct 2025 05:50:14 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer Message-ID: It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 This means that simple cases like these wrongly constant fold to zero: - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` - `Long.compress(x, 0xffff_ffffL)` ------------------------------------------------------------------ This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: public static test(int mask, int src) { mask = Math.max(CON1, Math.min(CON2, mask)); src = Math.max(CON2, Math.min(CON4, src)); result = Integer.compress(src, mask); int sum = 0; if (sum > LIMIT_1) { sum += 1; } if (sum > LIMIT_2) { sum += 2; } if (sum > LIMIT_3) { sum += 4; } if (sum > LIMIT_4) { sum += 8; } if (sum > LIMIT_5) { sum += 16; } if (sum > LIMIT_6) { sum += 32; } if (sum > LIMIT_7) { sum += 64; } if (sum > LIMIT_8) { sum += 128; } return new int[] {sum, result}; } What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to go on vacation, and was not able to catch the issue (https://github.com/openjdk/jdk/pull/23947). ------------- Commit messages: - the constant case - fix v1 - rm templated test - more tests - x7 - x6 - x5 - x4 - x3 - x2 - ... and 1 more: https://git.openjdk.org/jdk/compare/2c07214d...794ef7b3 Changes: https://git.openjdk.org/jdk/pull/28062/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28062&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370459 Stats: 111 lines in 2 files changed: 96 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/28062.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28062/head:pull/28062 PR: https://git.openjdk.org/jdk/pull/28062 From epeter at openjdk.org Fri Oct 31 05:55:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 31 Oct 2025 05:55:27 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v17] In-Reply-To: References: Message-ID: On Wed, 23 Jul 2025 01:52:54 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update intrinsicnode.cpp @jatin-bhateja @vnkozlov @TobiHartmann The TemplateFramework fuzzer found a bug in this bugfix: https://github.com/openjdk/jdk/pull/28062 It is quite subtle, and would have been hard to spot in the review. But we could have done a better job with tests. src/hotspot/share/opto/intrinsicnode.cpp line 276: > 274: if (maskcon != -1L) { > 275: int bitcount = population_count(static_cast(bt == T_INT ? maskcon & 0xFFFFFFFFL : maskcon)); > 276: hi = (1UL << bitcount) - 1; Replace `1UL` -> `1ULL` for Windows. src/hotspot/share/opto/intrinsicnode.cpp line 379: > 377: // We can further constrain the upper bound of bit compression if the number of bits > 378: // which can be set(one) is less than the maximum number of bits of integral type. > 379: hi = MIN2((jlong)((1UL << result_bit_width) - 1L), hi); Replace `1UL` -> `1ULL` for Windows. test/hotspot/jtreg/compiler/c2/gvn/TestBitCompressValueTransform.java line 541: > 539: public long test18(long src, long mask) { > 540: src = Math.max(BOUND_LO_L, Math.min(src, BOUND_HI_L)); > 541: long res = Long.compress(src, mask); Here and in all other similar tests: It seems we are only creating "random input ranges" for `src`, and not for `mask`. Bummer :frowning_face: ------------- PR Review: https://git.openjdk.org/jdk/pull/23947#pullrequestreview-3402676772 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2480227530 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2480228236 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2480230411 From xgong at openjdk.org Fri Oct 31 06:25:08 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 31 Oct 2025 06:25:08 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: Message-ID: On Wed, 17 Sep 2025 08:48:16 GMT, Xiaohong Gong wrote: >> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. >> >> ### Background >> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. >> >> ### Implementation >> >> #### Challenges >> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. >> >> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: >> - SPECIES_64: Single operation with mask (8 elements, 256-bit) >> - SPECIES_128: Single operation, full register (16 elements, 512-bit) >> - SPECIES_256: Two operations + merge (32 elements, 1024-bit) >> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) >> >> Use `ByteVector.SPECIES_512` as an example: >> - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. >> - It requires 4 times of vector gather-loads to finish the whole operation. >> >> >> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] >> int[] idx = [0, 1, 2, 3, ..., 63, ...] >> >> 4 gather-load: >> idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] >> idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] >> idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] >> idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] >> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] >> >> >> #### Solution >> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. >> >> Here is the main changes: >> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. >> - Added `VectorSliceNode` for result mer... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Add more comments for IRs and added method > - Merge branch 'jdk:master' into JDK-8351623-sve > - Merge 'jdk:master' into JDK-8351623-sve > - Address review comments > - Refine IR pattern and clean backend rules > - Fix indentation issue and move the helper matcher method to header files > - Merge branch jdk:master into JDK-8351623-sve > - 8351623: VectorAPI: Add SVE implementation of subword gather load operation Hi @iwanowww , @PaulSandoz , and @eme64 : I?ve recently completed a prototype that moves the implementation into the Java API level: [Refactor subword gather API in Java](https://github.com/XiaohongGong/jdk/pull/8). Do you think it would be a good time to open a draft PR for easier review? Below is a brief summary of the changes compared with the previous version. **Main idea** - Invoke VectorSupport.loadWithMap() multiple times in Java when needed, where each call handles a single vector gather load. - In the compiler, the gathered result is represented as an int vector and then cast to the original subword vector species. Cross-lane shifting aligns the elements correctly. - The partial results are merged in Java using the Vector.or() API. **Advantages** - No need to pass all vector indices to HotSpot. - The design is platform agnostic. **Limitations** - The Java implementation is less clean to accommodate compiler optimizations. - Compiler changes remain nontrivial due to required vector/mask casting, resizing, and slicing. - Additional IR ideal and match rules are needed for optimal SVE code generation. - The API's performance will **degrade significantly** (about 30% ~ 50%) on platforms that **do not** support compiler intrinsification. Since a single previous API call is now split into multiple calls that cannot be intrinsified, the overhead of generating multiple vector objects in pure Java can be substantial. Does this impact matter? I plan to rebase and update the compiler-change PR using the same node and match rules as well, so we can clearly compare both approaches. Any thoughts or feedback would be much appreciated. Thanks so much! Best Regards, Xiaohong ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3471488867 From xgong at openjdk.org Fri Oct 31 06:48:12 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 31 Oct 2025 06:48:12 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v11] In-Reply-To: <1lf_Ps_3CAik5IvWKzPzNwuCz7-Py1uAl2Ce1WLXfC4=.30671958-61ef-48a1-bb36-b9fdfbd449d7@github.com> References: <8T7swIJ17tLLg4FO_N5UZ0HsMYrz31ywBiMZohefGTE=.386eeb0d-8541-4c35-8a68-6caf31ea867e@github.com> <1lf_Ps_3CAik5IvWKzPzNwuCz7-Py1uAl2Ce1WLXfC4=.30671958-61ef-48a1-bb36-b9fdfbd449d7@github.com> Message-ID: On Fri, 31 Oct 2025 02:18:29 GMT, Xiaohong Gong wrote: > > > Do you intend to ignore ops with >32B vector size? May I ask the reason? > > > > > > The reason is the lack of relevant hardware. The only publicly available platform that implements 512b SVE I'm aware of is Fujitsu A64FX. I used to have access to that platform but no longer which makes it difficult to test and benchmark changes for 512b SVE. Stripping that functionality and keeping the implementation in bounds of 256b SVE reduces complexity of this patch. > > The 512-bit SVE cases can be tested with QEMU. I'm fine with this PR. I will trigger a test with my AArch64 environment which support 512-bit SVE. I assume you'v verified the correctness on other SVE machines? The tests pass on my side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-3471533386 From xgong at openjdk.org Fri Oct 31 06:48:10 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 31 Oct 2025 06:48:10 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v13] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 13:57:08 GMT, Mikhail Ablakatov wrote: >> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. >> >> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. >> >> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms >> >> >> Fujitsu A64FX (SVE 512-bit): >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Merge commit 'c8679713402186b24608fa4c91397b6a4fd5ebf3' into 8343689 > > Change-Id: Icfa70da585e034774e4ff0f60b8f0c9ce0598399 > - cleanup: remove redundand local variables > > Change-Id: I6fb6a9a7a236537612caa5d53c5516ed2f260bad > - cleanup: remove a trivial switch-case statement > > Change-Id: Ib914ce02ae9d88057cb0b88d4880df6ca64f8184 > - Assert the exact supported VL of 32B in SVE-specific methods > > Change-Id: I8768c653ff563cd8a7a75cd06a6523a9526d15ec > - cleanup: fix long line formatting > > Change-Id: I173e70a2fa9a45f56fe50d4a6b81699665e3433d > - fixup: remove VL asserts in match rules to fix failures on >= 512b SVE platforms > > Change-Id: I721f5a97076d645905ee1716f7d57ec8c90ef6e9 > - Merge branch 'master' into 8343689 > > Change-Id: Iebe758e4c7b3ab0de5f580199f8909e96b8c6274 > - cleanup: start the SVE Integer Misc - Unpredicated section > - Merge branch 'master' > - Address review comments and simplify the implementation > > - remove the loops from gt128b methods making them 256b only > - fixup: missed fnoregs in instruct reduce_mulL_256b > - use an extra vtmp3 reg for the 256b integer method > - remove a no longer needed change in reduce_mul_integral_le128b > - cleanup: unify comments > - ... and 14 more: https://git.openjdk.org/jdk/compare/c8679713...e564d6c1 LGTM! Thanks for your work! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/23181#pullrequestreview-3402778189 From jbhateja at openjdk.org Fri Oct 31 06:57:06 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 31 Oct 2025 06:57:06 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 14:51:53 GMT, Emanuel Peter wrote: > It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. > > This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. > > The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 > > This means that simple cases like these wrongly constant fold to zero: > - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` > - `Long.compress(x, 0xffff_ffffL)` > > ------------------------------------------------------------------ > > This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. > > Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) > > I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: > > public static test(int mask, int src) { > mask = Math.max(CON1, Math.min(CON2, mask)); > src = Math.max(CON2, Math.min(CON4, src)); > result = Integer.compress(src, mask); > int sum = 0; > if (sum > LIMIT_1) { sum += 1; } > if (sum > LIMIT_2) { sum += 2; } > if (sum > LIMIT_3) { sum += 4; } > if (sum > LIMIT_4) { sum += 8; } > if (sum > LIMIT_5) { sum += 16; } > if (sum > LIMIT_6) { sum += 32; } > if (sum > LIMIT_7) { sum += 64; } > if (sum > LIMIT_8) { sum += 128; } > return new int[] {sum, result}; > } > > What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. > > I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to go on vacation, and was not able to catch the issue (https://github.com/openj... Hi @eme64, Your fix looks good to me! Thanks for addressing this. Best Regards ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28062#pullrequestreview-3402795538 From jbhateja at openjdk.org Fri Oct 31 06:59:04 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 31 Oct 2025 06:59:04 GMT Subject: RFR: 8370409: Incorrect computation in Float16 reduction loop [v3] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 07:11:28 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Change GOLDEN constants to static variables > > @jatin-bhateja Thanks for fixing this! I have a few nits below. I'll run testing once you addressed them :) Hi @eme64 , please let me know if the results are green ------------- PR Comment: https://git.openjdk.org/jdk/pull/27977#issuecomment-3471554555 From thartmann at openjdk.org Fri Oct 31 07:28:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 31 Oct 2025 07:28:06 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 14:51:53 GMT, Emanuel Peter wrote: > It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. > > This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. > > The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 > > This means that simple cases like these wrongly constant fold to zero: > - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` > - `Long.compress(x, 0xffff_ffffL)` > > ------------------------------------------------------------------ > > This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. > > Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) > > I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: > > public static test(int mask, int src) { > mask = Math.max(CON1, Math.min(CON2, mask)); > src = Math.max(CON2, Math.min(CON4, src)); > result = Integer.compress(src, mask); > int sum = 0; > if (sum > LIMIT_1) { sum += 1; } > if (sum > LIMIT_2) { sum += 2; } > if (sum > LIMIT_3) { sum += 4; } > if (sum > LIMIT_4) { sum += 8; } > if (sum > LIMIT_5) { sum += 16; } > if (sum > LIMIT_6) { sum += 32; } > if (sum > LIMIT_7) { sum += 64; } > if (sum > LIMIT_8) { sum += 128; } > return new int[] {sum, result}; > } > > What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. > > I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to go on vacation, and was not able to catch the issue (https://github.com/openj... Looks good to me. Thank you for your persistence on improving our testing! Good that we caught this in-time for JDK 26. test/hotspot/jtreg/compiler/c2/gvn/TestBitCompressValueTransform.java line 679: > 677: @IR (counts = { IRNode.COMPRESS_BITS, " >0 " }, applyIfCPUFeature = { "bmi2", "true" }) > 678: public static long test20(int x) { > 679: // Analysis of when this used to produce wrong results on Windows: Suggestion: // Analysis of when this is used to produce wrong results on Windows: test/hotspot/jtreg/compiler/c2/gvn/TestBitCompressValueTransform.java line 700: > 698: // > 699: // But watch out: on windows 1UL is only a 32 bit value. Intended was probably 1ULL. > 700: // So when we caluculate "1UL << 32", we just get 1. And so then hi would be 0 now. Suggestion: // So when we calculate "1UL << 32", we just get 1. And so then hi would be 0 now. test/hotspot/jtreg/compiler/c2/gvn/TestBitCompressValueTransform.java line 704: > 702: // > 703: // We create type [lo, hi]: > 704: // windowns: [0, 0] -> constant zero Suggestion: // Windows: [0, 0] -> constant zero test/hotspot/jtreg/compiler/c2/gvn/TestBitCompressValueTransform.java line 728: > 726: @IR (counts = { IRNode.COMPRESS_BITS, " >0 " }, applyIfCPUFeature = { "bmi2", "true" }) > 727: public static long test21(long x) { > 728: // Analysis of when this used to produce wrong results on Windows: Suggestion: // Analysis of when this is used to produce wrong results on Windows: ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28062#pullrequestreview-3402858715 PR Review Comment: https://git.openjdk.org/jdk/pull/28062#discussion_r2480368185 PR Review Comment: https://git.openjdk.org/jdk/pull/28062#discussion_r2480369611 PR Review Comment: https://git.openjdk.org/jdk/pull/28062#discussion_r2480369974 PR Review Comment: https://git.openjdk.org/jdk/pull/28062#discussion_r2480368555 From epeter at openjdk.org Fri Oct 31 07:33:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 31 Oct 2025 07:33:20 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v2] In-Reply-To: References: Message-ID: > It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. > > This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. > > The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 > > This means that simple cases like these wrongly constant fold to zero: > - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` > - `Long.compress(x, 0xffff_ffffL)` > > ------------------------------------------------------------------ > > This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. > > Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) > > I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: > > public static test(int mask, int src) { > mask = Math.max(CON1, Math.min(CON2, mask)); > src = Math.max(CON2, Math.min(CON4, src)); > result = Integer.compress(src, mask); > int sum = 0; > if (sum > LIMIT_1) { sum += 1; } > if (sum > LIMIT_2) { sum += 2; } > if (sum > LIMIT_3) { sum += 4; } > if (sum > LIMIT_4) { sum += 8; } > if (sum > LIMIT_5) { sum += 16; } > if (sum > LIMIT_6) { sum += 32; } > if (sum > LIMIT_7) { sum += 64; } > if (sum > LIMIT_8) { sum += 128; } > return new int[] {sum, result}; > } > > What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. > > I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to go on vacation, and was not able to catch the issue (https://github.com/openj... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28062/files - new: https://git.openjdk.org/jdk/pull/28062/files/794ef7b3..41b8c365 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28062&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28062&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28062.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28062/head:pull/28062 PR: https://git.openjdk.org/jdk/pull/28062 From thartmann at openjdk.org Fri Oct 31 07:37:07 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 31 Oct 2025 07:37:07 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v2] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 07:33:20 GMT, Emanuel Peter wrote: >> It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. >> >> This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. >> >> The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: >> https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 >> https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 >> >> This means that simple cases like these wrongly constant fold to zero: >> - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` >> - `Long.compress(x, 0xffff_ffffL)` >> >> ------------------------------------------------------------------ >> >> This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. >> >> Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) >> >> I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: >> >> public static test(int mask, int src) { >> mask = Math.max(CON1, Math.min(CON2, mask)); >> src = Math.max(CON2, Math.min(CON4, src)); >> result = Integer.compress(src, mask); >> int sum = 0; >> if (sum > LIMIT_1) { sum += 1; } >> if (sum > LIMIT_2) { sum += 2; } >> if (sum > LIMIT_3) { sum += 4; } >> if (sum > LIMIT_4) { sum += 8; } >> if (sum > LIMIT_5) { sum += 16; } >> if (sum > LIMIT_6) { sum += 32; } >> if (sum > LIMIT_7) { sum += 64; } >> if (sum > LIMIT_8) { sum += 128; } >> return new int[] {sum, result}; >> } >> >> What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. >> >> I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Tobias Hartmann Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28062#pullrequestreview-3402886574 From fyang at openjdk.org Fri Oct 31 07:44:14 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 Oct 2025 07:44:14 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v13] In-Reply-To: <9NCXWsBW5TTtNLxDqIInodSU-nLiaf86r2dyMtoKklM=.0964bb38-e5cb-499d-a9fc-4efdca0ecfd0@github.com> References: <9NCXWsBW5TTtNLxDqIInodSU-nLiaf86r2dyMtoKklM=.0964bb38-e5cb-499d-a9fc-4efdca0ecfd0@github.com> Message-ID: On Wed, 29 Oct 2025 07:03:48 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > delete useless reg Hi, I am having a look at the latest version. Some minor comments. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2610: > 2608: } > 2609: > 2610: void increase_counter_128(Register counter, Register tmp) { Maybe pass another tmp register? Otherwise, you need to assert that the input `tmp` and `t0` are different registers. Also can you add some code comment for this routine? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2611: > 2609: > 2610: void increase_counter_128(Register counter, Register tmp) { > 2611: __ addi(t0, counter, 8); Note that the address for `ld/sd` instructions can accept an immediate offset. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2619: > 2617: __ mv(t0, 0x0ul); > 2618: __ sltu(tmp, t0, tmp); > 2619: __ xori(t0, tmp, 1); Seems a simple `seqz` will do? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2711: > 2709: > 2710: __ lw(used, Address(used_ptr)); > 2711: __ beqz(input_len, L_exit); Are the `lw` and `sw` of `used_ptr` necessary when `input_len` is zero? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2738: > 2736: // Encrypt bytes left with last encryptedCounter > 2737: __ bind(L_next); > 2738: __ mv(t0, block_size); Can we materialize this value in a dedicate tmp register? Then we can save these `mv` instructions in the loop. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2755: > 2753: > 2754: __ bind(L_main); > 2755: __ vsetivli(x0, 4, Assembler::e32, Assembler::m1); This seems to me redundant as we already have the same `vsetivli` on entry. ------------- PR Review: https://git.openjdk.org/jdk/pull/25281#pullrequestreview-3402878829 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2480387615 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2480385815 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2480383816 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2480393779 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2480399578 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2480395560 From duke at openjdk.org Fri Oct 31 08:03:06 2025 From: duke at openjdk.org (duke) Date: Fri, 31 Oct 2025 08:03:06 GMT Subject: RFR: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache [v5] In-Reply-To: References: Message-ID: <-TJg1xAhrqSMlUKFVH0y1myggbGHBcaIigZUyzVv2V0=.db78bfa9-2cde-4906-af2b-0fde26348174@github.com> On Wed, 29 Oct 2025 01:26:43 GMT, Chad Rakoczy wrote: >> [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) >> >> [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add include to fix build issue @chadrako Your change (at version 6739c4fa36f44b80fe100682c24d1ed715b817d6) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28008#issuecomment-3471709662 From dlong at openjdk.org Fri Oct 31 08:18:09 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 31 Oct 2025 08:18:09 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v2] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 07:33:20 GMT, Emanuel Peter wrote: >> It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. >> >> This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. >> >> The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: >> https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 >> https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 >> >> This means that simple cases like these wrongly constant fold to zero: >> - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` >> - `Long.compress(x, 0xffff_ffffL)` >> >> ------------------------------------------------------------------ >> >> This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. >> >> Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) >> >> I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: >> >> public static test(int mask, int src) { >> mask = Math.max(CON1, Math.min(CON2, mask)); >> src = Math.max(CON2, Math.min(CON4, src)); >> result = Integer.compress(src, mask); >> int sum = 0; >> if (sum > LIMIT_1) { sum += 1; } >> if (sum > LIMIT_2) { sum += 2; } >> if (sum > LIMIT_3) { sum += 4; } >> if (sum > LIMIT_4) { sum += 8; } >> if (sum > LIMIT_5) { sum += 16; } >> if (sum > LIMIT_6) { sum += 32; } >> if (sum > LIMIT_7) { sum += 64; } >> if (sum > LIMIT_8) { sum += 128; } >> return new int[] {sum, result}; >> } >> >> What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. >> >> I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Tobias Hartmann src/hotspot/share/opto/intrinsicnode.cpp line 379: > 377: // We can further constrain the upper bound of bit compression if the number of bits > 378: // which can be set(one) is less than the maximum number of bits of integral type. > 379: hi = MIN2((jlong)((1ULL << result_bit_width) - 1L), hi); It seems weird having `- 1L` mixed with ULL now. It might be better to use right_n_bits_typed() here and at line 276. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28062#discussion_r2480470937 From mli at openjdk.org Fri Oct 31 09:02:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 Oct 2025 09:02:08 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: <1MCzjBuHpcNGUkZNZAFWMVqRq7q_lTSVvlR3n3urWto=.743fc4de-d975-4954-9b1d-a02689dd75da@github.com> References: <1MCzjBuHpcNGUkZNZAFWMVqRq7q_lTSVvlR3n3urWto=.743fc4de-d975-4954-9b1d-a02689dd75da@github.com> Message-ID: On Fri, 31 Oct 2025 04:49:33 GMT, Emanuel Peter wrote: > Nice work! The tests are passing on my side :) @eme64 Thank you for reviewing and testing! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28047#issuecomment-3471980707 From mchevalier at openjdk.org Fri Oct 31 11:02:24 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 31 Oct 2025 11:02:24 GMT Subject: RFR: 8370077: C2: make Compile::_major_progress a boolean [v5] In-Reply-To: References: Message-ID: <8fSy1gEuSk6aOEH2HH3mM5taZu0JwZYCFHFPrSjL7_I=.727bc0bd-fc03-4b51-aee6-25ce3a3ed1d1@github.com> On Tue, 28 Oct 2025 07:22:59 GMT, Marc Chevalier wrote: >> Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. >> >> There is one detail, we used to have >> >> void restore_major_progress(int progress) { _major_progress += progress; } >> >> >> It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. >> >> It has a weird semantics: >> >> Progress before | Progress after verification | Progress after restore | What would be the assignment semantics >> ----------------|-----------------------------|-----------------------|- >> 0 | 0 | 0 | 0 >> 1 | 0 | 1 | 1 >> 0 | 1 | 1 | 0 (mismatch!) >> 1 | 1 | 2 | 1 (same truthiness) >> >> It is rather a or than a restore, and a proper boolean version of that would be >> >> void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } >> >> but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural >> >> void set_major_progress(bool progress) { _major_progress = progress; } >> >> that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > typoes in comment Thank you all for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27912#issuecomment-3472525303 From mchevalier at openjdk.org Fri Oct 31 11:03:51 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 31 Oct 2025 11:03:51 GMT Subject: RFR: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed [v10] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 10:26:17 GMT, Marc Chevalier wrote: >> Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. >> >> When we have something like >> >> // any loop >> while (...) { /* something involving limit */ } >> // counted loop with zero trip guard >> if (i < limit) { >> for (int i = init; i < limit; i++) { ... } >> } >> >> and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. >> >> But the method `PhaseIdealLoop::do_unroll` has the assert >> >> https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 >> >> requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. >> >> This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. >> >> But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? >> >> By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. >> >> We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. >>... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Fewer flag combinations Thanks @robcasloz & @chhagedorn for review and test! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27586#issuecomment-3472519791 From mchevalier at openjdk.org Fri Oct 31 11:03:54 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 31 Oct 2025 11:03:54 GMT Subject: Integrated: 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed In-Reply-To: References: Message-ID: On Wed, 1 Oct 2025 00:23:20 GMT, Marc Chevalier wrote: > Loop peeling works by cloning the loop body, which implies to replace the uses of the data in the loop to be replaced by a phi between the original loop and the clone. This is done by `PhaseIdealLoop::fix_data_uses` and can create a maze of phis. Multiple users of the same original data will get a fresh `PhiNode`, there is no logic trying to reuse them, or simplify. That's IGVN's job. > > When we have something like > > // any loop > while (...) { /* something involving limit */ } > // counted loop with zero trip guard > if (i < limit) { > for (int i = init; i < limit; i++) { ... } > } > > and we peel the first loop, the limits in the zero trip guard and in the counted loop condition are not the same node anymore but a fresh `PhiNode`. > > But the method `PhaseIdealLoop::do_unroll` has the assert > > https://github.com/openjdk/jdk/blob/444007fc234aeff75025831c2d1b5538c87fa8f1/src/hotspot/share/opto/loopTransform.cpp#L1929-L1930 > > requiring that both `limit` are the same node. But as explained, it might not be the case after peeling the first loop. > > This situation doesn't happen if IGVN happens between peeling the first loop and unrolling the second. While there is no formal invariant that this must always be true, I couldn't reproduce the same situation without stress peeling: either peeling happens too early, or not at all, or something else happens so that major progress is set before unrolling, which always saves the day. I've tried to hack on an example to make the peeling decision happen "naturally" (using the normal heuristic), but in the right situation, not too early or too late. At this point it was so hardcoded that it's not significantly different than a run with stress peeling. > > But with stress peeling, this situation seems to happen, rarely, but sometimes. What should we do? > > By creating many `PhiNode`s `PhaseIdealLoop::fix_data_uses` is doing exactly what we expect. We could make it a lot smarter to try to reuse the `PhiNode`s previously constructed, but that would be hard because the inputs of the fresh phis are recursively adjusted, so we can't share ahead of time when inputs are the same. Duplicating when inputs start to differ would also lead to too many copies since phis look indeed different and some more top down clean up can actually collapse them all. > > We could run IGVN to clean up the thing after each peeling: it was deemed not desirable as many things are expected to happen immediately after peeling. > > We could make the assert a lot smarter and test for ... This pull request has now been integrated. Changeset: 02f8874c Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/02f8874c2d105a86cbfd3b84b591fefb4e509806 Stats: 225 lines in 3 files changed: 223 ins; 0 del; 2 mod 8361608: C2: assert(opaq->outcnt() == 1 && opaq->in(1) == limit) failed Co-authored-by: Christian Hagedorn Reviewed-by: chagedorn, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/27586 From mchevalier at openjdk.org Fri Oct 31 11:05:39 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 31 Oct 2025 11:05:39 GMT Subject: Integrated: 8370077: C2: make Compile::_major_progress a boolean In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 08:07:15 GMT, Marc Chevalier wrote: > Simply change `Compile::_major_progress` from `int` to `bool` since we are only checking if it's non-zero. > > There is one detail, we used to have > > void restore_major_progress(int progress) { _major_progress += progress; } > > > It is used after some verification code (maybe not only?) that may reset the major progress, using the progress saved before the said code. > > It has a weird semantics: > > Progress before | Progress after verification | Progress after restore | What would be the assignment semantics > ----------------|-----------------------------|-----------------------|- > 0 | 0 | 0 | 0 > 1 | 0 | 1 | 1 > 0 | 1 | 1 | 0 (mismatch!) > 1 | 1 | 2 | 1 (same truthiness) > > It is rather a or than a restore, and a proper boolean version of that would be > > void restore_major_progress(bool progress) { _major_progress = _major_progress || progress; } > > but then, I'd argue the name is confusing. It also doesn't fit so well the idea that we just want to be back to the situation before the verification code. I suspect the unsaid assumption, is that the 3rd line (progress clear before, set by verification) is not possible. Anyway, I've tried with this or-semantics, or with a more natural > > void set_major_progress(bool progress) { _major_progress = progress; } > > that actually restore what we saved. Both pass (tier1-6 + some internal tests). Thus, I prefered the simpler semantics. > > Thanks, > Marc This pull request has now been integrated. Changeset: 8ca485cf Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/8ca485cf98889d1757170a4ec883c93c888a7140 Stats: 17 lines in 2 files changed: 10 ins; 1 del; 6 mod 8370077: C2: make Compile::_major_progress a boolean Reviewed-by: chagedorn, kvn, dlong, epeter ------------- PR: https://git.openjdk.org/jdk/pull/27912 From epeter at openjdk.org Fri Oct 31 11:47:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 31 Oct 2025 11:47:37 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v3] In-Reply-To: References: Message-ID: > It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. > > This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. > > The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 > > This means that simple cases like these wrongly constant fold to zero: > - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` > - `Long.compress(x, 0xffff_ffffL)` > > ------------------------------------------------------------------ > > This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. > > Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) > > I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: > > public static test(int mask, int src) { > mask = Math.max(CON1, Math.min(CON2, mask)); > src = Math.max(CON2, Math.min(CON4, src)); > result = Integer.compress(src, mask); > int sum = 0; > if (sum > LIMIT_1) { sum += 1; } > if (sum > LIMIT_2) { sum += 2; } > if (sum > LIMIT_3) { sum += 4; } > if (sum > LIMIT_4) { sum += 8; } > if (sum > LIMIT_5) { sum += 16; } > if (sum > LIMIT_6) { sum += 32; } > if (sum > LIMIT_7) { sum += 64; } > if (sum > LIMIT_8) { sum += 128; } > return new int[] {sum, result}; > } > > What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. > > I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to go on vacation, and was not able to catch the issue (https://github.com/openj... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8370459-expression-fuzz-failure' of https://github.com/eme64/jdk into JDK-8370459-expression-fuzz-failure - use right_n_bits_typed for Dean ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28062/files - new: https://git.openjdk.org/jdk/pull/28062/files/41b8c365..d2b88bee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28062&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28062&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28062.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28062/head:pull/28062 PR: https://git.openjdk.org/jdk/pull/28062 From epeter at openjdk.org Fri Oct 31 11:47:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 31 Oct 2025 11:47:39 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v2] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 08:15:43 GMT, Dean Long wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Tobias Hartmann > > src/hotspot/share/opto/intrinsicnode.cpp line 379: > >> 377: // We can further constrain the upper bound of bit compression if the number of bits >> 378: // which can be set(one) is less than the maximum number of bits of integral type. >> 379: hi = MIN2((jlong)((1ULL << result_bit_width) - 1L), hi); > > It seems weird having `- 1L` mixed with ULL now. It might be better to use right_n_bits_typed() here and at line 276. Great idea, I did not know about `right_n_bits_typed` :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28062#discussion_r2481143360 From duke at openjdk.org Fri Oct 31 14:14:19 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 31 Oct 2025 14:14:19 GMT Subject: Integrated: 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 19:48:10 GMT, Chad Rakoczy wrote: > [JDK-8370527](https://bugs.openjdk.org/browse/JDK-8370527) > > [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694) introduced an `immutable_data_references_counter` which keeps track of the number of nmethods using the immutable data so it can be shared between relocated nmethods. The old code reads the counter, decrements the counter, and then checks the first read to see if it is zero. Since the check is performed on the initial read it will never be zero which causes immutable data to never be freed. This pull request has now been integrated. Changeset: 8236800d Author: Chad Rakoczy Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/8236800deb5b99a027b0944f6c512c0f31d030df Stats: 106 lines in 4 files changed: 30 ins; 7 del; 69 mod 8370527: Memory leak after 8316694: Implement relocation of nmethod within CodeCache Reviewed-by: shade, eastigeevich, kvn ------------- PR: https://git.openjdk.org/jdk/pull/28008 From qamai at openjdk.org Fri Oct 31 14:28:49 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 31 Oct 2025 14:28:49 GMT Subject: RFR: 8370914: C2: Reimplement Type::join In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 03:09:22 GMT, Dean Long wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > I like this idea, but even if it is 100% correct, how do we prove that? Can we exhaustively test all possibilities? We might consider introducing the new join as a parallel implementation, so it can compare the result with the old join as a sanity check. Then after we are convinced of its correctness, we remove the old join. As for unit testing, all we have is check_symmetrical(), right? With a change this fundamental, I think we will need correspondingly better tests. @dean-long Thanks for your great suggestion. I have tried implemented the idea to compare the results of the old and the new approach each time we call `meet` or `join` between 2 types. The implementation can be found on [this branch](https://github.com/merykitty/jdk/tree/typejoindraft). Unfortunately, it exposes incorrect results in the current approach. For example: === Join May Be Incorrect === t1 = jdk/internal/jrtfs/JrtFileSystemProvider:exact * t2 = sun/nio/fs/LinuxFileSystemProvider * t1 joins t2 = null (t1->dual() meets t2->dual())->dual() = java/nio/file/spi/FileSystemProvider:AnyNull *,iid=top Apparently, both `t1` and `t2` are nullable, the result of the join cannot be empty, which an `AnyNull` represents. If I add this to master: diff --git a/src/hotspot/share/opto/type.cpp b/src/hotspot/share/opto/type.cpp index f62eea893cd..da3e929634b 100644 --- a/src/hotspot/share/opto/type.cpp +++ b/src/hotspot/share/opto/type.cpp @@ -983,6 +983,15 @@ void Type::check_symmetrical(const Type* t, const Type* mt, const VerifyMeet& ve // Interface:AnyNull meet Oop:AnyNull == Interface:AnyNull // Interface:NotNull meet Oop:NotNull == java/lang/Object:NotNull + const Type* join = verify.meet(dual(), t->dual())->dual(); + if (isa_instptr() && t->isa_instptr() && maybe_null() && t->maybe_null() && join->empty()) { + tty->print("Cannot be empty:\n"); + tty->print("t1 : "); dump(); tty->cr(); + tty->print("t2 : "); t->dump(); tty->cr(); + tty->print("t1 joins t2: "); join->dump(); tty->cr(); + assert(false, "incorrect join"); + } + if (t2t != t->_dual || t2this != this->_dual) { tty->print_cr("=== Meet Not Symmetric ==="); tty->print("t = "); t->dump(); tty->cr(); Similar failures can be observed when I run `make test-tier1`: Cannot be empty: t1 : com/sun/tools/javac/code/Symbol$PackageSymbol (com/sun/tools/javac/jvm/PoolConstant,javax/lang/model/AnnotatedConstruct,javax/lang/model/element/Element,javax/lang/model/element/QualifiedNameable,javax/lang/model/element/PackageElement) * t2 : com/sun/tools/javac/code/Symbol$ModuleSymbol (com/sun/tools/javac/jvm/PoolConstant,javax/lang/model/AnnotatedConstruct,javax/lang/model/element/Element,javax/lang/model/element/QualifiedNameable,javax/lang/model/element/ModuleElement) * t1 joins t2: com/sun/tools/javac/code/Symbol$TypeSymbol (com/sun/tools/javac/jvm/PoolConstant,javax/lang/model/AnnotatedConstruct,javax/lang/model/element/Element,javax/lang/model/element/QualifiedNameable):AnyNull *,iid=top ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3473334502 From epeter at openjdk.org Fri Oct 31 15:02:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 31 Oct 2025 15:02:27 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v4] In-Reply-To: References: Message-ID: > It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. > > This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. > > The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 > > This means that simple cases like these wrongly constant fold to zero: > - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` > - `Long.compress(x, 0xffff_ffffL)` > > ------------------------------------------------------------------ > > This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. > > Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) > > I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: > > public static test(int mask, int src) { > mask = Math.max(CON1, Math.min(CON2, mask)); > src = Math.max(CON2, Math.min(CON4, src)); > result = Integer.compress(src, mask); > int sum = 0; > if (sum > LIMIT_1) { sum += 1; } > if (sum > LIMIT_2) { sum += 2; } > if (sum > LIMIT_3) { sum += 4; } > if (sum > LIMIT_4) { sum += 8; } > if (sum > LIMIT_5) { sum += 16; } > if (sum > LIMIT_6) { sum += 32; } > if (sum > LIMIT_7) { sum += 64; } > if (sum > LIMIT_8) { sum += 128; } > return new int[] {sum, result}; > } > > What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. > > I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to go on vacation, and was not able to catch the issue (https://github.com/openj... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix last commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28062/files - new: https://git.openjdk.org/jdk/pull/28062/files/d2b88bee..8bf89e95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28062&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28062&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28062.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28062/head:pull/28062 PR: https://git.openjdk.org/jdk/pull/28062 From epeter at openjdk.org Fri Oct 31 15:07:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 31 Oct 2025 15:07:25 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v4] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 15:02:27 GMT, Emanuel Peter wrote: >> It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. >> >> This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. >> >> The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: >> https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 >> https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 >> >> This means that simple cases like these wrongly constant fold to zero: >> - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` >> - `Long.compress(x, 0xffff_ffffL)` >> >> ------------------------------------------------------------------ >> >> This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. >> >> Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) >> >> I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: >> >> public static test(int mask, int src) { >> mask = Math.max(CON1, Math.min(CON2, mask)); >> src = Math.max(CON2, Math.min(CON4, src)); >> result = Integer.compress(src, mask); >> int sum = 0; >> if (sum > LIMIT_1) { sum += 1; } >> if (sum > LIMIT_2) { sum += 2; } >> if (sum > LIMIT_3) { sum += 4; } >> if (sum > LIMIT_4) { sum += 8; } >> if (sum > LIMIT_5) { sum += 16; } >> if (sum > LIMIT_6) { sum += 32; } >> if (sum > LIMIT_7) { sum += 64; } >> if (sum > LIMIT_8) { sum += 128; } >> return new int[] {sum, result}; >> } >> >> What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. >> >> I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix last commit @dean-long Does it look better now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28062#issuecomment-3473491497 From roland at openjdk.org Fri Oct 31 15:39:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 31 Oct 2025 15:39:58 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v2] In-Reply-To: References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: On Tue, 28 Oct 2025 16:30:24 GMT, Emanuel Peter wrote: >> Analysis: >> `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. >> >> It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. >> >> What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. >> >> I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. >> >> Future Work: >> - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. >> - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > allow unique out with multiple uses Looks reasonable to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27955#pullrequestreview-3404867950 From dfenacci at openjdk.org Fri Oct 31 16:39:55 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 31 Oct 2025 16:39:55 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel Message-ID: ## Issue Today, the only practical ways to run IR Framework scenarios in parallel seems to be: * spawning threads manually in a single test, or * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. ## Change This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. ## Testing * Tier 1-3+ * explicit `ir_framework.tests` runs * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. ------------- Commit messages: - JDK-8370315: move declaration - JDK-8370315: remove garbage - JDK-8370315: add flag and tests - JDK-8370315: revert a few line changes - JDK-8370315: move getLastTestVMOutput - JDK-8370315: revert removal of lastTestVMOutput - JDK-8370315: fix output processing - JDK-8370315: [IR-Framework] Allow scenarios to be run in parallel Changes: https://git.openjdk.org/jdk/pull/28065/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370315 Stats: 191 lines in 8 files changed: 139 ins; 2 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/28065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28065/head:pull/28065 PR: https://git.openjdk.org/jdk/pull/28065 From dfenacci at openjdk.org Fri Oct 31 16:39:56 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 31 Oct 2025 16:39:56 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 16:27:11 GMT, Damon Fenacci wrote: > ## Issue > Today, the only practical ways to run IR Framework scenarios in parallel seems to be: > * spawning threads manually in a single test, or > * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). > > This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. > > ## Change > This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: > * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) > * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). > * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. > > ## Testing > * Tier 1-3+ > * explicit `ir_framework.tests` runs > * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) > > As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. The full list of tests using IR-Framework scenarios: compiler/c2/irTests/gc/ReferenceClearTests.java compiler/c2/irTests/gc/ReferenceRefersToTests.java compiler/c2/irTests/igvn/TestCleanMemPhi.java compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java compiler/c2/irTests/TestFloat16ScalarOperations.java compiler/c2/irTests/TestPostParseCallDevirtualization.java compiler/c2/irTests/TestScheduleSmallMethod.java compiler/c2/irTests/TestVectorizationMismatchedAccess.java? compiler/gcbarriers/TestG1BarrierGeneration.java compiler/loopopts/superword/ProdRed_Double.java compiler/loopopts/superword/ProdRed_Float.java compiler/loopopts/superword/ProdRed_Int.java compiler/loopopts/superword/RedTest_int.java compiler/loopopts/superword/RedTest_long.java compiler/loopopts/superword/SumRed_Double.java compiler/loopopts/superword/SumRed_Float.java compiler/loopopts/superword/SumRed_Int.java compiler/loopopts/superword/SumRed_Long.java compiler/loopopts/superword/SumRedAbsNeg_Double.java compiler/loopopts/superword/SumRedAbsNeg_Float.java compiler/loopopts/superword/SumRedSqrt_Double.java compiler/loopopts/superword/TestMemorySegment_8359688.java compiler/loopopts/superword/TestMemorySegment_ReassociateInvariants1.java compiler/loopopts/superword/TestMemorySegment_ReassociateInvariants2.java compiler/loopopts/superword/TestMemorySegmentByteSizeLongLoopLimit.java compiler/loopopts/superword/TestMemorySegmentField.java compiler/loopopts/superword/TestMemorySegmentFilterSummands.java compiler/loopopts/TestArrayFillIntrinsic.java compiler/vectorization/TestFloatConversionsVector.java Maybe we could run some of them concurrently (e.g. the ones that show more than a certain speedup to avoid parallelizing short tests that might waste most of the time spawning threads). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28065#issuecomment-3473887621 From roland at openjdk.org Fri Oct 31 16:43:24 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 31 Oct 2025 16:43:24 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: <4dxuy7RYykOejdvyiYvsTwivcfnOkhucFp5JZPUbDWU=.e36545ce-18e9-4ec8-a670-02bb99fa569a@github.com> References: <4dxuy7RYykOejdvyiYvsTwivcfnOkhucFp5JZPUbDWU=.e36545ce-18e9-4ec8-a670-02bb99fa569a@github.com> Message-ID: On Wed, 29 Oct 2025 08:59:28 GMT, Qizheng Xing wrote: >> It would be nice to make sure all cases here have an IR test which is not the case AFAICT. Can you open a JBS issue for that? > > @rwestrel @eme64 Do IR tests in `TestRedundantSafepointElimination.java` in this patch cover all these cases? Specifically: > > * Case A: `testTopLevelCountedLoop`, `testTopLevelCountedLoopWithDomCall` > * Case B: tests containing nested loops > * Case C: `testOuterLoopWithDomCall` > * Case D: `testOuterLoopWithLocalNonCallSafepoint` > * Case E: `testLoopNeedsToPreserveSafepoint` I didn't realize case E was covered. Good then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2482066789 From roland at openjdk.org Fri Oct 31 16:47:34 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 31 Oct 2025 16:47:34 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() Message-ID: In test cases, `mh` is initially not constant so the method handle invoke can't be inlined. It is later found to be constant, so it can be turned into a direct call by `Compile::process_late_inline_calls_no_inline()`. In the meantime, the `CallNode` for the mh invoke is cloned (by loop switching). In the process, only a shallow copy of the `JVMState` for the call is made. The initial `CallNode` is the first to be processed by `Compile::process_late_inline_calls_no_inline()` and that causes that `CallNode` to become dead. The cloned `CallNode` is then processed. The `JVMState` for that one references the initial `CallNode` in its caller's `JVMState`. Because that node is dead, that causes a crash. The fix I propose is to make a deep copy of the `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is assigned to the node. The other failure I see with these tests is: # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! because even though the `CallNode` is cloned, there's still only one late inline recorded. The fix here is to increment `_number_of_mh_late_inlines` when the node is cloned. This was reported by the netty developers. ------------- Commit messages: - more - test - fix Changes: https://git.openjdk.org/jdk/pull/28088/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370939 Stats: 116 lines in 3 files changed: 113 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28088/head:pull/28088 PR: https://git.openjdk.org/jdk/pull/28088 From qamai at openjdk.org Fri Oct 31 17:49:40 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 31 Oct 2025 17:49:40 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v2] In-Reply-To: References: Message-ID: <9linIWtYWiuwuKqBg0l31Q2-6kDev-8sh3TUPJ6qoF4=.c085bbcf-5c8f-420a-849a-1ad2f18af833@github.com> > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Keep old version for verification ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28051/files - new: https://git.openjdk.org/jdk/pull/28051/files/672d2613..1960854f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=00-01 Stats: 313 lines in 3 files changed: 274 ins; 3 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051 From qamai at openjdk.org Fri Oct 31 17:49:43 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 31 Oct 2025 17:49:43 GMT Subject: RFR: 8370914: C2: Reimplement Type::join In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 03:12:05 GMT, Quan Anh Mai wrote: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... I have restored the dual computation and use it for verification of the implementation of `xjoin`. It seems that apart from the cases I wrote above, the results of the 2 approaches match. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3474167588 From dlong at openjdk.org Fri Oct 31 19:30:04 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 31 Oct 2025 19:30:04 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v4] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 15:02:27 GMT, Emanuel Peter wrote: >> It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. >> >> This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. >> >> The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: >> https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 >> https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 >> >> This means that simple cases like these wrongly constant fold to zero: >> - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` >> - `Long.compress(x, 0xffff_ffffL)` >> >> ------------------------------------------------------------------ >> >> This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. >> >> Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) >> >> I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: >> >> public static test(int mask, int src) { >> mask = Math.max(CON1, Math.min(CON2, mask)); >> src = Math.max(CON2, Math.min(CON4, src)); >> result = Integer.compress(src, mask); >> int sum = 0; >> if (sum > LIMIT_1) { sum += 1; } >> if (sum > LIMIT_2) { sum += 2; } >> if (sum > LIMIT_3) { sum += 4; } >> if (sum > LIMIT_4) { sum += 8; } >> if (sum > LIMIT_5) { sum += 16; } >> if (sum > LIMIT_6) { sum += 32; } >> if (sum > LIMIT_7) { sum += 64; } >> if (sum > LIMIT_8) { sum += 128; } >> return new int[] {sum, result}; >> } >> >> What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. >> >> I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix last commit Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28062#pullrequestreview-3405841379 From dlong at openjdk.org Fri Oct 31 19:30:06 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 31 Oct 2025 19:30:06 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v4] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 15:04:47 GMT, Emanuel Peter wrote: > @dean-long Does it look better now? Yes, much better, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28062#issuecomment-3474608006 From vlivanov at openjdk.org Fri Oct 31 21:30:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 31 Oct 2025 21:30:03 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 16:39:07 GMT, Roland Westrelin wrote: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. Overall, looks reasonable. src/hotspot/share/opto/node.cpp line 567: > 565: n->as_Call()->set_generator(cloned_cg); > 566: if (cloned_cg->is_mh_late_inline()) { > 567: C->inc_number_of_mh_late_inlines(); Do you need to decrement the counter when a CallNode with `generator()->is_mh_late_inline()` goes dead? ------------- PR Review: https://git.openjdk.org/jdk/pull/28088#pullrequestreview-3406244440 PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2482724020 From vlivanov at openjdk.org Fri Oct 31 21:51:17 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 31 Oct 2025 21:51:17 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker Message-ID: Expand the optimization for interface calls introduced by [JDK-6986483](https://bugs.openjdk.org/browse/JDK-6986483) to calls through `MethodHandle.linkToInterface`. The implementation is straightforward except the fact that symbolic information is lost during `MemberName` resolution. The fix uses declaring class instead, but it's more conservative than what is done for invokeinterface case. Testing: hs-tier1 - hs-tier5 ------------- Commit messages: - Handle -XX:+StressMethodHandleLinkerInlining - fix Changes: https://git.openjdk.org/jdk/pull/28094/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28094&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8280469 Stats: 205 lines in 4 files changed: 173 ins; 2 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/28094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28094/head:pull/28094 PR: https://git.openjdk.org/jdk/pull/28094 From liach at openjdk.org Fri Oct 31 22:36:02 2025 From: liach at openjdk.org (Chen Liang) Date: Fri, 31 Oct 2025 22:36:02 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 21:34:27 GMT, Vladimir Ivanov wrote: > Expand the optimization for interface calls introduced by [JDK-6986483](https://bugs.openjdk.org/browse/JDK-6986483) to calls through `MethodHandle.linkToInterface`. > > The implementation is straightforward except the fact that symbolic information is lost during `MemberName` resolution. The fix uses declaring class instead, but it's more conservative than what is done for invokeinterface case. > > Testing: hs-tier1 - hs-tier5 A small improvement indeed. I wonder if the test verifies the `declared_interface` for the new monomorphic target - I don't see where it does so, yet I believe this may be error-prone. src/hotspot/share/opto/doCall.cpp line 345: > 343: if (orig_callee->intrinsic_id() == vmIntrinsics::_linkToInterface) { > 344: // MemberName doesn't keep symbolic information once resolution is over, but > 345: // resolved method holder can be used as a conservative approximation. Is "symbolic information" the referenced interface and the "resolved method holder" the declaring interface? I think including "referenced" vs "declared" would be more clear. ------------- PR Review: https://git.openjdk.org/jdk/pull/28094#pullrequestreview-3406367769 PR Review Comment: https://git.openjdk.org/jdk/pull/28094#discussion_r2482808203