From jbhateja at openjdk.org Sat Nov 1 01:52:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 1 Nov 2025 01:52:15 GMT Subject: Withdrawn: 8370409: Incorrect computation in Float16 reduction loop In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:36:21 GMT, Jatin Bhateja wrote: > Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral (comparison) operation[s]. However, the safest way to compare two Float16 values is to use Float16.compare/compareTo method, given that floating point comparisons can also be unordered. > > e.g., both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, but are not numerically equivalent with integral comparison. > jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) > $3 ==> 0 > > In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short must be sign-extended before operation. > > Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/27977 From syan at openjdk.org Sat Nov 1 04:46:05 2025 From: syan at openjdk.org (SendaoYan) Date: Sat, 1 Nov 2025 04:46:05 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 16:39:07 GMT, Roland Westrelin wrote: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. test/hotspot/jtreg/compiler/inlining/TestLateMHClonedCallNode.java line 28: > 26: * @bug 8370939 > 27: * @summary C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() > 28: * @run main/othervm -XX:-BackgroundCompilation -XX:CompileOnly=TestLateMHClonedCallNode::test1 -XX:CompileOnly=TestLateMHClonedCallNode::test2 TestLateMHClonedCallNode Maybe we can split this as two lines ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2483106650 From duke at openjdk.org Sat Nov 1 12:27:48 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 1 Nov 2025 12:27:48 GMT Subject: RFR: 8370878: C1: Clean up unnecessary ConversionStub constructor Message-ID: C1: Clean up unnecessary ConversionStub constructor Remove class which should not reach. ------------- Commit messages: - C1: Clean up unnecessary ConversionStub constructor Changes: https://git.openjdk.org/jdk/pull/28096/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28096&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370878 Stats: 42 lines in 3 files changed: 0 ins; 39 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28096/head:pull/28096 PR: https://git.openjdk.org/jdk/pull/28096 From duke at openjdk.org Sat Nov 1 14:14:37 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 1 Nov 2025 14:14:37 GMT Subject: RFR: 8370878: C1: Clean up unnecessary ConversionStub constructor [v2] In-Reply-To: References: Message-ID: > C1: Clean up unnecessary ConversionStub constructor > Remove class which should not reach. Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: fix arm ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28096/files - new: https://git.openjdk.org/jdk/pull/28096/files/5b4df065..a73b5282 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28096&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28096&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28096/head:pull/28096 PR: https://git.openjdk.org/jdk/pull/28096 From duke at openjdk.org Sat Nov 1 15:04:37 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 1 Nov 2025 15:04:37 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue Message-ID: If nodes both are constant, support constant folding. ------------- Commit messages: - C2: Improve (U)MulHiLNode::MulHiValue Changes: https://git.openjdk.org/jdk/pull/28097/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370196 Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From wenanjian at openjdk.org Sat Nov 1 15:29:56 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 1 Nov 2025 15:29:56 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v14] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: update some register use and instruction use ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/4039116c..26ea7628 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=12-13 Stats: 44 lines in 1 file changed: 5 ins; 8 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Sat Nov 1 15:29:58 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 1 Nov 2025 15:29:58 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v13] In-Reply-To: References: <9NCXWsBW5TTtNLxDqIInodSU-nLiaf86r2dyMtoKklM=.0964bb38-e5cb-499d-a9fc-4efdca0ecfd0@github.com> Message-ID: <2_UqadjXSXckE8l38MkkX8AhDONqn4qRgPbSP_Pylcs=.d1289413-ce94-4775-9150-020347fded57@github.com> On Fri, 31 Oct 2025 07:41:09 GMT, Fei Yang wrote: > Hi, I am having a look at the latest version. Some minor comments. thanks, I have modified the code and solve all these comments ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3476459666 From wenanjian at openjdk.org Sun Nov 2 02:04:51 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sun, 2 Nov 2025 02:04:51 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v15] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: modify some var names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/26ea7628..1cf06a35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=13-14 Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From hgreule at openjdk.org Sun Nov 2 09:56:00 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sun, 2 Nov 2025 09:56:00 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue In-Reply-To: References: Message-ID: On Sat, 1 Nov 2025 14:50:27 GMT, Zihao Lin wrote: > If nodes both are constant, support constant folding. Thanks for working on this. A few things: - You need tests to cover this change. The `Math.multiplyHigh(...)` and `Math.unsignedMultiplyHigh(...)` methods can be used to test this from the Java world. See e.g., #26143 or #25254 for inspiration. - The current method is for both unsigned and signed multiplication. You either have to deal with that directly there or get rid of that method and implement it directly in the respective `Value(...)` methods (the latter might be cleaner imo). - For unsigned multiplication, you can use the unsigned bounds (_uhi, _ulo) - I think extending from simple constant folding to intervals isn't that much more work. From my understanding, there shouldn't be any overflows that need to be handled. This would also automatically deal with cases like `multiplyHigh(x, 0)` etc. - The bottom checks are unneeded and can be removed (in fact, they would otherwise prevent proper calculation of the previous example) - Make sure to follow the code style: `T* v`; `if (a) {` spacing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28097#issuecomment-3477676797 From qamai at openjdk.org Sun Nov 2 15:48:37 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 2 Nov 2025 15:48:37 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v3] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Move dual to ASSERT only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28051/files - new: https://git.openjdk.org/jdk/pull/28051/files/1960854f..8c9f560e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=01-02 Stats: 83 lines in 2 files changed: 49 ins; 1 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051 From fyang at openjdk.org Mon Nov 3 03:11:09 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 3 Nov 2025 03:11:09 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v15] In-Reply-To: References: Message-ID: On Sun, 2 Nov 2025 02:04:51 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify some var names src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2623: > 2621: __ rev8(tmp1, tmp1); > 2622: __ sd(tmp1, Address(counter)); > 2623: } Can you add some code comment and maybe assertions about the input registers? Like: // Big-endian 128-bit + 64-bit -> 128-bit addition. void be_inc_counter_128(Register counter, Register tmp1, Register tmp2) { assert_different_registers(counter, tmp1, tmp2, t0); __ ld(tmp1, Address(counter, 8)); // Load 128-bits from counter __ ld(tmp2, Address(counter)); __ rev8(tmp1, tmp1); // Convert big-endian to little-endian __ rev8(tmp2, tmp2); __ addi(tmp1, tmp1, 1); __ seqz(t0, tmp1); // Check for result overflow __ add(tmp2, tmp2, t0); // Add 1 if overflow otherwise 0 __ rev8(tmp1, tmp1); // Convert little-endian to big-endian __ rev8(tmp2, tmp2); __ sd(tmp1, Address(counter, 8)); // Store 128-bits from counter __ sd(tmp2, Address(counter)); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2485208392 From xgong at openjdk.org Mon Nov 3 06:44:10 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Nov 2025 06:44:10 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v7] In-Reply-To: References: Message-ID: <5PrIYztcDbOPX9aY35VB-t8agqVbNvjHv6-ypPtdm7M=.a05cb69c-fee9-4b42-a7cc-26a9d79d40f8@github.com> > This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. > > ### Background > Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. > > ### Implementation > > #### Challenges > Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. > > For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: > - SPECIES_64: Single operation with mask (8 elements, 256-bit) > - SPECIES_128: Single operation, full register (16 elements, 512-bit) > - SPECIES_256: Two operations + merge (32 elements, 1024-bit) > - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) > > Use `ByteVector.SPECIES_512` as an example: > - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. > - It requires 4 times of vector gather-loads to finish the whole operation. > > > byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] > int[] idx = [0, 1, 2, 3, ..., 63, ...] > > 4 gather-load: > idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] > idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] > idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] > idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] > merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] > > > #### Solution > The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. > > Here is the main changes: > - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. > - Added `VectorSliceNode` for result merging. > - Added `VectorMaskWidenNode` for mask spliting and type conversion fo... Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'jdk:master' into JDK-8351623-sve - Add more comments for IRs and added method - Merge branch 'jdk:master' into JDK-8351623-sve - Merge 'jdk:master' into JDK-8351623-sve - Address review comments - Refine IR pattern and clean backend rules - Fix indentation issue and move the helper matcher method to header files - Merge branch jdk:master into JDK-8351623-sve - 8351623: VectorAPI: Add SVE implementation of subword gather load operation ------------- Changes: https://git.openjdk.org/jdk/pull/26236/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26236&range=06 Stats: 1071 lines in 20 files changed: 907 ins; 24 del; 140 mod Patch: https://git.openjdk.org/jdk/pull/26236.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26236/head:pull/26236 PR: https://git.openjdk.org/jdk/pull/26236 From epeter at openjdk.org Mon Nov 3 06:48:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 06:48:25 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v4] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 19:27:02 GMT, Dean Long wrote: >> @dean-long Does it look better now? > >> @dean-long Does it look better now? > > Yes, much better, thanks! @dean-long @TobiHartmann @jatin-bhateja thanks for the quick reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28062#issuecomment-3479126542 From epeter at openjdk.org Mon Nov 3 06:48:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 06:48:26 GMT Subject: Integrated: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 14:51:53 GMT, Emanuel Peter wrote: > It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. > > This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. > > The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 > > This means that simple cases like these wrongly constant fold to zero: > - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` > - `Long.compress(x, 0xffff_ffffL)` > > ------------------------------------------------------------------ > > This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. > > Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) > > I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: > > public static test(int mask, int src) { > mask = Math.max(CON1, Math.min(CON2, mask)); > src = Math.max(CON2, Math.min(CON4, src)); > result = Integer.compress(src, mask); > int sum = 0; > if (sum > LIMIT_1) { sum += 1; } > if (sum > LIMIT_2) { sum += 2; } > if (sum > LIMIT_3) { sum += 4; } > if (sum > LIMIT_4) { sum += 8; } > if (sum > LIMIT_5) { sum += 16; } > if (sum > LIMIT_6) { sum += 32; } > if (sum > LIMIT_7) { sum += 64; } > if (sum > LIMIT_8) { sum += 128; } > return new int[] {sum, result}; > } > > What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. > > I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to go on vacation, and was not able to catch the issue (https://github.com/openj... This pull request has now been integrated. Changeset: 0ca0852d Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/0ca0852d78d643c211d36b753a734dac0cd2800a Stats: 111 lines in 2 files changed: 96 ins; 0 del; 15 mod 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer Reviewed-by: dlong, jbhateja, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/28062 From epeter at openjdk.org Mon Nov 3 06:58:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 06:58:19 GMT Subject: Integrated: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 10:40:18 GMT, Emanuel Peter wrote: > Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 > > **Analysis** > We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. > During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. > But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. > > Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. > Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. > > **Solution** > We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. > > But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: > https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 > (the idea is to bail out of the elimination if any of the found stores are mismatched.) > > **Details** > > How the bad sequence develops, and which components are involved. > > 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) > > 6 ConI === 23 [[ 4 ]] #int:16777216 > 7 ConI === 23 [[ 4 ]] #int:256 > 8 ConI === 23 [[ 4 ]] #int:1048576 > 9 ConL === 23 [[ 4 ]] #long:68719476737 > 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] > 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) > > > 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: > https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 > If I understand it right, there zero is just a placeholder. > > And so we get: > > (rr) p sv->print_fields_on(tty) > Fields: 0, 68719476737, 1048576, 256, 16777216 > > We can see the `zero`, followed by the `ConL`, and then 3 `ConI`. > > This se... This pull request has now been integrated. Changeset: 09a047f0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/09a047f00c88d14505c42a966dedbc87b9be5bdf Stats: 375 lines in 5 files changed: 375 ins; 0 del; 0 mod 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination Co-authored-by: Olivier Mattmann Co-authored-by: Quan Anh Mai Reviewed-by: kvn, qamai ------------- PR: https://git.openjdk.org/jdk/pull/27997 From epeter at openjdk.org Mon Nov 3 06:58:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 06:58:18 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 07:03:51 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores >> - only verify primitive types >> - Apply suggestions from code review >> - more assert adjustment >> - ignore debug flag >> - id for tests, and fix up the assert >> - pass int for short slot >> - another test >> - improve test >> - wip new IR test >> - ... and 6 more: https://git.openjdk.org/jdk/compare/6dd1ad30...b6e032c2 > > Regardless, I think this patch makes sense. Bailing out of scalar elimination when we are doing it is better than when we are running EA, and we should generally try to do it if we can. @merykitty @vnkozlov Thanks for the review and discussion! @dougxc Thanks for checking for Graal and getting us a quick response :) And thanks to Olivier Mattmann <[olivier.mattmann at bluewin.ch](mailto:olivier.mattmann at bluewin.ch)> for finding the bug! @mhaessig I decided to file this RFE, in case someone wants to invest time in it: [JDK-8371122](https://bugs.openjdk.org/browse/JDK-8371122) C2 Allocation Elimination: handle some mismatched accesses to arrays ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3479146291 From epeter at openjdk.org Mon Nov 3 07:20:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 07:20:19 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v9] In-Reply-To: References: Message-ID: <-fbhk6x06Qi7XeEWmP9KCfUByQGjJJXj-CmPk1YcHGs=.d7125bd5-9b96-4a4f-a83c-a3886261958d@github.com> > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 106 additional commits since the last revision: - Merge branch 'master' into JDK-8367531-fix-addDataName - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName - Manuel's suggestions Co-authored-by: Manuel H?ssig - Merge branch 'master' into JDK-8367531-fix-addDataName - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Manuel H?ssig - improve tutorial for Manuel - fix TestMethodArguments.java after merge with master - fix tests after integration of Expressions/Operations - Merge branch 'master' into JDK-8367531-fix-addDataName - ... and 96 more: https://git.openjdk.org/jdk/compare/34aa819b...18b895f3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/317e3e8b..18b895f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=07-08 Stats: 50185 lines in 611 files changed: 26632 ins; 20326 del; 3227 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Mon Nov 3 07:20:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 07:20:29 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v8] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 11:36:54 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 105 additional commits since the last revision: > > - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName > - Manuel's suggestions > > Co-authored-by: Manuel H?ssig > - Merge branch 'master' into JDK-8367531-fix-addDataName > - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName > - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java > > Co-authored-by: Manuel H?ssig > - improve tutorial for Manuel > - fix TestMethodArguments.java after merge with master > - fix tests after integration of Expressions/Operations > - Merge branch 'master' into JDK-8367531-fix-addDataName > - fix test > - ... and 95 more: https://git.openjdk.org/jdk/compare/5d7270a8...317e3e8b @chhagedorn @robcasloz gentle ping ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3479206563 From chagedorn at openjdk.org Mon Nov 3 08:10:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Nov 2025 08:10:06 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 16:27:11 GMT, Damon Fenacci wrote: > ## Issue > Today, the only practical ways to run IR Framework scenarios in parallel seems to be: > * spawning threads manually in a single test, or > * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). > > This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. > > ## Change > This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: > * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) > * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). > * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. > > ## Testing > * Tier 1-3+ > * explicit `ir_framework.tests` runs > * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) > > As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. Overall looks good, thanks for improving this! I left a few suggestions. Now the only question remaining is which tests would already benefit from using the parallel version. I guess we can investigate that separately. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 456: > 454: } > 455: } else { > 456: startWithScenarios(!FORCE_SEQUENTIAL_SCENARIOS && parallel); Maybe we can already handle `FORCE_SEQUENTIAL_SCENARIOS` directly in `startParallel()`. Then `parallel` really means parallel. You could also add an additional API comment for `startParallel()` that we can force disable it with the corresponding property flag. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 762: > 760: outcome = new Outcome(scenario, null, null); > 761: } catch (TestFormatException e) { > 762: outcome = new Outcome(scenario, e, null); Why do you collect the format exceptions here and only throw them later? Is a fail-fast not possible? test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 772: > 770: System.out.println(output); > 771: } > 772: } This will probably also not be sorted by scenario index? Could we also just gather it here and then dump it after the stream? Maybe we can put `output` into `Outcome` as well as the exceptions by using a `ConcurrentSkipListMap` map in the parallel case or a normal `TreeMap` in the non-parallel case. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 787: > 785: outcomes.stream() > 786: .filter(o -> o.other() != null) > 787: .forEach(o -> exceptionMap.put(o.scenario(), o.other())); You could use a `ConcurrentSkipListMap` in the parallel case instead of a tree map. This would allow us to modify the map in parallel in the stream processing and simplify the code. Moreover, it will be sorted by scenario index which I'm not sure is currently the case? test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 847: > 845: * test VM flags, which also determine if IR matching should be done, and then starts the test VM to execute all tests. > 846: */ > 847: private void start(Scenario scenario, PrintStream printStream) { It might be time to refactor this method and create a scenario version and a non-scenario version. But that's for another day... test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 927: > 925: if (testVMProcess == null) { > 926: throw new TestFrameworkException("TestVMProcess is null"); > 927: } You can use this utility method instead: Suggestion: TestFramework.check(testVMProcess != null, "TestVMProcess must not be null"); test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenarios.java line 2: > 1: /* > 2: * Copyright (c) 2021, 2023, Oracle and/or its affiliates. All rights reserved. We can finally comment on hidden non-modified code ? You should also update the copyright here: Suggestion: * Copyright (c) 2021, 2025, Oracle and/or its affiliates. All rights reserved. ------------- PR Review: https://git.openjdk.org/jdk/pull/28065#pullrequestreview-3409620122 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485608824 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485465527 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485582995 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485476754 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485599238 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485585464 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485612986 From chagedorn at openjdk.org Mon Nov 3 08:37:02 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Nov 2025 08:37:02 GMT Subject: RFR: 8370878: C1: Clean up unnecessary ConversionStub constructor [v2] In-Reply-To: References: Message-ID: On Sat, 1 Nov 2025 14:14:37 GMT, Zihao Lin wrote: >> C1: Clean up unnecessary ConversionStub constructor >> Remove class which should not reach. > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > fix arm Looks good, thanks for cleaning it up! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28096#pullrequestreview-3409906428 From wenanjian at openjdk.org Mon Nov 3 08:38:49 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 3 Nov 2025 08:38:49 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v16] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: change some instruction's sequence to make it more hardware-friendly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/1cf06a35..5bb019b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=14-15 Stats: 15 lines in 1 file changed: 4 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From duke at openjdk.org Mon Nov 3 08:40:02 2025 From: duke at openjdk.org (duke) Date: Mon, 3 Nov 2025 08:40:02 GMT Subject: RFR: 8370878: C1: Clean up unnecessary ConversionStub constructor [v2] In-Reply-To: References: Message-ID: On Sat, 1 Nov 2025 14:14:37 GMT, Zihao Lin wrote: >> C1: Clean up unnecessary ConversionStub constructor >> Remove class which should not reach. > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > fix arm @linzihao1999 Your change (at version a73b5282146954dfb7727033babee1b762fbe9ac) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28096#issuecomment-3479423118 From mhaessig at openjdk.org Mon Nov 3 09:17:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 3 Nov 2025 09:17:17 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v9] In-Reply-To: <2rpTqTSOzGtT6SCXvjIrzH1iPBj1zMXXBH0RdQxQiok=.e59eb3a8-5f2d-4638-8f58-ab4c29c95a05@github.com> References: <2rpTqTSOzGtT6SCXvjIrzH1iPBj1zMXXBH0RdQxQiok=.e59eb3a8-5f2d-4638-8f58-ab4c29c95a05@github.com> Message-ID: On Wed, 29 Oct 2025 20:23:10 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - Add new asserts and change special case calculations > - Merge branch 'master' of https://github.com/openjdk/jdk into better_interger_div_type > - Add additional nodes to fail conditions to detect idealized/transformed DivI Nodes that did not constant fold > - Remove checks for bottom and reorganize DivI/DivL Value functions > - Adjust long constant folding test as well > - Adjust test, assert and comments > - Remove too strict assert from old code path > - Fix if condition > - Simplify the special case path > - Add a simple path for non-special-case corner calculation > - ... and 15 more: https://git.openjdk.org/jdk/compare/32697bf6...45a91bd0 Testing tier1 up to tier6 passed. If you move the test out of the `irTest` directory, this will be good to go. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3410075919 From chagedorn at openjdk.org Mon Nov 3 09:28:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Nov 2025 09:28:11 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v2] In-Reply-To: References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: On Tue, 28 Oct 2025 16:30:24 GMT, Emanuel Peter wrote: >> Analysis: >> `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. >> >> It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. >> >> What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. >> >> I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. >> >> Future Work: >> - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. >> - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > allow unique out with multiple uses That looks good to me and I agree to add verification for that case with `VerifyLoopOptimizations` at some point. src/hotspot/share/opto/loopopts.cpp line 235: > 233: // just split through now has no use any more, it also > 234: // has to be removed. > 235: IdealLoopTree* region_loop = get_loop(region); The method is already quite large. This could probably nicely be extracted to a "yank_old_nodes()" method or something like that. You can keep the ` _igvn.replace_node(n, phi)` in this method. src/hotspot/share/opto/loopopts.cpp line 236: > 234: // has to be removed. > 235: IdealLoopTree* region_loop = get_loop(region); > 236: if (region->is_Loop() && region_loop->_child == nullptr) { I think you can use `region_loop->is_innermost()` instead: Suggestion: if (region->is_Loop() && region_loop->is_innermost()) { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27955#pullrequestreview-3410062237 PR Review Comment: https://git.openjdk.org/jdk/pull/27955#discussion_r2485788488 PR Review Comment: https://git.openjdk.org/jdk/pull/27955#discussion_r2485773527 From epeter at openjdk.org Mon Nov 3 10:20:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 10:20:52 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v3] In-Reply-To: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: > Analysis: > `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. > > It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. > > What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. > > I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. > > Future Work: > - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. > - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27955/files - new: https://git.openjdk.org/jdk/pull/27955/files/98dbf27b..833085f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27955&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27955&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27955.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27955/head:pull/27955 PR: https://git.openjdk.org/jdk/pull/27955 From luhenry at openjdk.org Mon Nov 3 10:30:04 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 3 Nov 2025 10:30:04 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 16:38:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. > > ==================== > > In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. > As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. > > [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 > > Thanks! > > Tests running... Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28047#pullrequestreview-3410365345 From mli at openjdk.org Mon Nov 3 10:38:47 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Nov 2025 10:38:47 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 10:27:40 GMT, Ludovic Henry wrote: >> Hi, >> Can you help to review this patch? >> >> [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. >> >> ==================== >> >> In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. >> As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. >> >> [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 >> >> Thanks! >> >> Tests running... > > Marked as reviewed by luhenry (Committer). @luhenry Thank you for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28047#issuecomment-3479868473 From mli at openjdk.org Mon Nov 3 10:38:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Nov 2025 10:38:48 GMT Subject: Integrated: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: <7GtQ60n-HoBET6UBpVSsy1ux-n8b6wp6Cjl-wR9T0Js=.17ebdc1e-b8c2-47fd-9801-0891ef61b386@github.com> On Wed, 29 Oct 2025 16:38:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. > > ==================== > > In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. > As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. > > [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 > > Thanks! > > Tests running... This pull request has now been integrated. Changeset: 667744c3 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/667744c353e4d6abbe5cbf85746e5e0e44dafaf8 Stats: 1438 lines in 3 files changed: 1337 ins; 37 del; 64 mod 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP Reviewed-by: epeter, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/28047