From jbhateja at openjdk.org Sat Nov 1 01:52:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 1 Nov 2025 01:52:15 GMT Subject: Withdrawn: 8370409: Incorrect computation in Float16 reduction loop In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 14:36:21 GMT, Jatin Bhateja wrote: > Current floatToFloat16 intrinsic implementation always sign-extends the 16-bit short result to a 32-bit value in anticipation of safe consumption by subsequent integral (comparison) operation[s]. However, the safest way to compare two Float16 values is to use Float16.compare/compareTo method, given that floating point comparisons can also be unordered. > > e.g., both 64512 and -1024 are equivalent bit representations of the Float16 -Inf value, but are not numerically equivalent with integral comparison. > jshell> Float16.compare(Float16.shortBitsToFloat16((short)-1024), Float16.shortBitsToFlot16((short)64512)) > $3 ==> 0 > > In the scalar intrinsic of Float16.add/sub/mul/div/min/max, we always return a boxed value, which is then operated upon by the subsequent Float16 APIs. While Float.floatToFloat16 intrinsic always returns a 'short' value, this is special in the sense that even though the carrier type is 'short' but it encodes an IEEE 754 half precision value, being a short carrier if they get exposed to integral operators, then as per JVM specification, short must be sign-extended before operation. > > Given that our Float16 binary operations inference is based on generic pattern match and is agnostic to how that graph pallet got created, i.e., either through Float16.* APIs or by explicit Float.float16ToFloat/floatToFloat16 operations, hence it's safe to sign-extend the result in all cases. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/27977 From syan at openjdk.org Sat Nov 1 04:46:05 2025 From: syan at openjdk.org (SendaoYan) Date: Sat, 1 Nov 2025 04:46:05 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 16:39:07 GMT, Roland Westrelin wrote: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. test/hotspot/jtreg/compiler/inlining/TestLateMHClonedCallNode.java line 28: > 26: * @bug 8370939 > 27: * @summary C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() > 28: * @run main/othervm -XX:-BackgroundCompilation -XX:CompileOnly=TestLateMHClonedCallNode::test1 -XX:CompileOnly=TestLateMHClonedCallNode::test2 TestLateMHClonedCallNode Maybe we can split this as two lines ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2483106650 From duke at openjdk.org Sat Nov 1 12:27:48 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 1 Nov 2025 12:27:48 GMT Subject: RFR: 8370878: C1: Clean up unnecessary ConversionStub constructor Message-ID: C1: Clean up unnecessary ConversionStub constructor Remove class which should not reach. ------------- Commit messages: - C1: Clean up unnecessary ConversionStub constructor Changes: https://git.openjdk.org/jdk/pull/28096/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28096&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370878 Stats: 42 lines in 3 files changed: 0 ins; 39 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28096/head:pull/28096 PR: https://git.openjdk.org/jdk/pull/28096 From duke at openjdk.org Sat Nov 1 14:14:37 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 1 Nov 2025 14:14:37 GMT Subject: RFR: 8370878: C1: Clean up unnecessary ConversionStub constructor [v2] In-Reply-To: References: Message-ID: > C1: Clean up unnecessary ConversionStub constructor > Remove class which should not reach. Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: fix arm ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28096/files - new: https://git.openjdk.org/jdk/pull/28096/files/5b4df065..a73b5282 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28096&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28096&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28096/head:pull/28096 PR: https://git.openjdk.org/jdk/pull/28096 From duke at openjdk.org Sat Nov 1 15:04:37 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 1 Nov 2025 15:04:37 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue Message-ID: If nodes both are constant, support constant folding. ------------- Commit messages: - C2: Improve (U)MulHiLNode::MulHiValue Changes: https://git.openjdk.org/jdk/pull/28097/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370196 Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From wenanjian at openjdk.org Sat Nov 1 15:29:56 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 1 Nov 2025 15:29:56 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v14] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: update some register use and instruction use ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/4039116c..26ea7628 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=12-13 Stats: 44 lines in 1 file changed: 5 ins; 8 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Sat Nov 1 15:29:58 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 1 Nov 2025 15:29:58 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v13] In-Reply-To: References: <9NCXWsBW5TTtNLxDqIInodSU-nLiaf86r2dyMtoKklM=.0964bb38-e5cb-499d-a9fc-4efdca0ecfd0@github.com> Message-ID: <2_UqadjXSXckE8l38MkkX8AhDONqn4qRgPbSP_Pylcs=.d1289413-ce94-4775-9150-020347fded57@github.com> On Fri, 31 Oct 2025 07:41:09 GMT, Fei Yang wrote: > Hi, I am having a look at the latest version. Some minor comments. thanks, I have modified the code and solve all these comments ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3476459666 From wenanjian at openjdk.org Sun Nov 2 02:04:51 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sun, 2 Nov 2025 02:04:51 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v15] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: modify some var names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/26ea7628..1cf06a35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=13-14 Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From hgreule at openjdk.org Sun Nov 2 09:56:00 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sun, 2 Nov 2025 09:56:00 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue In-Reply-To: References: Message-ID: On Sat, 1 Nov 2025 14:50:27 GMT, Zihao Lin wrote: > If nodes both are constant, support constant folding. Thanks for working on this. A few things: - You need tests to cover this change. The `Math.multiplyHigh(...)` and `Math.unsignedMultiplyHigh(...)` methods can be used to test this from the Java world. See e.g., #26143 or #25254 for inspiration. - The current method is for both unsigned and signed multiplication. You either have to deal with that directly there or get rid of that method and implement it directly in the respective `Value(...)` methods (the latter might be cleaner imo). - For unsigned multiplication, you can use the unsigned bounds (_uhi, _ulo) - I think extending from simple constant folding to intervals isn't that much more work. From my understanding, there shouldn't be any overflows that need to be handled. This would also automatically deal with cases like `multiplyHigh(x, 0)` etc. - The bottom checks are unneeded and can be removed (in fact, they would otherwise prevent proper calculation of the previous example) - Make sure to follow the code style: `T* v`; `if (a) {` spacing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28097#issuecomment-3477676797 From qamai at openjdk.org Sun Nov 2 15:48:37 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 2 Nov 2025 15:48:37 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v3] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Move dual to ASSERT only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28051/files - new: https://git.openjdk.org/jdk/pull/28051/files/1960854f..8c9f560e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=01-02 Stats: 83 lines in 2 files changed: 49 ins; 1 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051 From fyang at openjdk.org Mon Nov 3 03:11:09 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 3 Nov 2025 03:11:09 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v15] In-Reply-To: References: Message-ID: On Sun, 2 Nov 2025 02:04:51 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify some var names src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2623: > 2621: __ rev8(tmp1, tmp1); > 2622: __ sd(tmp1, Address(counter)); > 2623: } Can you add some code comment and maybe assertions about the input registers? Like: // Big-endian 128-bit + 64-bit -> 128-bit addition. void be_inc_counter_128(Register counter, Register tmp1, Register tmp2) { assert_different_registers(counter, tmp1, tmp2, t0); __ ld(tmp1, Address(counter, 8)); // Load 128-bits from counter __ ld(tmp2, Address(counter)); __ rev8(tmp1, tmp1); // Convert big-endian to little-endian __ rev8(tmp2, tmp2); __ addi(tmp1, tmp1, 1); __ seqz(t0, tmp1); // Check for result overflow __ add(tmp2, tmp2, t0); // Add 1 if overflow otherwise 0 __ rev8(tmp1, tmp1); // Convert little-endian to big-endian __ rev8(tmp2, tmp2); __ sd(tmp1, Address(counter, 8)); // Store 128-bits from counter __ sd(tmp2, Address(counter)); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2485208392 From xgong at openjdk.org Mon Nov 3 06:44:10 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Nov 2025 06:44:10 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v7] In-Reply-To: References: Message-ID: <5PrIYztcDbOPX9aY35VB-t8agqVbNvjHv6-ypPtdm7M=.a05cb69c-fee9-4b42-a7cc-26a9d79d40f8@github.com> > This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform. > > ### Background > Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register. > > ### Implementation > > #### Challenges > Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints. > > For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches: > - SPECIES_64: Single operation with mask (8 elements, 256-bit) > - SPECIES_128: Single operation, full register (16 elements, 512-bit) > - SPECIES_256: Two operations + merge (32 elements, 1024-bit) > - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit) > > Use `ByteVector.SPECIES_512` as an example: > - It contains 64 elements. So the index vector size should be `64 * 32` bits, which is 4 times of the SVE vector register size. > - It requires 4 times of vector gather-loads to finish the whole operation. > > > byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...] > int[] idx = [0, 1, 2, 3, ..., 63, ...] > > 4 gather-load: > idx_v1 = [15 14 13 ... 1 0] gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa] > idx_v2 = [31 30 29 ... 17 16] gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb] > idx_v3 = [47 46 45 ... 33 32] gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc] > idx_v4 = [63 62 61 ... 49 48] gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd] > merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa] > > > #### Solution > The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end. > > Here is the main changes: > - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher. > - Added `VectorSliceNode` for result merging. > - Added `VectorMaskWidenNode` for mask spliting and type conversion fo... Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'jdk:master' into JDK-8351623-sve - Add more comments for IRs and added method - Merge branch 'jdk:master' into JDK-8351623-sve - Merge 'jdk:master' into JDK-8351623-sve - Address review comments - Refine IR pattern and clean backend rules - Fix indentation issue and move the helper matcher method to header files - Merge branch jdk:master into JDK-8351623-sve - 8351623: VectorAPI: Add SVE implementation of subword gather load operation ------------- Changes: https://git.openjdk.org/jdk/pull/26236/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26236&range=06 Stats: 1071 lines in 20 files changed: 907 ins; 24 del; 140 mod Patch: https://git.openjdk.org/jdk/pull/26236.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26236/head:pull/26236 PR: https://git.openjdk.org/jdk/pull/26236 From epeter at openjdk.org Mon Nov 3 06:48:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 06:48:25 GMT Subject: RFR: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer [v4] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 19:27:02 GMT, Dean Long wrote: >> @dean-long Does it look better now? > >> @dean-long Does it look better now? > > Yes, much better, thanks! @dean-long @TobiHartmann @jatin-bhateja thanks for the quick reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28062#issuecomment-3479126542 From epeter at openjdk.org Mon Nov 3 06:48:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 06:48:26 GMT Subject: Integrated: 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 14:51:53 GMT, Emanuel Peter wrote: > It seems we keep finding issues in `CompressBitsNode::Value`, using the `TemplateFramework` https://github.com/openjdk/jdk/pull/26885. > > This is a JDK26 regression of the bugfix https://github.com/openjdk/jdk/pull/23947, which was itself reported by my prototype of the `TemplateFramework`. > > The bug is simple: On windows `1UL` is only a 32-bit value, and not a 64-bit value. We should use `1ULL` instead. Impacted lines: > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L276 > https://github.com/openjdk/jdk/blob/b02c1256768bc9983d4dba899cd19219e11a380a/src/hotspot/share/opto/intrinsicnode.cpp#L379 > > This means that simple cases like these wrongly constant fold to zero: > - `Long.compress(-2683206580L, Integer.toUnsignedLong(x))` > - `Long.compress(x, 0xffff_ffffL)` > > ------------------------------------------------------------------ > > This sort of bug (`1UL` vs `1ULL`) is of course very subtle, and easy to miss in a code review. So that is why testing is paramount. > > Why was this not caught in the testing of https://github.com/openjdk/jdk/pull/23947? After all there were quite a few tests there, right? There were simply not enough tests, or not the right ones ;) > > I did at the time ask for a "range-based" test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251). I then doubled down and even proposed a conctete test (https://github.com/openjdk/jdk/pull/23947#issuecomment-2935548411) that would create "**range-based**" inputs: > > public static test(int mask, int src) { > mask = Math.max(CON1, Math.min(CON2, mask)); > src = Math.max(CON2, Math.min(CON4, src)); > result = Integer.compress(src, mask); > int sum = 0; > if (sum > LIMIT_1) { sum += 1; } > if (sum > LIMIT_2) { sum += 2; } > if (sum > LIMIT_3) { sum += 4; } > if (sum > LIMIT_4) { sum += 8; } > if (sum > LIMIT_5) { sum += 16; } > if (sum > LIMIT_6) { sum += 32; } > if (sum > LIMIT_7) { sum += 64; } > if (sum > LIMIT_8) { sum += 128; } > return new int[] {sum, result}; > } > > What is implortant here: both the `src` and `mask` must have random ranges. But the test that ended up being integrated only made the `src` "range-based" using the `min/max`. **Without the `mask` being tested "range-based", the bug here could not have been caught by that test**. > > I was asked again for my review (https://github.com/openjdk/jdk/pull/23947#issuecomment-3062355806), but I had to go on vacation, and was not able to catch the issue (https://github.com/openj... This pull request has now been integrated. Changeset: 0ca0852d Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/0ca0852d78d643c211d36b753a734dac0cd2800a Stats: 111 lines in 2 files changed: 96 ins; 0 del; 15 mod 8370459: C2: CompressBitsNode::Value produces wrong result on Windows (1UL vs 1ULL), found by ExpressionFuzzer Reviewed-by: dlong, jbhateja, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/28062 From epeter at openjdk.org Mon Nov 3 06:58:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 06:58:19 GMT Subject: Integrated: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 10:40:18 GMT, Emanuel Peter wrote: > Note: @oliviermattmann found this bug with his whitebox fuzzer. See also https://github.com/openjdk/jdk/pull/27991 > > **Analysis** > We run Escape Analysis, and see that a local array allocation could possibly be removed, we only have matching `StoreI` to the `int[]`. But there is one `StoreI` that is still in a loop, and so we wait with the actual allocation removal until later, hoping it may go away, or drop out of the loop. > During loop opts, the `StoreI` drops out of the loop, now there should be nothing in the way of allocation removal. > But now we run `MergeStores`, and merge two of the `StoreI` into a mismatched `StoreL`. > > Then, we eventually remove the allocation, but don't check again if any new mismatched store has appeared. > Instead of a `ConI`, we receive a `ConL`, for the first of the two merged `StoreI`. The second merged `StoreI` instead captures the state before the `StoreL`, and that is wrong. > > **Solution** > We should have some assert, that checks that the captured `field_val` corresponds to the expected `field_type`. > > But the real fix was suggested by @merykitty : apparently he just had a similar issue in Valhalla: > https://github.com/openjdk/valhalla/blame/60af17ff5995cfa5de075332355f7f475c163865/src/hotspot/share/opto/macro.cpp#L709-L713 > (the idea is to bail out of the elimination if any of the found stores are mismatched.) > > **Details** > > How the bad sequence develops, and which components are involved. > > 1) The `SafePoint` contains a `ConL` and 3 `ConI`. (Correct would have been 4 `ConI`) > > 6 ConI === 23 [[ 4 ]] #int:16777216 > 7 ConI === 23 [[ 4 ]] #int:256 > 8 ConI === 23 [[ 4 ]] #int:1048576 > 9 ConL === 23 [[ 4 ]] #long:68719476737 > 54 DefinitionSpillCopy === _ 27 [[ 16 12 4 ]] > 4 CallStaticJavaDirect === 47 29 30 26 32 33 0 34 0 54 9 8 7 6 [[ 5 3 52 ]] Static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # void ( int ) C=0.000100 Test::test @ bci:38 (line 21) reexecute !jvms: Test::test @ bci:38 (line 21) > > > 2) This is then encoded into an `ObjectValue`. A `Type::Long` / `ConL` is converted into a `[int=0, long=ConL]` pair, see: > https://github.com/openjdk/jdk/blob/da7121aff9eccb046b82a75093034f1cdbd9b9e4/src/hotspot/share/opto/output.cpp#L920-L925 > If I understand it right, there zero is just a placeholder. > > And so we get: > > (rr) p sv->print_fields_on(tty) > Fields: 0, 68719476737, 1048576, 256, 16777216 > > We can see the `zero`, followed by the `ConL`, and then 3 `ConI`. > > This se... This pull request has now been integrated. Changeset: 09a047f0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/09a047f00c88d14505c42a966dedbc87b9be5bdf Stats: 375 lines in 5 files changed: 375 ins; 0 del; 0 mod 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination Co-authored-by: Olivier Mattmann Co-authored-by: Quan Anh Mai Reviewed-by: kvn, qamai ------------- PR: https://git.openjdk.org/jdk/pull/27997 From epeter at openjdk.org Mon Nov 3 06:58:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 06:58:18 GMT Subject: RFR: 8370405: C2: mismatched store from MergeStores wrongly scalarized in allocation elimination [v2] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 07:03:51 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8370405-alloc-elimination-and-MergeStores >> - only verify primitive types >> - Apply suggestions from code review >> - more assert adjustment >> - ignore debug flag >> - id for tests, and fix up the assert >> - pass int for short slot >> - another test >> - improve test >> - wip new IR test >> - ... and 6 more: https://git.openjdk.org/jdk/compare/6dd1ad30...b6e032c2 > > Regardless, I think this patch makes sense. Bailing out of scalar elimination when we are doing it is better than when we are running EA, and we should generally try to do it if we can. @merykitty @vnkozlov Thanks for the review and discussion! @dougxc Thanks for checking for Graal and getting us a quick response :) And thanks to Olivier Mattmann <[olivier.mattmann at bluewin.ch](mailto:olivier.mattmann at bluewin.ch)> for finding the bug! @mhaessig I decided to file this RFE, in case someone wants to invest time in it: [JDK-8371122](https://bugs.openjdk.org/browse/JDK-8371122) C2 Allocation Elimination: handle some mismatched accesses to arrays ------------- PR Comment: https://git.openjdk.org/jdk/pull/27997#issuecomment-3479146291 From epeter at openjdk.org Mon Nov 3 07:20:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 07:20:19 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v9] In-Reply-To: References: Message-ID: <-fbhk6x06Qi7XeEWmP9KCfUByQGjJJXj-CmPk1YcHGs=.d7125bd5-9b96-4a4f-a83c-a3886261958d@github.com> > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 106 additional commits since the last revision: - Merge branch 'master' into JDK-8367531-fix-addDataName - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName - Manuel's suggestions Co-authored-by: Manuel H?ssig - Merge branch 'master' into JDK-8367531-fix-addDataName - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Manuel H?ssig - improve tutorial for Manuel - fix TestMethodArguments.java after merge with master - fix tests after integration of Expressions/Operations - Merge branch 'master' into JDK-8367531-fix-addDataName - ... and 96 more: https://git.openjdk.org/jdk/compare/34aa819b...18b895f3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/317e3e8b..18b895f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=07-08 Stats: 50185 lines in 611 files changed: 26632 ins; 20326 del; 3227 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Mon Nov 3 07:20:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 07:20:29 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v8] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 11:36:54 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 105 additional commits since the last revision: > > - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName > - Manuel's suggestions > > Co-authored-by: Manuel H?ssig > - Merge branch 'master' into JDK-8367531-fix-addDataName > - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName > - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java > > Co-authored-by: Manuel H?ssig > - improve tutorial for Manuel > - fix TestMethodArguments.java after merge with master > - fix tests after integration of Expressions/Operations > - Merge branch 'master' into JDK-8367531-fix-addDataName > - fix test > - ... and 95 more: https://git.openjdk.org/jdk/compare/5d7270a8...317e3e8b @chhagedorn @robcasloz gentle ping ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3479206563 From chagedorn at openjdk.org Mon Nov 3 08:10:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Nov 2025 08:10:06 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 16:27:11 GMT, Damon Fenacci wrote: > ## Issue > Today, the only practical ways to run IR Framework scenarios in parallel seems to be: > * spawning threads manually in a single test, or > * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). > > This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. > > ## Change > This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: > * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) > * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). > * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. > > ## Testing > * Tier 1-3+ > * explicit `ir_framework.tests` runs > * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) > > As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. Overall looks good, thanks for improving this! I left a few suggestions. Now the only question remaining is which tests would already benefit from using the parallel version. I guess we can investigate that separately. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 456: > 454: } > 455: } else { > 456: startWithScenarios(!FORCE_SEQUENTIAL_SCENARIOS && parallel); Maybe we can already handle `FORCE_SEQUENTIAL_SCENARIOS` directly in `startParallel()`. Then `parallel` really means parallel. You could also add an additional API comment for `startParallel()` that we can force disable it with the corresponding property flag. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 762: > 760: outcome = new Outcome(scenario, null, null); > 761: } catch (TestFormatException e) { > 762: outcome = new Outcome(scenario, e, null); Why do you collect the format exceptions here and only throw them later? Is a fail-fast not possible? test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 772: > 770: System.out.println(output); > 771: } > 772: } This will probably also not be sorted by scenario index? Could we also just gather it here and then dump it after the stream? Maybe we can put `output` into `Outcome` as well as the exceptions by using a `ConcurrentSkipListMap` map in the parallel case or a normal `TreeMap` in the non-parallel case. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 787: > 785: outcomes.stream() > 786: .filter(o -> o.other() != null) > 787: .forEach(o -> exceptionMap.put(o.scenario(), o.other())); You could use a `ConcurrentSkipListMap` in the parallel case instead of a tree map. This would allow us to modify the map in parallel in the stream processing and simplify the code. Moreover, it will be sorted by scenario index which I'm not sure is currently the case? test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 847: > 845: * test VM flags, which also determine if IR matching should be done, and then starts the test VM to execute all tests. > 846: */ > 847: private void start(Scenario scenario, PrintStream printStream) { It might be time to refactor this method and create a scenario version and a non-scenario version. But that's for another day... test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 927: > 925: if (testVMProcess == null) { > 926: throw new TestFrameworkException("TestVMProcess is null"); > 927: } You can use this utility method instead: Suggestion: TestFramework.check(testVMProcess != null, "TestVMProcess must not be null"); test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenarios.java line 2: > 1: /* > 2: * Copyright (c) 2021, 2023, Oracle and/or its affiliates. All rights reserved. We can finally comment on hidden non-modified code ? You should also update the copyright here: Suggestion: * Copyright (c) 2021, 2025, Oracle and/or its affiliates. All rights reserved. ------------- PR Review: https://git.openjdk.org/jdk/pull/28065#pullrequestreview-3409620122 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485608824 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485465527 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485582995 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485476754 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485599238 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485585464 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2485612986 From chagedorn at openjdk.org Mon Nov 3 08:37:02 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Nov 2025 08:37:02 GMT Subject: RFR: 8370878: C1: Clean up unnecessary ConversionStub constructor [v2] In-Reply-To: References: Message-ID: On Sat, 1 Nov 2025 14:14:37 GMT, Zihao Lin wrote: >> C1: Clean up unnecessary ConversionStub constructor >> Remove class which should not reach. > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > fix arm Looks good, thanks for cleaning it up! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28096#pullrequestreview-3409906428 From wenanjian at openjdk.org Mon Nov 3 08:38:49 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 3 Nov 2025 08:38:49 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v16] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: change some instruction's sequence to make it more hardware-friendly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/1cf06a35..5bb019b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=14-15 Stats: 15 lines in 1 file changed: 4 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From duke at openjdk.org Mon Nov 3 08:40:02 2025 From: duke at openjdk.org (duke) Date: Mon, 3 Nov 2025 08:40:02 GMT Subject: RFR: 8370878: C1: Clean up unnecessary ConversionStub constructor [v2] In-Reply-To: References: Message-ID: On Sat, 1 Nov 2025 14:14:37 GMT, Zihao Lin wrote: >> C1: Clean up unnecessary ConversionStub constructor >> Remove class which should not reach. > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > fix arm @linzihao1999 Your change (at version a73b5282146954dfb7727033babee1b762fbe9ac) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28096#issuecomment-3479423118 From mhaessig at openjdk.org Mon Nov 3 09:17:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 3 Nov 2025 09:17:17 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v9] In-Reply-To: <2rpTqTSOzGtT6SCXvjIrzH1iPBj1zMXXBH0RdQxQiok=.e59eb3a8-5f2d-4638-8f58-ab4c29c95a05@github.com> References: <2rpTqTSOzGtT6SCXvjIrzH1iPBj1zMXXBH0RdQxQiok=.e59eb3a8-5f2d-4638-8f58-ab4c29c95a05@github.com> Message-ID: On Wed, 29 Oct 2025 20:23:10 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - Add new asserts and change special case calculations > - Merge branch 'master' of https://github.com/openjdk/jdk into better_interger_div_type > - Add additional nodes to fail conditions to detect idealized/transformed DivI Nodes that did not constant fold > - Remove checks for bottom and reorganize DivI/DivL Value functions > - Adjust long constant folding test as well > - Adjust test, assert and comments > - Remove too strict assert from old code path > - Fix if condition > - Simplify the special case path > - Add a simple path for non-special-case corner calculation > - ... and 15 more: https://git.openjdk.org/jdk/compare/32697bf6...45a91bd0 Testing tier1 up to tier6 passed. If you move the test out of the `irTest` directory, this will be good to go. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3410075919 From chagedorn at openjdk.org Mon Nov 3 09:28:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Nov 2025 09:28:11 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v2] In-Reply-To: References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: On Tue, 28 Oct 2025 16:30:24 GMT, Emanuel Peter wrote: >> Analysis: >> `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. >> >> It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. >> >> What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. >> >> I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. >> >> Future Work: >> - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. >> - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > allow unique out with multiple uses That looks good to me and I agree to add verification for that case with `VerifyLoopOptimizations` at some point. src/hotspot/share/opto/loopopts.cpp line 235: > 233: // just split through now has no use any more, it also > 234: // has to be removed. > 235: IdealLoopTree* region_loop = get_loop(region); The method is already quite large. This could probably nicely be extracted to a "yank_old_nodes()" method or something like that. You can keep the ` _igvn.replace_node(n, phi)` in this method. src/hotspot/share/opto/loopopts.cpp line 236: > 234: // has to be removed. > 235: IdealLoopTree* region_loop = get_loop(region); > 236: if (region->is_Loop() && region_loop->_child == nullptr) { I think you can use `region_loop->is_innermost()` instead: Suggestion: if (region->is_Loop() && region_loop->is_innermost()) { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27955#pullrequestreview-3410062237 PR Review Comment: https://git.openjdk.org/jdk/pull/27955#discussion_r2485788488 PR Review Comment: https://git.openjdk.org/jdk/pull/27955#discussion_r2485773527 From epeter at openjdk.org Mon Nov 3 10:20:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 10:20:52 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v3] In-Reply-To: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: > Analysis: > `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. > > It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. > > What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. > > I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. > > Future Work: > - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. > - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27955/files - new: https://git.openjdk.org/jdk/pull/27955/files/98dbf27b..833085f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27955&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27955&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27955.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27955/head:pull/27955 PR: https://git.openjdk.org/jdk/pull/27955 From luhenry at openjdk.org Mon Nov 3 10:30:04 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 3 Nov 2025 10:30:04 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 16:38:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. > > ==================== > > In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. > As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. > > [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 > > Thanks! > > Tests running... Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28047#pullrequestreview-3410365345 From mli at openjdk.org Mon Nov 3 10:38:47 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Nov 2025 10:38:47 GMT Subject: RFR: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 10:27:40 GMT, Ludovic Henry wrote: >> Hi, >> Can you help to review this patch? >> >> [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. >> >> ==================== >> >> In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. >> As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. >> >> [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 >> >> Thanks! >> >> Tests running... > > Marked as reviewed by luhenry (Committer). @luhenry Thank you for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28047#issuecomment-3479868473 From mli at openjdk.org Mon Nov 3 10:38:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Nov 2025 10:38:48 GMT Subject: Integrated: 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP In-Reply-To: References: Message-ID: <7GtQ60n-HoBET6UBpVSsy1ux-n8b6wp6Cjl-wR9T0Js=.17ebdc1e-b8c2-47fd-9801-0891ef61b386@github.com> On Wed, 29 Oct 2025 16:38:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481) introduces this regression for unsigned I/L EQ/NE in SLP. > > ==================== > > In [JDK-8370481](https://bugs.openjdk.org/browse/JDK-8370481), we fixed an issue related to transformation from (Bool + CmpU + CMove) to (VectorMaskCmp + VectorBlend), and added tests for unsigned ones. > As discussion in [1], we should also add more tests for transformation from (Bool + Cmp + CMove) to (VectorMaskCmp + VectorBlend) for the signed ones. > > [1] https://github.com/openjdk/jdk/pull/27942#discussion_r2468750039 > > Thanks! > > Tests running... This pull request has now been integrated. Changeset: 667744c3 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/667744c353e4d6abbe5cbf85746e5e0e44dafaf8 Stats: 1438 lines in 3 files changed: 1337 ins; 37 del; 64 mod 8370794: C2 SuperWord: Long/Integer.compareUnsigned return wrong value for EQ/NE in SLP Reviewed-by: epeter, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/28047 From epeter at openjdk.org Mon Nov 3 12:26:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 12:26:41 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model Message-ID: Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 Main goal: - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. **Why cost-model?** Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. **Implementation** Items: - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. - `VLoopAnalyzer::cost`: scalar loop cost - `VTransformGraph::cost`: vector loop cost - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. - Adapted existing tests. - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. **Testing** Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. ------------------------------ **Some History** I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). During JDK9, reductions were first vectorized, but then restricted for "simple" and "2-element" reductions: - [JDK-8074981](https://bugs.openjdk.org/browse/JDK-8074981) Integer/FP scalar reduction optimization - Vectorized reduction, but led to a regression for some cases. - [JDK-8078563](https://bugs.openjdk.org/browse/JDK-8078563) Restrict reduction optimization - Disabled vectorization for many cases. It seems we disabled a bit too many cases, because the regression really only happened with the float/double add/mul cases with linear reductions. And the int/long reductions were not affected but still disabled. We filed the following RFE for investigation: - [JDK-8188313](https://bugs.openjdk.org/browse/JDK-8188313) C2: Consider enabling auto-vectorization for simple reductions (disabled by JDK-8078563) - Was never addressed. During JDK21, I further improved reductions: - [JDK-8302652](https://bugs.openjdk.org/browse/JDK-8302652) [SuperWord] Reduction should happen after loop, when possible - Now "simple" and "2-element" reductions of the int/long variety would be even more worth it, but still disabled because of [JDK-8078563](https://bugs.openjdk.org/browse/JDK-8078563). Other reports: - [JDK-8345044](https://bugs.openjdk.org/browse/JDK-8345044) Sum of array elements not vectorized - [JDK-8336000](https://bugs.openjdk.org/browse/JDK-8336000) C2 SuperWord: report that 2-element reductions do not vectorize - [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516) C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction And I've been mapping out the reduction performance with benchmarks: https://github.com/openjdk/jdk/pull/25387 You can see that we already used to vectorize a lot of cases, but especially did not vectorize: - "simple" reductions - "2-element" reductions Future Work, discovered while writing the attached IR test: - [JDK-8370671](https://bugs.openjdk.org/browse/JDK-8370671) C2 SuperWord [x86]: implement Long.max/min reduction for AVX2 - [JDK-8370673](https://bugs.openjdk.org/browse/JDK-8370673) C2 SuperWord [x86]: implement long mul reduction - [JDK-8370677](https://bugs.openjdk.org/browse/JDK-8370677) C2 SuperWord [aarch64]: implement sequential reduction for add/mul D/F - [JDK-8370685](https://bugs.openjdk.org/browse/JDK-8370685) C2 SuperWord: investigate why longMulBig does not vectorize - [JDK-8370686](https://bugs.openjdk.org/browse/JDK-8370686) C2 SuperWord [aarch64]: investigate long mul reductions performance on NEON ------------------------------------------------- **Reduction Benchmarks** Results from the benchmark https://github.com/openjdk/jdk/pull/25387 that is related to the attached IR test. Legend: - `master`: performance before this patch - `P1`: default with this patch, i.e. `-XX:AutoVectorizationOverrideProfitability=1`, relying on new cost-model. - `P0`: patch, but auto vectorization disabled, i.e. `-XX:AutoVectorizationOverrideProfitability=0`. - `P2`: patch, but auto vectorization forced, i.e. `-XX:AutoVectorizationOverrideProfitability=2`. How to look at the results below: - On the left, we have the raw performance numbers, and the errors. - On the right, we have the performance differences, marked with colors. - First focus on `P1 vs master`. Lower is better (marked green). - `P1 vs P0` gives you a view on how many cases already profit from auto vectorization in total. - `P1 vs P2` shows us how forced vectorization affects performance. There is basically no impact any more. See results from https://github.com/openjdk/jdk/pull/25387 to see that we used to have a lot of cases where forcing vectorization led to speedups. Note: some of the min/max benchmarks are not very stable. That is due to random input data: in some cases the scalar performance is better because it uses branching. `linux_x64` (AVX512) image `windows_x64` (AVX2 - ) image `macosx_x64_sandybridge` image `linux_aarch64` (NEON) image `macosx_aarch64` (NEON) image ------------- Commit messages: - simplify cost-model impl - fix IR rules for aarch64 NEON - rm assert - fix aarch64 long mul reduction perf issue - Merge branch 'master' into JDK-8340093-cost-model - fix ir test a bit more - fix some asimd ir rules - fix asimd add/mul f/d rules - AVX=0 ir rule adjustments - avx2 exception for mul long - ... and 27 more: https://git.openjdk.org/jdk/compare/c97d50d7...3f7ef58e Changes: https://git.openjdk.org/jdk/pull/27803/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27803&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340093 Stats: 2944 lines in 13 files changed: 2850 ins; 65 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/27803.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27803/head:pull/27803 PR: https://git.openjdk.org/jdk/pull/27803 From epeter at openjdk.org Mon Nov 3 12:31:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 12:31:43 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v4] In-Reply-To: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: > Analysis: > `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. > > It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. > > What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. > > I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. > > Future Work: > - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. > - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Refactor for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27955/files - new: https://git.openjdk.org/jdk/pull/27955/files/833085f2..02c411f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27955&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27955&range=02-03 Stats: 78 lines in 4 files changed: 46 ins; 31 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27955.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27955/head:pull/27955 PR: https://git.openjdk.org/jdk/pull/27955 From epeter at openjdk.org Mon Nov 3 12:37:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 12:37:45 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v2] In-Reply-To: References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: On Mon, 3 Nov 2025 09:25:33 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> allow unique out with multiple uses > > That looks good to me and I agree to add verification for that case with `VerifyLoopOptimizations` at some point. @chhagedorn Thanks for reviewing and the suggestions! I addressed them all. Does it look good to you now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27955#issuecomment-3480307003 From chagedorn at openjdk.org Mon Nov 3 12:43:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Nov 2025 12:43:37 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v4] In-Reply-To: References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: On Mon, 3 Nov 2025 12:31:43 GMT, Emanuel Peter wrote: >> Analysis: >> `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. >> >> It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. >> >> What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. >> >> I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. >> >> Future Work: >> - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. >> - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Refactor for Christian Even better to further extract `unique_multiple_edges_out_or_null()` - looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27955#pullrequestreview-3410858827 From roland at openjdk.org Mon Nov 3 13:06:53 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 3 Nov 2025 13:06:53 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v2] In-Reply-To: References: Message-ID: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - more - more ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28088/files - new: https://git.openjdk.org/jdk/pull/28088/files/f48281f5..4757b9c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=00-01 Stats: 9 lines in 4 files changed: 3 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28088/head:pull/28088 PR: https://git.openjdk.org/jdk/pull/28088 From roland at openjdk.org Mon Nov 3 13:11:04 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 3 Nov 2025 13:11:04 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v3] In-Reply-To: References: Message-ID: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: more ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28088/files - new: https://git.openjdk.org/jdk/pull/28088/files/4757b9c2..1a646503 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=01-02 Stats: 4 lines in 2 files changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28088/head:pull/28088 PR: https://git.openjdk.org/jdk/pull/28088 From roland at openjdk.org Mon Nov 3 13:15:57 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 3 Nov 2025 13:15:57 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v3] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 21:27:22 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> more > > src/hotspot/share/opto/node.cpp line 567: > >> 565: n->as_Call()->set_generator(cloned_cg); >> 566: if (cloned_cg->is_mh_late_inline()) { >> 567: C->inc_number_of_mh_late_inlines(); > > Do you need to decrement the counter when a CallNode with `generator()->is_mh_late_inline()` goes dead? I think that would make sense. But the only use of that counter (excluding asserts) seems to be: https://github.com/openjdk/jdk/blob/ef464d69399e50aee126a4756fe9a9a19e44d3c4/src/hotspot/share/opto/compile.cpp#L829 Maybe, then, it's simpler to not bother with maintaining an accurate count. See new commits. 8352963 added a new call `inc_number_of_mh_late_inlines()` that I remove here because I don't think it's needed. I had a look at the PR for that one and I don't see it discussed. @dafedafe do you remember why you added it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2486445565 From roland at openjdk.org Mon Nov 3 13:15:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 3 Nov 2025 13:15:59 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v3] In-Reply-To: References: Message-ID: <_1u3etcPJOGDantVBorvctDOnlz0leYjzY0oVWs_DGM=.5d668961-dcc6-4781-88c1-21e18bdcb106@github.com> On Sat, 1 Nov 2025 04:42:54 GMT, SendaoYan wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> more > > test/hotspot/jtreg/compiler/inlining/TestLateMHClonedCallNode.java line 28: > >> 26: * @bug 8370939 >> 27: * @summary C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() >> 28: * @run main/othervm -XX:-BackgroundCompilation -XX:CompileOnly=TestLateMHClonedCallNode::test1 -XX:CompileOnly=TestLateMHClonedCallNode::test2 TestLateMHClonedCallNode > > Maybe we can split this as two lines Done in new commits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2486447258 From hgreule at openjdk.org Mon Nov 3 13:43:52 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 3 Nov 2025 13:43:52 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 16:10:22 GMT, Emanuel Peter wrote: > Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. > > Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 > > Main goal: > - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). > - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. > > **Why cost-model?** > > Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. > > But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. > > Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. > > **Implementation** > > Items: > - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. > - `VLoopAnalyzer::cost`: scalar loop cost > - `VTransformGraph::cost`: vector loop cost > - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. > - Adapted existing tests. > - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. > > **Testing** > Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. > > ------------------------------ > > **Some History** > > I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). > > During JDK9, reductions were first vectorized, but then restricted for... Nice work :) src/hotspot/share/opto/vectorization.cpp line 544: > 542: } > 543: > 544: // Cost-model heuristic for nodes that do not contribute to computatinal Suggestion: // Cost-model heuristic for nodes that do not contribute to computational src/hotspot/share/opto/vectorization.cpp line 634: > 632: // Each reduction is composed of multiple instructions, each estimated with a unit cost. > 633: // Linear: shuffle and reduce Recursive: shuffle and reduce > 634: float c = requires_strict_order ? 2 * vlen : 2 * exact_log2(vlen); "unit cost" sounds a bit too simple given that there is some kind of estimation going on already. Maybe it would make sense to add some discussion how strict order affects the shape of the resulting vectorized code? I assume cases where the reduction can be moved after the loop are covered somewhere else? ------------- PR Review: https://git.openjdk.org/jdk/pull/27803#pullrequestreview-3411055615 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2486505401 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2486504265 From epeter at openjdk.org Mon Nov 3 13:58:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 13:58:55 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v2] In-Reply-To: References: Message-ID: > Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. > > Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 > > Main goal: > - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). > - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. > > **Why cost-model?** > > Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. > > But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. > > Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. > > **Implementation** > > Items: > - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. > - `VLoopAnalyzer::cost`: scalar loop cost > - `VTransformGraph::cost`: vector loop cost > - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. > - Adapted existing tests. > - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. > > **Testing** > Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. > > ------------------------------ > > **Some History** > > I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). > > During JDK9, reductions were first vectorized, but then restricted for... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/vectorization.cpp Co-authored-by: Hannes Greule ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27803/files - new: https://git.openjdk.org/jdk/pull/27803/files/3f7ef58e..22dab5a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27803&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27803&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27803.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27803/head:pull/27803 PR: https://git.openjdk.org/jdk/pull/27803 From epeter at openjdk.org Mon Nov 3 14:12:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 14:12:40 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: Message-ID: > Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. > > Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 > > Main goal: > - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). > - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. > > **Why cost-model?** > > Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. > > But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. > > Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. > > **Implementation** > > Items: > - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. > - `VLoopAnalyzer::cost`: scalar loop cost > - `VTransformGraph::cost`: vector loop cost > - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. > - Adapted existing tests. > - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. > > **Testing** > Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. > > ------------------------------ > > **Some History** > > I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). > > During JDK9, reductions were first vectorized, but then restricted for... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: More comments for SirYwell ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27803/files - new: https://git.openjdk.org/jdk/pull/27803/files/22dab5a4..d79df4fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27803&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27803&range=01-02 Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27803.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27803/head:pull/27803 PR: https://git.openjdk.org/jdk/pull/27803 From epeter at openjdk.org Mon Nov 3 14:12:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Nov 2025 14:12:42 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 13:41:13 GMT, Hannes Greule wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> More comments for SirYwell > > Nice work :) @SirYwell Thanks for the comments and suggestions :) I sent a small update, hope that helps. And I also sent some GitHub comments that may help additionally to understand some of the small changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27803#issuecomment-3480747593 From roland at openjdk.org Mon Nov 3 15:43:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 3 Nov 2025 15:43:50 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v17] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/957be06e..755bb766 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=15-16 Stats: 50 lines in 3 files changed: 3 ins; 39 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From roland at openjdk.org Mon Nov 3 15:43:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 3 Nov 2025 15:43:50 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v16] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Tue, 28 Oct 2025 17:10:14 GMT, Emanuel Peter wrote: > I had a few minutes to look over the `apply_..` solutions. I left a few comments, and hope that we can make the code just a little slicker still ;) @eme64 I pushed an update based on your comments. Can you have another look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3481179679 From dfenacci at openjdk.org Mon Nov 3 15:47:25 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 3 Nov 2025 15:47:25 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 06:47:45 GMT, Christian Hagedorn wrote: >> ## Issue >> Today, the only practical ways to run IR Framework scenarios in parallel seems to be: >> * spawning threads manually in a single test, or >> * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). >> >> This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. >> >> ## Change >> This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: >> * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) >> * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). >> * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. >> >> ## Testing >> * Tier 1-3+ >> * explicit `ir_framework.tests` runs >> * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) >> >> As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 762: > >> 760: outcome = new Outcome(scenario, null, null); >> 761: } catch (TestFormatException e) { >> 762: outcome = new Outcome(scenario, e, null); > > Why do you collect the format exceptions here and only throw them later? Is a fail-fast not possible? Actually it is (maybe a bit more tricky but possible). Changing this... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2486955518 From duke at openjdk.org Mon Nov 3 16:11:28 2025 From: duke at openjdk.org (Ruben) Date: Mon, 3 Nov 2025 16:11:28 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 16 Jul 2025 14:41:58 GMT, Andrew Haley wrote: >> AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: >> >> ;; cmpxchg { >> 0x0000e708d144cf60: mov x8, x2 >> 0x0000e708d144cf64: casal x8, x3, [x0] >> 0x0000e708d144cf68: cmp x8, x2 >> ;; 0x1F1F1F1F1F1F1F1F >> 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f >> ;; } cmpxchg >> 0x0000e708d144cf70: cset x8, ne // ne = any >> 0x0000e708d144cf74: dmb ish >> >> >> According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] >> >>> Atomically sets the value of a variable to the >>> newValue with the memory semantics of setVolatile if >>> the variable's current value, referred to as the witness >>> value, == the expectedValue, as accessed with the memory >>> semantics of getVolatile. >> >> >> >> Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. >> >> Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) >> >> This is also reflected by C2 not having a dmb for the same respective method. >> >> [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) >> [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) > > I think we still need a DMB after non-LSE CMPXCHG, which gets failures without this DMB: > > > AArch64 MP > > { > 0:X0=x; 0:X2=y; > 1:X0=y; 1:X4=x; > } > P0 | P1 ; > LDAR W1,[X0] | MOV W2,#1 ; > | L0: ; > LDR W3,[X2] | LDAXR W1,[X0] ; > | STLXR W8,W2,[X0] ; > | CBNZ W8,L0; > | DMB ISH; > | MOV W3,#1 ; > | STR W3,[X4] ; > exists (0:X1=1 /\ 0:X3=0 /\ 1:X1=0) Hi @theRealAph, I've pushed changes for this PR to a new branch https://github.com/openjdk/jdk/compare/master...ruben-arm:jdk:pr-8360654 as Samuel is currently not available. Once he is back, he can update this PR's branch. In the meanwhile, I'm planning to run more of the `jcstress` testing. I'd appreciate your feedback on the version in the new branch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3481315382 From bmaillard at openjdk.org Mon Nov 3 17:17:33 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 3 Nov 2025 17:17:33 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node [v3] In-Reply-To: References: Message-ID: > This PR prevents hitting an assert caused by encountering `top` while following the memory > slice associated with a field when eliminating allocations in macro node elimination. This situation > is the result of another elimination (boxing node elimination) that happened at the same > macro expansion iteration. > > ### Analysis > > The issue appears in the macro expansion phase. We have a nested `synchronized` block, > with the outer block synchronizing on `new A()` and the inner one on `TestTopInMacroElimination.class`. > In the inner `synchronized` block we have an `Integer.valueOf` call in a loop. > > In `PhaseMacroExpand::eliminate_boxing_node` we are getting rid of the `Integer.valueOf` > call, as it is a non-escaping boxing node. After having eliminated the call, > `PhaseMacroExpand::process_users_of_allocation` takes care of the users of the removed node. > There, we replace usages of the fallthrough memory projection with `top`. > > In the same macro expansion iteration, we later attempt to get rid of the `new A()` allocation > in `PhaseMacroExpand::create_scalarized_object_description`. There, we have to make > sure that all safepoints can still see the object fields as if the allocation was never deleted. > For this, we attempt to find the last value on the slice of each specific field (`a` > in this case). Because field `a` is never written to, and it is not explicitely initialized, > there is no `Store` associated to it and not even a dedicated memory slice (we end up > taking the `Bot` input on `MergeMem` nodes). By going up the memory chain, we eventually > encounter the `MemBarReleaseLock` whose input was set to `top`. This is where the assert > is hit. > > ### Proposed Fix > > In the end I opted for an analog fix as the similar [JDK-8325030](https://git.openjdk.org/jdk/pull/23104). > If we get `top` from `scan_mem_chain` in `PhaseMacroExpand::value_from_mem`, then we can safely > return `top` as well. This means that the safepoint will have `top` as data input, but this will > eventually cleaned up by the next round of IGVN. > > Another (tempting) option would have been to simply return `nullptr` from `PhaseMacroExpand::value_from_mem` when encoutering `top`. However this would result in bailing > out from eliminating this allocation temporarily and effectively delaying it to a subsqequent > macro expansion round. > > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8362832) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Make comment more specific ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27903/files - new: https://git.openjdk.org/jdk/pull/27903/files/0955e23d..0cd7417d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27903&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27903&range=01-02 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27903/head:pull/27903 PR: https://git.openjdk.org/jdk/pull/27903 From bmaillard at openjdk.org Mon Nov 3 17:17:36 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 3 Nov 2025 17:17:36 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node [v2] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 12:27:21 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Daniel Lund?n > > src/hotspot/share/opto/macro.cpp line 506: > >> 504: } else if (mem->is_top()) { >> 505: // The slice is on a dead path. Returning top prevents bailing out >> 506: // from the elimination, and IGVN can later clean up. > > You could make it more specific, and say what you say in your PR description: > `return nullptr` would lead to elimination bailout, but we want to prevent that. Just forwarding the `top` is also legal, and `IGVN` can just clean things up, and remove whatever receives top. > > Does this mean that there could be paths that don't get `top`, and so for those paths it is nice that we are able to remove the allocation, right? Done, thanks for the suggestion! > Does this mean that there could be paths that don't get top, and so for those paths it is nice that we are able to remove the allocation, right? Yes, exactly. We don't want a dead path to delay the removal of the allocation that would also have an effect on paths that are not dead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27903#discussion_r2487260068 From vlivanov at openjdk.org Mon Nov 3 18:38:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 18:38:13 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: References: Message-ID: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> > Expand the optimization for interface calls introduced by [JDK-6986483](https://bugs.openjdk.org/browse/JDK-6986483) to calls through `MethodHandle.linkToInterface`. > > The implementation is straightforward except the fact that symbolic information is lost during `MemberName` resolution. The fix uses declaring class instead, but it's more conservative than what is done for invokeinterface case. > > Testing: hs-tier1 - hs-tier5 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: naming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28094/files - new: https://git.openjdk.org/jdk/pull/28094/files/0bf3f2b6..12561a6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28094&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28094&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28094/head:pull/28094 PR: https://git.openjdk.org/jdk/pull/28094 From vlivanov at openjdk.org Mon Nov 3 18:38:16 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 18:38:16 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: References: Message-ID: <8rnvPrYpvmjycfDubWKkFzGoe9yCUMY7Fs5eOsY42fA=.807e4684-989a-4014-b48c-057dd8362086@github.com> On Fri, 31 Oct 2025 22:33:30 GMT, Chen Liang wrote: > I wonder if the test verifies the declared_interface for the new monomorphic target There are test cases which depend on `contextClass()` which is equivalent to `declared_interface` in corresponding VM code. > src/hotspot/share/opto/doCall.cpp line 345: > >> 343: if (orig_callee->intrinsic_id() == vmIntrinsics::_linkToInterface) { >> 344: // MemberName doesn't keep symbolic information once resolution is over, but >> 345: // resolved method holder can be used as a conservative approximation. > > Is "symbolic information" the referenced interface and the "resolved method holder" the declaring interface? I think including "referenced" vs "declared" would be more clear. Indeed, the terminology is inconsistent (even across hotspot code base). Runtime code uses "resolved class" (REFC) and "declaring class" (DECC) while, as you can see here, "declared class" means REFC in compiler code. I rewrote the comment trying to make it clearer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28094#issuecomment-3481956676 PR Review Comment: https://git.openjdk.org/jdk/pull/28094#discussion_r2487478970 From kxu at openjdk.org Mon Nov 3 19:12:39 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 3 Nov 2025 19:12:39 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v19] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: follow-up review 3383037106 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/ead1ab34..7395eb99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=17-18 Stats: 72 lines in 3 files changed: 8 ins; 16 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From kxu at openjdk.org Mon Nov 3 19:25:10 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 3 Nov 2025 19:25:10 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v20] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 38 commits: - fix bad merge - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp - follow-up review 3383037106 - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Christian Hagedorn - mark LoopExitTest::is_valid_with_bt() const - fix iv increment basic type and truncated increment check - add safepoint opcode condition - follow-up review 3321712957 - 8354383: C2: enable sinking of Type nodes out of loop Reviewed-by: chagedorn, thartmann (cherry picked from commit a2f99fd88bd03337e1ba73b413ffe4e39f3584cf) - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp - ... and 28 more: https://git.openjdk.org/jdk/compare/9f972008...de71e7c8 ------------- Changes: https://git.openjdk.org/jdk/pull/24458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=19 Stats: 1197 lines in 3 files changed: 606 ins; 283 del; 308 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From dlunden at openjdk.org Mon Nov 3 20:00:27 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 3 Nov 2025 20:00:27 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node [v3] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 17:17:33 GMT, Beno?t Maillard wrote: >> This PR prevents hitting an assert caused by encountering `top` while following the memory >> slice associated with a field when eliminating allocations in macro node elimination. This situation >> is the result of another elimination (boxing node elimination) that happened at the same >> macro expansion iteration. >> >> ### Analysis >> >> The issue appears in the macro expansion phase. We have a nested `synchronized` block, >> with the outer block synchronizing on `new A()` and the inner one on `TestTopInMacroElimination.class`. >> In the inner `synchronized` block we have an `Integer.valueOf` call in a loop. >> >> In `PhaseMacroExpand::eliminate_boxing_node` we are getting rid of the `Integer.valueOf` >> call, as it is a non-escaping boxing node. After having eliminated the call, >> `PhaseMacroExpand::process_users_of_allocation` takes care of the users of the removed node. >> There, we replace usages of the fallthrough memory projection with `top`. >> >> In the same macro expansion iteration, we later attempt to get rid of the `new A()` allocation >> in `PhaseMacroExpand::create_scalarized_object_description`. There, we have to make >> sure that all safepoints can still see the object fields as if the allocation was never deleted. >> For this, we attempt to find the last value on the slice of each specific field (`a` >> in this case). Because field `a` is never written to, and it is not explicitely initialized, >> there is no `Store` associated to it and not even a dedicated memory slice (we end up >> taking the `Bot` input on `MergeMem` nodes). By going up the memory chain, we eventually >> encounter the `MemBarReleaseLock` whose input was set to `top`. This is where the assert >> is hit. >> >> ### Proposed Fix >> >> In the end I opted for an analog fix as the similar [JDK-8325030](https://git.openjdk.org/jdk/pull/23104). >> If we get `top` from `scan_mem_chain` in `PhaseMacroExpand::value_from_mem`, then we can safely >> return `top` as well. This means that the safepoint will have `top` as data input, but this will >> eventually cleaned up by the next round of IGVN. >> >> Another (tempting) option would have been to simply return `nullptr` from `PhaseMacroExpand::value_from_mem` when encoutering `top`. However this would result in bailing >> out from eliminating this allocation temporarily and effectively delaying it to a subsqequent >> macro expansion round. >> >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8362832)... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Make comment more specific Marked as reviewed by dlunden (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27903#pullrequestreview-3412686540 From vlivanov at openjdk.org Mon Nov 3 21:07:59 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 21:07:59 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 15:08:09 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix merge > > src/hotspot/share/opto/compile.cpp line 4029: > >> 4027: Node* in = n->in(j); >> 4028: if (in->is_DecodeNarrowPtr() && (is_uncommon || !in->has_non_debug_uses())) { >> 4029: n->set_req(j, in->in(1)); > > Can you say why you changed this code here? Is it equivalent? It's a refactoring and it is equivalent except reachability fence handling in `Node::has_non_debug_uses()`. > Could you assert dead->is_dead() here? Isn't it too strong here? `Node::is_dead()` requires all inputs to be nulled, but `PhaseIterGVN::remove_globally_dead_node()` zaps node input array with nulls. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2487849689 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2487851194 From vlivanov at openjdk.org Mon Nov 3 21:14:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 21:14:51 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 15:10:14 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix merge > > src/hotspot/share/opto/loopTransform.cpp line 76: > >> 74: return head()->as_OuterStripMinedLoop()->outer_loop_exit(); >> 75: } else { >> 76: // For now, conservatively report multiple loop exits exist. > > Can this happen? Do you have an example? The outer check is `is_loop()`, but special cases are for counted loops and strip-mined outer loops. So, any non-counted loop handling should land here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2487869750 From vlivanov at openjdk.org Mon Nov 3 21:26:43 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 21:26:43 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 15:40:51 GMT, Emanuel Peter wrote: >> Can you write a comment what `ctrl`? Is it the `referent_ctrl`? > > Ah no, in all cases I could see it was actually the `rf` itself, right? Why not give it a more specific name? > But it seems you are using it from different places. Can you find a better name? Both IGVN (`ReachabilityFenceNode::Identity()`) and `PhaseIdealLoop` perform redundant RF elimination, so `is_redundant_rf_helper` is there so they can share the same implementation. I could name it `is_redundant_rf`, but it doesn't look like an improvement to me. It's tailored specifically for those 2 particular use cases and it is not intended to be used outside `reachability.cpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2487894458 From vlivanov at openjdk.org Mon Nov 3 22:09:18 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 22:09:18 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: <59uQ1nJm-iOr7-kNcauhuBC7wEQLs08e4Gi81SwJ9TU=.ffd0c0f9-abd4-483a-8967-d07dde2bd0a3@github.com> On Mon, 3 Nov 2025 21:05:35 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/loopnode.hpp line 1150: >> >>> 1148: >>> 1149: void remove_dead_node(Node* dead) { >>> 1150: assert(dead->outcnt() == 0 && !dead->is_top(), "node must be dead"); >> >> Could you assert `dead->is_dead()` here? >> We should probably also not call this on a `CFG` node, otherwise we might destroy the "ctrl forwarding", see: https://git.openjdk.org/jdk/pull/27892 >> >> I'm only putting so much scrutiny here, because you are adding a new public method to `PhaseIdealLoop`, and that would require that it is clear how to use it, and not to use it. > >> Could you assert dead->is_dead() here? > > Isn't it too strong here? `Node::is_dead()` requires all inputs to be nulled, but `PhaseIterGVN::remove_globally_dead_node()` zaps node input array with nulls. > We should probably also not call this on a CFG node, otherwise we might destroy the "ctrl forwarding" Interesting. RF is a CFG node. How does broken ctrl forwarding manifest? Does `VerifyLoopOptimizations` catch it? I didn't see any failures with `-XX:+VerifyLoopOptimizations`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2487980936 From vlivanov at openjdk.org Mon Nov 3 22:09:21 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 22:09:21 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 15:54:05 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix merge > > src/hotspot/share/opto/reachability.cpp line 180: > >> 178: return false; // uncommon traps are exit points >> 179: } >> 180: return true; > > Looks like we have established "significance" by principle of exclusion. That feels a little brittle, what if there is yet another category we would have to exclude? Would that lead to correctness issues, or only be inefficient? > > Also: "significant" is a bit of a vague term. Significant for what? "reachability tracking purposes", of course, we are in `reachability.hpp` ;) > But can you be more specific? `is_significant_sfpt()` encodes a white list consisting of cases which can be safely ignored when it comes to reachability tracking. An overlooked case is a missed optimization opportunity. > But can you be more specific? Are you suggesting to expand the comment or change the name? Speaking of the name, it's a local definition inside `reachabiltiy.cpp`: // Detect safepoint nodes which are important for reachability tracking purposes. static bool is_significant_sfpt(Node* n) { `Significant` term is declarative. Alternatively, an imperative term (like, "ignore" and `ignore_sfpt`) can be used. But I find it the current version clearer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2487976292 From vlivanov at openjdk.org Mon Nov 3 22:34:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 22:34:52 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 15:15:05 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix merge > > src/hotspot/share/opto/loopnode.cpp line 5119: > >> 5117: if (stop_early) { >> 5118: assert(do_expensive_nodes || do_optimize_reachability_fences, "why are we here?"); >> 5119: if (do_optimize_reachability_fences && optimize_reachability_fences()) { > > Can you explain why you call `optimize_reachability_fences` here and also below? The intention is to optimize RF nodes irrespective of whether loop optimizations are performed or not. (It mimics similar logic for expensive nodes.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2488049270 From vlivanov at openjdk.org Mon Nov 3 22:49:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 22:49:48 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v20] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/ed324159..0f5d3747 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=18-19 Stats: 30 lines in 4 files changed: 8 ins; 3 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Mon Nov 3 22:49:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 22:49:49 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 15:04:56 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/escape.cpp line 1230: >> >>> 1228: SafePointNode* sfpt = safepoints.at(spi)->as_SafePoint(); >>> 1229: >>> 1230: sfpt->remove_non_debug_edges(non_debug_edges_worklist); >> >> This looks a bit "hacky". Can you add some code comments why we need to do it this way? > > Same for the other occurances ;) Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2488076378 From vlivanov at openjdk.org Mon Nov 3 22:49:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 3 Nov 2025 22:49:51 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 15:43:06 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix merge > > src/hotspot/share/opto/reachability.cpp line 118: > >> 116: } >> 117: } else { >> 118: assert(rf_only, ""); > > Does `phase == nullptr` imply `rf_only`? If so, you should add an assert at the top of the method. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2488076845 From liach at openjdk.org Tue Nov 4 02:58:02 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 4 Nov 2025 02:58:02 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> References: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> Message-ID: <9mtsDCMffzknlk6Pflz3W_Whd1lvntBepTOaT8ckc9I=.25eb56fe-cfbc-4e50-8162-c56225e1b2e8@github.com> On Mon, 3 Nov 2025 18:38:13 GMT, Vladimir Ivanov wrote: >> Expand the optimization for interface calls introduced by [JDK-6986483](https://bugs.openjdk.org/browse/JDK-6986483) to calls through `MethodHandle.linkToInterface`. >> >> The implementation is straightforward except the fact that symbolic information is lost during `MemberName` resolution. The fix uses declaring class instead, but it's more conservative than what is done for invokeinterface case. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > naming This version looks good to me. Note I am not a professional c2 engineer so I won't leave an explicit approval. ------------- PR Review: https://git.openjdk.org/jdk/pull/28094#pullrequestreview-3413773864 From fjiang at openjdk.org Tue Nov 4 04:14:04 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 4 Nov 2025 04:14:04 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v5] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Wed, 29 Oct 2025 09:48:16 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Sanity tested and no issues with MAJIK t1 (with +VSC). >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into vsc > - Forgot fix format for VSAC > - Fixed format > - Label name > - li->mv, format, space > - Draft src/hotspot/cpu/riscv/riscv.ad line 1187: > 1185: } > 1186: > 1187: constexpr uint64_t MAJIK_DWORD = 0xabbaabbaabbaabbaull; Looking at `PhaseChaitin::dump_frame`, I found the hard-coded MAJIK word for x86. Should we add RISCV MAJIK DWORD here? https://github.com/openjdk/jdk/blob/576f9694b092f2a11a6a4e5a82c2b0e12203bd9c/src/hotspot/share/opto/chaitin.cpp#L2409-L2411 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2488588626 From duke at openjdk.org Tue Nov 4 05:41:22 2025 From: duke at openjdk.org (erifan) Date: Tue, 4 Nov 2025 05:41:22 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v3] In-Reply-To: References: Message-ID: > According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 environment using qemu. > > This PR fixes these test failures. erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8369456-select-from-two-vectors-failure - Merge branch 'master' into JDK-8369456-select-from-two-vectors-failure - 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. This test problem was discovered by simulating a 512-bit sve2 environment using qemu. This PR fixes these test failures. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27723/files - new: https://git.openjdk.org/jdk/pull/27723/files/b1025a01..146ce68f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27723&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27723&range=01-02 Stats: 52100 lines in 518 files changed: 27750 ins; 20629 del; 3721 mod Patch: https://git.openjdk.org/jdk/pull/27723.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27723/head:pull/27723 PR: https://git.openjdk.org/jdk/pull/27723 From epeter at openjdk.org Tue Nov 4 07:08:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 07:08:50 GMT Subject: RFR: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body [v2] In-Reply-To: References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: On Fri, 31 Oct 2025 15:37:27 GMT, Roland Westrelin wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> allow unique out with multiple uses > > Looks reasonable to me. @rwestrel Thanks for reviewing and catching some additional issues! @chhagedorn Thanks for reviewing and pushing me to better code! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27955#issuecomment-3484126830 From epeter at openjdk.org Tue Nov 4 07:12:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 07:12:37 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v4] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 07:05:42 GMT, erifan wrote: >> According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. >> >> This test problem was discovered by simulating a 512-bit sve2 environment using qemu. >> >> This PR fixes these test failures. > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Add IR test rules for unsupported partial cases on aarch64 Looks better now :) I'm running some internal testing before approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27723#issuecomment-3484190865 From duke at openjdk.org Tue Nov 4 07:12:40 2025 From: duke at openjdk.org (erifan) Date: Tue, 4 Nov 2025 07:12:40 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 02:15:53 GMT, erifan wrote: >> I'm afraid that there is not a machine which really runs with `MaxVectorSize > 64` both on X86 and AArch64. Can we just check the `MaxVectorSize = 64` case? > > Yes, we currently do not have any >128 bits SVE2 machines. According to https://github.com/openjdk/jdk/blob/4f9f086847f531ab1791727d74955cfd8ec56811/src/hotspot/cpu/aarch64/aarch64_vector.ad#L273 these cases are currently unsupported. This is why the test failed. I added a rule for cases where the `MaxVectorSize` > `16/32/64`, so that these cases will also be tested once they are supported. @eme64 Would you mind take another look, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27723#discussion_r2488806937 From duke at openjdk.org Tue Nov 4 07:12:36 2025 From: duke at openjdk.org (erifan) Date: Tue, 4 Nov 2025 07:12:36 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v4] In-Reply-To: References: Message-ID: > According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 environment using qemu. > > This PR fixes these test failures. erifan has updated the pull request incrementally with one additional commit since the last revision: Add IR test rules for unsupported partial cases on aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27723/files - new: https://git.openjdk.org/jdk/pull/27723/files/146ce68f..3147164f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27723&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27723&range=02-03 Stats: 54 lines in 1 file changed: 54 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27723.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27723/head:pull/27723 PR: https://git.openjdk.org/jdk/pull/27723 From epeter at openjdk.org Tue Nov 4 07:12:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 07:12:42 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 05:50:27 GMT, erifan wrote: >> Yes, we currently do not have any >128 bits SVE2 machines. According to https://github.com/openjdk/jdk/blob/4f9f086847f531ab1791727d74955cfd8ec56811/src/hotspot/cpu/aarch64/aarch64_vector.ad#L273 these cases are currently unsupported. This is why the test failed. > > I added a rule for cases where the `MaxVectorSize` > `16/32/64`, so that these cases will also be tested once they are supported. > @eme64 Would you mind take another look, thanks! @erifan That looks like a good set of updates to me :) General question: can `sve2` not use the vectors from `neon` or `sve`, which do support those vector sizes? But that would be a separate issue anyway, we are for now just trying to fix up the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27723#discussion_r2488941517 From epeter at openjdk.org Tue Nov 4 07:38:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 07:38:44 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v17] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Mon, 3 Nov 2025 15:43:50 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review @rwestrel Thanks a lot for the updates! It now looks much better to me. I'll run internal testing again before approval :) src/hotspot/share/opto/memnode.hpp line 1418: > 1416: } > 1417: return res->as_NarrowMemProj(); > 1418: } Nit, optional: Could we not remove some fluff here with a `isa_NarrowMemProj`? You would not have to check for `res == nullptr`. src/hotspot/share/opto/memnode.hpp line 1422: > 1420: public: > 1421: > 1422: template void for_each_narrow_mem_proj_with_new_uses(Callback callback) const { Can you add a code comment what the "with new uses" part means? Probably that if we add more uses during iteration, we will eventually also visit those? ------------- PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-3414372906 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2488962166 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2488966664 From epeter at openjdk.org Tue Nov 4 07:38:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 07:38:48 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v15] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Fri, 24 Oct 2025 13:18:46 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - Roberto's patches > > src/hotspot/share/opto/multnode.hpp line 215: > >> 213: } >> 214: public: >> 215: NarrowMemProjNode(Node* src, const TypePtr* adr_type) > > Can you feed it any other `src` than a `InitializeNode*`? > Suggestion: > > NarrowMemProjNode(InitializeNode* src, const TypePtr* adr_type) @rwestrel Do you not like this suggestion? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2488969574 From epeter at openjdk.org Tue Nov 4 07:42:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 07:42:49 GMT Subject: Integrated: 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body In-Reply-To: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> References: <8li_zfadVOIp8CU483eRah-t-2QjCyH3UfCkZhGHgrE=.038bda9a-1efd-4c98-a09f-3b47782817d2@github.com> Message-ID: On Thu, 23 Oct 2025 14:23:38 GMT, Emanuel Peter wrote: > Analysis: > `split_thru_phi` can split a node out of the loop, through some loop phi. As a consequence, that node and the phi we split through can become dead. But `split_thru_phi` did not have any logic to yank the dead node and phi from the `_body`. If this happens in the same loop-opts-phase as a later SuperWord, and that SuperWord pass somehow accesses that loop `_body`, then we may find dead nodes, which is not expected. > > It is not ok that `split_thru_phi` leaves dead nodes in the `_body`, so they have to be yanked. > > What I did additionally: I went through all uses of `split_thru_phi`, and moved the `replace_node` from the call-site to the method itself. Removing the node and yanking from `_body` conceptually belongs together, so they should be together in code. > > I suspect that `split_thru_phi` was broken for a long time already. But JDK26 changes in SuperWord started to check inputs of all nodes in `_body`, and that fails with dead nodes. > > Future Work: > - Continue work on making `VerifyLoopOptimizations` work again, we should assert that there are no dead nodes in the `_body`. We may do that with the following task, or a subsequent one. > - [JDK-8370332](https://bugs.openjdk.org/browse/JDK-8370332) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop This pull request has now been integrated. Changeset: 75e37b06 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/75e37b06c3e37ee49719a0c0d6b4ab2c4ff76098 Stats: 144 lines in 5 files changed: 129 ins; 10 del; 5 mod 8370332: C2 SuperWord: SIGSEGV because PhaseIdealLoop::split_thru_phi left dead nodes in loop _body Reviewed-by: chagedorn, roland ------------- PR: https://git.openjdk.org/jdk/pull/27955 From rehn at openjdk.org Tue Nov 4 08:01:36 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 4 Nov 2025 08:01:36 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v5] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Tue, 4 Nov 2025 04:11:48 GMT, Feilong Jiang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into vsc >> - Forgot fix format for VSAC >> - Fixed format >> - Label name >> - li->mv, format, space >> - Draft > > src/hotspot/cpu/riscv/riscv.ad line 1187: > >> 1185: } >> 1186: >> 1187: constexpr uint64_t MAJIK_DWORD = 0xabbaabbaabbaabbaull; > > Looking at `PhaseChaitin::dump_frame`, I found the hard-coded MAJIK word for x86. Should we add RISCV MAJIK DWORD here? > https://github.com/openjdk/jdk/blob/576f9694b092f2a11a6a4e5a82c2b0e12203bd9c/src/hotspot/share/opto/chaitin.cpp#L2409-L2411 Oh, yes. The reason I did not want to use "BADB100D" is because of sign extension of B if we would be off by 4 bytes. E.i. I want a zero bit on all 4-byte borders. And ABBA....ABBA is at lets easier for me to find when eye-balling. I have to think what to do here: do we want same majik on all platforms.... etc... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2489075618 From duke at openjdk.org Tue Nov 4 08:07:38 2025 From: duke at openjdk.org (erifan) Date: Tue, 4 Nov 2025 08:07:38 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 07:05:42 GMT, Emanuel Peter wrote: >> I added a rule for cases where the `MaxVectorSize` > `16/32/64`, so that these cases will also be tested once they are supported. >> @eme64 Would you mind take another look, thanks! > > @erifan That looks like a good set of updates to me :) > > General question: can `sve2` not use the vectors from `neon` or `sve`, which do support those vector sizes? But that would be a separate issue anyway, we are for now just trying to fix up the test. Hi @eme64 machines that support SVE2 will necessarily also support SVE1 and NEON. The situation is like this: 1. sve1 and sve2 use different implementations. sve1 supports all cases, but sve2 currently does not support partial cases where `vector_length_in_byte > 16B`. 2. From [aarch64_vector.ad](https://github.com/openjdk/jdk/blob/4f9f086847f531ab1791727d74955cfd8ec56811/src/hotspot/cpu/aarch64/aarch64_vector.ad#L273) these cases are not supported on sve2 because there is no such hardware, not because it's impossible to support them. Currently, all available machines on the market that support SVE2 are 128-bit, so even if we implement support for a larger width, it won't be usable. 3. The sve1 implementation can also run on sve2 machines with `-XX:UseSVE=1` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27723#discussion_r2489103183 From epeter at openjdk.org Tue Nov 4 08:28:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 08:28:03 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 08:05:13 GMT, erifan wrote: >> @erifan That looks like a good set of updates to me :) >> >> General question: can `sve2` not use the vectors from `neon` or `sve`, which do support those vector sizes? But that would be a separate issue anyway, we are for now just trying to fix up the test. > > Hi @eme64 machines that support SVE2 will necessarily also support SVE1 and NEON. The situation is like this: > > 1. sve1 and sve2 use different implementations. sve1 supports all cases, but sve2 currently does not support partial cases where `vector_length_in_byte > 16B`. > 2. From [aarch64_vector.ad](https://github.com/openjdk/jdk/blob/4f9f086847f531ab1791727d74955cfd8ec56811/src/hotspot/cpu/aarch64/aarch64_vector.ad#L273) these cases are not supported on sve2 because there is no such hardware, not because it's impossible to support them. Currently, all available machines on the market that support SVE2 are 128-bit, so even if we implement support for a larger width, it won't be usable. > 3. The sve1 implementation can also run on sve2 machines with `-XX:UseSVE=1` @erifan Ok, thanks for the explanations! I thought it was probably just because the VM implementation is limited at the moment. Once hardware is available, there would probably be a bigger investment in making SVE2 work for larger vector lengths. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27723#discussion_r2489203476 From duke at openjdk.org Tue Nov 4 08:33:20 2025 From: duke at openjdk.org (erifan) Date: Tue, 4 Nov 2025 08:33:20 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v2] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 08:24:56 GMT, Emanuel Peter wrote: >> Hi @eme64 machines that support SVE2 will necessarily also support SVE1 and NEON. The situation is like this: >> >> 1. sve1 and sve2 use different implementations. sve1 supports all cases, but sve2 currently does not support partial cases where `vector_length_in_byte > 16B`. >> 2. From [aarch64_vector.ad](https://github.com/openjdk/jdk/blob/4f9f086847f531ab1791727d74955cfd8ec56811/src/hotspot/cpu/aarch64/aarch64_vector.ad#L273) these cases are not supported on sve2 because there is no such hardware, not because it's impossible to support them. Currently, all available machines on the market that support SVE2 are 128-bit, so even if we implement support for a larger width, it won't be usable. >> 3. The sve1 implementation can also run on sve2 machines with `-XX:UseSVE=1` > > @erifan Ok, thanks for the explanations! I thought it was probably just because the VM implementation is limited at the moment. Once hardware is available, there would probably be a bigger investment in making SVE2 work for larger vector lengths. Yeah, that's usually the case. Anyway, this only concerns performance and doesn't affect correctness, since there's a default Java implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27723#discussion_r2489227810 From epeter at openjdk.org Tue Nov 4 08:33:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 08:33:28 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Mon, 3 Nov 2025 22:32:29 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/loopnode.cpp line 5119: >> >>> 5117: if (stop_early) { >>> 5118: assert(do_expensive_nodes || do_optimize_reachability_fences, "why are we here?"); >>> 5119: if (do_optimize_reachability_fences && optimize_reachability_fences()) { >> >> Can you explain why you call `optimize_reachability_fences` here and also below? > > The intention is to optimize RF nodes irrespective of whether loop optimizations are performed or not. (It mimics similar logic for expensive nodes.) Can you make that explicit with a code comment please? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2489216359 From epeter at openjdk.org Tue Nov 4 08:33:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 08:33:30 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: <59uQ1nJm-iOr7-kNcauhuBC7wEQLs08e4Gi81SwJ9TU=.ffd0c0f9-abd4-483a-8967-d07dde2bd0a3@github.com> References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> <59uQ1nJm-iOr7-kNcauhuBC7wEQLs08e4Gi81SwJ9TU=.ffd0c0f9-abd4-483a-8967-d07dde2bd0a3@github.com> Message-ID: On Mon, 3 Nov 2025 22:06:14 GMT, Vladimir Ivanov wrote: > Does VerifyLoopOptimizations catch it? Probably not, because it is not very strong yet. I will probably soon work on it again, to make sure we have stronger invariants, or at least enforcing our implicit invariants ;) In your case, you probably are using your new `remove_dead_node` in the way it "should" be used. But since this is a public API, someone will probably come along in the future and use it in unintended ways. That's why I'm asking for more asserts that at least block some wrong usages ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2489225737 From roland at openjdk.org Tue Nov 4 08:35:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 4 Nov 2025 08:35:52 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v17] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Tue, 4 Nov 2025 07:17:02 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/memnode.hpp line 1418: > >> 1416: } >> 1417: return res->as_NarrowMemProj(); >> 1418: } > > Nit, optional: Could we not remove some fluff here with a `isa_NarrowMemProj`? You would not have to check for `res == nullptr`. I don't think we can. Wouldn't we have to call a method on a possibly `null` res? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2489235039 From epeter at openjdk.org Tue Nov 4 08:39:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 08:39:02 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Mon, 3 Nov 2025 22:03:42 GMT, Vladimir Ivanov wrote: > is_significant_sfpt() encodes a white list consisting of cases which can be safely ignored when it comes to reachability tracking. An overlooked case is a missed optimization opportunity. Sounds good. Can you add a code comment for that, please? Ok, I'm fine with keeping the name. But it might make sense to link to where the "significance" term is defined. Because otherwise it is a concept without any clear definition, and hard for the reader to understand. You have to infer the definition from the usage, and that often leads to unclear definitions that shift over time, and eventually the concept even is incoherent. A clear definition can also help if we have a bug: we can clearly see what we missed and what we might have to do to fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2489249463 From bmaillard at openjdk.org Tue Nov 4 08:44:17 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 4 Nov 2025 08:44:17 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v4] In-Reply-To: <1UNdzkgCUH6tju9WzaTQaBdeT8Xv9T4TWnk2Jg3SMoA=.6ee10e45-8c73-444a-a9da-ca0c03bdaf79@github.com> References: <1UNdzkgCUH6tju9WzaTQaBdeT8Xv9T4TWnk2Jg3SMoA=.6ee10e45-8c73-444a-a9da-ca0c03bdaf79@github.com> Message-ID: <5S3qdxwC7jHLdzDGe74ls5zIgb1K1S5xlgp-jxFkKSI=.2f0b73d7-92e2-45f5-afa2-18ba3dacf932@github.com> On Thu, 30 Oct 2025 12:23:18 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Add -XX:+StressIGVN to run without fixed seed > > src/hotspot/share/opto/phaseX.cpp line 2568: > >> 2566: // ConvI2F->ConvF2I->ConvI2F >> 2567: // Note: there may be other 3-nodes conversion chains that would require to be added here, but these >> 2568: // are the only ones that are known to trigger missed optimizations otherwise > > You may want to update the description, and give a bit of extra information. Because you are saying `n` does not have to be a conversion, but it may be that `n` is about to be replaced with a conversion, right? Yes, in some cases `add_users_of_use_to_worklist` is called with the node about to be replaced as argument `n`. The point is that we might replace `n` with a node that already has other uses, and we only want to notify the uses for which there is a potential change. But this is in no way specific to this one optimization, so I think adding something here would cause more confusion than anything else. Perhaps we should update the description of `add_users_of_use_to_worklist` then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2489271376 From epeter at openjdk.org Tue Nov 4 08:45:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 08:45:16 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Mon, 3 Nov 2025 22:44:46 GMT, Vladimir Ivanov wrote: >> Same for the other occurances ;) > > Done. @iwanowww Thanks for the comments! I'm still not 100% happy with it. It really feels like we are introducing some tech-debt here. What should the next person do who also needs to attach something else to the SafePoint? It is also easy to miss places where we have to special case the extra edges. I don't have a solution here, I'm just not extremely satisfied. Is there a better long-term solution? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2489276839 From roland at openjdk.org Tue Nov 4 08:47:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 4 Nov 2025 08:47:47 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v18] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/755bb766..7da2da1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=16-17 Stats: 10 lines in 3 files changed: 6 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From epeter at openjdk.org Tue Nov 4 08:47:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 08:47:49 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v17] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: <7_ThGEBEjmk6FUxRgLbkuKe3HHDcG20VKUBfQtqPMZI=.5876df6c-e036-4525-bd2c-0257086c12e4@github.com> On Tue, 4 Nov 2025 08:32:25 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/memnode.hpp line 1418: >> >>> 1416: } >>> 1417: return res->as_NarrowMemProj(); >>> 1418: } >> >> Nit, optional: Could we not remove some fluff here with a `isa_NarrowMemProj`? You would not have to check for `res == nullptr`. > > I don't think we can. Wouldn't we have to call a method on a possibly `null` res? You are right. Ignore. My post-jogging brain saw things that were not there ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2489261245 From roland at openjdk.org Tue Nov 4 08:47:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 4 Nov 2025 08:47:52 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v15] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: <2LRgq5mLyp30XxD4td3sgnXN30VNkh06xRofZBV68i8=.1435d91d-cbaa-4a8e-ad77-ae4ff0506037@github.com> On Tue, 4 Nov 2025 07:20:49 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/multnode.hpp line 215: >> >>> 213: } >>> 214: public: >>> 215: NarrowMemProjNode(Node* src, const TypePtr* adr_type) >> >> Can you feed it any other `src` than a `InitializeNode*`? >> Suggestion: >> >> NarrowMemProjNode(InitializeNode* src, const TypePtr* adr_type) > > @rwestrel Do you not like this suggestion? I thought I took care of that one but obviously not. Done now. It requires moving the definition in the cpp file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2489266971 From roland at openjdk.org Tue Nov 4 08:47:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 4 Nov 2025 08:47:50 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v17] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Tue, 4 Nov 2025 07:19:17 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/memnode.hpp line 1422: > >> 1420: public: >> 1421: >> 1422: template void for_each_narrow_mem_proj_with_new_uses(Callback callback) const { > > Can you add a code comment what the "with new uses" part means? Probably that if we add more uses during iteration, we will eventually also visit those? Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2489256617 From epeter at openjdk.org Tue Nov 4 08:56:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 08:56:47 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v4] In-Reply-To: <5S3qdxwC7jHLdzDGe74ls5zIgb1K1S5xlgp-jxFkKSI=.2f0b73d7-92e2-45f5-afa2-18ba3dacf932@github.com> References: <1UNdzkgCUH6tju9WzaTQaBdeT8Xv9T4TWnk2Jg3SMoA=.6ee10e45-8c73-444a-a9da-ca0c03bdaf79@github.com> <5S3qdxwC7jHLdzDGe74ls5zIgb1K1S5xlgp-jxFkKSI=.2f0b73d7-92e2-45f5-afa2-18ba3dacf932@github.com> Message-ID: On Tue, 4 Nov 2025 08:41:24 GMT, Beno?t Maillard wrote: >> src/hotspot/share/opto/phaseX.cpp line 2568: >> >>> 2566: // ConvI2F->ConvF2I->ConvI2F >>> 2567: // Note: there may be other 3-nodes conversion chains that would require to be added here, but these >>> 2568: // are the only ones that are known to trigger missed optimizations otherwise >> >> You may want to update the description, and give a bit of extra information. Because you are saying `n` does not have to be a conversion, but it may be that `n` is about to be replaced with a conversion, right? > > Yes, in some cases `add_users_of_use_to_worklist` is called with the node about to be replaced as argument `n`. The point is that we might replace `n` with a node that already has other uses, and we only want to notify the uses for which there is a potential change. > But this is in no way specific to this one optimization, so I think adding something here would cause more confusion than anything else. Perhaps we should update the description of `add_users_of_use_to_worklist` then? Right, this is not specific to this optimization here. Why not add something at the level of `add_users_of_use_to_worklist`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2489325144 From duke at openjdk.org Tue Nov 4 09:19:19 2025 From: duke at openjdk.org (Harshit470250) Date: Tue, 4 Nov 2025 09:19:19 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v2] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge remote-tracking branch 'origin/master' - remove whitespace - Resolved a bug - remove TODO comments - final size added - Final sizes - oop_decoder and load_const_optimized - error fix and added more sizes - upto line 9078 - ... and 2 more: https://git.openjdk.org/jdk/compare/212d2340...4e02e366 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/c253449a..4e02e366 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=00-01 Stats: 49592 lines in 393 files changed: 25497 ins; 20831 del; 3264 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From epeter at openjdk.org Tue Nov 4 09:23:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 09:23:47 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v10] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 107 additional commits since the last revision: - Merge branch 'master' into JDK-8367531-fix-addDataName - Merge branch 'master' into JDK-8367531-fix-addDataName - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName - Manuel's suggestions Co-authored-by: Manuel H?ssig - Merge branch 'master' into JDK-8367531-fix-addDataName - Merge branch 'JDK-8367531-fix-addDataName' of https://github.com/eme64/jdk into JDK-8367531-fix-addDataName - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Manuel H?ssig - improve tutorial for Manuel - fix TestMethodArguments.java after merge with master - fix tests after integration of Expressions/Operations - ... and 97 more: https://git.openjdk.org/jdk/compare/f8c537f9...b708f0ac ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/18b895f3..b708f0ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=08-09 Stats: 6583 lines in 81 files changed: 3221 ins; 2304 del; 1058 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Tue Nov 4 09:36:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 09:36:08 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node [v3] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 17:17:33 GMT, Beno?t Maillard wrote: >> This PR prevents hitting an assert caused by encountering `top` while following the memory >> slice associated with a field when eliminating allocations in macro node elimination. This situation >> is the result of another elimination (boxing node elimination) that happened at the same >> macro expansion iteration. >> >> ### Analysis >> >> The issue appears in the macro expansion phase. We have a nested `synchronized` block, >> with the outer block synchronizing on `new A()` and the inner one on `TestTopInMacroElimination.class`. >> In the inner `synchronized` block we have an `Integer.valueOf` call in a loop. >> >> In `PhaseMacroExpand::eliminate_boxing_node` we are getting rid of the `Integer.valueOf` >> call, as it is a non-escaping boxing node. After having eliminated the call, >> `PhaseMacroExpand::process_users_of_allocation` takes care of the users of the removed node. >> There, we replace usages of the fallthrough memory projection with `top`. >> >> In the same macro expansion iteration, we later attempt to get rid of the `new A()` allocation >> in `PhaseMacroExpand::create_scalarized_object_description`. There, we have to make >> sure that all safepoints can still see the object fields as if the allocation was never deleted. >> For this, we attempt to find the last value on the slice of each specific field (`a` >> in this case). Because field `a` is never written to, and it is not explicitely initialized, >> there is no `Store` associated to it and not even a dedicated memory slice (we end up >> taking the `Bot` input on `MergeMem` nodes). By going up the memory chain, we eventually >> encounter the `MemBarReleaseLock` whose input was set to `top`. This is where the assert >> is hit. >> >> ### Proposed Fix >> >> In the end I opted for an analog fix as the similar [JDK-8325030](https://git.openjdk.org/jdk/pull/23104). >> If we get `top` from `scan_mem_chain` in `PhaseMacroExpand::value_from_mem`, then we can safely >> return `top` as well. This means that the safepoint will have `top` as data input, but this will >> eventually cleaned up by the next round of IGVN. >> >> Another (tempting) option would have been to simply return `nullptr` from `PhaseMacroExpand::value_from_mem` when encoutering `top`. However this would result in bailing >> out from eliminating this allocation temporarily and effectively delaying it to a subsqequent >> macro expansion round. >> >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8362832)... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Make comment more specific Looks good to me, thanks for the improved comments! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27903#pullrequestreview-3415111388 From qxing at openjdk.org Tue Nov 4 09:42:00 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 4 Nov 2025 09:42:00 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:22:13 GMT, Emanuel Peter wrote: >> The second question: >> >>> If we now removed safepoints in places where we would actually have needed them: how would we find out? I suppose we would get longer time to safepoint - higher latency in some cases. How would we catch this with our tests? >> >> I tried running tier1 tests with `JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+SafepointTimeout -XX:+AbortVMOnSafepointTimeout -XX:SafepointTimeoutDelay=1000`, and there were no failures. >> >> Running with `-XX:SafepointTimeoutDelay=500` caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option. > > @MaxXSoft Would you mind improving the documentation comments, so that they are easier to understand? Maybe you can even add more comments around your code change, to "prove" why it is ok to do what we would do with your change? @eme64 @rwestrel Thank you for your review! @rwestrel confirmed that the IR test covers all the cases mentioned in the `IdealLoopTree::check_safepts` comment, so this patch can now be integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-3484911936 From bmaillard at openjdk.org Tue Nov 4 09:42:01 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 4 Nov 2025 09:42:01 GMT Subject: RFR: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node [v3] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 17:43:53 GMT, Vladimir Kozlov wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Make comment more specific > > I agree with fix. Thank you for your reviews @vnkozlov @dlunde @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27903#issuecomment-3484913511 From epeter at openjdk.org Tue Nov 4 09:42:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 09:42:02 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v4] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 00:24:26 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Add assertion for the helper in CTPComparator > > Co-authored-by: Emanuel Peter Still good :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27618#pullrequestreview-3415158637 From bmaillard at openjdk.org Tue Nov 4 09:45:48 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 4 Nov 2025 09:45:48 GMT Subject: Integrated: 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 15:53:59 GMT, Beno?t Maillard wrote: > This PR prevents hitting an assert caused by encountering `top` while following the memory > slice associated with a field when eliminating allocations in macro node elimination. This situation > is the result of another elimination (boxing node elimination) that happened at the same > macro expansion iteration. > > ### Analysis > > The issue appears in the macro expansion phase. We have a nested `synchronized` block, > with the outer block synchronizing on `new A()` and the inner one on `TestTopInMacroElimination.class`. > In the inner `synchronized` block we have an `Integer.valueOf` call in a loop. > > In `PhaseMacroExpand::eliminate_boxing_node` we are getting rid of the `Integer.valueOf` > call, as it is a non-escaping boxing node. After having eliminated the call, > `PhaseMacroExpand::process_users_of_allocation` takes care of the users of the removed node. > There, we replace usages of the fallthrough memory projection with `top`. > > In the same macro expansion iteration, we later attempt to get rid of the `new A()` allocation > in `PhaseMacroExpand::create_scalarized_object_description`. There, we have to make > sure that all safepoints can still see the object fields as if the allocation was never deleted. > For this, we attempt to find the last value on the slice of each specific field (`a` > in this case). Because field `a` is never written to, and it is not explicitely initialized, > there is no `Store` associated to it and not even a dedicated memory slice (we end up > taking the `Bot` input on `MergeMem` nodes). By going up the memory chain, we eventually > encounter the `MemBarReleaseLock` whose input was set to `top`. This is where the assert > is hit. > > ### Proposed Fix > > In the end I opted for an analog fix as the similar [JDK-8325030](https://git.openjdk.org/jdk/pull/23104). > If we get `top` from `scan_mem_chain` in `PhaseMacroExpand::value_from_mem`, then we can safely > return `top` as well. This means that the safepoint will have `top` as data input, but this will > eventually cleaned up by the next round of IGVN. > > Another (tempting) option would have been to simply return `nullptr` from `PhaseMacroExpand::value_from_mem` when encoutering `top`. However this would result in bailing > out from eliminating this allocation temporarily and effectively delaying it to a subsqequent > macro expansion round. > > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8362832) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! This pull request has now been integrated. Changeset: a98b9e7c Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/a98b9e7cfa433d4bf2acbf59a1c9d3714c3c065d Stats: 9 lines in 3 files changed: 5 ins; 3 del; 1 mod 8362832: compiler/macronodes/TestTopInMacroElimination.java hits assert(false) failed: unexpected node Reviewed-by: dlunden, epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/27903 From epeter at openjdk.org Tue Nov 4 09:46:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 09:46:53 GMT Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph [v5] In-Reply-To: References: Message-ID: <32xjyMkb7Bo-nj_bCQD-4o9zoIwZSaxS_CUoXm6zYN0=.5431dc65-12c0-4e9e-abfc-3e37927d80ec@github.com> On Fri, 5 Sep 2025 08:14:50 GMT, Marc Chevalier wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> One more ResourceMark > > Totally not in this change, yes. And indeed, we could just use the macro to define a bit more. But I fear it will be a controversial topic. @marc-chevalier @mhaessig What's the status of this issue? Maybe it got a little de-preoritized because of the resistance I gave. I'm good with the approach now though, after our offline conversation some weeks ago. So please let me know if I should review again :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26362#issuecomment-3484935613 From duke at openjdk.org Tue Nov 4 09:46:53 2025 From: duke at openjdk.org (duke) Date: Tue, 4 Nov 2025 09:46:53 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v7] In-Reply-To: References: Message-ID: On Fri, 17 Oct 2025 09:32:48 GMT, Qizheng Xing wrote: >> In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. >> >> loop-safepoint >> >> This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). >> >> >> Benchmark Mode Cnt Score Error Units >> LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline >> LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch >> >> >> Testing: tier1-2 on x86_64 and aarch64. > > Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: > > - Update microbench > - Add IR tests for nested loops @MaxXSoft Your change (at version b132bddc29f63b1c98bb71c42eb8be7234bf073c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-3484926486 From epeter at openjdk.org Tue Nov 4 09:48:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 09:48:01 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v4] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 07:12:36 GMT, erifan wrote: >> According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. >> >> This test problem was discovered by simulating a 512-bit sve2 environment using qemu. >> >> This PR fixes these test failures. > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Add IR test rules for unsupported partial cases on aarch64 Tests passed, approved! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27723#pullrequestreview-3415209495 From qamai at openjdk.org Tue Nov 4 09:59:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 4 Nov 2025 09:59:50 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v17] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:45:04 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Make code more compact Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3415289323 From qamai at openjdk.org Tue Nov 4 10:09:07 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 4 Nov 2025 10:09:07 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v5] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into andorxor - Add assertion for the helper in CTPComparator Co-authored-by: Emanuel Peter - remove std::hash - remove unordered_map, add some comments for all_instances_size - Emanuel's reviews - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences ------------- Changes: https://git.openjdk.org/jdk/pull/27618/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=04 Stats: 964 lines in 9 files changed: 630 ins; 313 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/27618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27618/head:pull/27618 PR: https://git.openjdk.org/jdk/pull/27618 From dfenacci at openjdk.org Tue Nov 4 10:27:56 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 4 Nov 2025 10:27:56 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v3] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 13:12:27 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/node.cpp line 567: >> >>> 565: n->as_Call()->set_generator(cloned_cg); >>> 566: if (cloned_cg->is_mh_late_inline()) { >>> 567: C->inc_number_of_mh_late_inlines(); >> >> Do you need to decrement the counter when a CallNode with `generator()->is_mh_late_inline()` goes dead? > > I think that would make sense. But the only use of that counter (excluding asserts) seems to be: > > https://github.com/openjdk/jdk/blob/ef464d69399e50aee126a4756fe9a9a19e44d3c4/src/hotspot/share/opto/compile.cpp#L829 > > Maybe, then, it's simpler to not bother with maintaining an accurate count. See new commits. > > 8352963 added a new call `inc_number_of_mh_late_inlines()` that I remove here because I don't think it's needed. I had a look at the PR for that one and I don't see it discussed. @dafedafe do you remember why you added it? AFAIR at some point I was getting the same assert failure `assert(_number_of_mh_late_inlines > 0)` and noticed that we were re-registering method handles for late inlining without incrementing the counter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2489784177 From epeter at openjdk.org Tue Nov 4 11:21:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 11:21:01 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v18] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Tue, 4 Nov 2025 08:47:47 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Nice, thanks for the updates @rwestrel ! And thanks for working on this issue, it was a tricky one :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-3415820558 From roland at openjdk.org Tue Nov 4 11:21:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 4 Nov 2025 11:21:02 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com> <4UN1z9fhxeUqUGagnZVEIFOyDb_mP8WaWUBwWO2HjFA=.93b7c9ad-443c-4fff-810d-7fe805ccbfaa@github.com> Message-ID: <3eK8oOLGNpXu9jhHYACrh_FmWErkLN6IbWiHxSGYeqc=.82d0dc91-970a-4c24-8742-db4e2656a418@github.com> On Tue, 21 Oct 2025 13:49:12 GMT, Roberto Casta?eda Lozano wrote: >>> Hi @rwestrel, could you please have a look at the merge conflicts of this PR so that we can progress further with the review of this work? >> >> The conflict is caused by the integration of [JDK-8360031](https://bugs.openjdk.org/browse/JDK-8360031), which relaxes the assertion in https://github.com/openjdk/jdk/blob/430041d366ddf450c2480c81608dde980dfa6d41/src/hotspot/share/opto/memnode.cpp#L4232 which is also touched by this changeset. Is the current assertion in mainline (after JDK-8360031) still valid in the context of this changeset? > >> Is the current assertion in mainline (after JDK-8360031) still valid in the context of this changeset? > > I did a bit of testing and updating the asserted invariant to `(outcnt() > 0 && outcnt() <= 2) || Opcode() == Op_Initialize` seems to work. @robcasloz @eme64 thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3485415006 From roland at openjdk.org Tue Nov 4 11:21:04 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 4 Nov 2025 11:21:04 GMT Subject: Integrated: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Thu, 10 Apr 2025 11:39:36 GMT, Roland Westrelin wrote: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... This pull request has now been integrated. Changeset: e6546683 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/e6546683a8dd9a64255ce4c5606089068ec92e5d Stats: 922 lines in 24 files changed: 831 ins; 25 del; 66 mod 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed Co-authored-by: Emanuel Peter Co-authored-by: Roberto Casta?eda Lozano Reviewed-by: epeter, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/24570 From rehn at openjdk.org Tue Nov 4 12:37:12 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 4 Nov 2025 12:37:12 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v6] In-Reply-To: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: > Hi, please consider. > > Sanity tested and no issues with MAJIK t1 (with +VSC). > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - chaitin dump print majik cookie - Merge branch 'master' into vsc - Merge branch 'master' into vsc - Forgot fix format for VSAC - Fixed format - Label name - li->mv, format, space - Draft ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28005/files - new: https://git.openjdk.org/jdk/pull/28005/files/00bd0deb..989bef21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28005&range=04-05 Stats: 55741 lines in 616 files changed: 29123 ins; 22700 del; 3918 mod Patch: https://git.openjdk.org/jdk/pull/28005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28005/head:pull/28005 PR: https://git.openjdk.org/jdk/pull/28005 From rehn at openjdk.org Tue Nov 4 12:37:15 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 4 Nov 2025 12:37:15 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v5] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Tue, 4 Nov 2025 07:58:34 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/riscv.ad line 1187: >> >>> 1185: } >>> 1186: >>> 1187: constexpr uint64_t MAJIK_DWORD = 0xabbaabbaabbaabbaull; >> >> Looking at `PhaseChaitin::dump_frame`, I found the hard-coded MAJIK word for x86. Should we add RISCV MAJIK DWORD here? >> https://github.com/openjdk/jdk/blob/576f9694b092f2a11a6a4e5a82c2b0e12203bd9c/src/hotspot/share/opto/chaitin.cpp#L2409-L2411 > > Oh, yes. > > The reason I did not want to use "BADB100D" is because of sign extension of B if we would be off by 4 bytes. > E.i. I want a zero bit on all 4-byte borders. And ABBA....ABBA is at lets easier for me to find when eye-balling. > > I have to think what to do here: do we want same majik on all platforms.... etc... Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28005#discussion_r2490349411 From fjiang at openjdk.org Tue Nov 4 14:12:08 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 4 Nov 2025 14:12:08 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v6] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Tue, 4 Nov 2025 12:37:12 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Sanity tested and no issues with MAJIK t1 (with +VSC). >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - chaitin dump print majik cookie > - Merge branch 'master' into vsc > - Merge branch 'master' into vsc > - Forgot fix format for VSAC > - Fixed format > - Label name > - li->mv, format, space > - Draft Thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/28005#pullrequestreview-3416683306 From epeter at openjdk.org Tue Nov 4 16:04:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 16:04:22 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: Message-ID: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: small adjustments after call with Roberto and Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/b708f0ac..69cff741 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=09-10 Stats: 26 lines in 2 files changed: 19 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From kvn at openjdk.org Tue Nov 4 17:03:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Nov 2025 17:03:46 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> References: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> Message-ID: On Mon, 3 Nov 2025 18:38:13 GMT, Vladimir Ivanov wrote: >> Expand the optimization for interface calls introduced by [JDK-6986483](https://bugs.openjdk.org/browse/JDK-6986483) to calls through `MethodHandle.linkToInterface`. >> >> The implementation is straightforward except the fact that symbolic information is lost during `MemberName` resolution. The fix uses declaring class instead, but it's more conservative than what is done for invokeinterface case. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > naming test/hotspot/jtreg/compiler/cha/StrengthReduceInterfaceCall.java line 75: > 73: > 74: // Implementation limitation: CHA is not performed by C1 during inlining through MH linkers. > 75: if (!jdk.test.whitebox.code.Compiler.isC1Enabled()) { Should you check that C2 is **enabled**? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28094#discussion_r2491326645 From kvn at openjdk.org Tue Nov 4 17:11:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Nov 2025 17:11:46 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 14:12:40 GMT, Emanuel Peter wrote: >> Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. >> >> Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 >> >> Main goal: >> - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). >> - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. >> >> **Why cost-model?** >> >> Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. >> >> But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. >> >> Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. >> >> **Implementation** >> >> Items: >> - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. >> - `VLoopAnalyzer::cost`: scalar loop cost >> - `VTransformGraph::cost`: vector loop cost >> - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. >> - Adapted existing tests. >> - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. >> >> **Testing** >> Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. >> >> ------------------------------ >> >> **Some History** >> >> I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). >> > ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > More comments for SirYwell src/hotspot/share/opto/superword.cpp line 1910: > 1908: #ifndef PRODUCT > 1909: if (_trace._info) { > 1910: tty->print_cr("\nForced bailout of vectorization (AutoVectorizationOverrideProfitability=0)."); Side note. Consider separate RFE to change this to UL for such outputs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2491363663 From epeter at openjdk.org Tue Nov 4 17:15:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Nov 2025 17:15:47 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 17:09:05 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> More comments for SirYwell > > src/hotspot/share/opto/superword.cpp line 1910: > >> 1908: #ifndef PRODUCT >> 1909: if (_trace._info) { >> 1910: tty->print_cr("\nForced bailout of vectorization (AutoVectorizationOverrideProfitability=0)."); > > Side note. Consider separate RFE to change this to UL for such outputs. Absolutely. The tricky part is that the current `TraceAutoVectorization` is a compile command that can be enabled with method name filtering. Is that already available via UL now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2491382314 From kvn at openjdk.org Tue Nov 4 18:22:25 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Nov 2025 18:22:25 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 14:12:40 GMT, Emanuel Peter wrote: >> Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. >> >> Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 >> >> Main goal: >> - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). >> - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. >> >> **Why cost-model?** >> >> Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. >> >> But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. >> >> Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. >> >> **Implementation** >> >> Items: >> - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. >> - `VLoopAnalyzer::cost`: scalar loop cost >> - `VTransformGraph::cost`: vector loop cost >> - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. >> - Adapted existing tests. >> - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. >> >> **Testing** >> Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. >> >> ------------------------------ >> >> **Some History** >> >> I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). >> > ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > More comments for SirYwell This looks fine and not complex. I have only nit picks. src/hotspot/share/opto/vectorization.cpp line 572: > 570: > 571: // Compute the cost over all operations in the (scalar) loop. > 572: float VLoopAnalyzer::cost() const { consider renaming it to `cost_for_scalar()` and `cost_for_scalar()` to `cost_for_scalar_node()` src/hotspot/share/opto/vectorization.cpp line 580: > 578: > 579: float sum = 0; > 580: for (int j = 0; j < body().body().length(); j++) { What is `body().body()` mean? ------------- PR Review: https://git.openjdk.org/jdk/pull/27803#pullrequestreview-3417923718 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2491543549 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2491532857 From kvn at openjdk.org Tue Nov 4 18:29:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Nov 2025 18:29:08 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: References: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> Message-ID: On Tue, 4 Nov 2025 17:00:00 GMT, Vladimir Kozlov wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> naming > > test/hotspot/jtreg/compiler/cha/StrengthReduceInterfaceCall.java line 75: > >> 73: >> 74: // Implementation limitation: CHA is not performed by C1 during inlining through MH linkers. >> 75: if (!jdk.test.whitebox.code.Compiler.isC1Enabled()) { > > Should you check that C2 is **enabled**? May it should be in @requires ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28094#discussion_r2491652520 From kvn at openjdk.org Tue Nov 4 18:39:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Nov 2025 18:39:27 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 17:13:22 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 1910: >> >>> 1908: #ifndef PRODUCT >>> 1909: if (_trace._info) { >>> 1910: tty->print_cr("\nForced bailout of vectorization (AutoVectorizationOverrideProfitability=0)."); >> >> Side note. Consider separate RFE to change this to UL for such outputs. > > Absolutely. The tricky part is that the current `TraceAutoVectorization` is a compile command that can be enabled with method name filtering. Is that already available via UL now? Unfortunately no. I think this is what @anton-seoane worked on before. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2491678845 From vlivanov at openjdk.org Tue Nov 4 19:39:18 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 4 Nov 2025 19:39:18 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: References: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> Message-ID: On Tue, 4 Nov 2025 18:26:52 GMT, Vladimir Kozlov wrote: >> test/hotspot/jtreg/compiler/cha/StrengthReduceInterfaceCall.java line 75: >> >>> 73: >>> 74: // Implementation limitation: CHA is not performed by C1 during inlining through MH linkers. >>> 75: if (!jdk.test.whitebox.code.Compiler.isC1Enabled()) { >> >> Should you check that C2 is **enabled**? > > May it should be in @requires ? > Should you check that C2 is enabled? The test has `@requires !vm.graal.enabled`. Do you prefer to have it spelled as `@requires vm.compiler2.enabled` instead? > May it should be in @requires? Original test cases apply to both C1 and C2. I could introduce a separate test for MH invoker cases, but IMO keeping relevant test logic co-located is preferred compared to avoiding a configuration check at runtime. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28094#discussion_r2491830184 From vlivanov at openjdk.org Tue Nov 4 19:43:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 4 Nov 2025 19:43:55 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v3] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 13:11:04 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > more src/hotspot/share/opto/compile.hpp line 1090: > 1088: } > 1089: > 1090: void inc_number_of_mh_late_inlines() { _number_of_mh_late_inlines++; } Does it make sense to get rid of `inc_number_of_mh_late_inlines()` and turn `_number_of_mh_late_inlines` into a a boolean with a setter? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2491844939 From vlivanov at openjdk.org Tue Nov 4 21:08:17 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 4 Nov 2025 21:08:17 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: <_5LBssd0pSBQKv9WuNXXOyUOtmn0xPpwWbaZe9qEbj0=.7dd0c58d-6aa9-4829-aaf3-8756778fbe3e@github.com> On Tue, 4 Nov 2025 08:42:24 GMT, Emanuel Peter wrote: >> Done. > > @iwanowww Thanks for the comments! > > I'm still not 100% happy with it. It really feels like we are introducing some tech-debt here. What should the next person do who also needs to attach something else to the SafePoint? It is also easy to miss places where we have to special case the extra edges. I don't have a solution here, I'm just not extremely satisfied. Is there a better long-term solution? I don't think we can improve things much without paying existing tech debt related to managing non-debug edges. The two cases I spotted relates to scalarization and is not aware about non-debug inputs a safepoint may have. Both implementations assume there's only debug info attached and tweak it by appending new edges and extending debug info range to cover them. Without a proper API to manage safepoint-attached debug information, it'll always be fragile. JDK-8370133 "C2: Manage non-debug safepoint edges in structural manner" should address the root cause and introduce a proper way to work with safepoint-attached debug and non-debug information. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2492046349 From sviswanathan at openjdk.org Wed Nov 5 01:02:36 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 5 Nov 2025 01:02:36 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v5] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 10:13:03 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update comments Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3419152164 From jkarthikeyan at openjdk.org Wed Nov 5 01:28:31 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 5 Nov 2025 01:28:31 GMT Subject: RFR: 8350468: x86: Improve implementation of vectorized numberOfLeadingZeros for int and long [v2] In-Reply-To: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> Message-ID: > Hi all, > This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > LeadingZeros.testInt avgt 15 91.097 ? 3.276 ns/op 68.665 ? 1.740 ns/op (+ 28.1%) > LeadingZeros.testLong avgt 15 342.545 ? 4.470 ns/op 228.668 ? 5.994 ns/op (+ 39.9%) > > I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Fix typo in comment - Merge branch 'master' into optimize-leading-zero - Optimize numberOfLeadingZeros ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26610/files - new: https://git.openjdk.org/jdk/pull/26610/files/7e207220..05195505 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26610&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26610&range=00-01 Stats: 379610 lines in 5554 files changed: 260632 ins; 84059 del; 34919 mod Patch: https://git.openjdk.org/jdk/pull/26610.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26610/head:pull/26610 PR: https://git.openjdk.org/jdk/pull/26610 From jkarthikeyan at openjdk.org Wed Nov 5 01:28:32 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 5 Nov 2025 01:28:32 GMT Subject: RFR: 8350468: x86: Improve implementation of vectorized numberOfLeadingZeros for int and long In-Reply-To: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> Message-ID: On Mon, 4 Aug 2025 02:20:31 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > LeadingZeros.testInt avgt 15 91.097 ? 3.276 ns/op 68.665 ? 1.740 ns/op (+ 28.1%) > LeadingZeros.testLong avgt 15 342.545 ? 4.470 ns/op 228.668 ? 5.994 ns/op (+ 39.9%) > > I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated! Thanks for the reviews and the testing! I've pushed a commit that fixes a typo and merges the latest changes. A re-review would be appreciated! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26610#issuecomment-3488692128 From jkarthikeyan at openjdk.org Wed Nov 5 01:28:35 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 5 Nov 2025 01:28:35 GMT Subject: RFR: 8350468: x86: Improve implementation of vectorized numberOfLeadingZeros for int and long [v2] In-Reply-To: References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> Message-ID: On Tue, 23 Sep 2025 23:20:46 GMT, Sandhya Viswanathan wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Fix typo in comment >> - Merge branch 'master' into optimize-leading-zero >> - Optimize numberOfLeadingZeros > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6289: > >> 6287: // Move the top half result to the bottom half of xtmp1, setting the top half to 0. >> 6288: vpsrlq(xtmp1, dst, 32, vec_enc); >> 6289: // By moving the top half result to the right by 6 bytes, if the top half was empty (i.e. 32 is returned) the result bit will > > I think you mean 6 bits here and not 6 bytes. This is a good catch, I did mean bits here. I've fixed this in the latest commit. > test/hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java line 49: > >> 47: >> 48: public class TestNumberOfContinuousZeros { >> 49: private static final int[] SPECIAL_INT = { 0, 0x01FFFFFF, 0x03FFFFFE, 0x07FFFFFC, 0x0FFFFFF8, 0x1FFFFFF0, 0x3FFFFFE0, 0xFFFFFFFF }; > > Please also update the copyright year for the file to 2025. I think this file already has a 2025 copyright, since I'm an individual contributor I'd be covered by the Oracle one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26610#discussion_r2492497362 PR Review Comment: https://git.openjdk.org/jdk/pull/26610#discussion_r2492498871 From duke at openjdk.org Wed Nov 5 01:33:56 2025 From: duke at openjdk.org (duke) Date: Wed, 5 Nov 2025 01:33:56 GMT Subject: RFR: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms [v4] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 07:12:36 GMT, erifan wrote: >> According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. >> >> This test problem was discovered by simulating a 512-bit sve2 environment using qemu. >> >> This PR fixes these test failures. > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Add IR test rules for unsupported partial cases on aarch64 @erifan Your change (at version 3147164fea8f373af331d5c8af980fd07d822511) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27723#issuecomment-3488709168 From duke at openjdk.org Wed Nov 5 02:22:18 2025 From: duke at openjdk.org (erifan) Date: Wed, 5 Nov 2025 02:22:18 GMT Subject: Integrated: 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 10:45:47 GMT, erifan wrote: > According the AD file, partial cases where `vector_length_in_bytes > 8` of the vector API `selectFrom` are not supported on the AArch64 SVE2 platform. But the test `TestSelectFromTwoVectorOp.java` didn't rule out these cases, leading to test faiulres on sve2 plaftforms where `MaxVectorSize > 16`. > > This test problem was discovered by simulating a 512-bit sve2 environment using qemu. > > This PR fixes these test failures. This pull request has now been integrated. Changeset: 4e6cadf4 Author: erifan Committer: Hao Sun URL: https://git.openjdk.org/jdk/commit/4e6cadf4550c58b3ff97dfa0cead4b5b1399324c Stats: 116 lines in 3 files changed: 94 ins; 1 del; 21 mod 8369456: [TESTBUG] Fix the test failure of TestSelectFromTwoVectorOp.java on sve2 platforms Reviewed-by: epeter, bkilambi, xgong, haosun ------------- PR: https://git.openjdk.org/jdk/pull/27723 From qamai at openjdk.org Wed Nov 5 03:22:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 5 Nov 2025 03:22:12 GMT Subject: RFR: 8350468: x86: Improve implementation of vectorized numberOfLeadingZeros for int and long [v2] In-Reply-To: References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> Message-ID: On Wed, 5 Nov 2025 01:28:31 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> LeadingZeros.testInt avgt 15 91.097 ? 3.276 ns/op 68.665 ? 1.740 ns/op (+ 28.1%) >> LeadingZeros.testLong avgt 15 342.545 ? 4.470 ns/op 228.668 ? 5.994 ns/op (+ 39.9%) >> >> I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fix typo in comment > - Merge branch 'master' into optimize-leading-zero > - Optimize numberOfLeadingZeros Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26610#pullrequestreview-3419543784 From haosun at openjdk.org Wed Nov 5 03:36:01 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 5 Nov 2025 03:36:01 GMT Subject: RFR: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 07:09:38 GMT, Ramkumar Sunderbabu wrote: > We have a host of tests under test/hotspot/jtreg/compiler/intrinsics/sha which checks if the SHA intrinsics flags' enable/disable setting is in sync with CPU support in the underlying platform. There might be situations where the intrinsics might not be enabled despite the hardware supporting the relevant instructions. For example, there might be reliability issues or performance issues. In such situations, the tests will fail. > > Till now, the approach has been to exclude the platforms where the support is yet to be provided and remove the exclusion after. This necessitates additional work on the test front. > > A more compact design would be make predicate probes to rely on intrinsics availability in the platform as opposed to hardware support availability. The migration to intrinsics availability would especially help update releases where feature backport might not be complete. > > PS: This fix can/should be propagated to other such tests as well. Once this PR gets approval, I will work on similar tests. Hi, I suppose the failure may occur if we run this test case on CPU **with** SHA512 feature, but **disabling** SHA512Intrinsics. As **@requires vm.flagless** is set in this jtreg case, if we specify `-XX:-UseSHA512Intrinsics`, this test case is not tested actually. Here shows the log in my machine. $ make test TEST=test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java JTREG="VM_OPTIONS=-XX:-UseSHA512Intrinsics" Building target 'test' in configuration '/tmp/local-build-fastdebug' Running tests using JTREG control variable 'VM_OPTIONS=-XX:-UseSHA512Intrinsics' Test selection 'test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java', will run: * jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java Clean up dirs for jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java Running test 'jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java' Test results: no tests selected Report written to /tmp/local-build-fastdebug/test-results/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java/html/report.html Results written to /tmp/local-build-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java Finished running test 'jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java' Test report is stored in /tmp/local-build-fastdebug/test-results/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR SKIP jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java 0 0 0 0 0 ============================== TEST SUCCESS If so, I don't think it's a bug. Is there anything I misunderstood? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28053#issuecomment-3489100024 From sviswanathan at openjdk.org Wed Nov 5 03:45:10 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 5 Nov 2025 03:45:10 GMT Subject: RFR: 8350468: x86: Improve implementation of vectorized numberOfLeadingZeros for int and long [v2] In-Reply-To: References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> Message-ID: On Wed, 5 Nov 2025 01:28:31 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> LeadingZeros.testInt avgt 15 91.097 ? 3.276 ns/op 68.665 ? 1.740 ns/op (+ 28.1%) >> LeadingZeros.testLong avgt 15 342.545 ? 4.470 ns/op 228.668 ? 5.994 ns/op (+ 39.9%) >> >> I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fix typo in comment > - Merge branch 'master' into optimize-leading-zero > - Optimize numberOfLeadingZeros Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26610#pullrequestreview-3419613018 From duke at openjdk.org Wed Nov 5 05:08:57 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 5 Nov 2025 05:08:57 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v8] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - fix assert - add more assert - rid of access.addr().type() - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Fix build - Fix test failed - 8344116: C2: remove slice parameter from LoadNode::make ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/ea83736e..6d122039 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=06-07 Stats: 526337 lines in 7522 files changed: 349612 ins; 122587 del; 54138 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From duke at openjdk.org Wed Nov 5 05:09:05 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 5 Nov 2025 05:09:05 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v6] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:04:12 GMT, Roland Westrelin wrote: >> Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into 8344116 >> - Fix build >> - Fix test failed >> - 8344116: C2: remove slice parameter from LoadNode::make > > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 223: > >> 221: MergeMemNode* mm = opt_access.mem(); >> 222: PhaseGVN& gvn = opt_access.gvn(); >> 223: Node* mem = mm->memory_at(gvn.C->get_alias_index(access.addr().type())); > > Can we get rid of all uses of `access.addr().type()`? Get rid of all access.addr().type() > src/hotspot/share/gc/shared/c2/cardTableBarrierSetC2.cpp line 105: > >> 103: // stores. In theory we could relax the load from ctrl() to >> 104: // no_ctrl, but that doesn't buy much latitude. >> 105: Node* card_val = __ load( __ ctrl(), card_adr, TypeInt::BYTE, T_BYTE); > > We could asssert that `C->get_alias_index(kit->type(card_adr) == Compile::AliasIdxRaw`, that is that computed slice is the same as hardcoded slide. Similar asserts could be added for every location where a slice/address type is removed in this patch. Sure, I add more assert for this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2484816831 PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2492987998 From fyang at openjdk.org Wed Nov 5 07:25:05 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 5 Nov 2025 07:25:05 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v6] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Tue, 4 Nov 2025 12:37:12 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Sanity tested and no issues with MAJIK t1 (with +VSC). >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - chaitin dump print majik cookie > - Merge branch 'master' into vsc > - Merge branch 'master' into vsc > - Forgot fix format for VSAC > - Fixed format > - Label name > - li->mv, format, space > - Draft Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28005#pullrequestreview-3420264864 From aseoane at openjdk.org Wed Nov 5 08:10:04 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 5 Nov 2025 08:10:04 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 18:36:58 GMT, Vladimir Kozlov wrote: >> Absolutely. The tricky part is that the current `TraceAutoVectorization` is a compile command that can be enabled with method name filtering. Is that already available via UL now? > > Unfortunately no. I think this is what @anton-seoane worked on before. Yes, I have taken the task again so sooner than later CompileCommand filtering for UL will be enabled for cases such as this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2493432550 From hgreule at openjdk.org Wed Nov 5 08:38:26 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 5 Nov 2025 08:38:26 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 13:59:28 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorization.cpp line 634: >> >>> 632: // Each reduction is composed of multiple instructions, each estimated with a unit cost. >>> 633: // Linear: shuffle and reduce Recursive: shuffle and reduce >>> 634: float c = requires_strict_order ? 2 * vlen : 2 * exact_log2(vlen); >> >> "unit cost" sounds a bit too simple given that there is some kind of estimation going on already. Maybe it would make sense to add some discussion how strict order affects the shape of the resulting vectorized code? >> >> I assume cases where the reduction can be moved after the loop are covered somewhere else? > > Thanks for the comment :) > > By "unit cost" I mean unit cost per hardware instruction. Reduction ops use multiple instructions, so we count the instructions, and return that count. > > Yes, if we move reductions out of the loop, then the reduction node is not in the loop anymore, and instead we have vector accumulators. And then we count the cost of the vector accumulators. > > That's why I need methods like `VTransformGraph::mark_vtnodes_in_loop` to know what nodes are in the loop (the new vector accumulators, and not the reductions if moved out of the loop). > > I think I'll improve the comments a little to make that more clear :) Ah, when referring to hardware instructions this makes perfectly sense, somehow I assumed "unit cost of a node". Thanks for clarifying! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2493503295 From rehn at openjdk.org Wed Nov 5 09:21:34 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Nov 2025 09:21:34 GMT Subject: RFR: 8370708: RISC-V: Add VerifyStackAtCalls [v6] In-Reply-To: References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Tue, 4 Nov 2025 14:09:41 GMT, Feilong Jiang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - chaitin dump print majik cookie >> - Merge branch 'master' into vsc >> - Merge branch 'master' into vsc >> - Forgot fix format for VSAC >> - Fixed format >> - Label name >> - li->mv, format, space >> - Draft > > Thanks! Thanks @feilongjiang and @RealFYang ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28005#issuecomment-3490093894 From mli at openjdk.org Wed Nov 5 09:23:33 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 5 Nov 2025 09:23:33 GMT Subject: RFR: 8371297: C2: assert triggerred in BoolTest::BoolTest Message-ID: Hi, Can you help to review this patch? Not sure how this one slipped in (https://github.com/openjdk/jdk/pull/28047/), I think it passed my local test and github CI test at that time. Please check https://bugs.openjdk.org/browse/JDK-8371297 for details. Thanks! ------------- Commit messages: - initial commit - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 8 more: https://git.openjdk.org/jdk/compare/32508230...0c877e26 Changes: https://git.openjdk.org/jdk/pull/28141/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28141&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371297 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28141.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28141/head:pull/28141 PR: https://git.openjdk.org/jdk/pull/28141 From aseoane at openjdk.org Wed Nov 5 09:23:39 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 5 Nov 2025 09:23:39 GMT Subject: RFR: 8356761: IGV: dump escape analysis information Message-ID: This PR introduces new IGV dumps, property fields and filters related to escape analysis information. The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: - Node escape ?level?. - Scalar replaceability. - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. **Testing:** passes tiers 1-3, manual testing in IGV ------------- Commit messages: - Merge master - Remove unused include - Use print_method - Merge branch 'master' of github.com:openjdk/jdk into JDK-8356761 - Merge - Add EA filters and better dumping experience - Fix ordering - More granularity - Reorder and rename - Sort includes - ... and 5 more: https://git.openjdk.org/jdk/compare/642ba4cf...872b1b48 Changes: https://git.openjdk.org/jdk/pull/28060/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356761 Stats: 170 lines in 8 files changed: 165 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28060/head:pull/28060 PR: https://git.openjdk.org/jdk/pull/28060 From rehn at openjdk.org Wed Nov 5 09:25:08 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Nov 2025 09:25:08 GMT Subject: Integrated: 8370708: RISC-V: Add VerifyStackAtCalls In-Reply-To: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> References: <0XnBhfrjvDzQT196de1LU8PS-DX8tJSO5mCRuRkDI-k=.0f66bc44-a497-4ce5-8f68-49890122d65f@github.com> Message-ID: On Mon, 27 Oct 2025 16:47:35 GMT, Robbin Ehn wrote: > Hi, please consider. > > Sanity tested and no issues with MAJIK t1 (with +VSC). > > Thanks, Robbin This pull request has now been integrated. Changeset: 0737a562 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/0737a5625269773dcf70b95f8b8ac90b3b6cc444 Stats: 31 lines in 3 files changed: 21 ins; 7 del; 3 mod 8370708: RISC-V: Add VerifyStackAtCalls Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/28005 From aph at openjdk.org Wed Nov 5 09:47:18 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 5 Nov 2025 09:47:18 GMT Subject: RFR: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 07:09:38 GMT, Ramkumar Sunderbabu wrote: > We have a host of tests under test/hotspot/jtreg/compiler/intrinsics/sha which checks if the SHA intrinsics flags' enable/disable setting is in sync with CPU support in the underlying platform. There might be situations where the intrinsics might not be enabled despite the hardware supporting the relevant instructions. For example, there might be reliability issues or performance issues. In such situations, the tests will fail. > > Till now, the approach has been to exclude the platforms where the support is yet to be provided and remove the exclusion after. This necessitates additional work on the test front. > > A more compact design would be make predicate probes to rely on intrinsics availability in the platform as opposed to hardware support availability. The migration to intrinsics availability would especially help update releases where feature backport might not be complete. > > PS: This fix can/should be propagated to other such tests as well. Once this PR gets approval, I will work on similar tests. > Hi, I suppose the failure may occur if we run this test case on CPU **with** SHA512 feature, but **disabling** SHA512Intrinsics. > > As **@requires vm.flagless** is set in this jtreg case, if we specify `-XX:-UseSHA512Intrinsics`, this test case is not tested actually. Here shows the log in my machine. > > ```shell > $ make test TEST=test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java JTREG="VM_OPTIONS=-XX:-UseSHA512Intrinsics" > Building target 'test' in configuration '/tmp/local-build-fastdebug' > Running tests using JTREG control variable 'VM_OPTIONS=-XX:-UseSHA512Intrinsics' > Test selection 'test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java', will run: > * jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java > Clean up dirs for jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java > > Running test 'jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java' > Test results: no tests selected > Report written to /tmp/local-build-fastdebug/test-results/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java/html/report.html > Results written to /tmp/local-build-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java > Finished running test 'jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java' > Test report is stored in /tmp/local-build-fastdebug/test-results/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java > 0 0 0 0 0 > ============================== > TEST SUCCESS > ``` > > If so, I don't think it's a bug. Is there anything I misunderstood? That is correct. // Determine if the compiler corresponding to the compilation level 'compLevel' // and to the compilation context 'compilation_context' provides an intrinsic // for the method 'method'. An intrinsic is available for method 'method' if: // - the intrinsic is enabled (by using the appropriate command-line flag) and // - the platform on which the VM is running provides the instructions necessary // for the compiler to generate the intrinsic code. // // The compilation context is related to using the DisableIntrinsic flag on a // per-method level, see hotspot/src/share/vm/compiler/abstractCompiler.hpp // for more details. public boolean isIntrinsicAvailable(Executable method, Executable compilationContext, int compLevel) { ------------- PR Comment: https://git.openjdk.org/jdk/pull/28053#issuecomment-3490197209 From epeter at openjdk.org Wed Nov 5 09:50:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 09:50:47 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: > Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. > > Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 > > Main goal: > - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). > - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. > > **Why cost-model?** > > Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. > > But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. > > Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. > > **Implementation** > > Items: > - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. > - `VLoopAnalyzer::cost`: scalar loop cost > - `VTransformGraph::cost`: vector loop cost > - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. > - Adapted existing tests. > - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. > > **Testing** > Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. > > ------------------------------ > > **Some History** > > I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). > > During JDK9, reductions were first vectorized, but then restricted for... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rename cost methods for Vladimir K ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27803/files - new: https://git.openjdk.org/jdk/pull/27803/files/d79df4fc..23906b81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27803&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27803&range=02-03 Stats: 31 lines in 5 files changed: 0 ins; 0 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/27803.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27803/head:pull/27803 PR: https://git.openjdk.org/jdk/pull/27803 From epeter at openjdk.org Wed Nov 5 09:50:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 09:50:49 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: Message-ID: <59Pv7my4ZuQ0zbG-HVjrABTiHsnjguqvXqPAoo3S-ko=.eb97cbd9-672e-4872-ad1f-fb85556f45e0@github.com> On Tue, 4 Nov 2025 18:19:29 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> More comments for SirYwell > > This looks fine and not complex. I have only nit picks. @vnkozlov Thanks for reviewing and the suggestions. I renamed some cost functions, and I like it better this way now too :) > src/hotspot/share/opto/vectorization.cpp line 572: > >> 570: >> 571: // Compute the cost over all operations in the (scalar) loop. >> 572: float VLoopAnalyzer::cost() const { > > consider renaming it to `cost_for_scalar()` and `cost_for_scalar()` to `cost_for_scalar_node()` I'll do some renamings to make it explicit which are for nodes, and which for the loop. > src/hotspot/share/opto/vectorization.cpp line 580: > >> 578: >> 579: float sum = 0; >> 580: for (int j = 0; j < body().body().length(); j++) { > > What is `body().body()` mean? `VLoopAnalyzer` (`this`) has multiple analysis subcomponents. One of them is the `VLoopBody`, i.e. `this->body()` / `this->_body.` And it has access to a `GrowableArray` `body()`, which maps the nodes of the loop. Maybe `loopBody().nodes()` would sound better here. If you prefer that, I file a separate renaming RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27803#issuecomment-3490211351 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2493712328 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2493630689 From epeter at openjdk.org Wed Nov 5 09:50:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 09:50:50 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 08:07:50 GMT, Anton Seoane Ampudia wrote: >> Unfortunately no. I think this is what @anton-seoane worked on before. > > Yes, I have taken the task again so sooner than later CompileCommand filtering for UL will be enabled for cases such as this Ok, that's what I thought. For now, I'll extend the tracing the way I've been doing, and once we have UL available with method-level filtering, then I can migrate it all in one single PR :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2493717133 From epeter at openjdk.org Wed Nov 5 09:53:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 09:53:11 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: <_5LBssd0pSBQKv9WuNXXOyUOtmn0xPpwWbaZe9qEbj0=.7dd0c58d-6aa9-4829-aaf3-8756778fbe3e@github.com> References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> <_5LBssd0pSBQKv9WuNXXOyUOtmn0xPpwWbaZe9qEbj0=.7dd0c58d-6aa9-4829-aaf3-8756778fbe3e@github.com> Message-ID: On Tue, 4 Nov 2025 21:05:52 GMT, Vladimir Ivanov wrote: >> @iwanowww Thanks for the comments! >> >> I'm still not 100% happy with it. It really feels like we are introducing some tech-debt here. What should the next person do who also needs to attach something else to the SafePoint? It is also easy to miss places where we have to special case the extra edges. I don't have a solution here, I'm just not extremely satisfied. Is there a better long-term solution? > > I don't think we can improve things much without paying existing tech debt related to managing non-debug edges. > The two cases I spotted relates to scalarization and is not aware about non-debug inputs a safepoint may have. Both implementations assume there's only debug info attached and tweak it by appending new edges and extending debug info range to cover them. Without a proper API to manage safepoint-attached debug information, it'll always be fragile. > > JDK-8370133 "C2: Manage non-debug safepoint edges in structural manner" should address the root cause and introduce a proper way to work with safepoint-attached debug and non-debug information. Ah ok, you are already tracking it with an RFE. Great, that is good enough for me :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2493747244 From chagedorn at openjdk.org Wed Nov 5 09:58:16 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 09:58:16 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Tue, 4 Nov 2025 16:04:22 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > small adjustments after call with Roberto and Christian Nice improvement! It makes it much more intuitive to use and avoids user mistakes where it is not immediately evident how to fix missing data names because you need to know more framework internals. I'm already submitting some comments for `TestTutorial` which I walked through now. Very well written and it includes everything a normal user should know to get started without the need to further dig into the framework. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 371: > 369: // replacements escape the scope. > 370: transparentScope( > 371: let("x", 11), // escape escopes the "transparentScope". Did you mean the following? Suggestion: let("x", 11), // escapes the "transparentScope". test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 499: > 497: var template2 = Template.make("x", (Integer x) -> scope( > 498: """ > 499: // Let us go back to where we anchored the hook with anchor() and define a field named $field1 there. Maybe add hint: Suggestion: // Let us go back to where we anchored the hook with anchor() (see 'templateClass' below) and define a field // named $field1 there. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 501: > 499: // Let us go back to where we anchored the hook with anchor() and define a field named $field1 there. > 500: """, > 501: myHook.insert(scope( // <- insertion scope I'm not yet clear on that one. What happens when we insert a transparentScope here and for example add a `let("y", 42)`. Where could we then use `#y` (might not be a thing one might want to do, though)? Anywhere in the anchor scope and/or in the caller scope? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 720: > 718: // variable from an outer Template. Luckily, the outer Templates have added their > 719: // fields and variables, and you can now access them with "dataNames()". You can > 720: // count them, get a list of them, or sample a random one. (Sorry for also suggesting things for existing code, I just read through some of that to better understand the changes and found some things to improve) Maybe for completeness we could add that any let() definition is only available in the current scope and nested scopes but do not cross a "template boundary"? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 736: > 734: // > 735: // To get started, we show an example where all DataNames have the same type, and where > 736: // all Names are mutable. For simplicity, our type represents the primitive int type. Side note: Now that we also have the library available, could we also use that one instead of defining our own `MySimpleInt`? If so, you might want to show an example and hint here that you could also just define your own type as an expert user. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 764: > 762: addDataName($("f1"), mySimpleInt, MUTABLE, 1), > 763: addDataName($("f2"), mySimpleInt, MUTABLE, 1), > 764: addDataName($("f3"), mySimpleInt, MUTABLE), // omit weight, default is 1. It seems implicitly obvious but maybe we can add here that data names will only be available after adding them: Suggestion: // Also note that DataNames are only available once they are defined: // Nothing defined, yet: dataNames() = {} addDataName($("f1"), mySimpleInt, MUTABLE, 1), // Only now dataNames() contains f1: dataNames() = {f1} addDataName($("f2"), mySimpleInt, MUTABLE, 1), // dataNames() = {f1, f2} addDataName($("f3"), mySimpleInt, MUTABLE), // omit weight, default is 1. // dataNames() = {f1, f2, f3} test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 778: > 776: // the hashtag replacement "a". > 777: """, > 778: dataNames(MUTABLE).exactOf(mySimpleInt).sampleAndLetAs("a"), What do you think about just naming it `sampleTo("a")`? Or do you think the mention of `let` is crucial here? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 784: > 782: // If we are also interested in the type of the field, we can do: > 783: """, > 784: dataNames(MUTABLE).exactOf(mySimpleInt).sampleAndLetAs("b", "bType"), Just a suggestion: We could be more explicit to mention that the second one is the type like `sampleToWithType()` or something like that. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 860: > 858: // Define a local variable. > 859: // Note: it is very important that we use a "transparentScope" for the template here, > 860: // so that the DataName can escape to outer scopes. Might be obvious but you could add that it does not mean it's available for the entire outer scope but only for everything that follows this DataName definition/insertion point of this template. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 898: > 896: // action based on the value. For that we have to capture the count > 897: // with a lambda and inner scope as above. If we only need to have > 898: // the count as a hashtag replacement, we can also use the follwing Suggestion: // the count as a hashtag replacement, we can also use the following ------------- PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3420222142 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493257567 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493447391 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493479782 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493566912 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493581451 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493716808 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493594823 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493599920 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493642120 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493682462 From epeter at openjdk.org Wed Nov 5 11:40:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 11:40:39 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v12] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply Christian's suggestions directly Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/69cff741..5a7481d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=10-11 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Wed Nov 5 11:40:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 11:40:40 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Wed, 5 Nov 2025 07:05:21 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> small adjustments after call with Roberto and Christian > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 371: > >> 369: // replacements escape the scope. >> 370: transparentScope( >> 371: let("x", 11), // escape escopes the "transparentScope". > > Did you mean the following? > Suggestion: > > let("x", 11), // escapes the "transparentScope". Probably ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494080909 From chagedorn at openjdk.org Wed Nov 5 11:52:28 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 11:52:28 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Tue, 4 Nov 2025 16:04:22 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > small adjustments after call with Roberto and Christian Comments for `TestTemplate.java`. It was more a skim of the file and I stopped here and there for having a closer look but I trust you here that you covered all the functionality - at least it suggests that you were quite thorough :-) test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 469: > 467: var hook1 = new Hook("Hook1"); > 468: > 469: var template0 = Template.make(() -> scope("t0 isAnchored: ", hook1.isAnchored(a -> scope(a)), "\n")); You could directly use `Template::scope` instead of `a -> scope(a)`. Same below. test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 1259: > 1257: let("global", "GLOBAL"), > 1258: "g: #global. $a\n", > 1259: // Create a dummy DataName soe we can create the scope. Suggestion: // Create a dummy DataName so we can create the scope. test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 1334: > 1332: dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myInt).hasAny(h -> scope(h)), > 1333: ", ", > 1334: dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myInt).count(c -> scope(c)), Here you can simplify it again: Suggestion: dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myInt).hasAny(Template::scope), ", ", dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myInt).count(Template::scope), test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 1423: > 1421: dataNames(mutability).exactOf(myInt).hasAny(h -> scope(h)), > 1422: ", ", > 1423: dataNames(mutability).exactOf(myInt).count(c -> scope(c)), Suggestion: dataNames(mutability).exactOf(myInt).hasAny(Template::scope), ", ", dataNames(mutability).exactOf(myInt).count(Template::scope), test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 1570: > 1568: dataNames(mutability).exactOf(myInt).hasAny(h -> scope(h)), > 1569: ", ", > 1570: dataNames(mutability).exactOf(myInt).count(c -> scope(c)), There are many more occurrences further down. Feel free to fix them but I guess it's not that important. Suggestion: dataNames(mutability).exactOf(myInt).hasAny(Template::scope), ", ", dataNames(mutability).exactOf(myInt).count(Template::scope), test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 1901: > 1899: "int #v1 = x + 1;\n" > 1900: )), > 1901: // Using "transparentScope", is is available. "it is available"? Same below test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 2632: > 2630: "q: #q.\n", > 2631: listDataNames.asToken(), > 2632: // A "setFuelCostScope" nesting behaves the same as "transparentScope", as we are not useing fuel here. Suggestion: // A "setFuelCostScope" nesting behaves the same as "transparentScope", as we are not using fuel here. ------------- PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3420983428 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493782215 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493829389 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493840342 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493851152 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493852455 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493863474 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493870923 From epeter at openjdk.org Wed Nov 5 11:52:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 11:52:30 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: <12evYYJTBLOaUYCaUaEncCi8H6ge8hSxAFVHqEqeNpE=.8c904ca4-29a5-4a4a-a350-cbd56fc1a0db@github.com> On Wed, 5 Nov 2025 08:26:18 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> small adjustments after call with Roberto and Christian > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 501: > >> 499: // Let us go back to where we anchored the hook with anchor() and define a field named $field1 there. >> 500: """, >> 501: myHook.insert(scope( // <- insertion scope > > I'm not yet clear on that one. What happens when we insert a transparentScope here and for example add a `let("y", 42)`. Where could we then use `#y` (might not be a thing one might want to do, though)? Anywhere in the anchor scope and/or in the caller scope? I have examples like that in `TestTemplate.java`. A good example is in `testHookAndScopes2`. 3054 hook1.insert(transparentScope( 3055 let("nameTransparentScope", "x1c"), // escapes to caller 3056 addStructuralName("x1c", myStructuralTypeA), // escapes to anchor scope 3057 "inserted transparentScope: #nameTransparentScope\n", 3058 "local1: #local1\n", 3059 listNamesTemplate.asToken() 3060 )), Note: `let` escapes to the caller, and `addDataName` excapes to the anchor scope. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494125235 From epeter at openjdk.org Wed Nov 5 11:52:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 11:52:31 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: <12evYYJTBLOaUYCaUaEncCi8H6ge8hSxAFVHqEqeNpE=.8c904ca4-29a5-4a4a-a350-cbd56fc1a0db@github.com> References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> <12evYYJTBLOaUYCaUaEncCi8H6ge8hSxAFVHqEqeNpE=.8c904ca4-29a5-4a4a-a350-cbd56fc1a0db@github.com> Message-ID: On Wed, 5 Nov 2025 11:46:13 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 501: >> >>> 499: // Let us go back to where we anchored the hook with anchor() and define a field named $field1 there. >>> 500: """, >>> 501: myHook.insert(scope( // <- insertion scope >> >> I'm not yet clear on that one. What happens when we insert a transparentScope here and for example add a `let("y", 42)`. Where could we then use `#y` (might not be a thing one might want to do, though)? Anywhere in the anchor scope and/or in the caller scope? > > I have examples like that in `TestTemplate.java`. A good example is in `testHookAndScopes2`. > > 3054 hook1.insert(transparentScope( > 3055 let("nameTransparentScope", "x1c"), // escapes to caller > 3056 addStructuralName("x1c", myStructuralTypeA), // escapes to anchor scope > 3057 "inserted transparentScope: #nameTransparentScope\n", > 3058 "local1: #local1\n", > 3059 listNamesTemplate.asToken() > 3060 )), > > > Note: `let` escapes to the caller, and `addDataName` excapes to the anchor scope. This is not exactly a great way to code though... most likely we don't really want the `let` to escape, and so a `hashtagScope` would be best. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494128890 From epeter at openjdk.org Wed Nov 5 11:52:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 11:52:32 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> <12evYYJTBLOaUYCaUaEncCi8H6ge8hSxAFVHqEqeNpE=.8c904ca4-29a5-4a4a-a350-cbd56fc1a0db@github.com> Message-ID: On Wed, 5 Nov 2025 11:47:31 GMT, Emanuel Peter wrote: >> I have examples like that in `TestTemplate.java`. A good example is in `testHookAndScopes2`. >> >> 3054 hook1.insert(transparentScope( >> 3055 let("nameTransparentScope", "x1c"), // escapes to caller >> 3056 addStructuralName("x1c", myStructuralTypeA), // escapes to anchor scope >> 3057 "inserted transparentScope: #nameTransparentScope\n", >> 3058 "local1: #local1\n", >> 3059 listNamesTemplate.asToken() >> 3060 )), >> >> >> Note: `let` escapes to the caller, and `addDataName` excapes to the anchor scope. > > This is not exactly a great way to code though... most likely we don't really want the `let` to escape, and so a `hashtagScope` would be best. I don't mention all of this in the tutorial, because it is more complexity than the user probably wants to digest at this point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494133650 From chagedorn at openjdk.org Wed Nov 5 11:52:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 11:52:33 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Wed, 5 Nov 2025 10:15:33 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> small adjustments after call with Roberto and Christian > > test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 1259: > >> 1257: let("global", "GLOBAL"), >> 1258: "g: #global. $a\n", >> 1259: // Create a dummy DataName soe we can create the scope. > > Suggestion: > > // Create a dummy DataName so we can create the scope. Is a dummy DataName required or could you also just write `scope(...)`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2493836170 From epeter at openjdk.org Wed Nov 5 12:00:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:00:21 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> <12evYYJTBLOaUYCaUaEncCi8H6ge8hSxAFVHqEqeNpE=.8c904ca4-29a5-4a4a-a350-cbd56fc1a0db@github.com> Message-ID: <8-F0JFT2Vp076PJ0x2bgxPgW-vFYzjCRPuPt1qj3WjA=.4ad5b0ab-ae09-4d4f-bdbd-eedde30d077c@github.com> On Wed, 5 Nov 2025 11:49:24 GMT, Emanuel Peter wrote: >> This is not exactly a great way to code though... most likely we don't really want the `let` to escape, and so a `hashtagScope` would be best. > > I don't mention all of this in the tutorial, because it is more complexity than the user probably wants to digest at this point. Do you think we should talk about it somewhere, maybe add a further example at the very end, that discusses this? Alternatively, we could also add extra documentation at the `Hook.insert` documentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494149096 From epeter at openjdk.org Wed Nov 5 12:00:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:00:21 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: <8-F0JFT2Vp076PJ0x2bgxPgW-vFYzjCRPuPt1qj3WjA=.4ad5b0ab-ae09-4d4f-bdbd-eedde30d077c@github.com> References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> <12evYYJTBLOaUYCaUaEncCi8H6ge8hSxAFVHqEqeNpE=.8c904ca4-29a5-4a4a-a350-cbd56fc1a0db@github.com> <8-F0JFT2Vp076PJ0x2bgxPgW-vFYzjCRPuPt1qj3WjA=.4ad5b0ab-ae09-4d4f-bdbd-eedde30d077c@github.com> Message-ID: On Wed, 5 Nov 2025 11:55:07 GMT, Emanuel Peter wrote: >> I don't mention all of this in the tutorial, because it is more complexity than the user probably wants to digest at this point. > > Do you think we should talk about it somewhere, maybe add a further example at the very end, that discusses this? > > Alternatively, we could also add extra documentation at the `Hook.insert` documentation. I'm updating the description in `Hook.java` a little, to mention this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494155766 From epeter at openjdk.org Wed Nov 5 12:04:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:04:18 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Wed, 5 Nov 2025 08:57:10 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> small adjustments after call with Roberto and Christian > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 720: > >> 718: // variable from an outer Template. Luckily, the outer Templates have added their >> 719: // fields and variables, and you can now access them with "dataNames()". You can >> 720: // count them, get a list of them, or sample a random one. > > (Sorry for also suggesting things for existing code, I just read through some of that to better understand the changes and found some things to improve) > > Maybe for completeness we could add that any let() definition is only available in the current scope and nested scopes but do not cross a "template boundary"? Ok, but that does not really belong in this section, right? Here, we deal with the `DataName`s... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494168609 From epeter at openjdk.org Wed Nov 5 12:08:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:08:18 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Wed, 5 Nov 2025 12:01:49 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 720: >> >>> 718: // variable from an outer Template. Luckily, the outer Templates have added their >>> 719: // fields and variables, and you can now access them with "dataNames()". You can >>> 720: // count them, get a list of them, or sample a random one. >> >> (Sorry for also suggesting things for existing code, I just read through some of that to better understand the changes and found some things to improve) >> >> Maybe for completeness we could add that any let() definition is only available in the current scope and nested scopes but do not cross a "template boundary"? > > Ok, but that does not really belong in this section, right? Here, we deal with the `DataName`s... I'm adding a comment just before `generateWithHashtagAndDollarReplacements`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494179923 From epeter at openjdk.org Wed Nov 5 12:22:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:22:42 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Wed, 5 Nov 2025 09:02:18 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> small adjustments after call with Roberto and Christian > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 736: > >> 734: // >> 735: // To get started, we show an example where all DataNames have the same type, and where >> 736: // all Names are mutable. For simplicity, our type represents the primitive int type. > > Side note: Now that we also have the library available, could we also use that one instead of defining our own `MySimpleInt`? If so, you might want to show an example and hint here that you could also just define your own type as an expert user. Hmm. You have a point. The situation has shifted a little. Still, I think it is good if people understand how types work, so I'd leave it in as an example. So for now, I'm just adding a note that we already have some types modeled in the library, for example with `PrimitiveType`. But we have a bit of a fundamental issue here: if you expect `TestTutorial.java` to always give you the best practice with the most fresh things from the library, then we would have to constantly rework the tutorial. That would be a lot of work, especially if we continue adding more and more features. This concern is also a bit separate to the current RFE's intentions. So maybe we have to revisit the issue of how to present the library capabilities in the most efficient way, and to see if `TestTutorial.java` is the right place. Or maybe this tutorial just shows the core capabilities of the template framework, and we have some other tutorial that gives more insight into the library? > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 778: > >> 776: // the hashtag replacement "a". >> 777: """, >> 778: dataNames(MUTABLE).exactOf(mySimpleInt).sampleAndLetAs("a"), > > What do you think about just naming it `sampleTo("a")`? Or do you think the mention of `let` is crucial here? I think I slightly prefer `sampleAndLetAs` because of the `let`. But if you can get @robcasloz or @mhaessig to agree with you, I'm happy to change it ;) > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 784: > >> 782: // If we are also interested in the type of the field, we can do: >> 783: """, >> 784: dataNames(MUTABLE).exactOf(mySimpleInt).sampleAndLetAs("b", "bType"), > > Just a suggestion: We could be more explicit to mention that the second one is the type like `sampleToWithType()` or something like that. Hmm. What about `sampleAndLetNameAs` and `SampleAndLetNameAndTypeAs`? But that feels a bit too verbose. And I don't feel great about dropping the `let`. People can always follow the method name to its definition, and find the javadocs there... :man_shrugging: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494213676 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494218306 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494228917 From epeter at openjdk.org Wed Nov 5 12:25:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:25:42 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Wed, 5 Nov 2025 09:21:14 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> small adjustments after call with Roberto and Christian > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 860: > >> 858: // Define a local variable. >> 859: // Note: it is very important that we use a "transparentScope" for the template here, >> 860: // so that the DataName can escape to outer scopes. > > Might be obvious but you could add that it does not mean it's available for the entire outer scope but only for everything that follows this DataName definition/insertion point of this template. I'm adding a comment there. Maybe I should also add a comment in the javadocs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494241690 From galder at openjdk.org Wed Nov 5 12:36:13 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 5 Nov 2025 12:36:13 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 09:50:47 GMT, Emanuel Peter wrote: >> Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. >> >> Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 >> >> Main goal: >> - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). >> - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. >> >> **Why cost-model?** >> >> Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. >> >> But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. >> >> Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. >> >> **Implementation** >> >> Items: >> - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. >> - `VLoopAnalyzer::cost`: scalar loop cost >> - `VTransformGraph::cost`: vector loop cost >> - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. >> - Adapted existing tests. >> - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. >> >> **Testing** >> Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. >> >> ------------------------------ >> >> **Some History** >> >> I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). >> > ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rename cost methods for Vladimir K > [JDK-8370671](https://bugs.openjdk.org/browse/JDK-8370671) C2 SuperWord [x86]: implement Long.max/min reduction for AVX2 This is familiar to me. I discovered this when I was intrinsifying MinL/MaxL for [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513) and one of my servers only had AX2. Vectorization kicked in with AVX512 so I left it there. > Note: some of the min/max benchmarks are not very stable. That is due to random input data: in some cases the scalar performance is better because it uses branching. Looking at the results, seems like most instability is with doubles? In any case, on the topic of instability of min/max and branching, https://github.com/openjdk/jdk/pull/20098#issuecomment-2379386872 has a good analysis on past observations with the JMH benchmark now called `MinMaxVector`. This benchmark shapes the data such that data in the arrays is laid out to achieve a certain % of branch taken. It might not be fully applicable to the instabilities you observe but might help direct attention. WRT to the code changes in this PR, I don't have anything else to say other than I'm glad basic cases like [JDK-8345044](https://bugs.openjdk.org/browse/JDK-8345044) are getting solved. ------------- PR Review: https://git.openjdk.org/jdk/pull/27803#pullrequestreview-3421720613 From epeter at openjdk.org Wed Nov 5 12:38:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:38:15 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v13] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: address Christian's comments for TestTutorial.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/5a7481d5..e74241f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=11-12 Stats: 48 lines in 3 files changed: 45 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Wed Nov 5 12:38:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:38:18 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Wed, 5 Nov 2025 09:43:40 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> small adjustments after call with Roberto and Christian > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 764: > >> 762: addDataName($("f1"), mySimpleInt, MUTABLE, 1), >> 763: addDataName($("f2"), mySimpleInt, MUTABLE, 1), >> 764: addDataName($("f3"), mySimpleInt, MUTABLE), // omit weight, default is 1. > > It seems implicitly obvious but maybe we can add here that data names will only be available after adding them: > > Suggestion: > > // Also note that DataNames are only available once they are defined: > > // Nothing defined, yet: dataNames() = {} > addDataName($("f1"), mySimpleInt, MUTABLE, 1), > // Only now dataNames() contains f1: dataNames() = {f1} > addDataName($("f2"), mySimpleInt, MUTABLE, 1), > // dataNames() = {f1, f2} > addDataName($("f3"), mySimpleInt, MUTABLE), // omit weight, default is 1. > // dataNames() = {f1, f2, f3} Good idea. I have lots of those examples in `TestTemplates.java`, but we should also mention it here :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494277139 From epeter at openjdk.org Wed Nov 5 12:51:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:51:15 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: Message-ID: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - for Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/e74241f0..713d9c1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=12-13 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Wed Nov 5 12:51:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:51:20 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Wed, 5 Nov 2025 11:45:12 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> small adjustments after call with Roberto and Christian > > Comments for `TestTemplate.java`. It was more a skim of the file and I stopped here and there for having a closer look but I trust you here that you covered all the functionality - at least it suggests that you were quite thorough :-) @chhagedorn Thanks a lot for your review, I really appreciate the time you put in! I think I addressed/answered all your points :) > test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 469: > >> 467: var hook1 = new Hook("Hook1"); >> 468: >> 469: var template0 = Template.make(() -> scope("t0 isAnchored: ", hook1.isAnchored(a -> scope(a)), "\n")); > > You could directly use `Template::scope` instead of `a -> scope(a)`. Same below. But is that really more readable? I fear not really. And it is also longer... I'm a little torn with these. @robcasloz @mhaessig what are your votes? Before: `"t2 isAnchored: ", hook1.isAnchored(a -> scope(a)), "\n",` After: `"t2 isAnchored: ", hook1.isAnchored(Template::scope), "\n",` > test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 1334: > >> 1332: dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myInt).hasAny(h -> scope(h)), >> 1333: ", ", >> 1334: dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myInt).count(c -> scope(c)), > > Here you can simplify it again: > > Suggestion: > > dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myInt).hasAny(Template::scope), > ", ", > dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myInt).count(Template::scope), Is it really simpler? It looks longer to me ;) Probably the same question of taste... let's see if the others have an opinion here :) > test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 1901: > >> 1899: "int #v1 = x + 1;\n" >> 1900: )), >> 1901: // Using "transparentScope", is is available. > > "it is available"? Same below fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3491031530 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494300120 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494313357 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494316726 From epeter at openjdk.org Wed Nov 5 12:51:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:51:22 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Wed, 5 Nov 2025 10:17:21 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 1259: >> >>> 1257: let("global", "GLOBAL"), >>> 1258: "g: #global. $a\n", >>> 1259: // Create a dummy DataName soe we can create the scope. >> >> Suggestion: >> >> // Create a dummy DataName so we can create the scope. > > Is a dummy DataName required or could you also just write `scope(...)`? We want to do `sample`. If there is no `addDataName` before, we will get an exception. I changed the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494309370 From epeter at openjdk.org Wed Nov 5 12:55:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Nov 2025 12:55:18 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 12:33:58 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rename cost methods for Vladimir K > >> [JDK-8370671](https://bugs.openjdk.org/browse/JDK-8370671) C2 SuperWord [x86]: implement Long.max/min reduction for AVX2 > > This is familiar to me. I discovered this when I was intrinsifying MinL/MaxL for [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513) and one of my servers only had AX2. Vectorization kicked in with AVX512 so I left it there. > >> Note: some of the min/max benchmarks are not very stable. That is due to random input data: in some cases the scalar performance is better because it uses branching. > > Looking at the results, seems like most instability is with doubles? In any case, on the topic of instability of min/max and branching, https://github.com/openjdk/jdk/pull/20098#issuecomment-2379386872 has a good analysis on past observations with the JMH benchmark now called `MinMaxVector`. This benchmark shapes the data such that data in the arrays is laid out to achieve a certain % of branch taken. It might not be fully applicable to the instabilities you observe but might help direct attention. > > WRT to the code changes in this PR, I don't have anything else to say other than I'm glad basic cases like [JDK-8345044](https://bugs.openjdk.org/browse/JDK-8345044) are getting solved. @galderz Right, I did remember that you have had a better benchmark, and that's why I understood more quickly that the issue here with the doubles is just noise :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27803#issuecomment-3491050395 From roland at openjdk.org Wed Nov 5 13:01:29 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 5 Nov 2025 13:01:29 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v4] In-Reply-To: References: Message-ID: <-9uTmVk3XFV39gQjQp5NQsedrwYRUN2TVIaAOMB1pvA=.9819a01c-e05a-4569-a59c-0f90d3c4c161@github.com> > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28088/files - new: https://git.openjdk.org/jdk/pull/28088/files/1a646503..7f796587 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=02-03 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28088/head:pull/28088 PR: https://git.openjdk.org/jdk/pull/28088 From roland at openjdk.org Wed Nov 5 13:01:32 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 5 Nov 2025 13:01:32 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v3] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 19:40:59 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> more > > src/hotspot/share/opto/compile.hpp line 1090: > >> 1088: } >> 1089: >> 1090: void inc_number_of_mh_late_inlines() { _number_of_mh_late_inlines++; } > > Does it make sense to get rid of `inc_number_of_mh_late_inlines()` and turn `_number_of_mh_late_inlines` into a a boolean with a setter? Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2494363766 From roland at openjdk.org Wed Nov 5 13:01:33 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 5 Nov 2025 13:01:33 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v4] In-Reply-To: References: Message-ID: <3ARFIgZH4AaInkMBuleV6uwHIlrq1s5zvzzMmcaiUtE=.c6b6ee78-cf6e-445c-8781-fa886f57b69b@github.com> On Tue, 4 Nov 2025 10:24:48 GMT, Damon Fenacci wrote: >> I think that would make sense. But the only use of that counter (excluding asserts) seems to be: >> >> https://github.com/openjdk/jdk/blob/ef464d69399e50aee126a4756fe9a9a19e44d3c4/src/hotspot/share/opto/compile.cpp#L829 >> >> Maybe, then, it's simpler to not bother with maintaining an accurate count. See new commits. >> >> 8352963 added a new call `inc_number_of_mh_late_inlines()` that I remove here because I don't think it's needed. I had a look at the PR for that one and I don't see it discussed. @dafedafe do you remember why you added it? > > AFAIR at some point I was getting the same assert failure `assert(_number_of_mh_late_inlines > 0)` and noticed that we were re-registering method handles for late inlining without incrementing the counter. Do you remember what tests that was with? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2494362354 From chagedorn at openjdk.org Wed Nov 5 13:19:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 13:19:22 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Wed, 5 Nov 2025 12:51:15 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Here is the review for `Template.java`. I went into quite some detail but thought it could be worth giving that it's the API. I left various comment suggestion (no semantic suggestions) in the hope that it improves the documentation. I will do another pass over your update for the previous review later :-) test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 918: > 916: */ > 917: static float fuel() { > 918: // Note, since the fuel amount does not change during a template within? Suggestion: // Note, since the fuel amount does not change within a template ------------- PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3421499913 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494393702 From chagedorn at openjdk.org Wed Nov 5 13:19:35 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 13:19:35 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v12] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 11:40:39 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply Christian's suggestions directly > > Co-authored-by: Christian Hagedorn test/hotspot/jtreg/compiler/lib/template_framework/ScopeTokenImpl.java line 30: > 28: /** > 29: * Represents a scope with its tokens. Boolean flags indicate if names, > 30: * hashtag replacements and {@link setFuelCost} are local, or escape to The IDE cannot resolve it: Suggestion: * hashtag replacements and {@link Template#setFuelCost} are local, or escape to test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 215: > 213: * > 214: * > 215: * {@link setFuelCostScope} transparent transparent non-transparent Missing `#`: Suggestion: * hashtag {@link DataName} and {@link StructuralName} {@link #setFuelCost} * * * {@link #scope} non-transparent non-transparent non-transparent * * * {@link #hashtagScope} non-transparent transparent transparent * * * {@link #nameScope} transparent non-transparent transparent * * * {@link #setFuelCostScope} transparent transparent non-transparent test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 586: > 584: * {@link setFuelCost}. This means that any such name, hashtag-replacement or > 585: * {@link setFuelCost} declared inside the scope is local and does not escape to outer > 586: * scopes. Maybe put some more emphasis that it's non-transparent for anything. Otherwise it reads like non-transparent for a number of things but you're left wondering if it's a complete list. Suggestion: Suggestion: * Creates a {@link ScopeToken} that represents a scope that is completely * non-transparent, not allowing anything to escape. This * means that no {@link DataName}, {@link StructuralName}s, hashtag-replacement * or {@link #setFuelCost} defined inside the scope is available outside. All * these usages are only local to the defining scope here. * *

test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 593: > 591: *

> 592: * If you require a scope that is transparent for some or all of the above, consider > 593: * using {@link transparentScope}, {@link nameScope}, {@link hashtagScope}, or {@link setFuelCostScope}. Maybe you could be more specific and cross reference the matrix in the interface comment further up: Suggestion: * If you require a scope that is either fully transparent (i.e. everything escapes) * or only restricts a specific kind to not escape, consider using one of the other * provided scopes: {@link #transparentScope}, {@link #nameScope}, {@link #hashtagScope}, * or {@link #setFuelCostScope}. A "scope-transparency-matrix" can also be found in * the interface comment for {@link Template}. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 596: > 594: * > 595: *

> 596: * The most common use of {@link scope} is in the construction of templates: You miss some `#` prefixes for methods for Javadocs. Same for methods below. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 618: > 616: * variables, etc. In rare cases where the scope of the template needs to be > 617: * transparent (e.g. because we need to insert a variable or field into an > 618: * outer scope), it is recommended to use {@link transparentScope}. Maybe we can focus more on the fact that it really does not matter whether you use `scope()` or `transparentScope()` - hashtag replacements and fuel cost will not cross the template boundary. Feel free to take over this suggestion to better highlight that: Suggestion: * Note that regardless of the chosen scope for {@code Template.make}, * hashtag-replacements and {@link #setFuelCost} are always implicitly * non-transparent (i.e. non-escaping) for hashtag-replacements and * {@link #setFuelCost} (e.g. a {@link #let} will not escape the template * scope even when using {@link #transparentScope}. As a default, it is * recommended to use {@link #scope} for {@code Template.make} since in * most cases template scopes align with code scopes that are * non-transparent for fields, variables, etc. In rare cases, where the * scope of the template needs to be transparent (e.g. because we need * to insert a variable or field into an outer scope), it is recommended * to use {@link #transparentScope}. This allows to make {@link DataName}s * and {@link StructuralName}s available outside this template crossing * the template boundary. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 632: > 630: * ), > 631: * // CODE3: more code in the outer scope, names and hashtags from CODE2 are > 632: * // not available any more because of the non-transparent "scope". Suggestion: * // not available anymore because of the non-transparent "scope". test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 657: > 655: * {@link setFuelCost}. This means that any such name, hashtag-replacement or > 656: * {@link setFuelCost} declared inside the scope escapes that scope and is still > 657: * available in the outer scope. Analogously to above: Suggestion: * Creates a {@link ScopeToken} that represents a completely transparent scope, allowing * anything to escape anything. This means that {@link DataName}s, {@link StructuralName}s, * hashtag-replacements and {@link #setFuelCost} declared inside the scope will be available * in the outer scope. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 664: > 662: *

> 663: * If you require a scope that is non-transparent for some or all of the above, consider > 664: * using {@link scope}, {@link nameScope}, {@link hashtagScope}, or {@link setFuelCostScope}. Analogously to above: Suggestion: * If you require a scope that is non-transparent (i.e. nothing escapes) or only restricts * a specific kind to not escape, consider using one of the other provided scopes: * {@link #scope}, {@link #nameScope}, {@link #hashtagScope}, or {@link #setFuelCostScope}. * A "scope-transparency-matrix" can also be found in the interface comment for {@link Template}. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 679: > 677: * Creates a {@link ScopeToken} which represents a scope that is non-transparent for > 678: * {@link DataName}s and {@link StructuralName}s, but transparent for hashtag-replacements > 679: * and {@link setFuelCost}. Adapting from above suggestions: Suggestion: * Creates a {@link ScopeToken} that represents a scope that is non-transparent for * {@link DataName}s and {@link StructuralName}s (i.e. cannot escape), but * transparent for hashtag-replacements and {@link #setFuelCost} (i.e. available * in outer scope). * *

test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 686: > 684: *

> 685: * If you require a scope that is non-transparent for some or all of the above, consider > 686: * using {@link scope}, {@link transparentScope}, {@link hashtagScope}, or {@link setFuelCostScope}. Suggestion: * If you require a scope that is transparent or uses a different restriction, consider * using one of the other provided scopes: {@link #scope}, {@link #transparentScope}, * {@link #hashtagScope}, or {@link #setFuelCostScope}. A "scope-transparency-matrix" can * also be found in the interface comment for {@link Template}. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 701: > 699: * Creates a {@link ScopeToken} which represents a scope that is non-transparent for > 700: * hashtag-replacements, but transparent for {@link DataName}s and {@link StructuralName}s > 701: * and {@link setFuelCost}. Suggestion: * Creates a {@link ScopeToken} that represents a scope that is non-transparent for * hashtag-replacements (i.e. cannot escape), but transparent for {@link DataName}s * and {@link StructuralName}s and {@link #setFuelCost} (i.e. available in outer scope). * *

test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 708: > 706: *

> 707: * If you require a scope that is non-transparent for some or all of the above, consider > 708: * using {@link scope}, {@link transparentScope}, {@link nameScope}, {@link setFuelCostScope}. Suggestion: * If you require a scope that is transparent or uses a different restriction, consider * using one of the other provided scopes: {@link #scope}, {@link #transparentScope}, * {@link #nameScope}, or {@link #setFuelCostScope}. A "scope-transparency-matrix" can * also be found in the interface comment for {@link Template}. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 742: > 740: * Creates a {@link ScopeToken} which represents a scope that is non-transparent for > 741: * {@link setFuelCost}. but transparent for {@link DataName}s and {@link StructuralName}s > 742: * and hashtag-replacements. Last but not least also adapted from above: Suggestion: * Creates a {@link ScopeToken} that represents a scope that is non-transparent for * {@link #setFuelCost} (i.e. cannot escape), but transparent for hashtag-replacements, * {@link DataName}s and {@link StructuralName}s (i.e. available in outer scope). test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 749: > 747: *

> 748: * If you require a scope that is non-transparent for some or all of the above, consider > 749: * using {@link scope}, {@link transparentScope}, {@link nameScope}, {@link setFuelCostScope}. Suggestion: * If you require a scope that is transparent or uses a different restriction, consider * using one of the other provided scopes: {@link #scope}, {@link #transparentScope}, * {@link #hashtagScope}, or {@link #nameScope}. A "scope-transparency-matrix" can * also be found in the interface comment for {@link Template}. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 812: > 810: */ > 811: static String $(String name) { > 812: // Note, since the dollar replacements do not change during a template within? Suggestion: // Note, since the dollar replacements do not change within a template ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494138179 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494153205 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494220738 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494257064 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494196276 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494282318 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494284210 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494309710 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494321316 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494328403 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494338507 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494348553 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494347425 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494370502 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494373147 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494382680 From roland at openjdk.org Wed Nov 5 13:23:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 5 Nov 2025 13:23:18 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v8] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 05:08:57 GMT, Zihao Lin wrote: >> This patch remove slice parameter from LoadNode::make >> >> I have done more work which remove slice paramater from StoreNode::make. >> >> Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 >> >> Hi team, I am new, I'd appreciate any guidance. Thank a lot! > > Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - fix assert > - add more assert > - rid of access.addr().type() > - Merge branch 'openjdk:master' into 8344116 > - Merge branch 'openjdk:master' into 8344116 > - Merge branch 'openjdk:master' into 8344116 > - Fix build > - Fix test failed > - 8344116: C2: remove slice parameter from LoadNode::make Can we remove `C2AccessValuePtr` entirely and use: Node* _addr; where, currently, there's: C2AccessValuePtr& _addr; ? src/hotspot/share/opto/callnode.cpp line 1740: > 1738: Node* klass_node = in(AllocateNode::KlassNode); > 1739: Node* proto_adr = phase->transform(new AddPNode(klass_node, klass_node, phase->MakeConX(in_bytes(Klass::prototype_header_offset())))); > 1740: mark_node = LoadNode::make(*phase, control, mem, proto_adr, TypeX_X, TypeX_X->basic_type(), MemNode::unordered); We could assert that C->get_alias_index(kit->type(card_adr) == Compile::AliasIdxRaw ------------- PR Review: https://git.openjdk.org/jdk/pull/24258#pullrequestreview-3421940817 PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2494424924 From chagedorn at openjdk.org Wed Nov 5 13:52:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 13:52:23 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Wed, 5 Nov 2025 12:51:15 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Review for your update for `TestTutorial` and `TestTemplate`. Some more suggestions but otherwise, these two files look good now! test/hotspot/jtreg/compiler/lib/template_framework/Hook.java line 88: > 86: * the structure of the generated code, and the inserted scope thus belongs nested into > 87: * the anchor scope. On the other hand, hashtag replacements and {@link setFuelCost} > 88: * rather belongs to the code generation that happens within the context of a template. Some small typos and improvement suggestions including Javadoc updates: Suggestion: * Note that if we use {@link #insert} with {@link Template#transparentScope}, then * {@link DataName}s and {@link StructuralName}s escape from the inserted scope to the * anchor scope, but hashtag replacements and {@link Template#setFuelCost} escape to * the caller, i.e. from where we inserted the scope. This makes sense if we consider * {@link DataName}s belonging to the structure of the generated code and the inserted * scope belonging to the anchor scope. On the other hand, hashtag replacements and * {@link Template#setFuelCost} rather belong to the code generation that happens * within the context of a template. ------------- PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3422039256 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494494940 From chagedorn at openjdk.org Wed Nov 5 13:52:25 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 13:52:25 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> <12evYYJTBLOaUYCaUaEncCi8H6ge8hSxAFVHqEqeNpE=.8c904ca4-29a5-4a4a-a350-cbd56fc1a0db@github.com> <8-F0JFT2Vp076PJ0x2bgxPgW-vFYzjCRPuPt1qj3WjA=.4ad5b0ab-ae09-4d4f-bdbd-eedde30d077c@github.com> Message-ID: On Wed, 5 Nov 2025 11:57:35 GMT, Emanuel Peter wrote: >> Do you think we should talk about it somewhere, maybe add a further example at the very end, that discusses this? >> >> Alternatively, we could also add extra documentation at the `Hook.insert` documentation. > > I'm updating the description in `Hook.java` a little, to mention this. Thanks for the example and updating `Hook.java`! > Do you think we should talk about it somewhere, maybe add a further example at the very end, that discusses this? It might not hurt. The question just naturally occurred to me at that point in the tutorial. But I'm not sure if others feel the same way. If you add an example you could mention that this is advanced or some expert use or something like that. You could also reference to `Hook.java` for more details. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494507409 From chagedorn at openjdk.org Wed Nov 5 13:52:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 13:52:27 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: <6ioF8Q1RilKfDCuH8S9G9aynSh_jD7yc-CFOPTQxMX4=.4c222e43-8336-4d3f-90f8-0b40c944cbce@github.com> On Wed, 5 Nov 2025 12:05:49 GMT, Emanuel Peter wrote: >> Ok, but that does not really belong in this section, right? Here, we deal with the `DataName`s... > > I'm adding a comment just before `generateWithHashtagAndDollarReplacements`. > Ok, but that does not really belong in this section, right? Here, we deal with the DataNames... I agree with that, it again just popped up in my mind while reading these lines :-) > I'm adding a comment just before generateWithHashtagAndDollarReplacements. Perfect, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494511663 From chagedorn at openjdk.org Wed Nov 5 13:52:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 13:52:30 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> Message-ID: On Wed, 5 Nov 2025 12:15:15 GMT, Emanuel Peter wrote: > Still, I think it is good if people understand how types work, so I'd leave it in as an example. Yes, absolutely. I would not remove this example that shows how you can control everything in your own way. > So for now, I'm just adding a note that we already have some types modeled in the library, for example with PrimitiveType. That sounds good! > But we have a bit of a fundamental issue here: if you expect TestTutorial.java to always give you the best practice with the most fresh things from the library, then we would have to constantly rework the tutorial. That would be a lot of work, especially if we continue adding more and more features. Totally, that is the blessing and curse of having a thorough README. > This concern is also a bit separate to the current RFE's intentions. So maybe we have to revisit the issue of how to present the library capabilities in the most efficient way, and to see if TestTutorial.java is the right place. Or maybe this tutorial just shows the core capabilities of the template framework, and we have some other tutorial that gives more insight into the library? Could be an option to use `TestTutorial.java` for that or have some other tutorials like the IR Framework which presents different example files for different features. We could come back to this discussion later again. But either way, I think leaving the note below is perfectly fine for now to not give the impression that you need code everything from scratch if you just want to use some common cases. >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 778: >> >>> 776: // the hashtag replacement "a". >>> 777: """, >>> 778: dataNames(MUTABLE).exactOf(mySimpleInt).sampleAndLetAs("a"), >> >> What do you think about just naming it `sampleTo("a")`? Or do you think the mention of `let` is crucial here? > > I think I slightly prefer `sampleAndLetAs` because of the `let`. But if you can get @robcasloz or @mhaessig to agree with you, I'm happy to change it ;) Fair point. To me it just sounded a bit verbose but maybe Roberto and/or Manuel can break the tie :-) And if there is no clear consensus just stick with what you have now. > But that feels a bit too verbose. Indeed. > People can always follow the method name to its definition, and find the javadocs there... ???? That's true. You can leave it as it is now. >> test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 469: >> >>> 467: var hook1 = new Hook("Hook1"); >>> 468: >>> 469: var template0 = Template.make(() -> scope("t0 isAnchored: ", hook1.isAnchored(a -> scope(a)), "\n")); >> >> You could directly use `Template::scope` instead of `a -> scope(a)`. Same below. > > But is that really more readable? I fear not really. And it is also longer... I'm a little torn with these. @robcasloz @mhaessig what are your votes? > > Before: > `"t2 isAnchored: ", hook1.isAnchored(a -> scope(a)), "\n",` > After: > `"t2 isAnchored: ", hook1.isAnchored(Template::scope), "\n",` I just suggested it because the IDE marked it as improvement opportunity. Feel free to ignore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494537592 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494548702 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494565818 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2494578751 From chagedorn at openjdk.org Wed Nov 5 13:58:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 13:58:54 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Wed, 5 Nov 2025 12:51:15 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn I think I exhausted my reviewer fuel for today and will resume tomorrow by calling `setFuelCost(100%)` again :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3491342961 From duke at openjdk.org Wed Nov 5 14:04:53 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 5 Nov 2025 14:04:53 GMT Subject: RFR: 8370878: C1: Clean up unnecessary ConversionStub constructor [v2] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 08:34:00 GMT, Christian Hagedorn wrote: >> Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: >> >> fix arm > > Looks good, thanks for cleaning it up! Hi @chhagedorn Can you help to sponsor this change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28096#issuecomment-3491371691 From chagedorn at openjdk.org Wed Nov 5 14:10:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Nov 2025 14:10:00 GMT Subject: RFR: 8356761: IGV: dump escape analysis information In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 13:49:44 GMT, Anton Seoane Ampudia wrote: > This PR introduces new IGV dumps, property fields and filters related to escape analysis information. > > The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. > > With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: > > - Node escape ?level?. > - Scalar replaceability. > - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). > > This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. > > Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. > > **Testing:** passes tiers 1-3, manual testing in IGV Nice improvement! I have not reviewed this PR, yet, but I just want to raise a general concern that our model of having different `PrintIdealGraphLevel` values might not fit anymore for all the different concepts (different loop opts, IGVN steps, Superword steps, parsing steps and now EA steps etc.). Maybe the time has come to use a different solution to allow some better filtering for different needs. It should not per se block this PR but could be a justification to actually tackle this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28060#issuecomment-3491408399 From dfenacci at openjdk.org Wed Nov 5 15:05:48 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 5 Nov 2025 15:05:48 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v2] In-Reply-To: References: Message-ID: > ## Issue > Today, the only practical ways to run IR Framework scenarios in parallel seems to be: > * spawning threads manually in a single test, or > * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). > > This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. > > ## Change > This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: > * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) > * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). > * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. > > ## Testing > * Tier 1-3+ > * explicit `ir_framework.tests` runs > * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) > > As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. Damon Fenacci has updated the pull request incrementally with four additional commits since the last revision: - JDK-8370315: add test with scenarios to format exception tests - JDK-8370315: make failures thread local - JDK-8370315: don't process all scenarios if TestFormatException - JDK-8370315: move flag check to startParallel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28065/files - new: https://git.openjdk.org/jdk/pull/28065/files/67950832..c9295971 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=00-01 Stats: 54 lines in 3 files changed: 23 ins; 4 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/28065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28065/head:pull/28065 PR: https://git.openjdk.org/jdk/pull/28065 From shade at openjdk.org Wed Nov 5 15:21:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Nov 2025 15:21:12 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v3] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into JDK-8360557-ctw-inlining - Update src/hotspot/share/compiler/compiler_globals.hpp Co-authored-by: Tobias Hartmann - Revert separate patch - Final - Proper option name and bump the limits - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/dedbcfed..2a3b01b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=01-02 Stats: 427766 lines in 6339 files changed: 288673 ins; 96491 del; 42602 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From dfenacci at openjdk.org Wed Nov 5 15:54:49 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 5 Nov 2025 15:54:49 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v3] In-Reply-To: References: Message-ID: > ## Issue > Today, the only practical ways to run IR Framework scenarios in parallel seems to be: > * spawning threads manually in a single test, or > * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). > > This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. > > ## Change > This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: > * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) > * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). > * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. > > ## Testing > * Tier 1-3+ > * explicit `ir_framework.tests` runs > * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) > > As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. Damon Fenacci has updated the pull request incrementally with three additional commits since the last revision: - JDK-8370315: use check function - Apply suggestion from @chhagedorn Co-authored-by: Christian Hagedorn - JDK-8370315: add exceptions as they arise ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28065/files - new: https://git.openjdk.org/jdk/pull/28065/files/c9295971..4e9cc0a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=01-02 Stats: 16 lines in 2 files changed: 1 ins; 8 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28065/head:pull/28065 PR: https://git.openjdk.org/jdk/pull/28065 From kvn at openjdk.org Wed Nov 5 15:57:25 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Nov 2025 15:57:25 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 09:50:47 GMT, Emanuel Peter wrote: >> Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. >> >> Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 >> >> Main goal: >> - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). >> - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. >> >> **Why cost-model?** >> >> Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. >> >> But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. >> >> Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. >> >> **Implementation** >> >> Items: >> - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. >> - `VLoopAnalyzer::cost`: scalar loop cost >> - `VTransformGraph::cost`: vector loop cost >> - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. >> - Adapted existing tests. >> - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. >> >> **Testing** >> Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. >> >> ------------------------------ >> >> **Some History** >> >> I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). >> > ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rename cost methods for Vladimir K Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27803#pullrequestreview-3422990040 From kvn at openjdk.org Wed Nov 5 15:57:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Nov 2025 15:57:27 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: <59Pv7my4ZuQ0zbG-HVjrABTiHsnjguqvXqPAoo3S-ko=.eb97cbd9-672e-4872-ad1f-fb85556f45e0@github.com> References: <59Pv7my4ZuQ0zbG-HVjrABTiHsnjguqvXqPAoo3S-ko=.eb97cbd9-672e-4872-ad1f-fb85556f45e0@github.com> Message-ID: On Wed, 5 Nov 2025 09:17:44 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorization.cpp line 580: >> >>> 578: >>> 579: float sum = 0; >>> 580: for (int j = 0; j < body().body().length(); j++) { >> >> What is `body().body()` mean? > > `VLoopAnalyzer` (`this`) has multiple analysis subcomponents. One of them is the `VLoopBody`, i.e. `this->body()` / `this->_body.` And it has access to a `GrowableArray` `body()`, which maps the nodes of the loop. > > Maybe `loopBody().nodes()` would sound better here. If you prefer that, I file a separate renaming RFE. Yes, would be nice if you move `body().body()` into separate method with comment explaining it. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2495154530 From dfenacci at openjdk.org Wed Nov 5 16:02:41 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 5 Nov 2025 16:02:41 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v3] In-Reply-To: References: Message-ID: <4R5GzQKJ5EMgmZnWDmPT69rsNtOLWQVaua1JqsRqHd8=.4211beb0-2ca2-42fc-a288-98dce58e50e1@github.com> On Mon, 3 Nov 2025 15:44:34 GMT, Damon Fenacci wrote: >> test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 762: >> >>> 760: outcome = new Outcome(scenario, null, null); >>> 761: } catch (TestFormatException e) { >>> 762: outcome = new Outcome(scenario, e, null); >> >> Why do you collect the format exceptions here and only throw them later? Is a fail-fast not possible? > > Actually it is (maybe a bit more tricky but possible). Changing this... Trying to keep it simple, I've changed it so that other threads stop immediately if one has already thrown a `TestFormatException`. I also added `startParallel` tests to `TestBadFormat.java` to check for `TestFormatException`s with parallel scenarios. That exposed another issue with `TestFormat`: the list of failures there was static and could be filled by all threads concurrently. To keep things simple I turned it into a thread local field. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2495166808 From dfenacci at openjdk.org Wed Nov 5 16:02:39 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 5 Nov 2025 16:02:39 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v3] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 08:07:08 GMT, Christian Hagedorn wrote: > Now the only question remaining is which tests would already benefit from using the parallel version. I guess we can investigate that separately. Let me file an RFE for that. > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 456: > >> 454: } >> 455: } else { >> 456: startWithScenarios(!FORCE_SEQUENTIAL_SCENARIOS && parallel); > > Maybe we can already handle `FORCE_SEQUENTIAL_SCENARIOS` directly in `startParallel()`. Then `parallel` really means parallel. You could also add an additional API comment for `startParallel()` that we can force disable it with the corresponding property flag. Good idea. Changed. > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 772: > >> 770: System.out.println(output); >> 771: } >> 772: } > > This will probably also not be sorted by scenario index? Could we also just gather it here and then dump it after the stream? Maybe we can put `output` into `Outcome` as well as the exceptions by using a `ConcurrentSkipListMap` map in the parallel case or a normal `TreeMap` in the non-parallel case. The idea was to print the output as soon as one process finishes, so that we can follow the progress a bit better (and if there is an TestFormatException we have it printed up to the exception, although this could be done later as well). Of course then it is not sorted... That said, the output would be cleaner and more readable if we printed it in order as you suggest. Maybe we could even interleave output and exceptions for the same scenario. What do you think? (though I just noticed we throw another `TestRunException` after printing the exceptions) > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 787: > >> 785: outcomes.stream() >> 786: .filter(o -> o.other() != null) >> 787: .forEach(o -> exceptionMap.put(o.scenario(), o.other())); > > You could use a `ConcurrentSkipListMap` in the parallel case instead of a tree map. This would allow us to modify the map in parallel in the stream processing and simplify the code. Moreover, it will be sorted by scenario index which I'm not sure is currently the case? It should be (and I wanted to keep as much as possible as it was) but it's much nicer with `ConcurrentSkipListMap`. And there is no need to return any outcome anymore. > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 927: > >> 925: if (testVMProcess == null) { >> 926: throw new TestFrameworkException("TestVMProcess is null"); >> 927: } > > You can use this utility method instead: > Suggestion: > > TestFramework.check(testVMProcess != null, "TestVMProcess must not be null"); Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28065#issuecomment-3492041441 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2495170981 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2495168517 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2495168073 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2495168834 From dfenacci at openjdk.org Wed Nov 5 16:36:26 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 5 Nov 2025 16:36:26 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v4] In-Reply-To: References: Message-ID: > ## Issue > Today, the only practical ways to run IR Framework scenarios in parallel seems to be: > * spawning threads manually in a single test, or > * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). > > This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. > > ## Change > This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: > * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) > * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). > * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. > > ## Testing > * Tier 1-3+ > * explicit `ir_framework.tests` runs > * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) > > As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8370315: fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28065/files - new: https://git.openjdk.org/jdk/pull/28065/files/4e9cc0a2..7b643833 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28065/head:pull/28065 PR: https://git.openjdk.org/jdk/pull/28065 From qamai at openjdk.org Wed Nov 5 17:34:04 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 5 Nov 2025 17:34:04 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 09:50:47 GMT, Emanuel Peter wrote: >> Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. >> >> Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 >> >> Main goal: >> - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). >> - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. >> >> **Why cost-model?** >> >> Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. >> >> But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. >> >> Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. >> >> **Implementation** >> >> Items: >> - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. >> - `VLoopAnalyzer::cost`: scalar loop cost >> - `VTransformGraph::cost`: vector loop cost >> - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. >> - Adapted existing tests. >> - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. >> >> **Testing** >> Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. >> >> ------------------------------ >> >> **Some History** >> >> I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). >> > ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rename cost methods for Vladimir K src/hotspot/share/opto/vectorization.cpp line 604: > 602: // If needed, we could also use platform specific costs, if the > 603: // default here is not accurate enough. > 604: float VLoopAnalyzer::cost_for_scalar_node(int opcode) const { You need a `BasicType` parameter for this method, some opcodes are used for multiple kinds of operands. src/hotspot/share/opto/vectorization.cpp line 618: > 616: // default here is not accurate enough. > 617: float VLoopAnalyzer::cost_for_vector_node(int opcode, int vlen, BasicType bt) const { > 618: float c = 1; We have `Matcher::vector_op_pre_select_sz_estimate`, could it be used here? The corresponding for scalar is `Matcher::scalar_op_pre_select_sz_estimate` src/hotspot/share/opto/vectorization.cpp line 635: > 633: // Each reduction is composed of multiple instructions, each estimated with a unit cost. > 634: // Linear: shuffle and reduce Recursive: shuffle and reduce > 635: float c = requires_strict_order ? 2 * vlen : 2 * exact_log2(vlen); Can we ask for the cost of the element-wise opcode here, something like `(1 + element_wise_cost)` would be more accurate? src/hotspot/share/opto/vtransform.cpp line 201: > 199: // in_loop: vtn->_idx -> bool > 200: void VTransformGraph::mark_vtnodes_in_loop(VectorSet& in_loop) const { > 201: assert(is_scheduled(), "must already be scheduled"); May I ask if this schedule has already moved unordered reductions like addition out of the loop yet? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2495492772 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2495488204 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2495478951 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2495502105 From dlong at openjdk.org Wed Nov 5 21:23:04 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 5 Nov 2025 21:23:04 GMT Subject: RFR: 8371297: C2: assert triggerred in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 22:18:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Not sure how this one slipped in (https://github.com/openjdk/jdk/pull/28047/), I think it passed my local test and github CI test at that time. > Please check https://bugs.openjdk.org/browse/JDK-8371297 for details. > > Thanks! src/hotspot/share/opto/vtransform.cpp line 1427: > 1425: const BoolTest bt(m); > 1426: tty->print(" test=%s", m == _test._mask ? "" : "unsigned "); > 1427: bt.dump_on(tty); I was wondering why we pass the raw mask around instead of keeping it encapsulated in a BoolTest object. Elsewhere I saw code like this: `cond->get_con() & (BoolTest::unsigned_compare - 1)` which seems to be making fragile assumptions about BoolTest internals. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28141#discussion_r2496215037 From fyang at openjdk.org Thu Nov 6 03:22:33 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 6 Nov 2025 03:22:33 GMT Subject: RFR: 8371385: compiler/escapeAnalysis/TestRematerializeObjects.java fails in case of -XX:-UseUnalignedAccesses Message-ID: <82OW0gJBKDYOCkj3ExvHNebrgRnhIlHa6n_yEaHN1i0=.972f50ae-b39f-484b-b939-6aaa535df8f0@github.com> Hi, Please consider this small change fixing a test failure. Two IR rules failed under -XX:-EliminateAllocations on platforms with -XX:-UseUnalignedAccesses. These are expecting MergeStores to combine and emit StoreL or StoreI. But the enablement of MergeStores depends on flag UseUnalignedAccesses [1]. So this simply add that condition to applyIf of the two IR rules. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L3455 ------------- Commit messages: - 8371385: compiler/escapeAnalysis/TestRematerializeObjects.java fails in case of -XX:-UseUnalignedAccesses Changes: https://git.openjdk.org/jdk/pull/28171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28171&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371385 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28171/head:pull/28171 PR: https://git.openjdk.org/jdk/pull/28171 From rcastanedalo at openjdk.org Thu Nov 6 06:06:02 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 6 Nov 2025 06:06:02 GMT Subject: RFR: 8356761: IGV: dump escape analysis information In-Reply-To: References: Message-ID: <8j_40zCPi1joR0SAU9PtcIQGRSBe1eSCYUqDRpyS8Ts=.f6681026-f9f1-44bd-8e62-f68526f45d5d@github.com> On Wed, 5 Nov 2025 14:06:54 GMT, Christian Hagedorn wrote: > I just want to raise a general concern that our model of having different `PrintIdealGraphLevel` values might not fit anymore for all the different concepts (different loop opts, IGVN steps, Superword steps, parsing steps and now EA steps etc.). Maybe the time has come to use a different solution to allow some better filtering for different needs. Right, see [JDK-8320070](https://bugs.openjdk.org/browse/JDK-8320070). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28060#issuecomment-3495228693 From xgong at openjdk.org Thu Nov 6 06:09:03 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 6 Nov 2025 06:09:03 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v5] In-Reply-To: References: Message-ID: <2UIgdIDLBZDDkMD1z3FXlUTQ4GUp6clmM2Qwvdlk_H4=.4ba723ab-3cd5-4c01-9165-f64cb2a24e3c@github.com> On Wed, 5 Nov 2025 00:59:10 GMT, Sandhya Viswanathan wrote: > Looks good to me. Thanks so much for your review! Hi @eme64 , could you please help take an internal testing for this PR again? If there are any other inputs, please let me know. Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3495235713 From chagedorn at openjdk.org Thu Nov 6 06:40:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Nov 2025 06:40:06 GMT Subject: RFR: 8371385: compiler/escapeAnalysis/TestRematerializeObjects.java fails in case of -XX:-UseUnalignedAccesses In-Reply-To: <82OW0gJBKDYOCkj3ExvHNebrgRnhIlHa6n_yEaHN1i0=.972f50ae-b39f-484b-b939-6aaa535df8f0@github.com> References: <82OW0gJBKDYOCkj3ExvHNebrgRnhIlHa6n_yEaHN1i0=.972f50ae-b39f-484b-b939-6aaa535df8f0@github.com> Message-ID: On Thu, 6 Nov 2025 03:11:14 GMT, Fei Yang wrote: > Hi, Please consider this small change fixing a test failure. > > Two IR rules failed under -XX:-EliminateAllocations on platforms with -XX:-UseUnalignedAccesses. > These are expecting MergeStores to combine and emit StoreL or StoreI. But the enablement of MergeStores > depends on flag UseUnalignedAccesses [1]. So this simply add that condition to applyIf of the two IR rules. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L3455 That looks good to me, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28171#pullrequestreview-3426450868 From chagedorn at openjdk.org Thu Nov 6 07:22:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Nov 2025 07:22:12 GMT Subject: RFR: 8370878: C1: Clean up unnecessary ConversionStub constructor [v2] In-Reply-To: References: Message-ID: On Sat, 1 Nov 2025 14:14:37 GMT, Zihao Lin wrote: >> C1: Clean up unnecessary ConversionStub constructor >> Remove class which should not reach. > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > fix arm Yes, sure! We can consider it a trivial removal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28096#issuecomment-3495449186 From duke at openjdk.org Thu Nov 6 07:22:13 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 6 Nov 2025 07:22:13 GMT Subject: Integrated: 8370878: C1: Clean up unnecessary ConversionStub constructor In-Reply-To: References: Message-ID: <9MGSQTQ9j5S9Lmc5Og1qRSeUWs-p3y5XuZXGSmDrXh8=.f7472508-e098-4c30-9148-63968c188a0d@github.com> On Sat, 1 Nov 2025 12:21:35 GMT, Zihao Lin wrote: > C1: Clean up unnecessary ConversionStub constructor > Remove class which should not reach. This pull request has now been integrated. Changeset: ac9cf5d5 Author: Zihao Lin Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/ac9cf5d572f7504507117aa15e56c903e1400cf5 Stats: 43 lines in 4 files changed: 0 ins; 39 del; 4 mod 8370878: C1: Clean up unnecessary ConversionStub constructor Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28096 From qamai at openjdk.org Thu Nov 6 07:27:05 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 6 Nov 2025 07:27:05 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 17:23:35 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rename cost methods for Vladimir K > > src/hotspot/share/opto/vectorization.cpp line 635: > >> 633: // Each reduction is composed of multiple instructions, each estimated with a unit cost. >> 634: // Linear: shuffle and reduce Recursive: shuffle and reduce >> 635: float c = requires_strict_order ? 2 * vlen : 2 * exact_log2(vlen); > > Can we ask for the cost of the element-wise opcode here, something like `(1 + element_wise_cost)` would be more accurate? To be a little more precise, the strict one should be something like: vlen * (1 + Matcher::vector_op_pre_select_sz_estimate(Op_Extract, bt, vlen)) + (vlen - 1) * (1 + Matcher::scalar_op_pre_select_sz_estimate(opcode, bt))); and the non-strict one would be: float c = Matcher::vector_op_pre_select_sz_estimate(Op_Extract, bt, 2) * 2 + Matcher::scalar_op_pre_select_sz_estimate(opcode) + 3; for (int i = 4; i <= vlen; i *= 2) { c += 2 + Matcher::vector_op_pre_select_sz_estimate(Op_VectorRearrange, bt, i) + Matcher::vector_op_pre_select_sz_estimate(opcode, bt, i); } Maybe refactoring a little bit to make the `Matcher::vector_op_pre_select_sz_estimate` less awkward would be welcomed, too. Currently, it returns the estimated size - 1, which is unsettling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497769367 From epeter at openjdk.org Thu Nov 6 07:52:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 07:52:03 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 07:23:57 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/vectorization.cpp line 635: >> >>> 633: // Each reduction is composed of multiple instructions, each estimated with a unit cost. >>> 634: // Linear: shuffle and reduce Recursive: shuffle and reduce >>> 635: float c = requires_strict_order ? 2 * vlen : 2 * exact_log2(vlen); >> >> Can we ask for the cost of the element-wise opcode here, something like `(1 + element_wise_cost)` would be more accurate? > > To be a little more precise, the strict one should be something like: > > vlen * (1 + Matcher::vector_op_pre_select_sz_estimate(Op_Extract, bt, vlen)) + (vlen - 1) * (1 + Matcher::scalar_op_pre_select_sz_estimate(opcode, bt))); > > and the non-strict one would be: > > float c = Matcher::vector_op_pre_select_sz_estimate(Op_Extract, bt, 2) * 2 + Matcher::scalar_op_pre_select_sz_estimate(opcode) + 3; > for (int i = 4; i <= vlen; i *= 2) { > c += 2 + Matcher::vector_op_pre_select_sz_estimate(Op_VectorRearrange, bt, i) + Matcher::vector_op_pre_select_sz_estimate(opcode, bt, i); > } > > Maybe refactoring a little bit to make the `Matcher::vector_op_pre_select_sz_estimate` less awkward would be welcomed, too. Currently, it returns the estimated size - 1, which is unsettling. @merykitty Can we do that in a follow-up RFE? For now, I'd like to keep it as simple as possible. Cost-models can become arbitrarily complex. There is a bit of a trade-off between simplicity and accuracy. And we can for sure improve things in the future, this PR just lays the foundation. My goal here is to start as simple as possible, and then add complexity if there is a proven need for it. So if you/we can find a benchmark where the cost model is not accurate enough yet, provable by `-XX:AutoVectorizationOverrideProfitability=0/2`, then we should make it more complex. Would that be acceptable for you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497861417 From epeter at openjdk.org Thu Nov 6 07:59:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 07:59:06 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 17:27:43 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rename cost methods for Vladimir K > > src/hotspot/share/opto/vectorization.cpp line 604: > >> 602: // If needed, we could also use platform specific costs, if the >> 603: // default here is not accurate enough. >> 604: float VLoopAnalyzer::cost_for_scalar_node(int opcode) const { > > You need a `BasicType` parameter for this method, some opcodes are used for multiple kinds of operands. Will add it :) > src/hotspot/share/opto/vectorization.cpp line 618: > >> 616: // default here is not accurate enough. >> 617: float VLoopAnalyzer::cost_for_vector_node(int opcode, int vlen, BasicType bt) const { >> 618: float c = 1; > > We have `Matcher::vector_op_pre_select_sz_estimate`, could it be used here? The corresponding for scalar is `Matcher::scalar_op_pre_select_sz_estimate` Same answer as above :) > src/hotspot/share/opto/vtransform.cpp line 201: > >> 199: // in_loop: vtn->_idx -> bool >> 200: void VTransformGraph::mark_vtnodes_in_loop(VectorSet& in_loop) const { >> 201: assert(is_scheduled(), "must already be scheduled"); > > May I ask if this schedule has already moved unordered reductions like addition out of the loop yet? `optimize` happens before `schedule`. But the unordered reduction is still in the `VTransformGraph`, and so it is also scheduled. But `mark_vtnodes_in_loop` will find that the unordered reduction is outside the loop :) Does that answer your question? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497884992 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497872764 PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497881539 From epeter at openjdk.org Thu Nov 6 07:59:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 07:59:08 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 07:49:07 GMT, Emanuel Peter wrote: >> To be a little more precise, the strict one should be something like: >> >> vlen * (1 + Matcher::vector_op_pre_select_sz_estimate(Op_Extract, bt, vlen)) + (vlen - 1) * (1 + Matcher::scalar_op_pre_select_sz_estimate(opcode, bt))); >> >> and the non-strict one would be: >> >> float c = Matcher::vector_op_pre_select_sz_estimate(Op_Extract, bt, 2) * 2 + Matcher::scalar_op_pre_select_sz_estimate(opcode) + 3; >> for (int i = 4; i <= vlen; i *= 2) { >> c += 2 + Matcher::vector_op_pre_select_sz_estimate(Op_VectorRearrange, bt, i) + Matcher::vector_op_pre_select_sz_estimate(opcode, bt, i); >> } >> >> Maybe refactoring a little bit to make the `Matcher::vector_op_pre_select_sz_estimate` less awkward would be welcomed, too. Currently, it returns the estimated size - 1, which is unsettling. > > @merykitty Can we do that in a follow-up RFE? For now, I'd like to keep it as simple as possible. Cost-models can become arbitrarily complex. There is a bit of a trade-off between simplicity and accuracy. And we can for sure improve things in the future, this PR just lays the foundation. > > My goal here is to start as simple as possible, and then add complexity if there is a proven need for it. > > So if you/we can find a benchmark where the cost model is not accurate enough yet, provable by `-XX:AutoVectorizationOverrideProfitability=0/2`, then we should make it more complex. > > Would that be acceptable for you? What exactly does `Matcher::vector_op_pre_select_sz_estimate` return? The number of instructions or some kind of throughput estimate? Personally, I don't want to get too stuck to counting instructions, but rather getting a throughput estimate. Counting instructions is an estimate for throughput, but I don't know yet if longterm it is the best. I would like to wait a little more, and start depending on the cost model for more and more cases (extract, pack, shuffle, if-conversion, ...) and then we will run into issues along the way where the cost model is not yet accurate enough. And at that point we can think again what would produce the most accurate results. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497872332 From epeter at openjdk.org Thu Nov 6 08:06:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 08:06:03 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 07:56:30 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorization.cpp line 604: >> >>> 602: // If needed, we could also use platform specific costs, if the >>> 603: // default here is not accurate enough. >>> 604: float VLoopAnalyzer::cost_for_scalar_node(int opcode) const { >> >> You need a `BasicType` parameter for this method, some opcodes are used for multiple kinds of operands. > > Will add it :) Well, I actually tried it right now, and it would take a bit of engineering at the call sites. In quite a few cases the BasicType is not immediately available. Is it ok if we ignore it for now, and only add it in once we really need it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497906091 From qamai at openjdk.org Thu Nov 6 08:16:04 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 6 Nov 2025 08:16:04 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 09:50:47 GMT, Emanuel Peter wrote: >> Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. >> >> Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 >> >> Main goal: >> - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). >> - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. >> >> **Why cost-model?** >> >> Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. >> >> But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. >> >> Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. >> >> **Implementation** >> >> Items: >> - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. >> - `VLoopAnalyzer::cost`: scalar loop cost >> - `VTransformGraph::cost`: vector loop cost >> - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. >> - Adapted existing tests. >> - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. >> >> **Testing** >> Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. >> >> ------------------------------ >> >> **Some History** >> >> I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). >> > ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rename cost methods for Vladimir K Thanks for your replies. I think leaving my suggestions to future RFEs is reasonable. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/27803#pullrequestreview-3426803471 From epeter at openjdk.org Thu Nov 6 08:16:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 08:16:06 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v3] In-Reply-To: References: <59Pv7my4ZuQ0zbG-HVjrABTiHsnjguqvXqPAoo3S-ko=.eb97cbd9-672e-4872-ad1f-fb85556f45e0@github.com> Message-ID: <2o_PHZHgds6NB5QfD92O3TMpiwjVgvzwp7zJhnONUso=.2dae12bd-c2b1-4355-9f6a-611143ef9eaa@github.com> On Wed, 5 Nov 2025 15:53:15 GMT, Vladimir Kozlov wrote: >> `VLoopAnalyzer` (`this`) has multiple analysis subcomponents. One of them is the `VLoopBody`, i.e. `this->body()` / `this->_body.` And it has access to a `GrowableArray` `body()`, which maps the nodes of the loop. >> >> Maybe `loopBody().nodes()` would sound better here. If you prefer that, I file a separate renaming RFE. > > Yes, would be nice if you move `body().body()` into separate method with comment explaining it. Thanks! FYI, I filed: [JDK-8371391](https://bugs.openjdk.org/browse/JDK-8371391) C2 SuperWord: rename body().body() to something more understandable ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497934619 From qamai at openjdk.org Thu Nov 6 08:16:07 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 6 Nov 2025 08:16:07 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: <8HmGA-EHgFMDSS1WiVPceuBd42ca9OvngwnW8CwASz8=.4690cf29-a071-4bcd-a6b2-f81a8affd7e1@github.com> On Thu, 6 Nov 2025 07:52:58 GMT, Emanuel Peter wrote: > What exactly does Matcher::vector_op_pre_select_sz_estimate return? The number of instructions or some kind of throughput estimate? I believe it tries to estimate the number of instructions generated by a node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497937659 From epeter at openjdk.org Thu Nov 6 08:16:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 08:16:08 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: <8HmGA-EHgFMDSS1WiVPceuBd42ca9OvngwnW8CwASz8=.4690cf29-a071-4bcd-a6b2-f81a8affd7e1@github.com> References: <8HmGA-EHgFMDSS1WiVPceuBd42ca9OvngwnW8CwASz8=.4690cf29-a071-4bcd-a6b2-f81a8affd7e1@github.com> Message-ID: On Thu, 6 Nov 2025 08:10:43 GMT, Quan Anh Mai wrote: >> What exactly does `Matcher::vector_op_pre_select_sz_estimate` return? The number of instructions or some kind of throughput estimate? >> >> Personally, I don't want to get too stuck to counting instructions, but rather getting a throughput estimate. Counting instructions is an estimate for throughput, but I don't know yet if longterm it is the best. >> >> I would like to wait a little more, and start depending on the cost model for more and more cases (extract, pack, shuffle, if-conversion, ...) and then we will run into issues along the way where the cost model is not yet accurate enough. And at that point we can think again what would produce the most accurate results. > >> What exactly does Matcher::vector_op_pre_select_sz_estimate return? The number of instructions or some kind of throughput estimate? > > I believe it tries to estimate the number of instructions generated by a node. I'm filing an RFE now ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497946170 From epeter at openjdk.org Thu Nov 6 08:23:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 08:23:06 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: <8HmGA-EHgFMDSS1WiVPceuBd42ca9OvngwnW8CwASz8=.4690cf29-a071-4bcd-a6b2-f81a8affd7e1@github.com> Message-ID: On Thu, 6 Nov 2025 08:13:40 GMT, Emanuel Peter wrote: >>> What exactly does Matcher::vector_op_pre_select_sz_estimate return? The number of instructions or some kind of throughput estimate? >> >> I believe it tries to estimate the number of instructions generated by a node. > > I'm filing an RFE now [JDK-8371393](https://bugs.openjdk.org/browse/JDK-8371393) C2 SuperWord: improve cost model ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497958293 From epeter at openjdk.org Thu Nov 6 08:23:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 08:23:05 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 15:54:47 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rename cost methods for Vladimir K > > Good. @vnkozlov Thanks for reviewing and the approval! FYI, I filed: [JDK-8371391](https://bugs.openjdk.org/browse/JDK-8371391) C2 SuperWord: rename body().body() to something more understandable @merykitty Thanks a lot for reviewing as well, and the ideas about improving the cost model. There is actually a lot of literature out there about cost models, and various compilers employ various methods. There could be a lot of exciting work in this area, but let's take it step-by-step ;) FYI, I filed: [JDK-8371393](https://bugs.openjdk.org/browse/JDK-8371393) C2 SuperWord: improve cost model ------------- PR Comment: https://git.openjdk.org/jdk/pull/27803#issuecomment-3495786238 From dfenacci at openjdk.org Thu Nov 6 09:41:04 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 6 Nov 2025 09:41:04 GMT Subject: RFR: 8371385: compiler/escapeAnalysis/TestRematerializeObjects.java fails in case of -XX:-UseUnalignedAccesses In-Reply-To: <82OW0gJBKDYOCkj3ExvHNebrgRnhIlHa6n_yEaHN1i0=.972f50ae-b39f-484b-b939-6aaa535df8f0@github.com> References: <82OW0gJBKDYOCkj3ExvHNebrgRnhIlHa6n_yEaHN1i0=.972f50ae-b39f-484b-b939-6aaa535df8f0@github.com> Message-ID: On Thu, 6 Nov 2025 03:11:14 GMT, Fei Yang wrote: > Hi, Please consider this small change fixing a test failure. > > Two IR rules failed under -XX:-EliminateAllocations on platforms with -XX:-UseUnalignedAccesses. > These are expecting MergeStores to combine and emit StoreL or StoreI. But the enablement of MergeStores > depends on flag UseUnalignedAccesses [1]. So this simply add that condition to applyIf of the two IR rules. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L3455 Looks good to me too. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28171#pullrequestreview-3427187067 From mli at openjdk.org Thu Nov 6 09:55:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 6 Nov 2025 09:55:03 GMT Subject: RFR: 8371297: C2: assert triggerred in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 21:20:46 GMT, Dean Long wrote: >> Hi, >> Can you help to review this patch? >> >> Not sure how this one slipped in (https://github.com/openjdk/jdk/pull/28047/), I think it passed my local test and github CI test at that time. >> Please check https://bugs.openjdk.org/browse/JDK-8371297 for details. >> >> Thanks! > > src/hotspot/share/opto/vtransform.cpp line 1427: > >> 1425: const BoolTest bt(m); >> 1426: tty->print(" test=%s", m == _test._mask ? "" : "unsigned "); >> 1427: bt.dump_on(tty); > > I was wondering why we pass the raw mask around instead of keeping it encapsulated in a BoolTest object. Elsewhere I saw code like this: > `cond->get_con() & (BoolTest::unsigned_compare - 1)` > which seems to be making fragile assumptions about BoolTest internals. @dean-long Yes, I have the same feeling that BoolTest is currently used in a fragile way. The reasons could be, BoolTest itself is by design a struct and expose all its status, and `unsigned_compare` is indeed not supported (well) but needed somewhere e.g. in vector intrinsic, and auto-vectorization (after https://github.com/openjdk/jdk/pull/28047). I think it's worth to do more investigation about the refactoring of BoolTest, file https://bugs.openjdk.org/browse/JDK-8371396 to track it, feel free to take it if you already have a solution or idea. This issue (in fact it's https://github.com/openjdk/jdk/pull/27942) blocks several other prs in my backlog for a while, e.g. https://github.com/openjdk/jdk/pull/25336, https://github.com/openjdk/jdk/pull/25341. It helps to resolve this assert in a quick (although ugly) way. Please kindly let me know how you think about it. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28141#discussion_r2498279741 From chagedorn at openjdk.org Thu Nov 6 10:17:45 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Nov 2025 10:17:45 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Wed, 5 Nov 2025 12:51:15 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Dumping some more comments as I'm refreshing my memory on how the internal classes interact with each other by jumping around the code. Will pick it up again after lunch. test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 32: > 30: > 31: /** > 32: * The {@link CodeFrame} represents a frame (i.e. scope) of code, appending {@link Code} to the {@code 'codeList'} Needed to jump back and forth between different classes to refresh my memory about internals. I was a little bit confused about the difference between `TemplateFrame` and `CodeFrame` in the context where it was used, so I ended up studying the class comments of these classes. IIUC, the latter (this class here) is actually about the generated code. Maybe it helps when adding here "generated"? Suggestion: * The {@link CodeFrame} represents a frame (i.e. scope) of generated code by appending {@link Code} to the {@code 'codeList'} test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 34: > 32: * The {@link CodeFrame} represents a frame (i.e. scope) of code, appending {@link Code} to the {@code 'codeList'} > 33: * as {@link Token}s are rendered, and adding names to the {@link NameSet}s with {@link Template#addStructuralName}/ > 34: * {@link Template#addDataName}. {@link Hook}s can be added to a frame, which allows code to be inserted at that Since we have template and code frames it might help to be more precise here: Suggestion: * {@link Template#addDataName}. {@link Hook}s can be added to a code frame, which allows code to be inserted at that test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 50: > 48: *

> 49: * Note, that {@link CodeFrame}s and {@link TemplateFrame}s often go together, but can also > 50: * diverge. Can you give an example here where/when this is the case? If there are not many variants, it might be worth to list them here. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 27: > 25: > 26: import java.util.List; > 27: import java.util.function.Function; The IDE reports that this is unused. Suggestion: test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 88: > 86: * > 87: *

> 88: * Instead, the user should create a {@link TemplateToken} from the inner {@link Template}, and Read this through again to refresh my memory an noticed this: "Should" sounds like good advise but below it clearly mention it's forbidden. Would it make sense to swap the paragraphs? Then it makes it more clear that this is not only advise but the only way to go. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 117: > 115: static Renderer getCurrent() { > 116: if (renderer == null) { > 117: throw new RendererException("A Template method such as '$', 'fuel', etc. was called outside a template rendering."); Call? Suggestion: throw new RendererException("A Template method such as '$', 'fuel', etc. was called outside a template rendering call."); test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 145: > 143: // Ensure CodeFrame consistency. > 144: if (baseCodeFrame != currentCodeFrame) { > 145: throw new RuntimeException("Internal error: Renderer did not end up at base CodeFrame."); Optionally, something for another RFE (there are more "Internal error message"): You could turn these runtime exception into `TemplateFrameworkException`s add a hint to report a bug. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 237: > 235: // If the ScopeToken is transparent to Names, then the Template is transparent to names. > 236: ScopeToken st = templateToken.instantiate(); > 237: renderScopeToken(st, () -> {}); Just a suggestion: You seem to use `() -> {}` quite often when calling `renderScopeToken()`. Might be worth to overload it and provide one `renderScopeToken()` version with just a `ScopeToken`. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 246: > 244: > 245: private void renderScopeToken(ScopeToken st, Runnable preamble) { > 246: if (!(st instanceof ScopeTokenImpl sti)) { Wasn't aware of that but seems quite convenient: The IDE suggests to use a record pattern: if (!(st instanceof ScopeTokenImpl(List tokens, boolean nestedNamesAreLocal, boolean nestedHashtagsAreLocal, boolean nestedSetFuelCostAreLocal ))) { Then you can directly access the fields below: sti.nestedNamesAreLocal() -> nestedNamesAreLocal etc. test/hotspot/jtreg/compiler/lib/template_framework/ScopeTokenImpl.java line 34: > 32: * > 33: * Note: we want the tokens to be package private, so we create this Impl > 34: * record. Can you extend this comment why it matters? Is it that users just don't need to know/worry about this or is it to actually avoid that someone is trying to write code that relies on this class and possibly prevent future internal changes? test/hotspot/jtreg/compiler/lib/template_framework/ScopeTokenImpl.java line 39: > 37: boolean nestedNamesAreLocal, > 38: boolean nestedHashtagsAreLocal, > 39: boolean nestedSetFuelCostAreLocal) implements ScopeToken, Token {} Could be confusing what "local" means. Does it mean local to the scope or local to the nested scope? IIUC, it means to the nested scope. Maybe "nestedXAreTransparent" (or "nestedXAreNonTransparent" ) is clearer and adheres to the scope transparency definitions. What do you think? test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 30: > 28: > 29: /** > 30: * The {@link TemplateFrame} is the frame for a {@link Template} and its inner unique? Suggestion: * The {@link TemplateFrame} is the frame for a unique {@link Template} and its inner test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 34: > 32: * {@link #id} used to deconflict names using {@link Template#$}. It also has a set of hashtag > 33: * replacements, which combine the key-value pairs from the template argument and the > 34: * {@link Template#let} definitions. Inner scopes of a {@link Template} have access to `Inner scopes of a {@link Template} [...]` Should be obvious but since it's a class comment you could add something here that we only mean scopes that are not template themselves. test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 36: > 34: * {@link Template#let} definitions. Inner scopes of a {@link Template} have access to > 35: * the outer scope hashtag replacements, and any hashtag replacement defined inside an > 36: * inner scope is local and disapears once we leave the scope. The {@link #parent} relationship Suggestion: * inner scope is local and disappears once we leave the scope. The {@link #parent} relationship test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 38: > 36: * inner scope is local and disapears once we leave the scope. The {@link #parent} relationship > 37: * provides a trace for the use chain of templates and their inner scopes. The {@link #fuel} > 38: * is reduced over this chain, to give a heuristic on how much time is spent on the code I don't have a better suggestion but the first part mentioning "time" is not very clear since it's only about nesting depth? Suggestion: * is reduced over this chain to give a heuristic on how much time is spent on the code ------------- PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3426605511 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498309359 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498331002 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498335738 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2497775079 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2497797514 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2497782957 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498027072 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498047072 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498053874 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498147231 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498168918 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498192200 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498197996 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498229138 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498245702 From chagedorn at openjdk.org Thu Nov 6 10:17:46 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Nov 2025 10:17:46 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Thu, 6 Nov 2025 10:09:29 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 50: > >> 48: *

>> 49: * Note, that {@link CodeFrame}s and {@link TemplateFrame}s often go together, but can also >> 50: * diverge. > > Can you give an example here where/when this is the case? If there are not many variants, it might be worth to list them here. It might also help to show a simple visual/sketch example how template frames, code frames and scopes work together. I think it's a little tricky at first to grasp how they interact. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498344312 From epeter at openjdk.org Thu Nov 6 10:34:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 10:34:59 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v15] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: insert example for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/713d9c1e..7c69a57d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=13-14 Stats: 133 lines in 1 file changed: 130 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Thu Nov 6 10:35:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 10:35:01 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v11] In-Reply-To: References: <6tblejSZk3E4Y1Mc7yiI_pwUQlpPu0aRCFQHeT_9mwY=.1931f70a-01bc-4154-8e0d-0fd0bbdbe8a3@github.com> <12evYYJTBLOaUYCaUaEncCi8H6ge8hSxAFVHqEqeNpE=.8c904ca4-29a5-4a4a-a350-cbd56fc1a0db@github.com> <8-F0JFT2Vp076PJ0x2bgxPgW-vFYzjCRPuPt1qj3WjA=.4ad5b0ab-ae09-4d4f-bdbd-eedde30d077c@github.com> Message-ID: On Wed, 5 Nov 2025 13:30:18 GMT, Christian Hagedorn wrote: >> I'm updating the description in `Hook.java` a little, to mention this. > > Thanks for the example and updating `Hook.java`! > >> Do you think we should talk about it somewhere, maybe add a further example at the very end, that discusses this? > > It might not hurt. The question just naturally occurred to me at that point in the tutorial. But I'm not sure if others feel the same way. If you add an example you could mention that this is advanced or some expert use or something like that. You could also reference to `Hook.java` for more details. I added `generateWithScopes2`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498412555 From qxing at openjdk.org Thu Nov 6 11:00:26 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 6 Nov 2025 11:00:26 GMT Subject: Integrated: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops In-Reply-To: References: Message-ID: <-3etOnDvd53dlxITOWyUuFHrEzKD0TvHtutCbF7xM_I=.44f114f1-56c2-4a05-a58c-ac30d3d9a6e8@github.com> On Mon, 13 Jan 2025 01:12:20 GMT, Qizheng Xing wrote: > In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. > > loop-safepoint > > This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). > > > Benchmark Mode Cnt Score Error Units > LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline > LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch > > > Testing: tier1-2 on x86_64 and aarch64. This pull request has now been integrated. Changeset: 093e1287 Author: Qizheng Xing Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/093e128771f3dc01f64a8572de068e9776e38b97 Stats: 347 lines in 3 files changed: 319 ins; 2 del; 26 mod 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops Reviewed-by: epeter, roland ------------- PR: https://git.openjdk.org/jdk/pull/23057 From wenanjian at openjdk.org Thu Nov 6 11:06:24 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 6 Nov 2025 11:06:24 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v17] In-Reply-To: References: Message-ID: <9uwW544WzaCZ1u1LwmNZW4_8mSyxQK9aPV8Uv54NtRA=.4facae4d-76e7-4780-a34c-c7d5ee114503@github.com> > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: fix a jtreg problem ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/5bb019b9..a3bd1ff1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=15-16 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From epeter at openjdk.org Thu Nov 6 11:52:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 11:52:18 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v16] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Direct apply of Christian's suggestions Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/7c69a57d..b0246911 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=14-15 Stats: 71 lines in 3 files changed: 23 ins; 1 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Thu Nov 6 12:16:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 12:16:02 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v17] In-Reply-To: References: Message-ID: <3DOY2bRy6X58x78JgywTlMoRc4Ue52WL0FDSwmvfIUE=.e02f080c-cec7-4f7b-9f6d-40fbcc84cecb@github.com> > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/b0246911..76579fdb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=15-16 Stats: 10 lines in 2 files changed: 1 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Thu Nov 6 12:20:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 12:20:03 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v18] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/76579fdb..94384f4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=16-17 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Thu Nov 6 12:23:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 12:23:16 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v12] In-Reply-To: References: Message-ID: <8AT8vb3XIVl-qKfysSMfdQy8kkFgKS3A0zG3Qdld9qI=.b20ad88c-0d62-4e86-92f2-010b198e4060@github.com> On Wed, 5 Nov 2025 12:10:16 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply Christian's suggestions directly >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 596: > >> 594: * >> 595: *

>> 596: * The most common use of {@link scope} is in the construction of templates: > > You miss some `#` prefixes for methods for Javadocs. Same for methods below. Fixed a few more cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498753778 From epeter at openjdk.org Thu Nov 6 12:31:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 12:31:28 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Thu, 6 Nov 2025 07:32:59 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 88: > >> 86: * >> 87: *

>> 88: * Instead, the user should create a {@link TemplateToken} from the inner {@link Template}, and > > Read this through again to refresh my memory an noticed this: "Should" sounds like good advise but below it clearly mention it's forbidden. Would it make sense to swap the paragraphs? Then it makes it more clear that this is not only advise but the only way to go. Sounds good, `should` -> `must` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498778002 From epeter at openjdk.org Thu Nov 6 12:40:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 12:40:17 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Thu, 6 Nov 2025 08:41:01 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 145: > >> 143: // Ensure CodeFrame consistency. >> 144: if (baseCodeFrame != currentCodeFrame) { >> 145: throw new RuntimeException("Internal error: Renderer did not end up at base CodeFrame."); > > Optionally, something for another RFE (there are more "Internal error message"): You could turn these runtime exception into `TemplateFrameworkException`s add a hint to report a bug. Sure, let me file an RFE :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498801254 From epeter at openjdk.org Thu Nov 6 12:43:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 12:43:18 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Thu, 6 Nov 2025 12:37:31 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 145: >> >>> 143: // Ensure CodeFrame consistency. >>> 144: if (baseCodeFrame != currentCodeFrame) { >>> 145: throw new RuntimeException("Internal error: Renderer did not end up at base CodeFrame."); >> >> Optionally, something for another RFE (there are more "Internal error message"): You could turn these runtime exception into `TemplateFrameworkException`s add a hint to report a bug. > > Sure, let me file an RFE :) [JDK-8371407](https://bugs.openjdk.org/browse/JDK-8371407) TemplateFramework: replace internal RuntimeException with custom error ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498808806 From epeter at openjdk.org Thu Nov 6 12:48:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 12:48:18 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: <562hD-ZptH1PAfsfWtlxEEU9fG0r8vBMEK7fQ8LdUmY=.7803da2b-6bed-41f8-893d-a9b42c595080@github.com> On Thu, 6 Nov 2025 08:47:23 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 237: > >> 235: // If the ScopeToken is transparent to Names, then the Template is transparent to names. >> 236: ScopeToken st = templateToken.instantiate(); >> 237: renderScopeToken(st, () -> {}); > > Just a suggestion: You seem to use `() -> {}` quite often when calling `renderScopeToken()`. Might be worth to overload it and provide one `renderScopeToken()` version with just a `ScopeToken`. Nice idea :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498823367 From epeter at openjdk.org Thu Nov 6 12:54:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 12:54:41 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v19] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/94384f4b..343e9cf1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=17-18 Stats: 43 lines in 3 files changed: 5 ins; 5 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Thu Nov 6 12:54:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 12:54:43 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Thu, 6 Nov 2025 08:49:32 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 246: > >> 244: >> 245: private void renderScopeToken(ScopeToken st, Runnable preamble) { >> 246: if (!(st instanceof ScopeTokenImpl sti)) { > > Wasn't aware of that but seems quite convenient: The IDE suggests to use a record pattern: > > > if (!(st instanceof ScopeTokenImpl(List tokens, boolean nestedNamesAreLocal, > boolean nestedHashtagsAreLocal, boolean nestedSetFuelCostAreLocal > ))) { > > Then you can directly access the fields below: > > sti.nestedNamesAreLocal() -> nestedNamesAreLocal > > etc. Nice idea :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498841860 From epeter at openjdk.org Thu Nov 6 12:59:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 12:59:29 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Thu, 6 Nov 2025 09:14:30 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/lib/template_framework/ScopeTokenImpl.java line 34: > >> 32: * >> 33: * Note: we want the tokens to be package private, so we create this Impl >> 34: * record. > > Can you extend this comment why it matters? Is it that users just don't need to know/worry about this or is it to actually avoid that someone is trying to write code that relies on this class and possibly prevent future internal changes? sure, writing this instead: ~ 33 * Note: We want the {@link ScopeToken} to be public, but the internals of the ~ 34 * record should be private. One way too solve this is with a public interface + 35 * that exposes nothing but its name, and a private implementation via a + 36 * record that allows easy destructuring with pattern matching. > test/hotspot/jtreg/compiler/lib/template_framework/ScopeTokenImpl.java line 39: > >> 37: boolean nestedNamesAreLocal, >> 38: boolean nestedHashtagsAreLocal, >> 39: boolean nestedSetFuelCostAreLocal) implements ScopeToken, Token {} > > Could be confusing what "local" means. Does it mean local to the scope or local to the nested scope? IIUC, it means to the nested scope. Maybe "nestedXAreTransparent" (or "nestedXAreNonTransparent" ) is clearer and adheres to the scope transparency definitions. What do you think? What about `isTransparentForNames`, etc? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498857830 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498866582 From epeter at openjdk.org Thu Nov 6 13:09:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 13:09:21 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Thu, 6 Nov 2025 09:26:17 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 30: > >> 28: >> 29: /** >> 30: * The {@link TemplateFrame} is the frame for a {@link Template} and its inner > > unique? > > Suggestion: > > * The {@link TemplateFrame} is the frame for a unique {@link Template} and its inner I adjusted it a little different. Because it is not per Template, but per rendering of a template. But the follwing sentences also back that up. I hope it is clear now. > test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 34: > >> 32: * {@link #id} used to deconflict names using {@link Template#$}. It also has a set of hashtag >> 33: * replacements, which combine the key-value pairs from the template argument and the >> 34: * {@link Template#let} definitions. Inner scopes of a {@link Template} have access to > > `Inner scopes of a {@link Template} [...]` > > Should be obvious but since it's a class comment you could add something here that we only mean scopes that are not template themselves. I don't really know how to say it better than I am already. Do you have a suggestion? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498894986 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2498903894 From duke at openjdk.org Thu Nov 6 13:43:33 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 6 Nov 2025 13:43:33 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v9] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: remove C2AccessValuePtr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/6d122039..e89910c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=07-08 Stats: 58 lines in 8 files changed: 0 ins; 21 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From duke at openjdk.org Thu Nov 6 13:58:53 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 6 Nov 2025 13:58:53 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v10] In-Reply-To: References: Message-ID: <1zyQq98OPsZ-2nzYz21X_5v2RgKhWaZrZaJQevDMzo4=.138599b1-4797-42b0-a48a-829a112dfbe7@github.com> > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - fix conflict - Merge master - remove C2AccessValuePtr - fix assert - add more assert - rid of access.addr().type() - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Fix build - ... and 2 more: https://git.openjdk.org/jdk/compare/c173d416...36e024db ------------- Changes: https://git.openjdk.org/jdk/pull/24258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=09 Stats: 230 lines in 18 files changed: 33 ins; 55 del; 142 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From epeter at openjdk.org Thu Nov 6 14:16:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 14:16:54 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v20] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/343e9cf1..6ff6e333 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=18-19 Stats: 76 lines in 4 files changed: 59 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Thu Nov 6 14:53:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 14:53:46 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v21] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rename/refactor for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/6ff6e333..ba5cba18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=19-20 Stats: 19 lines in 3 files changed: 0 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From bmaillard at openjdk.org Thu Nov 6 15:07:57 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 6 Nov 2025 15:07:57 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v5] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. > > This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). > > However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. > > This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. > As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. > > ```c++ > ... > // Global Value Numbering > i = hash_find_insert(k); // Check for pre-existing node > if (i && (i != k)) { > // Return the pre-existing node if it isn't dead > NOT_PRODUCT(set_progress();) > add_users_to_worklist(k); > subsume_node(k, i); // Everybody using k now uses i > return i; > } > ... > > > The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. > > ### Proposed Fix > > We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) > - [x] tier1-3, plus some internal testing > - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Add comments to clarify how add_users_to_worklist and add_users_of_use_to_worklist ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27900/files - new: https://git.openjdk.org/jdk/pull/27900/files/16842d01..103cc585 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=03-04 Stats: 14 lines in 1 file changed: 14 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27900/head:pull/27900 PR: https://git.openjdk.org/jdk/pull/27900 From bmaillard at openjdk.org Thu Nov 6 15:07:58 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 6 Nov 2025 15:07:58 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v4] In-Reply-To: References: <1UNdzkgCUH6tju9WzaTQaBdeT8Xv9T4TWnk2Jg3SMoA=.6ee10e45-8c73-444a-a9da-ca0c03bdaf79@github.com> <5S3qdxwC7jHLdzDGe74ls5zIgb1K1S5xlgp-jxFkKSI=.2f0b73d7-92e2-45f5-afa2-18ba3dacf932@github.com> Message-ID: On Tue, 4 Nov 2025 08:54:18 GMT, Emanuel Peter wrote: >> Yes, in some cases `add_users_of_use_to_worklist` is called with the node about to be replaced as argument `n`. The point is that we might replace `n` with a node that already has other uses, and we only want to notify the uses for which there is a potential change. >> But this is in no way specific to this one optimization, so I think adding something here would cause more confusion than anything else. Perhaps we should update the description of `add_users_of_use_to_worklist` then? > > Right, this is not specific to this optimization here. Why not add something at the level of `add_users_of_use_to_worklist`. Sorry for the delay, I ended up adding comments to the definition of both `add_users_to_worklist` and `add_users_of_use_to_worklist`. I think this might help avoid some confusion in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2499336288 From epeter at openjdk.org Thu Nov 6 15:29:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 15:29:38 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v22] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/ba5cba18..34982b39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=20-21 Stats: 90 lines in 2 files changed: 40 ins; 0 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Thu Nov 6 15:29:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 15:29:40 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Wed, 5 Nov 2025 13:53:13 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > I think I exhausted my reviewer fuel for today and will resume tomorrow by calling `setFuel(100%)` again :-) @chhagedorn Thank you very much for all your comments and suggetions! I think I addressed them all, though it's not impossible that I missed one since there are so many ;) As discussed offline: I attempted a first drawing, and you can tell me if you think this one is sufficient, or if we should add a few similar ones for some other scenarios (nested template, multiple insertions, insertion from within insertion). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3497793982 From epeter at openjdk.org Thu Nov 6 15:29:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 15:29:41 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: <-dSR9bqDnGWbCg4CUoIJ7nk5xjOqjF1wG18vKXIu8rc=.f084c527-9560-442d-8fd9-5479b7c17a3d@github.com> On Thu, 6 Nov 2025 10:11:57 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 50: >> >>> 48: *

>>> 49: * Note, that {@link CodeFrame}s and {@link TemplateFrame}s often go together, but can also >>> 50: * diverge. >> >> Can you give an example here where/when this is the case? If there are not many variants, it might be worth to list them here. > > It might also help to show a simple visual/sketch example how template frames, code frames and scopes work together. I think it's a little tricky at first to grasp how they interact. I attempted a first sketch, thanks for the discussion offline! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2499420141 From chagedorn at openjdk.org Thu Nov 6 15:36:40 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Nov 2025 15:36:40 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v8] In-Reply-To: References: Message-ID: <-ugEs3lNGFg5q4ufh-MGJaymmDB_v7bIO7IkwGWypTk=.3d28c7eb-69bb-4009-ad5e-107e3840eeb9@github.com> On Wed, 15 Oct 2025 08:57:02 GMT, Beno?t Maillard wrote: >> This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. >> >> ### Analysis >> >> This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced >> and added to this PR as a regression test. >> >> The test contains a switch inside a loop, and stressing the loop peeling results in >> a fairly complex graph. The split-if optimization is applied agressively, and we >> run a verification pass at every progress made. >> >> We end up with a relatively high number of verification passes, with each pass being >> fairly expensive because of the size of the graph. >> Each verification pass requires building a new `IdealLoopTree`. This is quite slow >> (which is unfortunately hard to mitigate), and also causes inefficient memory usage >> on the `ciEnv` arena. >> >> The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. >> At every call, we have >> - One allocation on the `ciEnv` arena to store the returned `ciField` >> - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: >> - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) >> - Pushes the new symbol to the `_symbols` array >> >> The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to >> check if the `BasicType` of a static field is a reference type. >> >> In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols >> (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called >> repeatedly as it is done here. >> >> The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: >> >> >> ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 >> TypeOopPtr::TypeOopPtr type.cpp:3484 >> TypeInstPtr::TypeInstPtr type.cpp:3953 >> TypeInstPtr::make type.cpp:3990 >> TypeInstPtr::add_offset type.cpp:4509 >> AddPNode::bottom_type addnode.cpp:696 >> MemNode::adr_type memnode.cpp:73 >> PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 >> PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 >> PhaseIdealLoop::build_lo... > > Beno?t Maillard has updated the pull request incrementally with four additional commits since the last revision: > > - Add run without fixed stress seed > - Reorder flags > - Remove unnecessary CompileCommand=dontinline > - Change name Few more nits but otherwise, looks good, thanks for the updates! src/hotspot/share/ci/ciInstanceKlass.cpp line 395: > 393: > 394: // ------------------------------------------------------------------ > 395: // ciInstanceKlass::get_non_static_field_by_offset I think you can remove this - we used to add it in the early days but not anymore. Suggestion: src/hotspot/share/ci/ciInstanceKlass.cpp line 399: > 397: for (int i = 0, len = nof_nonstatic_fields(); i < len; i++) { > 398: ciField* field = _nonstatic_fields->at(i); > 399: int field_off = field->offset_in_bytes(); Suggestion: ciField* field = _nonstatic_fields->at(i); int field_off = field->offset_in_bytes(); src/hotspot/share/ci/ciInstanceKlass.cpp line 438: > 436: // ------------------------------------------------------------------ > 437: // ciInstanceKlass::get_field_type_by_offset > 438: // Suggestion: src/hotspot/share/ci/ciInstanceKlass.cpp line 440: > 438: // > 439: // This is essentially a shortcut for: > 440: // get_field_by_offset(field_offset, is_static)->layout_type() Suggestion: // get_field_by_offset(field_offset, is_static)->layout_type() test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 32: > 30: * This is caused by the high number of verification passes triggered > 31: * in PhaseIdealLoop::split_if_with_blocks_post and repetitive memory > 32: * allocations while building the ideal Loop tree in preparation for Suggestion: * allocations while building the ideal loop tree in preparation for test/hotspot/jtreg/compiler/loopopts/TestVerifyLoopOptimizationsHitsMemLimit.java line 49: > 47: * compiler.loopopts.TestVerifyLoopOptimizationsHitsMemLimit > 48: * @run main compiler.loopopts.TestVerifyLoopOptimizationsHitsMemLimit > 49: * Suggestion: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27731#pullrequestreview-3428809389 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2499445193 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2499440534 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2499447501 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2499449135 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2499458403 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2499454835 From epeter at openjdk.org Thu Nov 6 15:40:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 15:40:48 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v23] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: test insert inside insert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/34982b39..34381242 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=21-22 Stats: 34 lines in 1 file changed: 34 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From bmaillard at openjdk.org Thu Nov 6 15:40:59 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 6 Nov 2025 15:40:59 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v9] In-Reply-To: References: Message-ID: > This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. > > ### Analysis > > This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced > and added to this PR as a regression test. > > The test contains a switch inside a loop, and stressing the loop peeling results in > a fairly complex graph. The split-if optimization is applied agressively, and we > run a verification pass at every progress made. > > We end up with a relatively high number of verification passes, with each pass being > fairly expensive because of the size of the graph. > Each verification pass requires building a new `IdealLoopTree`. This is quite slow > (which is unfortunately hard to mitigate), and also causes inefficient memory usage > on the `ciEnv` arena. > > The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. > At every call, we have > - One allocation on the `ciEnv` arena to store the returned `ciField` > - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: > - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) > - Pushes the new symbol to the `_symbols` array > > The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to > check if the `BasicType` of a static field is a reference type. > > In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols > (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called > repeatedly as it is done here. > > The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: > > > ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 > TypeOopPtr::TypeOopPtr type.cpp:3484 > TypeInstPtr::TypeInstPtr type.cpp:3953 > TypeInstPtr::make type.cpp:3990 > TypeInstPtr::add_offset type.cpp:4509 > AddPNode::bottom_type addnode.cpp:696 > MemNode::adr_type memnode.cpp:73 > PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 > PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 > PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827 > PhaseIdealLoop::build_loop_late_post loopnode.cpp:67... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27731/files - new: https://git.openjdk.org/jdk/pull/27731/files/1f13f874..95bbb29c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27731&range=07-08 Stats: 9 lines in 2 files changed: 0 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731 PR: https://git.openjdk.org/jdk/pull/27731 From epeter at openjdk.org Thu Nov 6 15:50:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 15:50:52 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: <-NCDjG92oU0zLXlTtLD5ki2mo72GMU1vKJ5FpDcnrWw=.d02b93c7-a342-4f51-a876-6ac870acc1bc@github.com> On Wed, 5 Nov 2025 13:53:13 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > I think I exhausted my reviewer fuel for today and will resume tomorrow by calling `setFuel(100%)` again :-) I think it is deserved by now :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3497910261 From epeter at openjdk.org Thu Nov 6 15:50:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 15:50:50 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v24] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/34381242..e1d50609 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=22-23 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Thu Nov 6 16:00:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Nov 2025 16:00:09 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi Message-ID: In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. --------- Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. ------------- Commit messages: - add assert - rm debug printing - add test - JDK-8371065 Changes: https://git.openjdk.org/jdk/pull/28113/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371065 Stats: 147 lines in 2 files changed: 143 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28113.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28113/head:pull/28113 PR: https://git.openjdk.org/jdk/pull/28113 From dfenacci at openjdk.org Thu Nov 6 16:13:42 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 6 Nov 2025 16:13:42 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v9] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 15:40:59 GMT, Beno?t Maillard wrote: >> This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. >> >> ### Analysis >> >> This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced >> and added to this PR as a regression test. >> >> The test contains a switch inside a loop, and stressing the loop peeling results in >> a fairly complex graph. The split-if optimization is applied agressively, and we >> run a verification pass at every progress made. >> >> We end up with a relatively high number of verification passes, with each pass being >> fairly expensive because of the size of the graph. >> Each verification pass requires building a new `IdealLoopTree`. This is quite slow >> (which is unfortunately hard to mitigate), and also causes inefficient memory usage >> on the `ciEnv` arena. >> >> The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. >> At every call, we have >> - One allocation on the `ciEnv` arena to store the returned `ciField` >> - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: >> - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) >> - Pushes the new symbol to the `_symbols` array >> >> The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to >> check if the `BasicType` of a static field is a reference type. >> >> In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols >> (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called >> repeatedly as it is done here. >> >> The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: >> >> >> ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 >> TypeOopPtr::TypeOopPtr type.cpp:3484 >> TypeInstPtr::TypeInstPtr type.cpp:3953 >> TypeInstPtr::make type.cpp:3990 >> TypeInstPtr::add_offset type.cpp:4509 >> AddPNode::bottom_type addnode.cpp:696 >> MemNode::adr_type memnode.cpp:73 >> PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 >> PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 >> PhaseIdealLoop::build_lo... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Looks good to me too. Thanks @benoitmaillard! src/hotspot/share/ci/ciInstanceKlass.cpp line 435: > 433: > 434: // This is essentially a shortcut for: > 435: // get_field_by_offset(field_offset, is_static)->layout_type() Suggestion: // get_field_by_offset(field_offset, is_static)->layout_type() Did you actually want the indentation? ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/27731#pullrequestreview-3429025686 PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2499625670 From bmaillard at openjdk.org Thu Nov 6 16:27:43 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 6 Nov 2025 16:27:43 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v9] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 15:58:24 GMT, Damon Fenacci wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > src/hotspot/share/ci/ciInstanceKlass.cpp line 435: > >> 433: >> 434: // This is essentially a shortcut for: >> 435: // get_field_by_offset(field_offset, is_static)->layout_type() > > Suggestion: > > // get_field_by_offset(field_offset, is_static)->layout_type() > > > Did you actually want the indentation? It was on purpose yes, I think it was to avoid making it look like code that was commented out ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27731#discussion_r2499762600 From psandoz at openjdk.org Thu Nov 6 16:28:55 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 6 Nov 2025 16:28:55 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v5] In-Reply-To: References: Message-ID: <-UaCystBHqWgcBLIphPuAb3r7XYDU_B28-vzPZXK06M=.2c5c284b-6ae7-4659-93dd-0789cd9f7c14@github.com> On Fri, 31 Oct 2025 01:40:13 GMT, Xiaohong Gong wrote: > Hi @PaulSandoz , @sviswa7, would you mind taking look at the changes on jdk.incubator.vector tests part? It would be more helpful if I can get any feedback from you. Thanks a lot in advance! I recommend reverting changes to the smoke test, as i don't think that is sufficient. Since you already have IR tests in place we can follow up with a proper set of unit tests operating over long arrays like we do for boolean and with various mask operations. If you log the follow up issue i might be able to find someone else to write those tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3498193624 From qamai at openjdk.org Thu Nov 6 17:06:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 6 Nov 2025 17:06:50 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 15:20:37 GMT, Emanuel Peter wrote: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. > > The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. > > I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. LGTM, another approach may be to add a method `Type::local_bottom` that will return the local bottom of a type (e.g. `Type::INT` for an `Int`). ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/28113#pullrequestreview-3429411423 From vlivanov at openjdk.org Thu Nov 6 23:07:02 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 6 Nov 2025 23:07:02 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v4] In-Reply-To: <-9uTmVk3XFV39gQjQp5NQsedrwYRUN2TVIaAOMB1pvA=.9819a01c-e05a-4569-a59c-0f90d3c4c161@github.com> References: <-9uTmVk3XFV39gQjQp5NQsedrwYRUN2TVIaAOMB1pvA=.9819a01c-e05a-4569-a59c-0f90d3c4c161@github.com> Message-ID: On Wed, 5 Nov 2025 13:01:29 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by vlivanov (Reviewer). src/hotspot/share/opto/compile.hpp line 472: > 470: > 471: int _late_inlines_pos; // Where in the queue should the next late inlining candidate go (emulate depth first inlining) > 472: bool _has_mh_late_inlines; // Any method handle late inlining still pending? The comment is slightly misleading. As it is now, `_has_mh_late_inlines` signals that a late inline candidate has been observed during compilation. ------------- PR Review: https://git.openjdk.org/jdk/pull/28088#pullrequestreview-3430876037 PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2501125577 From dlong at openjdk.org Fri Nov 7 00:04:05 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 7 Nov 2025 00:04:05 GMT Subject: RFR: 8371297: C2: assert triggerred in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 22:18:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Not sure how this one slipped in (https://github.com/openjdk/jdk/pull/28047/), I think it passed my local test and github CI test at that time. > Please check https://bugs.openjdk.org/browse/JDK-8371297 for details. > > Thanks! OK, I'm fine with cleaning this up as a separate issue. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28141#pullrequestreview-3431034140 From xgong at openjdk.org Fri Nov 7 01:30:13 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 7 Nov 2025 01:30:13 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v5] In-Reply-To: <-UaCystBHqWgcBLIphPuAb3r7XYDU_B28-vzPZXK06M=.2c5c284b-6ae7-4659-93dd-0789cd9f7c14@github.com> References: <-UaCystBHqWgcBLIphPuAb3r7XYDU_B28-vzPZXK06M=.2c5c284b-6ae7-4659-93dd-0789cd9f7c14@github.com> Message-ID: On Thu, 6 Nov 2025 16:25:49 GMT, Paul Sandoz wrote: > > Hi @PaulSandoz , @sviswa7, would you mind taking look at the changes on jdk.incubator.vector tests part? It would be more helpful if I can get any feedback from you. Thanks a lot in advance! > > I recommend reverting changes to the smoke test, as i don't think that is sufficient. Since you already have IR tests in place we can follow up with a proper set of unit tests operating over long arrays like we do for boolean and with various mask operations. If you log the follow up issue i might be able to find someone else to write those tests. Sounds reasonable to me @PaulSandoz ! I will revert the smoke tests in next commit. And I'v filed another JBS as a follow-up for the unit tests: https://bugs.openjdk.org/browse/JDK-8371446. Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3500073516 From fyang at openjdk.org Fri Nov 7 04:14:04 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 7 Nov 2025 04:14:04 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v17] In-Reply-To: <9uwW544WzaCZ1u1LwmNZW4_8mSyxQK9aPV8Uv54NtRA=.4facae4d-76e7-4780-a34c-c7d5ee114503@github.com> References: <9uwW544WzaCZ1u1LwmNZW4_8mSyxQK9aPV8Uv54NtRA=.4facae4d-76e7-4780-a34c-c7d5ee114503@github.com> Message-ID: On Thu, 6 Nov 2025 11:06:24 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix a jtreg problem src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2761: > 2759: __ mv(t0, 52); > 2760: __ blt(keylen, t0, L_aes128_loop_next); > 2761: __ beq(keylen, t0, L_aes192_loop_next); I think these branches in the loop could be saved if we do versioning according to keylen. Then we only need to do two branches on entry to choose the right version. And this also applies in the case of loadkeys. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2501613720 From vlivanov at openjdk.org Fri Nov 7 05:22:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 05:22:13 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> <59uQ1nJm-iOr7-kNcauhuBC7wEQLs08e4Gi81SwJ9TU=.ffd0c0f9-abd4-483a-8967-d07dde2bd0a3@github.com> Message-ID: On Tue, 4 Nov 2025 08:30:08 GMT, Emanuel Peter wrote: > why I'm asking for more asserts that at least block some wrong usages Ok. As I said earlier, it doesn't look like `dead->is_dead()` would work here. Any other checks you have in mind? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2501717753 From xgong at openjdk.org Fri Nov 7 05:41:37 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 7 Nov 2025 05:41:37 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v6] In-Reply-To: References: Message-ID: > The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. > > For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. > > These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. > > This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. > > It also modifies the Vector API jtreg tests for well testing. Here is the details: > > 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: > > VectorMaskToLong (VectorLongToMask l) => l > > Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. > > 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". > > Performance shows significant improvement on NVIDIA's Grace CPU. > > Here is the performance data with `-XX:UseSVE=2`: > > Benchmark bits inputs Mode Unit Before After Gain > MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 > MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 > MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 > MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 > MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/ms 101031... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Revert smoke test changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27481/files - new: https://git.openjdk.org/jdk/pull/27481/files/40c2df04..97ba1748 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=04-05 Stats: 181 lines in 32 files changed: 0 ins; 75 del; 106 mod Patch: https://git.openjdk.org/jdk/pull/27481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27481/head:pull/27481 PR: https://git.openjdk.org/jdk/pull/27481 From epeter at openjdk.org Fri Nov 7 06:07:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 06:07:22 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v2] In-Reply-To: References: Message-ID: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. > > The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. > > I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add diagnostic flag for product build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28113/files - new: https://git.openjdk.org/jdk/pull/28113/files/2bb36dee..4dfd6100 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28113.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28113/head:pull/28113 PR: https://git.openjdk.org/jdk/pull/28113 From epeter at openjdk.org Fri Nov 7 06:07:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 06:07:23 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v2] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 17:03:02 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add diagnostic flag for product build > > LGTM, another approach may be to add a method `Type::local_bottom` that will return the local bottom of a type (e.g. `Type::INT` for an `Int`). @merykitty Thanks for the approval! I just noticed that the attached test failed because of a missing `-XX:+UnlockDiagnosticVMOptions` for product. Yes, there are a few alternatives. I don't really want to extend the type system in this bugfix... I also don't really want to mess with the types of all phis in the loop, because we could also lose information on some nodes that way. For example, the tripcount may have a restricted range, and I don't want to just widen that to `int`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28113#issuecomment-3500897330 From xgong at openjdk.org Fri Nov 7 06:15:05 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 7 Nov 2025 06:15:05 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v5] In-Reply-To: <-UaCystBHqWgcBLIphPuAb3r7XYDU_B28-vzPZXK06M=.2c5c284b-6ae7-4659-93dd-0789cd9f7c14@github.com> References: <-UaCystBHqWgcBLIphPuAb3r7XYDU_B28-vzPZXK06M=.2c5c284b-6ae7-4659-93dd-0789cd9f7c14@github.com> Message-ID: On Thu, 6 Nov 2025 16:25:49 GMT, Paul Sandoz wrote: >> Hi @sviswa7, @jatin-bhateja , could you please help take a look at this PR especially the X86 changes? Thanks so much! >> >> Hi @PaulSandoz , @sviswa7, would you mind taking look at the changes on jdk.incubator.vector tests part? It would be more helpful if I can get any feedback from you. Thanks a lot in advance! > >> Hi @PaulSandoz , @sviswa7, would you mind taking look at the changes on jdk.incubator.vector tests part? It would be more helpful if I can get any feedback from you. Thanks a lot in advance! > > I recommend reverting changes to the smoke test, as i don't think that is sufficient. Since you already have IR tests in place we can follow up with a proper set of unit tests operating over long arrays like we do for boolean and with various mask operations. If you log the follow up issue i might be able to find someone else to write those tests. Hi @PaulSandoz , I'v reverted the smoke test changes in latest commit. Would you mind taking another look? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3500915597 From vlivanov at openjdk.org Fri Nov 7 06:38:44 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 06:38:44 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v21] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: - updates - Merge branch 'master' into 8290892.rf - cleanups - Fix merge - Merge branch 'master' into 8290892.rf - cleanup - update - Merge remote-tracking branch 'origin/master' into 8290892.rf - Merge branch 'master' into 8290892.rf - scalarization support - ... and 20 more: https://git.openjdk.org/jdk/compare/87966112...0a850b64 ------------- Changes: https://git.openjdk.org/jdk/pull/25315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=20 Stats: 1517 lines in 38 files changed: 1455 ins; 20 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Fri Nov 7 06:38:45 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 06:38:45 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Tue, 4 Nov 2025 08:27:41 GMT, Emanuel Peter wrote: >> The intention is to optimize RF nodes irrespective of whether loop optimizations are performed or not. (It mimics similar logic for expensive nodes.) > > Can you make that explicit with a code comment please? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2501851765 From vlivanov at openjdk.org Fri Nov 7 06:38:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 06:38:46 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> <59uQ1nJm-iOr7-kNcauhuBC7wEQLs08e4Gi81SwJ9TU=.ffd0c0f9-abd4-483a-8967-d07dde2bd0a3@github.com> Message-ID: <3F8YIADlKJItx-Xt5eNTVJM5P7gYs-11FZx1u2JS8Uk=.8ddbf1c9-46da-4455-8366-a7edeaa0930e@github.com> On Fri, 7 Nov 2025 05:19:11 GMT, Vladimir Ivanov wrote: >>> Does VerifyLoopOptimizations catch it? >> >> Probably not, because it is not very strong yet. I will probably soon work on it again, to make sure we have stronger invariants, or at least enforcing our implicit invariants ;) >> >> In your case, you probably are using your new `remove_dead_node` in the way it "should" be used. But since this is a public API, someone will probably come along in the future and use it in unintended ways. That's why I'm asking for more asserts that at least block some wrong usages ;) > >> why I'm asking for more asserts that at least block some wrong usages > > Ok. As I said earlier, it doesn't look like `dead->is_dead()` would work here. Any other checks you have in mind? The method only works for data nodes, so I renamed it to `remove_dead_data_node` and added `!dead->is_CFG()` assert (it was already checked, but down the call tree). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2501851259 From vlivanov at openjdk.org Fri Nov 7 06:38:47 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 06:38:47 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 15:37:23 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix merge > > src/hotspot/share/opto/reachability.cpp line 102: > >> 100: return true; >> 101: } >> 102: } > > Can you explain in a code comment? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2501851482 From vlivanov at openjdk.org Fri Nov 7 06:43:08 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 06:43:08 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Thu, 30 Oct 2025 12:49:09 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix merge > > src/hotspot/share/opto/c2_globals.hpp line 86: > >> 84: \ >> 85: product(bool, StressReachabilityFences, false, DIAGNOSTIC, \ >> 86: "Aggressively insert reachability fences for all oop arguments") \ > > It could be nice if you gave some more detail here, what these flags do. I slightly reworded the comments. But it doesn't look like a good place to dive into the details. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2501862667 From vlivanov at openjdk.org Fri Nov 7 07:14:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 07:14:29 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/0a850b64..01d0b175 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=20-21 Stats: 12 lines in 1 file changed: 0 ins; 2 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Fri Nov 7 07:14:30 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 07:14:30 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Tue, 4 Nov 2025 08:36:21 GMT, Emanuel Peter wrote: >> `is_significant_sfpt()` encodes a white list consisting of cases which can be safely ignored when it comes to reachability tracking. An overlooked case is a missed optimization opportunity. >> >>> But can you be more specific? >> >> Are you suggesting to expand the comment or change the name? >> >> Speaking of the name, it's a local definition inside `reachabiltiy.cpp`: >> >> // Detect safepoint nodes which are important for reachability tracking purposes. >> static bool is_significant_sfpt(Node* n) { >> >> >> `Significant` term is declarative. Alternatively, an imperative term (like, "ignore" and `ignore_sfpt`) can be used. But I find it the current version clearer. > >> is_significant_sfpt() encodes a white list consisting of cases which can be safely ignored when it comes to reachability tracking. An overlooked case is a missed optimization opportunity. > > Sounds good. Can you add a code comment for that, please? > > Ok, I'm fine with keeping the name. But it might make sense to link to where the "significance" term is defined. Because otherwise it is a concept without any clear definition, and hard for the reader to understand. You have to infer the definition from the usage, and that often leads to unclear definitions that shift over time, and eventually the concept even is incoherent. A clear definition can also help if we have a bug: we can clearly see what we missed and what we might have to do to fix it. Ok, I got rid of "significant" in favor of "interfering" (new name is `is_interfering_sfpt_candidate`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2501928414 From chagedorn at openjdk.org Fri Nov 7 07:43:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Nov 2025 07:43:07 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 06:07:22 GMT, Emanuel Peter wrote: >> In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. >> >> The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. >> >> I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. >> >> --------- >> >> Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. >> I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add diagnostic flag for product build Looks good to me, too, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28113#pullrequestreview-3432084288 From chagedorn at openjdk.org Fri Nov 7 07:44:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Nov 2025 07:44:09 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v9] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 15:40:59 GMT, Beno?t Maillard wrote: >> This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. >> >> ### Analysis >> >> This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced >> and added to this PR as a regression test. >> >> The test contains a switch inside a loop, and stressing the loop peeling results in >> a fairly complex graph. The split-if optimization is applied agressively, and we >> run a verification pass at every progress made. >> >> We end up with a relatively high number of verification passes, with each pass being >> fairly expensive because of the size of the graph. >> Each verification pass requires building a new `IdealLoopTree`. This is quite slow >> (which is unfortunately hard to mitigate), and also causes inefficient memory usage >> on the `ciEnv` arena. >> >> The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. >> At every call, we have >> - One allocation on the `ciEnv` arena to store the returned `ciField` >> - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: >> - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) >> - Pushes the new symbol to the `_symbols` array >> >> The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to >> check if the `BasicType` of a static field is a reference type. >> >> In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols >> (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called >> repeatedly as it is done here. >> >> The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: >> >> >> ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 >> TypeOopPtr::TypeOopPtr type.cpp:3484 >> TypeInstPtr::TypeInstPtr type.cpp:3953 >> TypeInstPtr::make type.cpp:3990 >> TypeInstPtr::add_offset type.cpp:4509 >> AddPNode::bottom_type addnode.cpp:696 >> MemNode::adr_type memnode.cpp:73 >> PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 >> PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 >> PhaseIdealLoop::build_lo... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Update looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27731#pullrequestreview-3432087050 From thartmann at openjdk.org Fri Nov 7 07:50:15 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 7 Nov 2025 07:50:15 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v3] In-Reply-To: References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> Message-ID: On Mon, 27 Oct 2025 18:08:40 GMT, Chad Rakoczy wrote: >>> @chadrako, is PR ready for testing now? >> >> Yes > >> @chadrako I think my suggestion was not correct. We should revert back to your first changes for `@requires`. Original code was correct and only `serviceability/jvmti/NMethodRelocation/NMethodRelocationTest.java` missed it. > > Since the tests get run with different GCs anyways I don't think we need to explicitly require the GC that they run with and just have one test config FTR, even with this fix, we still see failures: [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121), [JDK-8371046](https://bugs.openjdk.org/browse/JDK-8371046), [JDK-8369150](https://bugs.openjdk.org/browse/JDK-8369150). @chadrako It would be great if you could prioritize fixing these remaining issues, as the failures cause quite some noise in our testing. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3501148078 From chagedorn at openjdk.org Fri Nov 7 07:59:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Nov 2025 07:59:08 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v5] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 15:07:57 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. >> >> This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). >> >> However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. >> >> This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. >> As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. >> >> ```c++ >> ... >> // Global Value Numbering >> i = hash_find_insert(k); // Check for pre-existing node >> if (i && (i != k)) { >> // Return the pre-existing node if it isn't dead >> NOT_PRODUCT(set_progress();) >> add_users_to_worklist(k); >> subsume_node(k, i); // Everybody using k now uses i >> return i; >> } >> ... >> >> >> The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. >> >> ### Proposed Fix >> >> We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) >> - [x] tier1-3, plus some internal testing >> - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Add comments to clarify how add_users_to_worklist and add_users_of_use_to_worklist src/hotspot/share/opto/phaseX.hpp line 536: > 534: // optimizations have dependencies that extend beyond a node's direct > 535: // inputs, so it is necessary to ensure the appropriate notifications > 536: // are made here. Maybe also add what 'n' is. Could it be named 'parent'? src/hotspot/share/opto/phaseX.hpp line 542: > 540: // affected by changes to 'n', to the worklist. > 541: // This function may be called with a node that is about to be > 542: // replaced as argument 'n'. In this case, 'n' should not be considered By another node? Suggestion: // replaced by another node. In this case, 'n' should not be considered ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2502010601 PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2502016964 From mli at openjdk.org Fri Nov 7 09:09:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Nov 2025 09:09:03 GMT Subject: RFR: 8371297: C2: assert triggerred in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 00:00:52 GMT, Dean Long wrote: > OK, I'm fine with cleaning this up as a separate issue. @dean-long Thank you for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28141#issuecomment-3501395652 From fgao at openjdk.org Fri Nov 7 09:45:24 2025 From: fgao at openjdk.org (Fei Gao) Date: Fri, 7 Nov 2025 09:45:24 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v3] In-Reply-To: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: > In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the > `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. > > Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. > > To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. > > The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. > > This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. > > The whole process is done by the function `insert_post_loop()`. > > We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: > > 1. The fall-in control flow to the vectorized drain loop comes from a `RegionNode` merging exits ... Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Fixed new test failures after rebasing and refined parts of the code to address review comments - Merge branch 'master' into optimize-atomic-post - Merge branch 'master' into optimize-atomic-post - Clean up comments for consistency and add spacing for readability - Fix some corner case failures and refined part of code - Merge branch 'master' into optimize-atomic-post - Refine ascii art, rename some variables and resolve conflicts - Merge branch 'master' into optimize-atomic-post - Add necessary ASCII art, refactor insert_post_loop() and rename "atomic post loop" with "vectorized drain loop. - Merge branch 'master' into optimize-atomic-post - ... and 1 more: https://git.openjdk.org/jdk/compare/eab5644a...e21a830f ------------- Changes: https://git.openjdk.org/jdk/pull/22629/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22629&range=02 Stats: 1598 lines in 8 files changed: 1391 ins; 63 del; 144 mod Patch: https://git.openjdk.org/jdk/pull/22629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22629/head:pull/22629 PR: https://git.openjdk.org/jdk/pull/22629 From fyang at openjdk.org Fri Nov 7 10:12:48 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 7 Nov 2025 10:12:48 GMT Subject: RFR: 8371385: compiler/escapeAnalysis/TestRematerializeObjects.java fails in case of -XX:-UseUnalignedAccesses In-Reply-To: References: <82OW0gJBKDYOCkj3ExvHNebrgRnhIlHa6n_yEaHN1i0=.972f50ae-b39f-484b-b939-6aaa535df8f0@github.com> Message-ID: On Thu, 6 Nov 2025 06:37:35 GMT, Christian Hagedorn wrote: >> Hi, Please consider this small change fixing a test failure. >> >> Two IR rules failed under -XX:-EliminateAllocations on platforms with -XX:-UseUnalignedAccesses. >> These are expecting MergeStores to combine and emit StoreL or StoreI. But the enablement of MergeStores >> depends on flag UseUnalignedAccesses [1]. So this simply add that condition to applyIf of the two IR rules. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L3455 > > That looks good to me, thanks! @chhagedorn @dafedafe : Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28171#issuecomment-3501641203 From fyang at openjdk.org Fri Nov 7 10:12:50 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 7 Nov 2025 10:12:50 GMT Subject: Integrated: 8371385: compiler/escapeAnalysis/TestRematerializeObjects.java fails in case of -XX:-UseUnalignedAccesses In-Reply-To: <82OW0gJBKDYOCkj3ExvHNebrgRnhIlHa6n_yEaHN1i0=.972f50ae-b39f-484b-b939-6aaa535df8f0@github.com> References: <82OW0gJBKDYOCkj3ExvHNebrgRnhIlHa6n_yEaHN1i0=.972f50ae-b39f-484b-b939-6aaa535df8f0@github.com> Message-ID: On Thu, 6 Nov 2025 03:11:14 GMT, Fei Yang wrote: > Hi, Please consider this small change fixing a test failure. > > Two IR rules failed under -XX:-EliminateAllocations on platforms with -XX:-UseUnalignedAccesses. > These are expecting MergeStores to combine and emit StoreL or StoreI. But the enablement of MergeStores > depends on flag UseUnalignedAccesses [1]. So this simply add that condition to applyIf of the two IR rules. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L3455 This pull request has now been integrated. Changeset: 59d23095 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/59d23095789bbb6d4e466bcbeb82089b17d78eae Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8371385: compiler/escapeAnalysis/TestRematerializeObjects.java fails in case of -XX:-UseUnalignedAccesses Reviewed-by: chagedorn, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/28171 From dfenacci at openjdk.org Fri Nov 7 10:56:05 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 7 Nov 2025 10:56:05 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v4] In-Reply-To: <3ARFIgZH4AaInkMBuleV6uwHIlrq1s5zvzzMmcaiUtE=.c6b6ee78-cf6e-445c-8781-fa886f57b69b@github.com> References: <3ARFIgZH4AaInkMBuleV6uwHIlrq1s5zvzzMmcaiUtE=.c6b6ee78-cf6e-445c-8781-fa886f57b69b@github.com> Message-ID: On Wed, 5 Nov 2025 12:56:07 GMT, Roland Westrelin wrote: >> AFAIR at some point I was getting the same assert failure `assert(_number_of_mh_late_inlines > 0)` and noticed that we were re-registering method handles for late inlining without incrementing the counter. > > Do you remember what tests that was with? At that time the failing test was `compiler/intrinsics/TestArraysHashCode.java` (not really related). I can't make it trigger the assert anymore now though... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2502781249 From duke at openjdk.org Fri Nov 7 11:04:34 2025 From: duke at openjdk.org (Zihao Lin) Date: Fri, 7 Nov 2025 11:04:34 GMT Subject: RFR: 8369993: Redundant separate 'String' check in 'trust_final_non_static_fields' ciField.cpp Message-ID: Remove redundant separate 'String' check in 'trust_final_non_static_fields' ciField.cpp ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/28191/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28191&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369993 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28191.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28191/head:pull/28191 PR: https://git.openjdk.org/jdk/pull/28191 From epeter at openjdk.org Fri Nov 7 11:13:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 11:13:28 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: <3F8YIADlKJItx-Xt5eNTVJM5P7gYs-11FZx1u2JS8Uk=.8ddbf1c9-46da-4455-8366-a7edeaa0930e@github.com> References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> <59uQ1nJm-iOr7-kNcauhuBC7wEQLs08e4Gi81SwJ9TU=.ffd0c0f9-abd4-483a-8967-d07dde2bd0a3@github.com> <3F8YIADlKJItx-Xt5eNTVJM5P7gYs-11FZx1u2JS8Uk=.8ddbf1c9-46da-4455-8366-a7edeaa0930e@github.com> Message-ID: <_a8_2HKZWXFnFI1ipBa0ahKOLjKP3748iLJmEFfg22k=.480925ea-074c-478c-b384-9b2b791c1ed5@github.com> On Fri, 7 Nov 2025 06:34:34 GMT, Vladimir Ivanov wrote: >>> why I'm asking for more asserts that at least block some wrong usages >> >> Ok. As I said earlier, it doesn't look like `dead->is_dead()` would work here. Any other checks you have in mind? > > The method only works for data nodes, so I renamed it to `remove_dead_data_node` and added `!dead->is_CFG()` assert (it was already checked, but down the call tree). Nice, thanks for the update! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502508103 From epeter at openjdk.org Fri Nov 7 11:13:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 11:13:29 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: <9QpCytaix4_SUNjU6eUdrUcXu_PuQ6CpKSUaree9vrY=.fcd87f13-5eca-4117-89d6-be093ee0f10d@github.com> On Mon, 3 Nov 2025 21:24:19 GMT, Vladimir Ivanov wrote: >> Ah no, in all cases I could see it was actually the `rf` itself, right? Why not give it a more specific name? > >> But it seems you are using it from different places. Can you find a better name? > > Both IGVN (`ReachabilityFenceNode::Identity()`) and `PhaseIdealLoop` perform redundant RF elimination, so `is_redundant_rf_helper` is there so they can share the same implementation. I could name it `is_redundant_rf`, but it doesn't look like an improvement to me. It's tailored specifically for those 2 particular use cases and it is not intended to be used outside `reachability.cpp`. It is just strange to see a `helper` without its "master", like it it looks now in `ReachabilityFenceNode::Identity`. Right, naming multiple methods the same is generally also confusing. Maybe `PhaseIdealLoop::is_redundant_rf` could be `PhaseIdealLoop::is_redundant_rf_in_loop`? Would that even be an accurate name? But I'll let you decide, if it's too hard to find something better, I can live with `_helper`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502579039 From epeter at openjdk.org Fri Nov 7 11:13:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 11:13:30 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19] In-Reply-To: References: <8GeyG4p4gmMjAPoP4FRNoJ9zL8I42hvmy7Wzo_tFY8k=.189f51d1-4003-4da9-97f1-765a70e45573@github.com> Message-ID: On Fri, 7 Nov 2025 07:09:37 GMT, Vladimir Ivanov wrote: >>> is_significant_sfpt() encodes a white list consisting of cases which can be safely ignored when it comes to reachability tracking. An overlooked case is a missed optimization opportunity. >> >> Sounds good. Can you add a code comment for that, please? >> >> Ok, I'm fine with keeping the name. But it might make sense to link to where the "significance" term is defined. Because otherwise it is a concept without any clear definition, and hard for the reader to understand. You have to infer the definition from the usage, and that often leads to unclear definitions that shift over time, and eventually the concept even is incoherent. A clear definition can also help if we have a bug: we can clearly see what we missed and what we might have to do to fix it. > > Ok, I got rid of "significant" in favor of "interfering" (new name is `is_interfering_sfpt_candidate`). Nice! `is_interfering_sfpt_candidate` is definitively a big improvement :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502634482 From epeter at openjdk.org Fri Nov 7 11:13:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 11:13:32 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> On Fri, 7 Nov 2025 10:30:46 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/reachability.cpp line 383: > >> 381: // Linearly traverse CFG upwards starting at n until first merge point. >> 382: // All encountered safepoints are recorded in safepoints list. >> 383: static void linear_traversal(Node* n, Node_Stack& worklist, VectorSet& visited, Node_List& safepoints) { > > The second comment line does not sound accurate, we don't collect ALL, only the candidates. Maybe we can find a better method name, and even remove that comment because of it? > > Given the more useful sub query `is_interfering_sfpt_candidate`, I think we could name this method something like `collect_interfering_sfpt_candidates`. Or is it very important that this is a linear traversal vs some other traversal we could choose from? Hmm, but this here is only a component of the `enumerate_interfering_sfpts` below, which has essencially that name. So maybe it should be `collect_interfering_sfpt_candidates_for_node` here and just `collect_interfering_sfpt_candidates` below? > src/hotspot/share/opto/reachability.cpp line 409: > >> 407: visited.set(referent_ctrl->_idx); // end point >> 408: >> 409: Node_Stack stack(0); > > `ResouceMark`? We call this many times, so not sure if this could explode somehow? > src/hotspot/share/opto/reachability.cpp line 409: > >> 407: visited.set(referent_ctrl->_idx); // end point >> 408: >> 409: Node_Stack stack(0); > > Also: you now call it `stack` out here but `worklist` inside `linear_traversal`. I would use a consistent name. And why not use a `Unique_Node_List`, to unite the `visited` and `stack` into a single `worklist`? > src/hotspot/share/opto/reachability.cpp line 461: > >> 459: if (!is_redundant_rf(rf, false /*rf_only*/)) { >> 460: Node_List safepoints; >> 461: enumerate_interfering_sfpts(rf, this, safepoints); > > Could this explode if we have a lot of RF in the graph? Do we need a ResouceMark, or reuse the `safeponts` node list? > > Imagine something like this: > > referent > if (flag1) { something } else { something } > ... > if (flag100) { something } else { something } > if (x1) { RF(referent); } > ... > if (x100) { RF(referent); } > > So we would call `enumerate_interfering_sfpts` 100x, and then traverse the graph with about 100-400 nodes each time. You can see how that grows quadratically. Maybe that's fine for runtime, but is it also ok for memory? And what if we find a lot of SafePoints for each RF? Do we end up attaching quadratically many referent edges over all? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502686333 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502689245 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502736689 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502848017 From epeter at openjdk.org Fri Nov 7 11:13:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 11:13:33 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> References: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> Message-ID: On Fri, 7 Nov 2025 10:43:54 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/reachability.cpp line 409: >> >>> 407: visited.set(referent_ctrl->_idx); // end point >>> 408: >>> 409: Node_Stack stack(0); >> >> Also: you now call it `stack` out here but `worklist` inside `linear_traversal`. I would use a consistent name. > > And why not use a `Unique_Node_List`, to unite the `visited` and `stack` into a single `worklist`? Ah. Right, at first I did not see that you are using a stack, which id not a node list. It also has the idx. In my experience, this usually creates code that is a little harder to read. I prefer using a `Unique_Node_List`, and then just traverse over all ctrl inputs, and add those to the worklist. You have to special case Region, and all other CFG nodes only have ctrl on `in(0)`. It tends to nicely flatten the whole BFS traversal into a small loop. But maybe it does use just a bit more memory than your traversal. Just an idea, I can probably find a way to wrap my head around this approach here too ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502821592 From epeter at openjdk.org Fri Nov 7 11:13:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 11:13:27 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 07:14:29 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > update First batch of today. I just responded to some of your responses, and dug into random places ;) src/hotspot/share/opto/loopnode.cpp line 5147: > 5145: assert(do_expensive_nodes || do_optimize_reachability_fences, "why are we here?"); > 5146: // Use a change to optimize reachability fence nodes irrespective of > 5147: // whether loop optimizations are performed or not. What do you mean by `Use a change`? src/hotspot/share/opto/reachability.cpp line 88: > 86: // RF is redundant for some referent oop when the referent has another user which keeps it alive across the RF. > 87: // In terms of dominance relation it can be formulated as "a referent has a user which is dominated by the redundant RF". > 88: // Until loop opts are over, only RF nodes are considered as usages (controlled by rf_only flag). Can you say why until after loop opts are over only RF are considered? How does this play with allocation elimination etc? What if we run this after loop opts where we still have the allocation, but before the elimination. And then we eventually lose all referents. Could something like that happen? src/hotspot/share/opto/reachability.cpp line 108: > 106: return true; // ignore fences on boxed primitives produced by valueOf methods > 107: } > 108: } Nice, thanks for adding the comment. I'm trying to understand the reason why that is ok. So someone would have set a RF for a boxed primitive. But we don't expect anything to ever be attached to a boxed primitive, and so we can just ignore these RF? Is that the reason? Might be worth writing it in a code comment explicitly. src/hotspot/share/opto/reachability.cpp line 186: > 184: return false; // uncommon traps are exit points > 185: } > 186: return true; Suggestion: // By default, we return a conservative answer, and assume it could interfere. return true; src/hotspot/share/opto/reachability.cpp line 383: > 381: // Linearly traverse CFG upwards starting at n until first merge point. > 382: // All encountered safepoints are recorded in safepoints list. > 383: static void linear_traversal(Node* n, Node_Stack& worklist, VectorSet& visited, Node_List& safepoints) { The second comment line does not sound accurate, we don't collect ALL, only the candidates. Maybe we can find a better method name, and even remove that comment because of it? Given the more useful sub query `is_interfering_sfpt_candidate`, I think we could name this method something like `collect_interfering_sfpt_candidates`. Or is it very important that this is a linear traversal vs some other traversal we could choose from? src/hotspot/share/opto/reachability.cpp line 384: > 382: // All encountered safepoints are recorded in safepoints list. > 383: static void linear_traversal(Node* n, Node_Stack& worklist, VectorSet& visited, Node_List& safepoints) { > 384: for (Node* ctrl = n; ctrl != nullptr; ctrl = ctrl->in(0)) { This "fast-forwarding" looks a bit like an optimization. Why not just add all CFG nodes on the worklist, would that not simplify the graph a little? Or did you find a case where this was really important? src/hotspot/share/opto/reachability.cpp line 409: > 407: visited.set(referent_ctrl->_idx); // end point > 408: > 409: Node_Stack stack(0); `ResouceMark`? src/hotspot/share/opto/reachability.cpp line 409: > 407: visited.set(referent_ctrl->_idx); // end point > 408: > 409: Node_Stack stack(0); Also: you now call it `stack` out here but `worklist` inside `linear_traversal`. I would use a consistent name. src/hotspot/share/opto/reachability.cpp line 441: > 439: > 440: assert(OptimizeReachabilityFences, "required"); > 441: assert(C->post_loop_opts_phase(), "required"); You could give the reason in the assert message ;) src/hotspot/share/opto/reachability.cpp line 446: > 444: ResourceMark rm; > 445: Unique_Node_List redundant_rfs; > 446: Node_List worklist; Looks like this `worklist` is really a list of `` pairs. Consider making it a `GrowableArray` with a pair type and also consider giving it a more descriptive name, calling it by whatever we collect in it. Maybe `sfpt_referent_pairs`? Also: why do we need this worklist, and not just attach the referent to the sfpt eagerly? And: can it be that we add the same pair multiple times? Is that intentional? src/hotspot/share/opto/reachability.cpp line 461: > 459: if (!is_redundant_rf(rf, false /*rf_only*/)) { > 460: Node_List safepoints; > 461: enumerate_interfering_sfpts(rf, this, safepoints); Could this explode if we have a lot of RF in the graph? Do we need a ResouceMark, or reuse the `safeponts` node list? Imagine something like this: referent if (flag1) { something } else { something } ... if (flag100) { something } else { something } if (x1) { RF(referent); } ... if (x100) { RF(referent); } So we would call `enumerate_interfering_sfpts` 100x, and then traverse the graph with about 100-400 nodes each time. You can see how that grows quadratically. Maybe that's fine for runtime, but is it also ok for memory? ------------- PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-3432711599 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502498455 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502534061 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502617200 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502647790 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502669143 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502796906 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502680412 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502724464 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502837337 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502778969 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502745140 From mdoerr at openjdk.org Fri Nov 7 11:29:05 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 7 Nov 2025 11:29:05 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 07:36:57 GMT, Richard Reingruber wrote: > With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. > > It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. > > The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. > > The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. > Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. > > So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) > > There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. > > Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. > > ##### Testing with fastdebug builds on AARCH64 and PPC64: > > hotspot_vector_1 > hotspot_vector_2 > jdk_vector > jdk_vector_sanity > > ##### The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: > > compiler/vectorapi/VectorRearrangeTest.java > jdk/incubator/vector/Byte128VectorLoadStoreTests.java > jdk/incubator/vector/Double256VectorLoadStoreTests.java > jdk/incubator/vector/Float128VectorTests.java > jdk/incubator/vector/Long256VectorLoadStoreTests.java > jdk/incubator/vector/Short128VectorLoadStoreTests.java > jdk/incubator/vector/Vector64ConversionTests.java This looks like a nice improvement! Thanks for doing it! I only have minor comments. src/hotspot/cpu/ppc/ppc.ad line 1800: > 1798: int src_offset = ra_->reg2offset(src_lo); > 1799: int dst_offset = ra_->reg2offset(dst_lo); > 1800: DEBUG_ONLY(int algm = MIN2(RegMask::num_registers(ideal_reg()), (int)Matcher::stack_alignment_in_slots()) * VMRegImpl::stack_slot_size); This must always be 16, but ok. (Instructions can't encode other offsets.) You can keep it this way. src/hotspot/cpu/ppc/ppc.ad line 1839: > 1837: } else { > 1838: st->print("%-7s %s, R1_SP, %d \t// vector spill copy", "ADDI", Matcher::regName[src_lo], dst_offset); > 1839: st->print("%-7s %s, [R1_SP] \t// vector spill copy", "STXVD2X", Matcher::regName[src_lo]); The output looks wrong. We write to [R0], not [R1_SP]. Better use one line? src/hotspot/cpu/ppc/ppc.ad line 1865: > 1863: } else { > 1864: st->print("%-7s %s, R1_SP, %d \t// vector spill copy", "ADDI", Matcher::regName[src_lo], src_offset); > 1865: st->print("%-7s %s, [R1_SP] \t// vector spill copy", "LXVD2X", Matcher::regName[dst_lo]); As above. src/hotspot/share/opto/chaitin.hpp line 146: > 144: private: > 145: // Number of registers this live range uses when it colors > 146: uint16_t _num_regs; // byte size of the value divided by 4 Maybe "divided by slot size which is 4"? ------------- PR Review: https://git.openjdk.org/jdk/pull/27969#pullrequestreview-3433146180 PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2502878330 PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2502917723 PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2502920655 PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2502925601 From epeter at openjdk.org Fri Nov 7 11:43:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 11:43:09 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 07:14:29 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > update A bit more before lunch :) src/hotspot/share/opto/reachability.cpp line 437: > 435: // All RFs are replaced with edges from corresponding referents to interfering safepoints. > 436: // Interfering safepoints are safepoint nodes which are reachable from the RF to its referent through CFG. > 437: bool PhaseIdealLoop::eliminate_reachability_fences() { Why not call it `migrate_reachability_fences_to_safepoints`? Because you are not really eliminating them, just shifting the edges, right? src/hotspot/share/opto/reachability.cpp line 474: > 472: } > 473: } > 474: redundant_rfs.push(rf); I think the name `redundant_rfs` is a bit confusing here. Because above you may just have checked if `is_redundant_rf` and it may have returned false. So it does not really make sense to call it a "redundant rf". This confused me when I was trying out a simple example in the debugger. ------------- PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-3433270750 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502982042 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502988125 From epeter at openjdk.org Fri Nov 7 11:43:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 11:43:11 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 11:37:22 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/reachability.cpp line 474: > >> 472: } >> 473: } >> 474: redundant_rfs.push(rf); > > I think the name `redundant_rfs` is a bit confusing here. Because above you may just have checked if `is_redundant_rf` and it may have returned false. So it does not really make sense to call it a "redundant rf". This confused me when I was trying out a simple example in the debugger. Another question: What if `!is_redundant_rf` and we found no sfpt candidates? Is it ok to just eliminate the rf here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2502999947 From chagedorn at openjdk.org Fri Nov 7 12:33:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Nov 2025 12:33:22 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v24] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 15:50:50 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix whitespace Thanks a lot for the attribution! ? And also for all the updates and additional discussions! I appreciate that a lot. Oof, it became quite challenging with multiple review and update rounds interleaved - might be something to watch out more next time :-) Update to `Template.java` review: Looks good! Update to `TestTutorial.java` view: Some more small suggestions but otherwise looks good! Update to `TemplateFrame` and `CodeFrame`: Some more minor comments. I have not looked at the new visual example - I thought I'll do that when diving further into the code. test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 42: > 40: *

> 41: * A {@link Template} can have multiple {@link TemplateFrame}s, if there are nested > 42: * scopes. The outermost {@link TemplateFrame} determines the id of the {@link Tmeplate} Suggestion: * scopes. The outermost {@link TemplateFrame} determines the id of the {@link Template} test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 52: > 50: * and queries such as {@link Template#let} definitions. Each {@link TemplateFrame} > 51: * has such a set of hashtag replacements, and implicitly provides access to the > 52: * hashtag replacmeents of the outer {@link TemplateFrame}s, up to the outermost Suggestion: * hashtag replacements of the outer {@link TemplateFrame}s, up to the outermost test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 62: > 60: * on how much time is spent on the code from the template corresponding to the frame, > 61: * and to give a termination criterion to avoid nesting templates too deeply. > 62: * It now more sounds like a "TemplateScope" since we have a "TemplateFrame" per scope and not per template which the latter name somehow suggests. But just wanted to share that thought here. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1166: > 1164: > 1165: // In this section, we will look at some subtle facts about the behavior of > 1166: // transparent scopes around hook insertion. This inteded for expert users Suggestion: // transparent scopes around hook insertion. This is intended for expert users test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1168: > 1166: // transparent scopes around hook insertion. This inteded for expert users > 1167: // so feel free to skip it until you extensively use hook insertion. > 1168: // More info can also be found in the javadocs of the Hook class. Suggestion: // More info can also be found in the Javadocs of the Hook class. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1171: > 1169: > 1170: // Helper method to check that the expected DataNames are available. > 1171: var templateVerify = Template.make("toList", (String toList) -> scope( Awesome, thanks for adding the expert example - that is really helpful to better grasp the interaction of hooks and scopes. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1220: > 1218: // x4 escapes to the caller out here, and not to the anchor scope. > 1219: "// x4: #x4\n", > 1220: // And v4 escapes to the anchor scope, which is available from hee too. Suggestion: // And v4 escapes to the anchor scope, which is available from here too. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1241: > 1239: templateVerify.asToken("v1, v4, v5, v2") > 1240: )), > 1241: templateVerify.asToken("v1"), For completeness: Suggestion: // We left the non-transparent anchoring scope which does not let anything escape templateVerify.asToken("v1"), test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1261: > 1259: templateVerify.asToken("v1, v6, v7") > 1260: )), > 1261: templateVerify.asToken("v1, v6, v7") We could probably also add another `let("x6", 5)` inside the `anchor` and access it here to show that everything escapes the anchor. What do you think? For completeness: Suggestion: let("x6", 42) // escapes the anchor scope )), // We left the transparent anchoring scope which lets the DataNames and // hashtags escape. "// x6: #x6\n", templateVerify.asToken("v1, v6, v7") ------------- PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3432859378 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2503092934 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2503094493 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2503115113 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2502620560 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2502621715 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2502637540 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2502641117 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2502654333 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2502684113 From luhenry at openjdk.org Fri Nov 7 13:50:03 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 7 Nov 2025 13:50:03 GMT Subject: RFR: 8371297: C2: assert triggerred in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 22:18:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Not sure how this one slipped in (https://github.com/openjdk/jdk/pull/28047/), I think it passed my local test and github CI test at that time. > Please check https://bugs.openjdk.org/browse/JDK-8371297 for details. > > Thanks! Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28141#pullrequestreview-3434041441 From mli at openjdk.org Fri Nov 7 14:04:05 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Nov 2025 14:04:05 GMT Subject: RFR: 8371297: C2: assert triggerred in BoolTest::BoolTest In-Reply-To: References: Message-ID: <3Yqkkf2hUJNWsSTeGM9WC9MQh_ZkMrhyqx6isHDmI2U=.7991ecec-51a0-4564-b9fc-fb1e19299ba5@github.com> On Fri, 7 Nov 2025 13:46:56 GMT, Ludovic Henry wrote: >> Hi, >> Can you help to review this patch? >> >> Not sure how this one slipped in (https://github.com/openjdk/jdk/pull/28047/), I think it passed my local test and github CI test at that time. >> Please check https://bugs.openjdk.org/browse/JDK-8371297 for details. >> >> Thanks! > > Marked as reviewed by luhenry (Committer). @luhenry Thank you for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28141#issuecomment-3502730034 From mli at openjdk.org Fri Nov 7 14:04:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Nov 2025 14:04:06 GMT Subject: RFR: 8371297: C2: assert triggerred in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 22:18:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Not sure how this one slipped in (https://github.com/openjdk/jdk/pull/28047/), I think it passed my local test and github CI test at that time. > Please check https://bugs.openjdk.org/browse/JDK-8371297 for details. > > Thanks! @eme64 Can you have a look? Thanks! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28141#issuecomment-3502736083 From epeter at openjdk.org Fri Nov 7 15:07:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 15:07:27 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 11:39:32 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/reachability.cpp line 474: >> >>> 472: } >>> 473: } >>> 474: redundant_rfs.push(rf); >> >> I think the name `redundant_rfs` is a bit confusing here. Because above you may just have checked if `is_redundant_rf` and it may have returned false. So it does not really make sense to call it a "redundant rf". This confused me when I was trying out a simple example in the debugger. > > Another question: > What if `!is_redundant_rf` and we found no sfpt candidates? Is it ok to just eliminate the rf here? I suppose yes, because no GC could have happened since, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2504034323 From epeter at openjdk.org Fri Nov 7 15:07:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Nov 2025 15:07:28 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: <1vJqFydubIzTlZ-5PRTXHAZyuqEPHJpPQQ71ayUrlKI=.ba1fefc3-2654-40e8-9451-25b369f4b9db@github.com> On Fri, 7 Nov 2025 15:04:03 GMT, Emanuel Peter wrote: >> Another question: >> What if `!is_redundant_rf` and we found no sfpt candidates? Is it ok to just eliminate the rf here? > > I suppose yes, because no GC could have happened since, right? Could be worth writing that down in a code comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2504035787 From bmaillard at openjdk.org Fri Nov 7 16:22:23 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 7 Nov 2025 16:22:23 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v6] In-Reply-To: References: Message-ID: <1pRKczP7EhfqRnLV2KhyD48bEwDThYfaLNU8FKdTP8A=.9f87dcee-0025-48e4-946f-6affd4e05728@github.com> > This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. > > This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). > > However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. > > This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. > As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. > > ```c++ > ... > // Global Value Numbering > i = hash_find_insert(k); // Check for pre-existing node > if (i && (i != k)) { > // Return the pre-existing node if it isn't dead > NOT_PRODUCT(set_progress();) > add_users_to_worklist(k); > subsume_node(k, i); // Everybody using k now uses i > return i; > } > ... > > > The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. > > ### Proposed Fix > > We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) > - [x] tier1-3, plus some internal testing > - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Refine comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27900/files - new: https://git.openjdk.org/jdk/pull/27900/files/103cc585..c3b6f58f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27900&range=04-05 Stats: 11 lines in 1 file changed: 2 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/27900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27900/head:pull/27900 PR: https://git.openjdk.org/jdk/pull/27900 From bmaillard at openjdk.org Fri Nov 7 16:22:26 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 7 Nov 2025 16:22:26 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v5] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 07:48:34 GMT, Christian Hagedorn wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comments to clarify how add_users_to_worklist and add_users_of_use_to_worklist > > src/hotspot/share/opto/phaseX.hpp line 536: > >> 534: // optimizations have dependencies that extend beyond a node's direct >> 535: // inputs, so it is necessary to ensure the appropriate notifications >> 536: // are made here. > > Maybe also add what 'n' is. Could it be named 'parent'? I would keep the name `n` for consistency, `parent` sounds a bit confusing from the point of view of the caller imho (as it is the node we are modifying in the first place). But I described more explicitly what `n` is. > src/hotspot/share/opto/phaseX.hpp line 542: > >> 540: // affected by changes to 'n', to the worklist. >> 541: // This function may be called with a node that is about to be >> 542: // replaced as argument 'n'. In this case, 'n' should not be considered > > By another node? > Suggestion: > > // replaced by another node. In this case, 'n' should not be considered That's not what I meant, but yes it was confusing. I have changed the comment a bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2504404220 PR Review Comment: https://git.openjdk.org/jdk/pull/27900#discussion_r2504408495 From rrich at openjdk.org Fri Nov 7 16:29:25 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 7 Nov 2025 16:29:25 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v2] In-Reply-To: References: Message-ID: <2Ts5dNdaDuen71ZoTYdKP8UNG44epCiEsIb8DeJpvps=.24618d81-4b98-489a-962b-c04e0d561270@github.com> > With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. > > It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. > > The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. > > The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. > Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. > > So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) > > There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. > > Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. > > ##### Testing with fastdebug builds on AARCH64 and PPC64: > > hotspot_vector_1 > hotspot_vector_2 > jdk_vector > jdk_vector_sanity > > ##### The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: > > compiler/vectorapi/VectorRearrangeTest.java > jdk/incubator/vector/Byte128VectorLoadStoreTests.java > jdk/incubator/vector/Double256VectorLoadStoreTests.java > jdk/incubator/vector/Float128VectorTests.java > jdk/incubator/vector/Long256VectorLoadStoreTests.java > jdk/incubator/vector/Short128VectorLoadStoreTests.java > jdk/incubator/vector/Vector64ConversionTests.java Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: - Enhance comment - Fix OptoAssembly for Power 8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27969/files - new: https://git.openjdk.org/jdk/pull/27969/files/ef7ba147..7729a448 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27969&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27969&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27969/head:pull/27969 PR: https://git.openjdk.org/jdk/pull/27969 From mdoerr at openjdk.org Fri Nov 7 16:29:26 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 7 Nov 2025 16:29:26 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v2] In-Reply-To: <2Ts5dNdaDuen71ZoTYdKP8UNG44epCiEsIb8DeJpvps=.24618d81-4b98-489a-962b-c04e0d561270@github.com> References: <2Ts5dNdaDuen71ZoTYdKP8UNG44epCiEsIb8DeJpvps=.24618d81-4b98-489a-962b-c04e0d561270@github.com> Message-ID: On Fri, 7 Nov 2025 16:25:53 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: > > - Enhance comment > - Fix OptoAssembly for Power 8 Thanks! LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27969#pullrequestreview-3435009962 From rrich at openjdk.org Fri Nov 7 16:29:27 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 7 Nov 2025 16:29:27 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 07:36:57 GMT, Richard Reingruber wrote: > With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. > > It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. > > The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. > > The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. > Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. > > So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) > > There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. > > Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. > > ##### Testing with fastdebug builds on AARCH64 and PPC64: > > hotspot_vector_1 > hotspot_vector_2 > jdk_vector > jdk_vector_sanity > > ##### The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: > > compiler/vectorapi/VectorRearrangeTest.java > jdk/incubator/vector/Byte128VectorLoadStoreTests.java > jdk/incubator/vector/Double256VectorLoadStoreTests.java > jdk/incubator/vector/Float128VectorTests.java > jdk/incubator/vector/Long256VectorLoadStoreTests.java > jdk/incubator/vector/Short128VectorLoadStoreTests.java > jdk/incubator/vector/Vector64ConversionTests.java Thanks Martin for having a look at the PR! I've pushed changes to incorporate your feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27969#issuecomment-3503470449 From rrich at openjdk.org Fri Nov 7 16:29:30 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 7 Nov 2025 16:29:30 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 11:14:00 GMT, Martin Doerr wrote: >> Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: >> >> - Enhance comment >> - Fix OptoAssembly for Power 8 > > src/hotspot/cpu/ppc/ppc.ad line 1800: > >> 1798: int src_offset = ra_->reg2offset(src_lo); >> 1799: int dst_offset = ra_->reg2offset(dst_lo); >> 1800: DEBUG_ONLY(int algm = MIN2(RegMask::num_registers(ideal_reg()), (int)Matcher::stack_alignment_in_slots()) * VMRegImpl::stack_slot_size); > > This must always be 16, but ok. (Instructions can't encode other offsets.) You can keep it this way. I don't like hard coded integer literals. I also think that further clean-up is needed. E.g. `ideal_reg() == Op_VecX` should be asserted for vectors. > src/hotspot/cpu/ppc/ppc.ad line 1839: > >> 1837: } else { >> 1838: st->print("%-7s %s, R1_SP, %d \t// vector spill copy", "ADDI", Matcher::regName[src_lo], dst_offset); >> 1839: st->print("%-7s %s, [R1_SP] \t// vector spill copy", "STXVD2X", Matcher::regName[src_lo]); > > The output looks wrong. We write to [R0], not [R1_SP]. Better use one line? Good catch. Not easy to test since on Power 8 usage of vectors is disabled. I've fixed it hopefully. > Better use one line? You can't mean one output line with 2 instructions. Sorry don't get it... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2504405180 PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2504411015 From mdoerr at openjdk.org Fri Nov 7 16:29:30 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 7 Nov 2025 16:29:30 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 16:18:59 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/ppc/ppc.ad line 1800: >> >>> 1798: int src_offset = ra_->reg2offset(src_lo); >>> 1799: int dst_offset = ra_->reg2offset(dst_lo); >>> 1800: DEBUG_ONLY(int algm = MIN2(RegMask::num_registers(ideal_reg()), (int)Matcher::stack_alignment_in_slots()) * VMRegImpl::stack_slot_size); >> >> This must always be 16, but ok. (Instructions can't encode other offsets.) You can keep it this way. > > I don't like hard coded integer literals. > I also think that further clean-up is needed. E.g. `ideal_reg() == Op_VecX` should be asserted for vectors. ideal_reg() == Op_VecX is already in the condition above. >> src/hotspot/cpu/ppc/ppc.ad line 1839: >> >>> 1837: } else { >>> 1838: st->print("%-7s %s, R1_SP, %d \t// vector spill copy", "ADDI", Matcher::regName[src_lo], dst_offset); >>> 1839: st->print("%-7s %s, [R1_SP] \t// vector spill copy", "STXVD2X", Matcher::regName[src_lo]); >> >> The output looks wrong. We write to [R0], not [R1_SP]. Better use one line? > > Good catch. Not easy to test since on Power 8 usage of vectors is disabled. I've fixed it hopefully. >> Better use one line? > > You can't mean one output line with 2 instructions. Sorry don't get it... It's ok. We will probably remove VSX support for Power8 at some point of time. Performance is bad and the code complicated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2504408810 PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2504438854 From rrich at openjdk.org Fri Nov 7 16:29:30 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 7 Nov 2025 16:29:30 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 16:19:46 GMT, Martin Doerr wrote: >> I don't like hard coded integer literals. >> I also think that further clean-up is needed. E.g. `ideal_reg() == Op_VecX` should be asserted for vectors. > > ideal_reg() == Op_VecX is already in the condition above. Indeed but what if that ever changes?? That case will be silently unhandled. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2504421992 From dlunden at openjdk.org Fri Nov 7 16:33:12 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 7 Nov 2025 16:33:12 GMT Subject: RFR: 8341039: compiler/cha/TypeProfileFinalMethod.java fails with assertEquals expected: 0 but was: 2 Message-ID: The test `compiler/cha/TypeProfileFinalMethod.java` exercises a specific compilation pattern and easily breaks by setting various VM flags (e.g., `-Xcomp`). ### Changeset - Make the test flagless. - Ensure the test only compiles the intended methods. - Fix problems with compiler directives used in the test (incorrect signatures and some directives getting unintentionally shadowed by other directives). - Force C2 inlining of a method which the test author likely intended to always be inlined (based on source code comments in the test). - Switch argument order in `assertEquals` to make error message correct. Note for reviewers: A more fundamental rewrite of the test is beyond the scope of this changeset. The objective here is simply to ensure the test runs only in contexts intended by the test author. ### Testing - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18972906513) - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. - Stress testing of the specific test on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. ------------- Commit messages: - Fix issue Changes: https://git.openjdk.org/jdk/pull/28200/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28200&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341039 Stats: 21 lines in 2 files changed: 3 ins; 5 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/28200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28200/head:pull/28200 PR: https://git.openjdk.org/jdk/pull/28200 From mdoerr at openjdk.org Fri Nov 7 16:36:02 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 7 Nov 2025 16:36:02 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 16:22:32 GMT, Richard Reingruber wrote: >> ideal_reg() == Op_VecX is already in the condition above. > > Indeed but what if that ever changes?? That case will be silently unhandled. We will run into ShouldNotReachHere(); // Unimplemented (line 2000) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2504477806 From psandoz at openjdk.org Fri Nov 7 17:39:07 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 7 Nov 2025 17:39:07 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v6] In-Reply-To: References: Message-ID: <5QJxi0p3-S621zs2Z1EYj4Ns-p8lay2SxapkkBCDAuA=.d0dae4cf-b91e-47f0-8bf1-8b3a3c2172b5@github.com> On Fri, 7 Nov 2025 05:41:37 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Revert smoke test changes Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3435399248 From vlivanov at openjdk.org Fri Nov 7 19:14:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 19:14:11 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 11:04:36 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/reachability.cpp line 441: > >> 439: >> 440: assert(OptimizeReachabilityFences, "required"); >> 441: assert(C->post_loop_opts_phase(), "required"); > > You could give the reason in the assert message ;) Those are essentially preconditions. I'm not a fan of duplicating information over and over again. Assert messages won't add much beyond what the condition already says unless the message explains why precondition is there. But that would explode message size. I can use `precond` here if you are against generic messages in asserts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505075363 From bmaillard at openjdk.org Fri Nov 7 19:27:04 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 7 Nov 2025 19:27:04 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v4] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 10:13:32 GMT, Roland Westrelin wrote: >> In: >> >> >> for (int i = 100; i < 1100; i++) { >> v += floatArray[i - 100]; >> Objects.checkIndex(i, longRange); >> } >> >> >> The int counted loop has both an int range check and a long range. The >> int range check is optimized first. Assertion predicates are inserted >> above the loop. One predicates checks that: >> >> >> init - 100 > >> >> The loop is then transformed to enable the optimization of the long >> range check. The loop is short running, so there's no need to create a >> loop nest. The counted loop is mostly left as is but, the loop's >> bounds are changed from: >> >> >> for (int i = 100; i < 1100; i++) { >> >> >> to: >> >> >> for (int i = 0; i < 1000; i++) { >> >> >> The reason for that the long range check transformation expects the >> loop to start at 0. >> >> Pre/main/post loops are created. Template Assertion predicates are >> added above the main loop. The loop is unrolled. Initialized assertion >> predicates are created. The one created from the condition: >> >> >> init - 100 > >> >> checks the value of `i` out of the pre loop which is 1. That check fails. >> >> The root cause of the failure is that when bounds of the counted loop >> are changed, template assertion predicates need to be updated with and >> adjusted init input. >> >> When the bounds of the loop are known, the assertion predicates can be >> updated in place. Otherwise, when the loop is speculated to be short >> running, the assertion predicates are updated when they are cloned. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into JDK-8366888 > - whitespaces > - review > - Merge branch 'master' into JDK-8366888 > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - whitespaces > - fix Thanks for fixing this @rwestrel. I was not very familiar with assertion predicates before reviewing this, but the logic seems sounds to me now. Nice work src/hotspot/share/opto/loopnode.cpp line 1196: > 1194: // for (int = 0; i < stop - start; i+= stride) { ... } > 1195: // Template Assertion Predicates added so far were with an init value of start. They need to be updated with the new > 1196: // init value of 0: Not being super familiar with assertion predicates, I was a little bit confused at first. I would maybe add something along the lines of: Suggestion: // init value of 0. We want the OpaqueLoopInit node on the zero in order to be able to replace it when cloning the predicate. But feel free to ignore if you think this is obvious. ------------- Marked as reviewed by bmaillard (Committer). PR Review: https://git.openjdk.org/jdk/pull/27250#pullrequestreview-3435756270 PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2505007523 From vlivanov at openjdk.org Fri Nov 7 19:36:23 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 19:36:23 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 11:36:07 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/reachability.cpp line 437: > >> 435: // All RFs are replaced with edges from corresponding referents to interfering safepoints. >> 436: // Interfering safepoints are safepoint nodes which are reachable from the RF to its referent through CFG. >> 437: bool PhaseIdealLoop::eliminate_reachability_fences() { > > Why not call it `migrate_reachability_fences_to_safepoints`? Because you are not really eliminating them, just shifting the edges, right? `eliminate_reachability_fences()` means `eliminate_reachability_fence_nodes()`. We do eliminate reachability fences, but replace it with a safepoint-attached representation which is not equivalent (it's a lossy transformation). Remember: all initial reachability invariants hold right after parsing phase (all interfering safepoints have all corresponding referents as part of their debug info), but subsequent optimizations can break them. Once RF nodes are gone, it's not safe to perform the same level of optimizations anymore. So, if we migrate to safepoint-attached representation too early, it can reintroduce the problem we are trying to fix with this PR. I can call it `expand_reachability_fence_nodes()` (akin to macro node expansion) if you find it less confusing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505166904 From vlivanov at openjdk.org Fri Nov 7 19:57:12 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 19:57:12 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 10:56:24 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/reachability.cpp line 384: > >> 382: // All encountered safepoints are recorded in safepoints list. >> 383: static void linear_traversal(Node* n, Node_Stack& worklist, VectorSet& visited, Node_List& safepoints) { >> 384: for (Node* ctrl = n; ctrl != nullptr; ctrl = ctrl->in(0)) { > > This "fast-forwarding" looks a bit like an optimization. Why not just add all CFG nodes on the worklist, would that not simplify the graph a little? Or did you find a case where this was really important? `linear_traversal` in `enumerate_interfering_sfpts` is an optimization for CFG traversal. Only Region nodes end up on worklist while the rest can be traversed in a linear fashion. > Why not just add all CFG nodes on the worklist, would that not simplify the graph a little? Then on each iteration you have to check for Region node and handle it specially. The perceived complexity just pops out in a different place. I started with a generic graph traversal version and then found it clearer to separate Region handling and avoid worklist for all the other CFG nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505250129 From vlivanov at openjdk.org Fri Nov 7 20:02:15 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 20:02:15 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> References: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> Message-ID: <_1EqjnWT2m_tByqitwUPdR7Db0-gZU1YnHkTchCSqhc=.a05fc6cb-9fc9-406f-9988-299415b90fdd@github.com> On Fri, 7 Nov 2025 11:06:57 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/reachability.cpp line 461: >> >>> 459: if (!is_redundant_rf(rf, false /*rf_only*/)) { >>> 460: Node_List safepoints; >>> 461: enumerate_interfering_sfpts(rf, this, safepoints); >> >> Could this explode if we have a lot of RF in the graph? Do we need a ResouceMark, or reuse the `safeponts` node list? >> >> Imagine something like this: >> >> referent >> if (flag1) { something } else { something } >> ... >> if (flag100) { something } else { something } >> if (x1) { RF(referent); } >> ... >> if (x100) { RF(referent); } >> >> So we would call `enumerate_interfering_sfpts` 100x, and then traverse the graph with about 100-400 nodes each time. You can see how that grows quadratically. Maybe that's fine for runtime, but is it also ok for memory? > > And what if we find a lot of SafePoints for each RF? Do we end up attaching quadratically many referent edges over all? In the worst case the number of new edges added is `(# of unique referents) * (# of safepoints)`. Multiple reachability fences can share the same referent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505270350 From chagedorn at openjdk.org Fri Nov 7 20:16:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Nov 2025 20:16:05 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v6] In-Reply-To: <1pRKczP7EhfqRnLV2KhyD48bEwDThYfaLNU8FKdTP8A=.9f87dcee-0025-48e4-946f-6affd4e05728@github.com> References: <1pRKczP7EhfqRnLV2KhyD48bEwDThYfaLNU8FKdTP8A=.9f87dcee-0025-48e4-946f-6affd4e05728@github.com> Message-ID: On Fri, 7 Nov 2025 16:22:23 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. >> >> This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). >> >> However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. >> >> This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. >> As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. >> >> ```c++ >> ... >> // Global Value Numbering >> i = hash_find_insert(k); // Check for pre-existing node >> if (i && (i != k)) { >> // Return the pre-existing node if it isn't dead >> NOT_PRODUCT(set_progress();) >> add_users_to_worklist(k); >> subsume_node(k, i); // Everybody using k now uses i >> return i; >> } >> ... >> >> >> The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. >> >> ### Proposed Fix >> >> We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) >> - [x] tier1-3, plus some internal testing >> - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Refine comments Looks good, thanks for the updated comments! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27900#pullrequestreview-3436151660 From kvn at openjdk.org Fri Nov 7 22:49:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Nov 2025 22:49:04 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> References: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> Message-ID: On Mon, 3 Nov 2025 18:38:13 GMT, Vladimir Ivanov wrote: >> Expand the optimization for interface calls introduced by [JDK-6986483](https://bugs.openjdk.org/browse/JDK-6986483) to calls through `MethodHandle.linkToInterface`. >> >> The implementation is straightforward except the fact that symbolic information is lost during `MemberName` resolution. The fix uses declaring class instead, but it's more conservative than what is done for invokeinterface case. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > naming Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28094#pullrequestreview-3436753929 From kvn at openjdk.org Fri Nov 7 22:49:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Nov 2025 22:49:07 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: References: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> Message-ID: On Tue, 4 Nov 2025 19:34:54 GMT, Vladimir Ivanov wrote: >> May it should be in @requires ? > >> Should you check that C2 is enabled? > > The test has `@requires !vm.graal.enabled`. Do you prefer to have it spelled as `@requires vm.compiler2.enabled` instead? > >> May it should be in @requires? > > Original test cases apply to both C1 and C2. I could introduce a separate test for MH invoker cases, but IMO keeping relevant test logic co-located is preferred compared to avoiding a configuration check at runtime. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28094#discussion_r2505773293 From kvn at openjdk.org Fri Nov 7 22:54:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Nov 2025 22:54:01 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: References: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> Message-ID: On Fri, 7 Nov 2025 22:45:53 GMT, Vladimir Kozlov wrote: >>> Should you check that C2 is enabled? >> >> The test has `@requires !vm.graal.enabled`. Do you prefer to have it spelled as `@requires vm.compiler2.enabled` instead? >> >>> May it should be in @requires? >> >> Original test cases apply to both C1 and C2. I could introduce a separate test for MH invoker cases, but IMO keeping relevant test logic co-located is preferred compared to avoiding a configuration check at runtime. > > ok We can have VM build without C2 or only C1 is enabled (TieredStopAtLevel flag). But based on your response the test should work even when only C1 is used. So it is fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28094#discussion_r2505779693 From vlivanov at openjdk.org Fri Nov 7 23:20:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 23:20:29 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v23] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/01d0b175..3fb6e205 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=21-22 Stats: 87 lines in 8 files changed: 23 ins; 5 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Fri Nov 7 23:20:31 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 23:20:31 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 09:59:29 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/loopnode.cpp line 5147: > >> 5145: assert(do_expensive_nodes || do_optimize_reachability_fences, "why are we here?"); >> 5146: // Use a change to optimize reachability fence nodes irrespective of >> 5147: // whether loop optimizations are performed or not. > > What do you mean by `Use a change`? Fixed. > src/hotspot/share/opto/reachability.cpp line 186: > >> 184: return false; // uncommon traps are exit points >> 185: } >> 186: return true; > > Suggestion: > > // By default, we return a conservative answer, and assume it could interfere. > return true; I updated the comment. I wouldn't say it produces a conservative answer, since the query is for "interfering safepoint candidate" and RF-agnostic. It becomes accurate when applied during CFG traversal. > src/hotspot/share/opto/reachability.cpp line 446: > >> 444: ResourceMark rm; >> 445: Unique_Node_List redundant_rfs; >> 446: Node_List worklist; > > Looks like this `worklist` is really a list of `` pairs. > > Consider making it a `GrowableArray` with a pair type and also consider giving it a more descriptive name, calling it by whatever we collect in it. Maybe `sfpt_referent_pairs`? > > Also: why do we need this worklist, and not just attach the referent to the sfpt eagerly? And: can it be that we add the same pair multiple times? Is that intentional? Sounds good, I applied your suggestion. > Also: why do we need this worklist, and not just attach the referent to the sfpt eagerly? There's an assert to ensure there are no non-debug edges on discovered safepoints. It'd be harder to ensure the invariant if safepoints are modified during the analysis. > And: can it be that we add the same pair multiple times? Is that intentional? Good catch. I added the missing check guarding `add_req` call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505816538 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505815407 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505815783 From vlivanov at openjdk.org Fri Nov 7 23:20:33 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 23:20:33 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> References: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> Message-ID: On Fri, 7 Nov 2025 10:34:18 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/reachability.cpp line 383: >> >>> 381: // Linearly traverse CFG upwards starting at n until first merge point. >>> 382: // All encountered safepoints are recorded in safepoints list. >>> 383: static void linear_traversal(Node* n, Node_Stack& worklist, VectorSet& visited, Node_List& safepoints) { >> >> The second comment line does not sound accurate, we don't collect ALL, only the candidates. Maybe we can find a better method name, and even remove that comment because of it? >> >> Given the more useful sub query `is_interfering_sfpt_candidate`, I think we could name this method something like `collect_interfering_sfpt_candidates`. Or is it very important that this is a linear traversal vs some other traversal we could choose from? > > Hmm, but this here is only a component of the `enumerate_interfering_sfpts` below, which has essencially that name. > > So maybe it should be `collect_interfering_sfpt_candidates_for_node` here and just `collect_interfering_sfpt_candidates` below? I updated the comment and did some renaming. `enumerate_interfering_sfpts_linear_traversal` computes the set of interfering safepoints for some RF node, not just candidates. It's only `is_interfering_sfpt_candidate()` which does some filtering ignoring the context (the RF node a safepoint can interfere with). >> src/hotspot/share/opto/reachability.cpp line 409: >> >>> 407: visited.set(referent_ctrl->_idx); // end point >>> 408: >>> 409: Node_Stack stack(0); >> >> `ResouceMark`? > > We call this many times, so not sure if this could explode somehow? It's hard to place a nested ResourceMark because there are dynamically reallocated data structures with different life cycles. Instead, I moved temporary data structure allocations up in the call chain and made them shared across all RF nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505815482 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505815512 From vlivanov at openjdk.org Fri Nov 7 23:20:34 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 23:20:34 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> Message-ID: On Fri, 7 Nov 2025 11:01:24 GMT, Emanuel Peter wrote: >> And why not use a `Unique_Node_List`, to unite the `visited` and `stack` into a single `worklist`? > > Ah. Right, at first I did not see that you are using a stack, which id not a node list. It also has the idx. > > In my experience, this usually creates code that is a little harder to read. I prefer using a `Unique_Node_List`, and then just traverse over all ctrl inputs, and add those to the worklist. You have to special case Region, and all other CFG nodes only have ctrl on `in(0)`. It tends to nicely flatten the whole BFS traversal into a small loop. But maybe it does use just a bit more memory than your traversal. > > Just an idea, I can probably find a way to wrap my head around this approach here too ;) Unified naming. > In my experience, this usually creates code that is a little harder to read. Well, in my experience graph traversal implementation in C2 is way too verbose most of the time. I'd prefer a standard utility methods to traverse relevant parts of the graph, especially since we can use lambdas now. It would make it much easier to reason about it at use sites while making it more beneficial to invest into microoptimizations for different types of traversals. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505815641 From vlivanov at openjdk.org Fri Nov 7 23:20:35 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Nov 2025 23:20:35 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: <1vJqFydubIzTlZ-5PRTXHAZyuqEPHJpPQQ71ayUrlKI=.ba1fefc3-2654-40e8-9451-25b369f4b9db@github.com> References: <1vJqFydubIzTlZ-5PRTXHAZyuqEPHJpPQQ71ayUrlKI=.ba1fefc3-2654-40e8-9451-25b369f4b9db@github.com> Message-ID: On Fri, 7 Nov 2025 15:04:23 GMT, Emanuel Peter wrote: >> I suppose yes, because no GC could have happened since, right? > > Could be worth writing that down in a code comment. > So it does not really make sense to call it a "redundant rf". After migration takes place, original RF node does become redundant (but not necessarily in a way detected by `is_redundant_rf`). Renamed it to `for_removal`. Hope it makes it clearer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505815955 From wenanjian at openjdk.org Sat Nov 8 00:35:27 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 8 Nov 2025 00:35:27 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v18] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: save branch jump and add some comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/a3bd1ff1..ec66035b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=16-17 Stats: 105 lines in 1 file changed: 38 ins; 40 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Sat Nov 8 00:35:28 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 8 Nov 2025 00:35:28 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v15] In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 03:07:01 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> modify some var names > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2623: > >> 2621: __ rev8(tmp1, tmp1); >> 2622: __ sd(tmp1, Address(counter)); >> 2623: } > > Can you add some code comment and maybe assertions about the input registers? Like: > > // Big-endian 128-bit + 64-bit -> 128-bit addition. > void be_inc_counter_128(Register counter, Register tmp1, Register tmp2) { > assert_different_registers(counter, tmp1, tmp2, t0); > __ ld(tmp1, Address(counter, 8)); // Load 128-bits from counter > __ ld(tmp2, Address(counter)); > __ rev8(tmp1, tmp1); // Convert big-endian to little-endian > __ rev8(tmp2, tmp2); > __ addi(tmp1, tmp1, 1); > __ seqz(t0, tmp1); // Check for result overflow > __ add(tmp2, tmp2, t0); // Add 1 if overflow otherwise 0 > __ rev8(tmp1, tmp1); // Convert little-endian to big-endian > __ rev8(tmp2, tmp2); > __ sd(tmp1, Address(counter, 8)); // Store 128-bits to counter > __ sd(tmp2, Address(counter)); > } > > > PS: My local test show that this test "com/sun/crypto/provider/Cipher/AEAD/AEADBufferTest.java" is failing with this change. We need to resolve that. thanks, I have fixed it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2505912451 From wenanjian at openjdk.org Sat Nov 8 00:38:08 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 8 Nov 2025 00:38:08 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v17] In-Reply-To: References: <9uwW544WzaCZ1u1LwmNZW4_8mSyxQK9aPV8Uv54NtRA=.4facae4d-76e7-4780-a34c-c7d5ee114503@github.com> Message-ID: <5anryOMDjUCg1uSq5CID97nO8_unrd2P2LLAN9VZrUY=.5c174f05-af78-467f-a63c-5fd5620b88f9@github.com> On Fri, 7 Nov 2025 04:09:04 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix a jtreg problem > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2761: > >> 2759: __ mv(t0, 52); >> 2760: __ blt(keylen, t0, L_aes128_loop_next); >> 2761: __ beq(keylen, t0, L_aes192_loop_next); > > I think these branches in the loop could be saved if we do versioning according to keylen. Then we only need to do two branches on entry to choose the right version. And this also applies in the case of loadkeys. good idea, I have change it and pass the test in _test/hotspot/jtreg/compiler/codegen/aes/_ _test/jdk/com/sun/crypto/_ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2505914298 From duke at openjdk.org Sat Nov 8 00:38:16 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Sat, 8 Nov 2025 00:38:16 GMT Subject: RFR: 8369147: Various issues with new tests added by JDK-8316694 [v3] In-Reply-To: References: <8UVgF0vLaZhY7zvKWPBLXoU9p71SevknlPYC5LPsfCo=.8531cacc-ff26-42be-a517-88ff87628d29@github.com> <0ZFCDTQAFKjZ5XDtUXWsvC4v0lyot7Trdf3wlIxrb-M=.988407f3-6353-447a-98d6-f890b795fd1d@github.com> Message-ID: On Mon, 27 Oct 2025 18:08:40 GMT, Chad Rakoczy wrote: >>> @chadrako, is PR ready for testing now? >> >> Yes > >> @chadrako I think my suggestion was not correct. We should revert back to your first changes for `@requires`. Original code was correct and only `serviceability/jvmti/NMethodRelocation/NMethodRelocationTest.java` missed it. > > Since the tests get run with different GCs anyways I don't think we need to explicitly require the GC that they run with and just have one test config > FTR, even with this fix, we still see failures: [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121), [JDK-8371046](https://bugs.openjdk.org/browse/JDK-8371046), [JDK-8369150](https://bugs.openjdk.org/browse/JDK-8369150). @chadrako It would be great if you could prioritize fixing these remaining issues, as the failures cause quite some noise in our testing. Thanks! @TobiHartmann I apologize for the noise and appreciate you bringing these to my attention. I am prioritizing the fixes and will keep the JBS issues updated with my progress. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27659#issuecomment-3505482325 From vlivanov at openjdk.org Sat Nov 8 00:39:15 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 8 Nov 2025 00:39:15 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 10:06:17 GMT, Emanuel Peter wrote: > Can you say why until after loop opts are over only RF are considered? It is intended to ensure a RF is superseded by another RF. We can't guarantee that non-RF users won't go away any time before loop opts are over, so there won't be anything keeping the referent alive. But now I don't see why a similar problem doesn't affect RFs as well. While RFs don't go away on their own, they can be pruned as part of dead code elimination. I'll rethink how RF redundancy is defined. > How does this play with allocation elimination etc? This method works on RF nodes and, hence, it is applicable until RF nodes are eliminated after loop opts are over. RF nodes are ignored during scalarization attempts, so are reachability edges. When an allocation is scalarized, either all RFs having it as a referent or all of it's reachability edges are removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2505914924 From wenanjian at openjdk.org Sat Nov 8 00:46:42 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 8 Nov 2025 00:46:42 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v19] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: delete the zvbb assert and some assembler support because no use ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/ec66035b..e415fbd0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=17-18 Stats: 11 lines in 3 files changed: 0 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From jbhateja at openjdk.org Sat Nov 8 02:22:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 8 Nov 2025 02:22:17 GMT Subject: RFR: 8370691: Add new HalffloatVector type and enable intrinsification of float16 vector operations Message-ID: Add new HalffloatVector type and corresponding concrete vector classes in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. - Add necessary inline expander support. - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. - Use existing Float16 vector IR and backend support. - Extended the existing VectorAPI JTREG test suite for the newly added HalffloatVector operations. The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). The following are the performance numbers for some of the selected HalfflotVector benchmarking kernels compared to equivalent Float16OperationsBenchmark kernels. {A2BA2D85-085A-489F-8DDD-0FCFB5986EA5} Initial RFP[1] was floated on the panama-dev mailing list. Kindly review the draft PR and share your feedback. Best Regards, Jatin [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html ------------- Commit messages: - Some cleanups - Fix some JTREG failures - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Revamped JTreg test generation and bug fixes - Cleanups - Removing redundant warmup constraint - Adding a HalffloatVectorBenchmark having benchmarking kernel parity with Float16OperationsBenchmark - Adding IR Framework test - Fix JTREG failures - Build failure fixes - ... and 1 more: https://git.openjdk.org/jdk/compare/e34a8318...c60d533c Changes: https://git.openjdk.org/jdk/pull/28002/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370691 Stats: 66541 lines in 134 files changed: 54467 ins; 460 del; 11614 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From vlivanov at openjdk.org Sat Nov 8 02:53:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 8 Nov 2025 02:53:03 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v3] In-Reply-To: References: Message-ID: <-N7FjZgj5rBDPTJFwoosPSW-qWtiQzLm53BX_M5xCZs=.ca2f92fe-f35d-47c1-8778-c4a103d4ccb0@github.com> On Thu, 30 Oct 2025 18:23:46 GMT, Hannes Greule wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27886#pullrequestreview-3437151709 From hgreule at openjdk.org Sat Nov 8 02:53:04 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sat, 8 Nov 2025 02:53:04 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: <7u2xtXRTNr7N0wlHDhY9oOQvMobaPHHxxNz4mfYsork=.f8da7cf8-9fa8-4dea-815f-9f9301d4d451@github.com> References: <7u2xtXRTNr7N0wlHDhY9oOQvMobaPHHxxNz4mfYsork=.f8da7cf8-9fa8-4dea-815f-9f9301d4d451@github.com> Message-ID: On Mon, 27 Oct 2025 18:39:07 GMT, Vladimir Ivanov wrote: >>> If we want to to keep expanded shape while being able to compute its type as if it were the original node, then a new flavor of Cast node may help. The one which keeps the node type and its inputs and can run Value() as if it were the original node. >> >> This is what we'd like to achieve, yes. This PR is basically just a simple workaround. So I guess it comes down to: >> Do we want to have a simple workaround for common cases? And if so, >> 1. Do we want to use this delay mechanism, or >> 2. Do we want to use Cast nodes >> >> I assume that the proper solution in form of a Cast-like node requires some more effort, and I'm not sure if anyone has the resources to work on that in the near future. >> >> >>> What I don't know: how does that interact with other IGVN optimizations, especially those that want to pattern match specific nodes? Inserting such special cast nodes could interrupt `Ideal` optimizations, current pattern matching would not know how to deal with it. Probably it is not a big issue, but I'm not sure. >> >> This isn't much different from methods like `uncast` I think. New methods like `get_in_of_type(index, opcode)` could help in such cases (check for the different ins of the cast), and maybe be even useful for other code in general. > >> This PR is basically just a simple workaround. > > I'm not against proposed solution, just want to be sure we know its limitations and have a proper tool to avoid such bugs in the future. @iwanowww I hope I properly addressed your suggestions, could to take another look? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3493125429 From vlivanov at openjdk.org Sat Nov 8 03:24:09 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 8 Nov 2025 03:24:09 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 06:20:34 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Add more comments for IRs and added method >> - Merge branch 'jdk:master' into JDK-8351623-sve >> - Merge 'jdk:master' into JDK-8351623-sve >> - Address review comments >> - Refine IR pattern and clean backend rules >> - Fix indentation issue and move the helper matcher method to header files >> - Merge branch jdk:master into JDK-8351623-sve >> - 8351623: VectorAPI: Add SVE implementation of subword gather load operation > > Hi @iwanowww , @PaulSandoz , and @eme64 : > > I?ve recently completed a prototype that moves the implementation into the Java API level: > [Refactor subword gather API in Java](https://github.com/XiaohongGong/jdk/pull/8). > > Do you think it would be a good time to open a draft PR for easier review? > > Below is a brief summary of the changes compared with the previous version. > > **Main idea** > > - Invoke VectorSupport.loadWithMap() multiple times in Java when needed, where each call handles a single vector gather load. > - In the compiler, the gathered result is represented as an int vector and then cast to the original subword vector species. Cross-lane shifting aligns the elements correctly. > - The partial results are merged in Java using the Vector.or() API. > > **Advantages** > > - No need to pass all vector indices to HotSpot. > - The design is platform agnostic. > > **Limitations** > > - The Java implementation is less clean to accommodate compiler optimizations. > - Compiler changes remain nontrivial due to required vector/mask casting, resizing, and slicing. > - Additional IR ideal and match rules are needed for optimal SVE code generation. > - The API's performance will **degrade significantly** (about 30% ~ 50%) on platforms that **do not** support compiler intrinsification. Since a single previous API call is now split into multiple calls that cannot be intrinsified, the overhead of generating multiple vector objects in pure Java can be substantial. Does this impact matter? > > I plan to rebase and update the compiler-change PR using the same node and match rules as well, so we can clearly compare both approaches. > > Any thoughts or feedback would be much appreciated. Thanks so much! > > Best Regards, > Xiaohong Nice work, @XiaohongGong! I haven't closely looked at the patch yet, but I very much like the general direction. I don't consider performance regression in default Java implementation a big deal. In the future, we can rethink how default implementations are handled for operations which lack hardware/VM intrinsic support. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3505712211 From rrich at openjdk.org Sat Nov 8 05:26:01 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Sat, 8 Nov 2025 05:26:01 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 16:33:10 GMT, Martin Doerr wrote: >> Indeed but what if that ever changes?? That case will be silently unhandled. > > We will run into ShouldNotReachHere(); // Unimplemented (line 2000) I see thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2506269789 From qamai at openjdk.org Sat Nov 8 11:44:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 8 Nov 2025 11:44:01 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 06:07:22 GMT, Emanuel Peter wrote: >> In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. >> >> The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. >> >> I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. >> >> --------- >> >> Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. >> I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add diagnostic flag for product build Tbh I still feel a little uneasy with this, what if in the future we try to vectorize to a `long` sometimes, too? Is there anything stopping us from creating a new `Phi` for the `VTransformLoopPhiNode` instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28113#issuecomment-3506472710 From qamai at openjdk.org Sat Nov 8 14:08:04 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 8 Nov 2025 14:08:04 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v3] In-Reply-To: References: Message-ID: <-a0c1_ZbQ9m_eDerJCfPq6I-rhlcvcYXkeVKcX6V30g=.d448c7d6-697a-422f-a65b-d6ec97709f69@github.com> On Thu, 30 Oct 2025 18:23:46 GMT, Hannes Greule wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > review Since this is a transformation ordering issue, I think it is best to serialize the transformations. My proposal is to have a separate `PhaseDivConstant` that runs after some rounds of `IterGVN` and before the loop opts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3506577422 From epeter at openjdk.org Sat Nov 8 15:04:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 8 Nov 2025 15:04:05 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v2] In-Reply-To: References: Message-ID: On Sat, 8 Nov 2025 11:41:15 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add diagnostic flag for product build > > Tbh I still feel a little uneasy with this, what if in the future we try to vectorize to a `long` sometimes, too? Is there anything stopping us from creating a new `Phi` for the `VTransformLoopPhiNode` instead? @merykitty Would you prefer if I took the union of the entry edge and backedge type? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28113#issuecomment-3506616584 From qamai at openjdk.org Sat Nov 8 16:02:02 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 8 Nov 2025 16:02:02 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 06:07:22 GMT, Emanuel Peter wrote: >> In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. >> >> The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. >> >> I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. >> >> --------- >> >> Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. >> I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add diagnostic flag for product build That may be more preferable. Or we can track the type in `VTransformLoopPhiNode` and change it when we decide to do the transformation, at the same time as other nodes in the loop? I see that `VTransformLoopPhiNode::apply` returns a `make_scalar`, which seems confusing if it can be a vector, too. Or we can have `VTransformScalarLoopPhi` and `VTransformVectorLoopPhi` as separate classes, but it seems like it will result in some unnecessary duplication. These are just suggestions, and my expertise in the superword vectorizer is definitely lacking, please make the decision that you think is best. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28113#issuecomment-3506668036 From wenanjian at openjdk.org Sat Nov 8 16:13:21 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 8 Nov 2025 16:13:21 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v20] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: clean code and optimize big endian increase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/e415fbd0..1f5ba9cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=18-19 Stats: 50 lines in 1 file changed: 16 ins; 1 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From duke at openjdk.org Sun Nov 9 09:34:52 2025 From: duke at openjdk.org (Tobias Hotz) Date: Sun, 9 Nov 2025 09:34:52 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v10] In-Reply-To: References: Message-ID: > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Move Test to compiler.igvn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26143/files - new: https://git.openjdk.org/jdk/pull/26143/files/45a91bd0..89e60231 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From xgong at openjdk.org Mon Nov 10 02:22:16 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 10 Nov 2025 02:22:16 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 06:20:34 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Add more comments for IRs and added method >> - Merge branch 'jdk:master' into JDK-8351623-sve >> - Merge 'jdk:master' into JDK-8351623-sve >> - Address review comments >> - Refine IR pattern and clean backend rules >> - Fix indentation issue and move the helper matcher method to header files >> - Merge branch jdk:master into JDK-8351623-sve >> - 8351623: VectorAPI: Add SVE implementation of subword gather load operation > > Hi @iwanowww , @PaulSandoz , and @eme64 : > > I?ve recently completed a prototype that moves the implementation into the Java API level: > [Refactor subword gather API in Java](https://github.com/XiaohongGong/jdk/pull/8). > > Do you think it would be a good time to open a draft PR for easier review? > > Below is a brief summary of the changes compared with the previous version. > > **Main idea** > > - Invoke VectorSupport.loadWithMap() multiple times in Java when needed, where each call handles a single vector gather load. > - In the compiler, the gathered result is represented as an int vector and then cast to the original subword vector species. Cross-lane shifting aligns the elements correctly. > - The partial results are merged in Java using the Vector.or() API. > > **Advantages** > > - No need to pass all vector indices to HotSpot. > - The design is platform agnostic. > > **Limitations** > > - The Java implementation is less clean to accommodate compiler optimizations. > - Compiler changes remain nontrivial due to required vector/mask casting, resizing, and slicing. > - Additional IR ideal and match rules are needed for optimal SVE code generation. > - The API's performance will **degrade significantly** (about 30% ~ 50%) on platforms that **do not** support compiler intrinsification. Since a single previous API call is now split into multiple calls that cannot be intrinsified, the overhead of generating multiple vector objects in pure Java can be substantial. Does this impact matter? > > I plan to rebase and update the compiler-change PR using the same node and match rules as well, so we can clearly compare both approaches. > > Any thoughts or feedback would be much appreciated. Thanks so much! > > Best Regards, > Xiaohong > Nice work, @XiaohongGong! I haven't closely looked at the patch yet, but I very much like the general direction. I don't consider performance regression in default Java implementation a big deal. In the future, we can rethink how default implementations are handled for operations which lack hardware/VM intrinsic support. Thank you very much for your input so far?it?s been extremely helpful. I have an additional concern regarding the `slice` operation for both masks and vectors. While `Vector.slice` exists and works well for vector merging, there?s currently no equivalent operation for vector masks when using the masked gather API and splitting is required. Adding such an API or not both come with trade-offs: 1) Adding a slice API for masks: This would likely make the compiler code cleaner. However, it would also increase patch complexity and could present performance issues on certain architectures such as SVE. Optimizing codegen for this path might require significant additional compiler work. 2) Not adding a slice API for masks: As @PaulSandoz suggested, we could move mask slicing to the compiler by passing an `origin` to the intrinsic, and similarly move vector slice operations to the compiler. This approach, however, introduces another issue: `VectorSlice` with a constant index is not universally supported across all architectures and vector species?see the X86 limitation ([VectorSlice details](https://github.com/openjdk/jdk/pull/24104/files#diff-d6a3624f0f0af65a98a47378a5c146eed5016ca09b4de1acd0a3acc823242e82R1726)). While it can be implemented with rearrange/blend as alternatives, they would complicate the compiler code. If left unsupported, gather API intrinsification would fail and fall back to Java implementation, causing significant performance regression I guess. I am unsure of the best approach for implementing the `slice` operations cleanly and efficiently, so I would greatly appreciate any additional feedback or suggestions on this topic. Thank you again for your help! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3509156972 From vlivanov at openjdk.org Mon Nov 10 02:27:15 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 10 Nov 2025 02:27:15 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v24] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Revise RF redunancy & auto-boxed primitives handling Cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/3fb6e205..842c3d61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=22-23 Stats: 203 lines in 6 files changed: 40 ins; 134 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Mon Nov 10 02:40:08 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 10 Nov 2025 02:40:08 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: <81c8AWB-Z0_xjM-sjnCYwI-gmNPOYH2MnapbYTPctiM=.c75465ef-e819-4d52-88d3-3a4c398957ab@github.com> On Sat, 8 Nov 2025 00:36:12 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/reachability.cpp line 88: >> >>> 86: // RF is redundant for some referent oop when the referent has another user which keeps it alive across the RF. >>> 87: // In terms of dominance relation it can be formulated as "a referent has a user which is dominated by the redundant RF". >>> 88: // Until loop opts are over, only RF nodes are considered as usages (controlled by rf_only flag). >> >> Can you say why until after loop opts are over only RF are considered? >> >> How does this play with allocation elimination etc? What if we run this after loop opts where we still have the allocation, but before the elimination. And then we eventually lose all referents. Could something like that happen? > >> Can you say why until after loop opts are over only RF are considered? > > It is intended to ensure a RF is superseded by another RF. We can't guarantee that non-RF users won't go away any time before loop opts are over, so there won't be anything keeping the referent alive. > > But now I don't see why a similar problem doesn't affect RFs as well. While RFs don't go away on their own, they can be pruned as part of dead code elimination. > > I'll rethink how RF redundancy is defined. > >> How does this play with allocation elimination etc? > > This method works on RF nodes and, hence, it is applicable until RF nodes are eliminated after loop opts are over. > > RF nodes are ignored during scalarization attempts, so are reachability edges. When an allocation is scalarized, either all RFs having it as a referent or all of it's reachability edges are removed. I decided to play it safe and got rid of dominance-based redundancy checks until expansion takes place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2508552253 From vlivanov at openjdk.org Mon Nov 10 02:40:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 10 Nov 2025 02:40:11 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 10:21:50 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/reachability.cpp line 108: > >> 106: return true; // ignore fences on boxed primitives produced by valueOf methods >> 107: } >> 108: } > > Nice, thanks for adding the comment. I'm trying to understand the reason why that is ok. > > So someone would have set a RF for a boxed primitive. But we don't expect anything to ever be attached to a boxed primitive, and so we can just ignore these RF? Is that the reason? Might be worth writing it in a code comment explicitly. I stumbled upon it while inspecting IR with `-XX:+StressReachabiltiyFences` when RF is added for auto-boxed argument. (`is_boxing_method()` matches `valueOf` factory, not an explicitly allocated box). Thinking more about it, there's a way to observe the absence of RF on a auto-boxed instance when (1) value is out of range for internal caches; and (2) there's java.lang.ref.Reference instance registered for it. So, I moved the logic under `StressReachabilityFences` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2508550244 From wenanjian at openjdk.org Mon Nov 10 03:09:41 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 10 Nov 2025 03:09:41 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v21] In-Reply-To: References: Message-ID: <146aFq2a3ekrh2dF9Ze9UxxUIpK9n0hR_Wj2Ze0fLBs=.112e1b28-677a-4456-a945-ac230aaf2370@github.com> > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: clean comments and format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/1f5ba9cd..a379a39f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=19-20 Stats: 8 lines in 1 file changed: 1 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Mon Nov 10 05:59:02 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 10 Nov 2025 05:59:02 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v22] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: modify parm to unsigned as aarch64 and x86 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/a379a39f..09a31b7d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=20-21 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From jkarthikeyan at openjdk.org Mon Nov 10 06:19:20 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 10 Nov 2025 06:19:20 GMT Subject: Integrated: 8350468: x86: Improve implementation of vectorized numberOfLeadingZeros for int and long In-Reply-To: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> Message-ID: On Mon, 4 Aug 2025 02:20:31 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > LeadingZeros.testInt avgt 15 91.097 ? 3.276 ns/op 68.665 ? 1.740 ns/op (+ 28.1%) > LeadingZeros.testLong avgt 15 342.545 ? 4.470 ns/op 228.668 ? 5.994 ns/op (+ 39.9%) > > I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated! This pull request has now been integrated. Changeset: f77a5117 Author: Jasmine Karthikeyan URL: https://git.openjdk.org/jdk/commit/f77a5117db2d01a935762e948aef2d0ade3512a3 Stats: 225 lines in 3 files changed: 160 ins; 17 del; 48 mod 8350468: x86: Improve implementation of vectorized numberOfLeadingZeros for int and long Co-authored-by: Raffaello Giulietti Reviewed-by: sviswanathan, qamai, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/26610 From jkarthikeyan at openjdk.org Mon Nov 10 06:19:18 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 10 Nov 2025 06:19:18 GMT Subject: RFR: 8350468: x86: Improve implementation of vectorized numberOfLeadingZeros for int and long [v2] In-Reply-To: References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com> Message-ID: On Wed, 5 Nov 2025 01:28:31 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> LeadingZeros.testInt avgt 15 91.097 ? 3.276 ns/op 68.665 ? 1.740 ns/op (+ 28.1%) >> LeadingZeros.testLong avgt 15 342.545 ? 4.470 ns/op 228.668 ? 5.994 ns/op (+ 39.9%) >> >> I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fix typo in comment > - Merge branch 'master' into optimize-leading-zero > - Optimize numberOfLeadingZeros Thanks for the reviews! I'll integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26610#issuecomment-3509613496 From dfenacci at openjdk.org Mon Nov 10 07:35:02 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 10 Nov 2025 07:35:02 GMT Subject: RFR: 8341039: compiler/cha/TypeProfileFinalMethod.java fails with assertEquals expected: 0 but was: 2 In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 16:25:59 GMT, Daniel Lund?n wrote: > The test `compiler/cha/TypeProfileFinalMethod.java` exercises a specific compilation pattern and easily breaks by setting various VM flags (e.g., `-Xcomp`). > > ### Changeset > > - Make the test flagless. > - Ensure the test only compiles the intended methods. > - Fix problems with compiler directives used in the test (incorrect signatures and some directives getting unintentionally shadowed by other directives). > - Force C2 inlining of a method which the test author likely intended to always be inlined (based on source code comments in the test). > - Switch argument order in `assertEquals` to make error message correct. > > Note for reviewers: A more fundamental rewrite of the test is beyond the scope of this changeset. The objective here is simply to ensure the test runs only in contexts intended by the test author. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18972906513) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Stress testing of the specific test on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Thanks for this "refactoring" @dlunde. LGTM (just 1 question) test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java line 28: > 26: * @test > 27: * @summary test c1 to record type profile with CHA optimization > 28: * @requires vm.flavor == "server" & vm.flagless I guess this change is part of the "Make the test flagless" part (and `TieredStopAtLevel` filters seem anyway a bit odd) but did you figure out why this was added first? Was it possibly just a mistake (since we create a new process anyway)? ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28200#pullrequestreview-3441273573 PR Review Comment: https://git.openjdk.org/jdk/pull/28200#discussion_r2509065656 From 617263736 at qq.com Mon Nov 10 08:12:30 2025 From: 617263736 at qq.com (=?utf-8?B?5pif5pm0?=) Date: Mon, 10 Nov 2025 16:12:30 +0800 Subject: Question: Could hardware problems cause JIT deoptimization? Message-ID: Hi all, I would like to ask whether hardware problems could lead to JIT deoptimization behavior in hotspot. We have deployed the same application (Gravitee Gateway 3.5.16) on multiple servers with the same software configuration. However, when capturing flame graphs, we observed significantly different levels of JIT deoptimization across machines. On one server, the percentage of deoptimized frames was as high as 40%. So my question is: the high deoptimization rate is caused by hardware problems? Thanks for your insights, Hang Ren -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgreule at openjdk.org Mon Nov 10 08:23:05 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 10 Nov 2025 08:23:05 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v3] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 18:23:46 GMT, Hannes Greule wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > review That would work too, and would lower the need for a Cast-like node (although another phase is less flexible). I don't have a strong opinion here, but others might have :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3510090758 From bmaillard at openjdk.org Mon Nov 10 08:42:13 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 10 Nov 2025 08:42:13 GMT Subject: RFR: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive [v6] In-Reply-To: References: <1pRKczP7EhfqRnLV2KhyD48bEwDThYfaLNU8FKdTP8A=.9f87dcee-0025-48e4-946f-6affd4e05728@github.com> Message-ID: <67b7RNOMAc2vsDPNQI5Ci4szNlkVr6ReEqvJaxl_JtM=.064f195d-b55a-4266-bc7a-2528f2f8b129@github.com> On Fri, 7 Nov 2025 20:13:07 GMT, Christian Hagedorn wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Refine comments > > Looks good, thanks for the updated comments! Thank you for the reviews @chhagedorn @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27900#issuecomment-3510182839 From bmaillard at openjdk.org Mon Nov 10 08:42:14 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 10 Nov 2025 08:42:14 GMT Subject: Integrated: 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive In-Reply-To: References: Message-ID: On Mon, 20 Oct 2025 14:08:59 GMT, Beno?t Maillard wrote: > This PR addresses a missed optimization opportunity in `PhaseIterGVN`. The missed optimization is the simplification of redundant conversion patterns of the shape `ConvX2Y->ConvY2X->ConvX2Y`. > > This optimization pattern is implemented as an ideal optimization on `ConvX2Y` nodes. Because it depends on the input of the input of the node in question, we need to have an appropriate notification mechanism in `PhaseIterGVN::add_users_of_use_to_worklist`. The notification for this pattern was added in [JDK-8359603](https://bugs.openjdk.org/browse/JDK-8359603). > > However, that fix was based on the wrong assumption that in `PhaseIterGVN::add_users_of_use_to_worklist`, argument `n` is already the optimized node. However in some cases this argument is actually the node that is about to get replaced. > > This happens for example in `PhaseIterGVN::transform_old`. If we find that node `k` returned by `Ideal` actually already exists by calling `hash_find_insert(k)`, we call `add_users_to_worklist(k)`. > As we replace node `k` with `i`, and `i` as a different opcode than `k`, then we cannot use the opcode of `k` to detect the redundant conversion pattern. > > ```c++ > ... > // Global Value Numbering > i = hash_find_insert(k); // Check for pre-existing node > if (i && (i != k)) { > // Return the pre-existing node if it isn't dead > NOT_PRODUCT(set_progress();) > add_users_to_worklist(k); > subsume_node(k, i); // Everybody using k now uses i > return i; > } > ... > > > The bug was quite intermittent and only showed up in some cases with `-XX:+StressIGVN`. > > ### Proposed Fix > > We make the detection of the pattern less specific by only looking at the opcode of the user of `n`, and not directly the opcode of `n`. This is consistent with the detection of other patterns in `PhaseIterGVN::add_users_of_use_to_worklist`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8369646) > - [x] tier1-3, plus some internal testing > - [x] Added a second run for the existing test with `-XX:+StressIGVN` and a fixed stress seed > > Thank you for reviewing! This pull request has now been integrated. Changeset: 5e8bf7a2 Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/5e8bf7a283f75464dbd906454c852e4d1db497dc Stats: 33 lines in 3 files changed: 26 ins; 0 del; 7 mod 8369646: Detection of redundant conversion patterns in add_users_of_use_to_worklist is too restrictive Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/27900 From bmaillard at openjdk.org Mon Nov 10 08:44:14 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 10 Nov 2025 08:44:14 GMT Subject: RFR: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code [v9] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 16:10:22 GMT, Damon Fenacci wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > Looks good to me too. Thanks @benoitmaillard! Thank you for reviewing @dafedafe @chhagedorn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27731#issuecomment-3510188466 From bmaillard at openjdk.org Mon Nov 10 08:44:16 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 10 Nov 2025 08:44:16 GMT Subject: Integrated: 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code In-Reply-To: References: Message-ID: On Thu, 9 Oct 2025 14:48:37 GMT, Beno?t Maillard wrote: > This PR prevents the C2 compiler from hitting memory limits during compilation when using `-XX:+StressLoopPeeling` and `-XX:+VerifyLoopOptimizations` in certain edge cases. The fix addresses an issue where the `ciEnv` arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations. > > ### Analysis > > This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced > and added to this PR as a regression test. > > The test contains a switch inside a loop, and stressing the loop peeling results in > a fairly complex graph. The split-if optimization is applied agressively, and we > run a verification pass at every progress made. > > We end up with a relatively high number of verification passes, with each pass being > fairly expensive because of the size of the graph. > Each verification pass requires building a new `IdealLoopTree`. This is quite slow > (which is unfortunately hard to mitigate), and also causes inefficient memory usage > on the `ciEnv` arena. > > The inefficient usages are caused by the `ciInstanceKlass::get_field_by_offset` method. > At every call, we have > - One allocation on the `ciEnv` arena to store the returned `ciField` > - The constructor of `ciField` results in a call to `ciObjectFactory::get_symbol`, which: > - Allocates a new `ciSymbol` on the `ciEnv` arena at every call (when not found in `vmSymbols`) > - Pushes the new symbol to the `_symbols` array > > The `ciEnv` objects returned by `ciInstanceKlass::get_field_by_offset` are only used once, to > check if the `BasicType` of a static field is a reference type. > > In `ciObjectFactory`, the `_symbols` array ends up containg a large number of duplicates for certain symbols > (up to several millions), which hints at the fact that `ciObjectFactory::get_symbol` should not be called > repeatedly as it is done here. > > The stack trace of how we get to the `ciInstanceKlass::get_field_by_offset` is shown below: > > > ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412 > TypeOopPtr::TypeOopPtr type.cpp:3484 > TypeInstPtr::TypeInstPtr type.cpp:3953 > TypeInstPtr::make type.cpp:3990 > TypeInstPtr::add_offset type.cpp:4509 > AddPNode::bottom_type addnode.cpp:696 > MemNode::adr_type memnode.cpp:73 > PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477 > PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439 > PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827 > PhaseIdealLoop::build_loop_late_post loopnode.cpp:67... This pull request has now been integrated. Changeset: 0c1b7267 Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/0c1b7267e374192f30322a45a1a34f734565cc15 Stats: 152 lines in 4 files changed: 138 ins; 8 del; 6 mod 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code Reviewed-by: chagedorn, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/27731 From epeter at openjdk.org Mon Nov 10 09:19:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Nov 2025 09:19:05 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v2] In-Reply-To: References: Message-ID: On Sat, 8 Nov 2025 15:59:50 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add diagnostic flag for product build > > That may be more preferable. Or we can track the type in `VTransformLoopPhiNode` and change it when we decide to do the transformation, at the same time as other nodes in the loop? I see that `VTransformLoopPhiNode::apply` returns a `make_scalar`, which seems confusing if it can be a vector, too. Or we can have `VTransformScalarLoopPhi` and `VTransformVectorLoopPhi` as separate classes, but it seems like it will result in some unnecessary duplication. > > These are just suggestions, and my expertise in the superword vectorizer is definitely lacking, please make the decision that you think is best. @merykitty Yeah, the modeling is not yet perfect in `VTransform`. We can keep perfecting it as the need arises. I'll do some experiments now, and note some down for later :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28113#issuecomment-3510363872 From mhaessig at openjdk.org Mon Nov 10 09:51:08 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 10 Nov 2025 09:51:08 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v10] In-Reply-To: References: Message-ID: On Sun, 9 Nov 2025 09:34:52 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Move Test to compiler.igvn Thank you for all your work, @ichttt. Looks good to me now. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3442030376 From aph at openjdk.org Mon Nov 10 10:58:02 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 10 Nov 2025 10:58:02 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue In-Reply-To: References: Message-ID: On Sat, 1 Nov 2025 14:50:27 GMT, Zihao Lin wrote: > If nodes both are constant, support constant folding. src/hotspot/share/opto/mulnode.cpp line 622: > 620: const TypeLong *longType1 = t1->is_long(); > 621: const TypeLong *longType2 = t2->is_long(); > 622: if(longType1 && longType2 && longType1->is_con() && longType2->is_con()){ Suggestion: if(longType1 != nullptr && longType2 != nullptr && longType1->is_con() && longType2->is_con()){ I know, it seems a bit fussy, but that's the way we do it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2509939997 From aph-open at littlepinkcloud.com Mon Nov 10 12:00:32 2025 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Mon, 10 Nov 2025 12:00:32 +0000 Subject: Question: Could hardware problems cause JIT deoptimization? In-Reply-To: References: Message-ID: <94c49f50-add6-42be-b3a6-49a11720346f@littlepinkcloud.com> On 10/11/2025 08:12, ?? wrote: > We have deployed the same application (Gravitee Gateway 3.5.16) on multiple servers with the same software configuration. However, when capturing flame graphs, we observed significantly different levels of JIT deoptimization across machines. On one server, the percentage of deoptimized frames was as high as 40%. > So my question is: the high deoptimization rate is caused by hardware problems? Almost by definition, hardware problems can cause anything. But some profiling can cause deoptimization when it rewrites bytecodes. For example, async-profiler can do it if you use method tracing. So I'd investigate this possibility first. -- Andrew Haley (he/him) Java Platform Lead Engineer https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dlunden at openjdk.org Mon Nov 10 12:28:11 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 10 Nov 2025 12:28:11 GMT Subject: RFR: 8341039: compiler/cha/TypeProfileFinalMethod.java fails with assertEquals expected: 0 but was: 2 In-Reply-To: References: Message-ID: <0OuXo37mgXAp-0dsFSNqroKE3_4lpGSviktpBqcxY-8=.f22a5f10-f70d-4305-9780-c8885221aaf9@github.com> On Mon, 10 Nov 2025 07:25:57 GMT, Damon Fenacci wrote: >> The test `compiler/cha/TypeProfileFinalMethod.java` exercises a specific compilation pattern and easily breaks by setting various VM flags (e.g., `-Xcomp`). >> >> ### Changeset >> >> - Make the test flagless. >> - Ensure the test only compiles the intended methods. >> - Fix problems with compiler directives used in the test (incorrect signatures and some directives getting unintentionally shadowed by other directives). >> - Force C2 inlining of a method which the test author likely intended to always be inlined (based on source code comments in the test). >> - Switch argument order in `assertEquals` to make error message correct. >> >> Note for reviewers: A more fundamental rewrite of the test is beyond the scope of this changeset. The objective here is simply to ensure the test runs only in contexts intended by the test author. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18972906513) >> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Stress testing of the specific test on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java line 28: > >> 26: * @test >> 27: * @summary test c1 to record type profile with CHA optimization >> 28: * @requires vm.flavor == "server" & vm.flagless > > I guess this change is part of the "Make the test flagless" part (and `TieredStopAtLevel` filters seem anyway a bit odd) but did you figure out why this was added first? Was it possibly just a mistake (since we create a new process anyway)? Thanks for the review @dafedafe! > I guess this change is part of the "Make the test flagless" part (and TieredStopAtLevel filters seem anyway a bit odd) but did you figure out why this was added first? I would guess the conditions for `TieredStopAtLevel` were added to ensure this particular flag could not break the test. However, there are many other flags that can also break the test. Hence, we need flagless (which subsumes the `TieredStopAtLevel` conditions). > Was it possibly just a mistake (since we create a new process anyway)? The `createTestJavaProcessBuilder` method adds the default jvm options from jtreg, test.vm.opts and test.java.opts (see the source code comment for `createTestJavaProcessBuilder`). So no, not a mistake, but also not a complete safeguard. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28200#discussion_r2510348285 From epeter at openjdk.org Mon Nov 10 13:07:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Nov 2025 13:07:40 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v3] In-Reply-To: References: Message-ID: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. > > The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. > > I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - second reproducer - move fix to apply_backedge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28113/files - new: https://git.openjdk.org/jdk/pull/28113/files/4dfd6100..5e6b99b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=01-02 Stats: 73 lines in 2 files changed: 56 ins; 10 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28113.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28113/head:pull/28113 PR: https://git.openjdk.org/jdk/pull/28113 From epeter at openjdk.org Mon Nov 10 14:03:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Nov 2025 14:03:39 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v2] In-Reply-To: References: Message-ID: On Sat, 8 Nov 2025 15:59:50 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add diagnostic flag for product build > > That may be more preferable. Or we can track the type in `VTransformLoopPhiNode` and change it when we decide to do the transformation, at the same time as other nodes in the loop? I see that `VTransformLoopPhiNode::apply` returns a `make_scalar`, which seems confusing if it can be a vector, too. Or we can have `VTransformScalarLoopPhi` and `VTransformVectorLoopPhi` as separate classes, but it seems like it will result in some unnecessary duplication. > > These are just suggestions, and my expertise in the superword vectorizer is definitely lacking, please make the decision that you think is best. @merykitty In the meantime, we have had another fuzzer failure detected. Interestingly, it fails also because of the same refactoring, but for a slightly different reason. It is not about constant folding, but rather about dead nodes that are still attached to the phi. With `StressIGVN` we can get the old scalar reduction nodes to check their input, and find the new vectorized phi type. That leads to a scalar/vector type mismatch. I'll now try to implement a `VTransformVectorLoopPhi`, which creates a new phi node, so we are sure no old nodes are attached to it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28113#issuecomment-3511839802 From fgao at openjdk.org Mon Nov 10 15:25:21 2025 From: fgao at openjdk.org (Fei Gao) Date: Mon, 10 Nov 2025 15:25:21 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v2] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Mon, 8 Sep 2025 10:16:16 GMT, Emanuel Peter wrote: > Could you please add a general comment about what this does at the top? Done > The name is a bit funny with goo, but that's not your fault. If you have a better name feel free to rename ;) Renaming `clone_up_backedge_goo` would affect some unrelated files, so I?d prefer to handle the renaming in a separate patch :) > src/hotspot/share/opto/loopnode.hpp line 1434: > >> 1432: Node* get_vectorized_drain_input(Node* main_backedge_ctrl, VectorSet& visited, >> 1433: Node_Stack& clones, Node* main_merge_region, >> 1434: Node* main_phi); > > We don't just do this for the trip-counter though, right? Because the `main_incr` suggests that a bit here. Could you rephrase to make it more accurate? Do you think that could be worth it? It is also nice to have the analogy to the trip-counter, so I like that in the example ASCII art. Yes, it applies to all values that increase as the loop iterates. I?m afraid I forgot to rename `main_incr` to a more general name after refactoring the code here. I?ll update it in the next commit. How about renaming it to `main_out`? > src/hotspot/share/opto/loopopts.cpp line 2466: > >> 2464: // Find the phi node merging the data from pre-loop and vector main-loop. >> 2465: Node_List visit_list; >> 2466: Node_List phi_list; > > You are doing this in a loop. And you set no `ResouceMark`. I'm afraid this could end up allocating a lot of memory. What do you think? The `old_new` map grows within the live ranges of `phi_list` and `visit_list`, so we can?t use `ResourceMark` here. In the new commit, I?ve moved the declarations of these two variables outside the loop and clear them at the start of each new iteration. Does that make sense? > src/hotspot/share/opto/loopopts.cpp line 2514: > >> 2512: assert(!has_ctrl(outn) || !has_ctrl(curr) || is_dominator(get_ctrl(curr), get_ctrl(outn)), >> 2513: "Only these nodes controlled by loop exit edge need to be cloned"); >> 2514: visit_list.push(outn); > > Might we visit nodes more than once? Or is that already prevented? Yes, we did a check while visiting the node in the list: Node* curr = visit_list.at(0); visit_list.remove(0); Node* newcurr = old_new[curr->_idx]; if (newcurr != nullptr) { continue; } newcurr = curr->clone(); ... old_new.map(curr->_idx, newcurr); > Do we need to do both fix_data_uses and handle_data_uses_for_vectorized_drain? Ah, they do it one for the old and one for the new loop? > > It is kinda funny that we do a loop here for the old loop, but then do the loop inside fix_data_uses for the other loop - did I understand this right? `fix_data_uses` fixes data uses for all nodes in the loop body. The loop here handles nodes collected in `extra_data_nodes`, which may belong to the outer loop or represent special cases. > We can also do that in a separate RFE first maybe? Because now with the large switch case here things are harder to read and get an overview quickly. What do you think? Sounds great. :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2510974372 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2510975101 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2510972400 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2503311208 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2510985823 From fgao at openjdk.org Mon Nov 10 15:25:23 2025 From: fgao at openjdk.org (Fei Gao) Date: Mon, 10 Nov 2025 15:25:23 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v2] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Mon, 8 Sep 2025 09:55:07 GMT, Emanuel Peter wrote: >> So you could actually make the type more precise than `Node*` :) > > Or do we have to somehow support `long` loops too here? then we could just make it an `AddNode*`. I would keep it as `Node*` and rename it to `new_trip_cnt`, since it is a `PhiNode` when creating a `vectorized drain` loop, but an `AddNode` when creating a `post` loop. Does that make sense? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2510983987 From fgao at openjdk.org Mon Nov 10 15:25:24 2025 From: fgao at openjdk.org (Fei Gao) Date: Mon, 10 Nov 2025 15:25:24 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v2] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Mon, 8 Sep 2025 10:41:58 GMT, Emanuel Peter wrote: >> Ah, are we only removing nodes? > > Oh, you have another implicit zero check here. > Ah, are we only removing nodes? Yes, just removing nodes here. >> Maybe you can construct some graph where this really visits a lot of nodes, then this could blow up quadratically. > > `pop` is more efficient, because it just takes it from the end. But then you'd get a DFS and not BFS. Yes, we need BFS here ? that?s why I used `remove`. I haven?t yet figured out a more efficient way to handle some of the corner cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2510964595 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2503271161 From fgao at openjdk.org Mon Nov 10 15:25:27 2025 From: fgao at openjdk.org (Fei Gao) Date: Mon, 10 Nov 2025 15:25:27 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v2] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Mon, 8 Sep 2025 10:54:46 GMT, Emanuel Peter wrote: > Can you quickly say what this loop does with each phi? For each Phi node, referred to as `main_merge_phi`, we create a corresponding `drain_merge_phi` as one of its new data uses, as shown below: main_merge_phi = Phi (pre_out, main_out) drain_merge_phi = Phi (drain_out, main_merge_phi) >> test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java line 86: >> >>> 84: "multiversion_delayed_slow", "= 0", // The second loop's multiversion_if was also not used, so it is constant folded after loop opts. >>> 85: "multiversion", ">= 5", // nothing unexpected >>> 86: "multiversion", "<= 7", // nothing unexpected >> >> Can you please also add a lower bound for >> `"post .* multiversion_fast", ">= 3",` >> That should be correct, right? >> >> Ah ok, now we also vectorize the smaller (first) loop. But we still fully unroll the main-loop, because its stride becomes too large compared to the SIZE, right? But the post-vectorized loop is still reachable. Correct? >> >> >> I'm a little bit unsure where the `On platforms (> 32 bytes)` is coming from. Does this IR rule fail with a smaller `MaxVectorSize=32`? >> >> I'm wondering if it would make sense to have a few extra IR tests, with various constant SIZEs, and see which ones constant fold which loops, and if that happens as expected. I think that would be worth it. >> >> You could even automate this to some degree with the template framework. We could also make this a follow-up RFE. > > I'm also wondering if it would not be nicer to have a different tag for the vectorized drain loop, instead of `post`. Could we call it `vector_drain` maybe? That would make it easier to spot it correctly and to write more expressive IR rules. > Can you please also add a lower bound for > "post .* multiversion_fast", ">= 3", > That should be correct, right? Updated. > Ah ok, now we also vectorize the smaller (first) loop. But we still fully unroll the main-loop, because its stride becomes too large compared to the SIZE, right? But the post-vectorized loop is still reachable. Correct? > I'm a little bit unsure where the On platforms (> 32 bytes) is coming from. Does this IR rule fail with a smaller MaxVectorSize=32? Yes, this original IR rule fail on `32-byte` machine. I suppose we don?t always fully unroll the main loop. Taking the `20-iteration` short loop as an example, one `32-byte` vector operation can handle 8 iterations. Based on the unrolling policy, the `main` loop might be unrolled only once, allowing it to process 16 iterations per round. The `pre-loop` would probably handle the first 4 iterations. In that case, the `vectorized drain` loop becomes redundant. I?m surprised that GVN and loop optimization can recognize this redundancy and eliminate it. > I'm wondering if it would make sense to have a few extra IR tests, with various constant SIZEs, and see which ones constant fold which loops, and if that happens as expected. I think that would be worth it. > You could even automate this to some degree with the template framework. We could also make this a follow-up RFE. > I'm also wondering if it would not be nicer to have a different tag for the vectorized drain loop, instead of post. Could we call it vector_drain maybe? That would make it easier to spot it correctly and to write more expressive IR rules. That sounds good. I?ll keep that in mind and provide a more precise test framework for the vectorized drain loop in the follow-up RFE. >> test/hotspot/jtreg/compiler/loopopts/superword/TestVectorizedDrainLoop.java line 31: >> >>> 29: * generated by fuzzer. >>> 30: * >>> 31: * @run main/othervm -Xint compiler.loopopts.superword.TestVectorizedDrainLoop >> >> What is the interpreter run good for? Why not just have a run without any flags instead? > > Ah, you have exact constant results that you compare with. Could be good to state this here as a comment, so that nobody removes this in the future. You are just making sure that the interpreter would have produced the same results. > > Still: why not add a run without any flags? Added a comment in the short summary part for interpreter run. Also added a run without any flags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2510788293 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2510991242 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2502434896 From bmaillard at openjdk.org Mon Nov 10 15:33:21 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 10 Nov 2025 15:33:21 GMT Subject: RFR: 8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL Message-ID: This PR addresses a missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. The affected optimization is the transformation of `(x & mask) >> shift` into `(x >> shift) & (mask >> shift)`, where `mask` is a constant, for `URShiftL` and `URShiftI` nodes. This transformation is handled in `URShiftLNode::IdealIL` and `URShiftINode::IdealIL`. [JDK-8361700](https://bugs.openjdk.org/browse/JDK-8361700) addressed the analog case for `RShiftL` and `RShiftI`, but lacked the notification for unsigned shifting. This PR builds on top of [JDK-8361700](https://bugs.openjdk.org/browse/JDK-8361700) and adds the following changes: - Fix the notification mechanism in `add_users_of_use_to_worklist` - Add the `URShiftL` in `TestMaskAndRShiftReorder.java` - Drive-by changes: simplify the `RShiftL` test case slightly, and add the missing analog case for `RShiftI` I tried to reproduce the missing optimization for the `URShiftI` without success. There must be some subtle difference with the `long` case that causes the optimization to be triggered in this specific setup. I still added the case to the fix in `add_users_of_use_to_worklist`, as there are likely cases where the notification is missing (but I was just not able to find one). ### Testing - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) - [x] tier1-4, plus some internal testing Thank you for reviewing! ------------- Commit messages: - Format - Add missing case in add_users_of_use_to_worklist - Update test and add testURShiftL case Changes: https://git.openjdk.org/jdk/pull/28218/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28218&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371534 Stats: 35 lines in 2 files changed: 27 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28218.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28218/head:pull/28218 PR: https://git.openjdk.org/jdk/pull/28218 From thartmann at openjdk.org Mon Nov 10 15:37:46 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 10 Nov 2025 15:37:46 GMT Subject: RFR: 8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 15:23:25 GMT, Beno?t Maillard wrote: > This PR addresses a missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. > > The affected optimization is the transformation of `(x & mask) >> shift` into `(x >> shift) & (mask >> shift)`, where `mask` is a constant, for `URShiftL` and `URShiftI` nodes. This transformation is handled in `URShiftLNode::IdealIL` and `URShiftINode::IdealIL`. [JDK-8361700](https://bugs.openjdk.org/browse/JDK-8361700) addressed the analog case for `RShiftL` and `RShiftI`, but lacked the notification for unsigned shifting. > > This PR builds on top of [JDK-8361700](https://bugs.openjdk.org/browse/JDK-8361700) and adds the following changes: > - Fix the notification mechanism in `add_users_of_use_to_worklist` > - Add the `URShiftL` in `TestMaskAndRShiftReorder.java` > - Drive-by changes: simplify the `RShiftL` test case slightly, and add the missing analog case for `RShiftI` > > > I tried to reproduce the missing optimization for the `URShiftI` without success. There must be some subtle difference with the `long` case that causes the optimization to be triggered in this specific setup. I still added the case to the fix in `add_users_of_use_to_worklist`, as there are likely cases where the notification is missing (but I was just not able to find one). > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! That looks good to me. Thanks for quickly jumping on this! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28218#pullrequestreview-3443889981 From epeter at openjdk.org Mon Nov 10 15:43:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Nov 2025 15:43:41 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v4] In-Reply-To: References: Message-ID: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. > > The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. > > I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: wip fix, still broken ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28113/files - new: https://git.openjdk.org/jdk/pull/28113/files/5e6b99b2..20c32b31 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=02-03 Stats: 93 lines in 3 files changed: 69 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/28113.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28113/head:pull/28113 PR: https://git.openjdk.org/jdk/pull/28113 From snatarajan at openjdk.org Mon Nov 10 15:50:11 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 10 Nov 2025 15:50:11 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v4] In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: > The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). > > ### Fix > Implemented the suggested refactoring. > > ### Testing > Github Actions, Tier 1-3 Saranya Natarajan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into JDK-8349835 - addressing review comments#2 - fixing test failure - addressing review comments - changing int to bool in a struct - fix to failing test - initial fix ------------- Changes: https://git.openjdk.org/jdk/pull/26902/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=03 Stats: 205 lines in 2 files changed: 89 ins; 109 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/26902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26902/head:pull/26902 PR: https://git.openjdk.org/jdk/pull/26902 From epeter at openjdk.org Mon Nov 10 15:56:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Nov 2025 15:56:34 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v5] In-Reply-To: References: Message-ID: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. > > The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. > > I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: refine fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28113/files - new: https://git.openjdk.org/jdk/pull/28113/files/20c32b31..d2fac68d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=03-04 Stats: 15 lines in 1 file changed: 1 ins; 11 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28113.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28113/head:pull/28113 PR: https://git.openjdk.org/jdk/pull/28113 From epeter at openjdk.org Mon Nov 10 15:59:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Nov 2025 15:59:41 GMT Subject: Integrated: 8340093: C2 SuperWord: implement cost model In-Reply-To: References: Message-ID: On Tue, 14 Oct 2025 16:10:22 GMT, Emanuel Peter wrote: > Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests. > > Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964 > > Main goal: > - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions). > - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others. > > **Why cost-model?** > > Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea. > > But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations. > > Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable. > > **Implementation** > > Items: > - New `VTransform::is_profitable`: checks cost-model and some other cost related checks. > - `VLoopAnalyzer::cost`: scalar loop cost > - `VTransformGraph::cost`: vector loop cost > - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions. > - Adapted existing tests. > - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below. > > **Testing** > Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below. > > ------------------------------ > > **Some History** > > I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/). > > During JDK9, reductions were first vectorized, but then restricted for... This pull request has now been integrated. Changeset: 72989e0f Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/72989e0fac7dae1bfec40e3017ba89aa201cc8ee Stats: 2949 lines in 13 files changed: 2855 ins; 65 del; 29 mod 8340093: C2 SuperWord: implement cost model Reviewed-by: kvn, qamai ------------- PR: https://git.openjdk.org/jdk/pull/27803 From epeter at openjdk.org Mon Nov 10 15:59:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Nov 2025 15:59:39 GMT Subject: RFR: 8340093: C2 SuperWord: implement cost model [v4] In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 08:12:00 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rename cost methods for Vladimir K > > Thanks for your replies. I think leaving my suggestions to future RFEs is reasonable. @merykitty @vnkozlov Thank you very much for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27803#issuecomment-3512554916 From mhaessig at openjdk.org Mon Nov 10 16:00:31 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 10 Nov 2025 16:00:31 GMT Subject: RFR: 8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 15:23:25 GMT, Beno?t Maillard wrote: > This PR addresses a missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. > > The affected optimization is the transformation of `(x & mask) >> shift` into `(x >> shift) & (mask >> shift)`, where `mask` is a constant, for `URShiftL` and `URShiftI` nodes. This transformation is handled in `URShiftLNode::Ideal` and `URShiftINode::Ideal`. [JDK-8361700](https://bugs.openjdk.org/browse/JDK-8361700) addressed the analog case for `RShiftL` and `RShiftI`, but lacked the notification for unsigned shifting. > > This PR builds on top of [JDK-8361700](https://bugs.openjdk.org/browse/JDK-8361700) and adds the following changes: > - Fix the notification mechanism in `add_users_of_use_to_worklist` > - Add the `URShiftL` in `TestMaskAndRShiftReorder.java` > - Drive-by changes: simplify the `RShiftL` test case slightly, and add the missing analog case for `RShiftI` > > > I tried to reproduce the missing optimization for the `URShiftI` without success. There must be some subtle difference with the `long` case that causes the optimization to be triggered in this specific setup. I still added the case to the fix in `add_users_of_use_to_worklist`, as there are likely cases where the notification is missing (but I was just not able to find one). > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! Thank you for fixing this, @benoitmaillard! This looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28218#pullrequestreview-3443993751 From epeter at openjdk.org Mon Nov 10 16:07:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Nov 2025 16:07:24 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v6] In-Reply-To: References: Message-ID: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. > > The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. > > I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix assert code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28113/files - new: https://git.openjdk.org/jdk/pull/28113/files/d2fac68d..77f03563 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=04-05 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28113.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28113/head:pull/28113 PR: https://git.openjdk.org/jdk/pull/28113 From fgao at openjdk.org Mon Nov 10 16:09:54 2025 From: fgao at openjdk.org (Fei Gao) Date: Mon, 10 Nov 2025 16:09:54 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: <_gdObRdkYkS7d3fQQ6bcms709TpeM1IQtuPJtI0fcyE=.073d0496-8dfc-48a9-aee8-b64b408a6e62@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <_gdObRdkYkS7d3fQQ6bcms709TpeM1IQtuPJtI0fcyE=.073d0496-8dfc-48a9-aee8-b64b408a6e62@github.com> Message-ID: On Fri, 17 Oct 2025 13:05:47 GMT, Emanuel Peter wrote: >>> BTW: I just integrated https://github.com/openjdk/jdk/pull/24278 which may have silent merge conflicts, so it would be good if you merged and tested again. >> >> Hi @eme64 , I?ve rebased the patch onto the latest JDK, and all tier1 to tier3 tests have passed on my local AArch64 and x86 machines. >> >>> It would be good if you re-ran the benchmarks. It seems the last ones you did in December of 2024. >> We should see that we have various benchmarks, both for array and MemorySegment. >> You could look at the array benchmarks from here: https://github.com/openjdk/jdk/pull/22070 >> >> I also re-verified the benchmark from [PR #22070](https://github.com/openjdk/jdk/pull/22070) on 128-bit, 256-bit, and 512-bit vector machines. The results show no significant regressions and performance changes are consistent with the previous round described in [perf results]( https://bugs.openjdk.org/browse/JDK-8307084?focusedId=14729524&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14729524). >> >>> Once you do that I could also run some internal testing, if you like :) >> >> I?d really appreciate it if you could run some internal testing at a time you think is suitable. >> Thanks :) > > @fg1417 Are you still working on this? Hi @eme64, many thanks for your review. It?s really comprehensive and insightful. I?ve given a thumbs-up to all the comments that have been resolved in this commit. > I have one concern: We now have changed the branches. There is now a long sequence of branches if we have very few iterations, so that we only go through pre and post loop. It would be interesting to see what the performance difference is between master and patch. Regarding this concern, I re-ran the microbenchmarks (now merged with the existing `VectorThroughputForIterationCount.java` ), named as `bench03*_drain_memoryBound`, and collected data across different platforms, including `128-bit` and `256-bit` `AArch64` machines as well as a `512-bit` `x86` machine. To summarize, I observe a minor performance regression for small-iteration loops on the `128-bit` and `256-bit` `AArch64` platforms. For larger-iteration loops, there is either a performance improvement or no noticeable change. The performance data on the `512-bit x86` machine shows a similar trend, though the regression is more significant. **The test range of `ITERATION_COUNT` is `0?300`. For larger `ITERATION_COUNT` values, there is either a performance improvement or no noticeable change, so those results are omitted. The following data only shows cases with regressions.** (FIXED_OFFSET) (RANDOMIZE_OFFSETS) (REPETITIONS) (seed) Mode Cnt 0 TRUE 1024 42 avgt 3 `Diff = (patch - master) / master` On `128-bit aarch64` platform: Benchmark (ITERATION_COUNT) Units Diff bench031B_drain_memoryBound 1 ns/op 15.15% bench031B_drain_memoryBound 2 ns/op 10.89% bench031B_drain_memoryBound 3 ns/op 9.27% bench031B_drain_memoryBound 4 ns/op 7.39% bench031B_drain_memoryBound 5 ns/op 5.86% bench031B_drain_memoryBound 6 ns/op 5.31% bench031B_drain_memoryBound 7 ns/op 4.39% bench031B_drain_memoryBound 8 ns/op 4.27% bench031B_drain_memoryBound 9 ns/op 3.60% bench031B_drain_memoryBound 10 ns/op 3.11% bench031B_drain_memoryBound 11 ns/op 2.97% bench031B_drain_memoryBound 12 ns/op 3.19% bench031B_drain_memoryBound 13 ns/op 2.90% bench031B_drain_memoryBound 14 ns/op 2.68% bench031B_drain_memoryBound 15 ns/op 2.37% bench031B_drain_memoryBound 16 ns/op 2.44% bench031B_drain_memoryBound 17 ns/op 2.11% bench031B_drain_memoryBound 18 ns/op 1.57% bench031B_drain_memoryBound 19 ns/op 1.32% bench031B_drain_memoryBound 20 ns/op 1.31% bench031B_drain_memoryBound 21 ns/op 1.32% bench031B_drain_memoryBound 22 ns/op 1.22% bench031B_drain_memoryBound 23 ns/op 0.88% bench031B_drain_memoryBound 24 ns/op 0.98% bench031B_drain_memoryBound 25 ns/op 1.14% bench031B_drain_memoryBound 26 ns/op 0.93% bench031B_drain_memoryBound 27 ns/op 0.84% bench031B_drain_memoryBound 28 ns/op 0.87% bench031B_drain_memoryBound 29 ns/op 0.96% bench031B_drain_memoryBound 30 ns/op 0.82% bench032S_drain_memoryBound 1 ns/op 15.17% bench032S_drain_memoryBound 2 ns/op 5.01% bench032S_drain_memoryBound 3 ns/op 8.95% bench032S_drain_memoryBound 4 ns/op 7.77% bench032S_drain_memoryBound 5 ns/op 0.52% bench032S_drain_memoryBound 6 ns/op -0.67% bench032S_drain_memoryBound 7 ns/op 4.05% bench032S_drain_memoryBound 8 ns/op 3.67% bench032S_drain_memoryBound 9 ns/op -2.89% bench032S_drain_memoryBound 10 ns/op 2.04% bench032S_drain_memoryBound 11 ns/op -4.50% bench032S_drain_memoryBound 12 ns/op -3.11% bench032S_drain_memoryBound 13 ns/op 1.43% bench032S_drain_memoryBound 14 ns/op -4.16% bench032S_drain_memoryBound 15 ns/op -3.80% bench034I_drain_memoryBound 1 ns/op 15.15% bench034I_drain_memoryBound 2 ns/op 10.52% bench034I_drain_memoryBound 3 ns/op 9.04% bench034I_drain_memoryBound 4 ns/op 7.94% bench034I_drain_memoryBound 5 ns/op 6.78% bench034I_drain_memoryBound 6 ns/op 4.12% bench034I_drain_memoryBound 7 ns/op 3.82% bench035L_drain_memoryBound 1 ns/op 12.50% bench035L_drain_memoryBound 2 ns/op 10.57% bench035L_drain_memoryBound 3 ns/op 9.11% bench035L_drain_memoryBound 4 ns/op 7.50% bench035L_drain_memoryBound 5 ns/op 7.02% on `256-bit` aarch64 platform: Benchmark (ITERATION_COUNT) Units diff bench031B_drain_memoryBound 1 ns/op 14.01% bench031B_drain_memoryBound 2 ns/op 11.00% bench031B_drain_memoryBound 3 ns/op 12.57% bench031B_drain_memoryBound 4 ns/op 8.25% bench031B_drain_memoryBound 5 ns/op 9.71% bench031B_drain_memoryBound 6 ns/op 7.00% bench031B_drain_memoryBound 7 ns/op 4.09% bench031B_drain_memoryBound 8 ns/op 6.48% bench031B_drain_memoryBound 9 ns/op 4.30% bench031B_drain_memoryBound 10 ns/op 5.28% bench031B_drain_memoryBound 11 ns/op 4.58% bench031B_drain_memoryBound 12 ns/op 3.84% bench031B_drain_memoryBound 13 ns/op 3.51% bench031B_drain_memoryBound 14 ns/op 3.49% bench031B_drain_memoryBound 15 ns/op 3.21% bench031B_drain_memoryBound 16 ns/op 2.97% bench031B_drain_memoryBound 17 ns/op 2.04% bench031B_drain_memoryBound 18 ns/op 1.75% bench031B_drain_memoryBound 19 ns/op 0.83% bench031B_drain_memoryBound 20 ns/op 0.92% bench031B_drain_memoryBound 21 ns/op 1.67% bench031B_drain_memoryBound 22 ns/op 0.33% bench031B_drain_memoryBound 23 ns/op 1.02% bench032S_drain_memoryBound 1 ns/op 12.33% bench032S_drain_memoryBound 2 ns/op 8.75% bench032S_drain_memoryBound 3 ns/op 8.75% bench032S_drain_memoryBound 4 ns/op 7.40% bench032S_drain_memoryBound 5 ns/op 6.90% bench032S_drain_memoryBound 6 ns/op 5.33% bench032S_drain_memoryBound 7 ns/op 7.30% bench032S_drain_memoryBound 8 ns/op 3.44% bench032S_drain_memoryBound 9 ns/op 0.59% bench032S_drain_memoryBound 10 ns/op 1.81% bench032S_drain_memoryBound 11 ns/op 0.94% bench032S_drain_memoryBound 12 ns/op 0.80% bench032S_drain_memoryBound 13 ns/op 0.08% bench032S_drain_memoryBound 14 ns/op 1.01% bench032S_drain_memoryBound 15 ns/op 0.55% bench032S_drain_memoryBound 16 ns/op 0.14% bench032S_drain_memoryBound 17 ns/op 0.41% bench032S_drain_memoryBound 18 ns/op 0.22% bench032S_drain_memoryBound 19 ns/op 0.44% bench034I_drain_memoryBound 1 ns/op 15.41% bench034I_drain_memoryBound 2 ns/op 14.37% bench034I_drain_memoryBound 3 ns/op 10.95% bench034I_drain_memoryBound 4 ns/op 9.54% bench034I_drain_memoryBound 5 ns/op 6.94% bench034I_drain_memoryBound 6 ns/op 7.16% bench034I_drain_memoryBound 7 ns/op 5.35% bench034I_drain_memoryBound 8 ns/op 5.13% bench034I_drain_memoryBound 9 ns/op 5.42% bench034I_drain_memoryBound 10 ns/op 4.20% bench034I_drain_memoryBound 11 ns/op 3.83% bench035L_drain_memoryBound 1 ns/op 12.94% bench035L_drain_memoryBound 2 ns/op 11.69% bench035L_drain_memoryBound 3 ns/op 8.99% bench035L_drain_memoryBound 4 ns/op 8.67% bench035L_drain_memoryBound 5 ns/op 6.93% On the `512-bit x86` machine, for the `byte` type, the regression is quite noticeable. A graph might illustrate this more clearly. bench031B_drain_memoryBound on 512 x86 For the other data types: Benchmark (ITERATION_COUNT) Units diff bench032S_drain_memoryBound 1 ns/op 5.56% bench032S_drain_memoryBound 2 ns/op 4.30% bench032S_drain_memoryBound 3 ns/op 15.05% bench032S_drain_memoryBound 4 ns/op 10.83% bench032S_drain_memoryBound 5 ns/op 11.13% bench032S_drain_memoryBound 6 ns/op 2.27% bench032S_drain_memoryBound 7 ns/op 11.13% bench032S_drain_memoryBound 8 ns/op 1.29% bench032S_drain_memoryBound 9 ns/op 12.30% bench032S_drain_memoryBound 10 ns/op -2.16% bench032S_drain_memoryBound 11 ns/op 11.14% bench032S_drain_memoryBound 12 ns/op 4.56% bench032S_drain_memoryBound 13 ns/op 10.08% bench032S_drain_memoryBound 14 ns/op -0.14% bench032S_drain_memoryBound 15 ns/op 10.33% bench032S_drain_memoryBound 16 ns/op 0.68% bench032S_drain_memoryBound 17 ns/op 5.01% bench032S_drain_memoryBound 18 ns/op -0.12% bench032S_drain_memoryBound 19 ns/op 1.54% bench032S_drain_memoryBound 20 ns/op 0.38% bench032S_drain_memoryBound 21 ns/op 0.65% bench032S_drain_memoryBound 22 ns/op 4.38% bench032S_drain_memoryBound 23 ns/op 2.54% bench032S_drain_memoryBound 24 ns/op -0.46% bench032S_drain_memoryBound 25 ns/op 0.33% bench032S_drain_memoryBound 26 ns/op 1.06% bench032S_drain_memoryBound 27 ns/op 4.41% bench032S_drain_memoryBound 28 ns/op 0.34% bench032S_drain_memoryBound 29 ns/op 1.35% bench032S_drain_memoryBound 30 ns/op 0.58% bench032S_drain_memoryBound 31 ns/op 3.00% bench032S_drain_memoryBound 32 ns/op -2.67% bench032S_drain_memoryBound 33 ns/op 3.62% bench032S_drain_memoryBound 34 ns/op 3.35% bench032S_drain_memoryBound 35 ns/op 1.01% bench032S_drain_memoryBound 36 ns/op -1.65% bench032S_drain_memoryBound 37 ns/op -1.65% bench032S_drain_memoryBound 38 ns/op 2.91% bench032S_drain_memoryBound 39 ns/op 3.44% bench032S_drain_memoryBound 40 ns/op 1.38% bench032S_drain_memoryBound 41 ns/op -0.18% bench032S_drain_memoryBound 42 ns/op 1.58% bench032S_drain_memoryBound 43 ns/op 2.05% bench032S_drain_memoryBound 44 ns/op 3.22% bench032S_drain_memoryBound 45 ns/op -1.45% bench032S_drain_memoryBound 46 ns/op 0.81% bench032S_drain_memoryBound 47 ns/op 0.67% bench032S_drain_memoryBound 48 ns/op 0.26% bench032S_drain_memoryBound 49 ns/op 2.81% bench032S_drain_memoryBound 50 ns/op -1.97% bench032S_drain_memoryBound 51 ns/op 3.71% bench032S_drain_memoryBound 52 ns/op 2.98% bench032S_drain_memoryBound 53 ns/op -0.54% bench032S_drain_memoryBound 55 ns/op 8.44% bench034I_drain_memoryBound 1 ns/op 10.82% bench034I_drain_memoryBound 2 ns/op 12.22% bench034I_drain_memoryBound 3 ns/op 6.62% bench034I_drain_memoryBound 4 ns/op 11.52% bench034I_drain_memoryBound 5 ns/op 7.84% bench034I_drain_memoryBound 6 ns/op 9.48% bench034I_drain_memoryBound 7 ns/op 7.41% bench034I_drain_memoryBound 8 ns/op 2.55% bench034I_drain_memoryBound 9 ns/op 4.28% bench034I_drain_memoryBound 10 ns/op 6.15% bench034I_drain_memoryBound 11 ns/op 5.07% bench034I_drain_memoryBound 12 ns/op 6.84% bench034I_drain_memoryBound 13 ns/op 3.45% bench034I_drain_memoryBound 14 ns/op 4.99% bench034I_drain_memoryBound 15 ns/op 4.34% bench034I_drain_memoryBound 16 ns/op 7.29% bench034I_drain_memoryBound 17 ns/op 4.74% bench034I_drain_memoryBound 18 ns/op 2.25% bench034I_drain_memoryBound 19 ns/op 6.39% bench034I_drain_memoryBound 20 ns/op 2.52% bench034I_drain_memoryBound 21 ns/op 3.82% bench034I_drain_memoryBound 22 ns/op -0.49% bench034I_drain_memoryBound 23 ns/op 4.22% bench034I_drain_memoryBound 24 ns/op 3.17% bench034I_drain_memoryBound 25 ns/op 2.89% bench034I_drain_memoryBound 26 ns/op 2.05% bench034I_drain_memoryBound 27 ns/op 3.43% bench035L_drain_memoryBound 1 ns/op 7.70% bench035L_drain_memoryBound 2 ns/op 8.36% bench035L_drain_memoryBound 3 ns/op 5.62% bench035L_drain_memoryBound 4 ns/op 0.02% bench035L_drain_memoryBound 5 ns/op 5.58% bench035L_drain_memoryBound 6 ns/op 13.26% bench035L_drain_memoryBound 7 ns/op 6.33% bench035L_drain_memoryBound 8 ns/op 4.58% bench035L_drain_memoryBound 9 ns/op 8.82% bench035L_drain_memoryBound 10 ns/op 2.15% bench035L_drain_memoryBound 11 ns/op 6.71% bench035L_drain_memoryBound 12 ns/op 15.44% bench035L_drain_memoryBound 13 ns/op -1.53% The marginal performance regressions on the `AArch64` machines and most data types on the `x86` machine are relatively predictable and acceptable. However, the fluctuations observed on the `x86` machine for the `byte` case are somewhat unusual. What do you think? > It would also be interesting to see a case where the SIZE of the array is not constant, and so the branches become impossible to predict, and there are a lot of branch misses. What do you think? Regarding this case, I also ran a set of microbenchmarks named `bench03*_drain_dynamic`, which are included in `VectorThroughputForIterationCount.java`. Do these benchmarks make sense to you in the context of this issue? If so, there?s no noticeable performance regression on either `x86` or `AArch64` platforms ? only some performance improvements. Taking the `256-bit AArch64` platform as an example, here are the results: `Units: ns/op` 256byte 256 short 256int 256long ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3512609154 From epeter at openjdk.org Mon Nov 10 16:25:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Nov 2025 16:25:19 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v7] In-Reply-To: References: Message-ID: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. > > The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. > > I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - manual merge with cost model change - fix assert code - refine fix - wip fix, still broken - second reproducer - move fix to apply_backedge - add diagnostic flag for product build - add assert - rm debug printing - add test - ... and 1 more: https://git.openjdk.org/jdk/compare/72989e0f...dfc7c87e ------------- Changes: https://git.openjdk.org/jdk/pull/28113/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=06 Stats: 286 lines in 4 files changed: 256 ins; 6 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/28113.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28113/head:pull/28113 PR: https://git.openjdk.org/jdk/pull/28113 From kxu at openjdk.org Mon Nov 10 16:33:12 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 10 Nov 2025 16:33:12 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v3] In-Reply-To: References: Message-ID: On Thu, 23 Oct 2025 08:02:47 GMT, Christian Hagedorn wrote: >> @chhagedorn Sorry this took longer than expected. I left a few replies under some of your specific comments. All other issues were addressed. Thank you! > > No worries, thanks @tabjy for addressing my suggestions and comments! I won't be able to continue this week but will have another look next week. > @chhagedorn: Have you thought about some ways to test this? One idea could be to do some runs with some custom logging in place when a counted loop was successfully created and then compare the output to a baseline without your patch and the same logging in place. That's a very good suggestion. I tried logging the method id where a counted loop is detected and converted and run `tier2_ctw` with it. However, the the results are not stable between multiple runs (everytime I get a more or less counted loops detected, even with the old code). There was some non-determinism either in the HS or the JCL code. I couldn't identify them. Alternatively, I cherry-picked the old counted loop implementation back in with minimum changes and asserted the old and new implementation always produce the same result. This is done on another branch [0] to avoid complicating reviews. All of the `tier1` tests pass on GHA [0] and `tier2_ctw` pass locally as well (I pinkly swear!). https://github.com/tabjy/jdk/blob/8c1a8c02574af9e2a7b073fc729b3474f187d361/src/hotspot/share/opto/loopnode.cpp#L3010-L3016 So yes, I'm quite confident there's no regressions. [0] https://github.com/tabjy/jdk/compare/counted-loop-refactor...tabjy:jdk:counted-loop-refactor-log [1] https://github.com/tabjy/jdk/actions/runs/19149109113 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3512700057 From bmaillard at openjdk.org Mon Nov 10 16:38:17 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 10 Nov 2025 16:38:17 GMT Subject: RFR: 8369993: Redundant separate 'String' check in 'trust_final_non_static_fields' ciField.cpp In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 10:57:53 GMT, Zihao Lin wrote: > Remove redundant separate 'String' check in 'trust_final_non_static_fields' ciField.cpp Thanks for making this cleanup @linzihao1999, this looks rather reasonable and trivial. I have submitted testing and will come back with the results. ------------- PR Review: https://git.openjdk.org/jdk/pull/28191#pullrequestreview-3444150589 From vladimir.kozlov at oracle.com Mon Nov 10 17:08:22 2025 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Nov 2025 09:08:22 -0800 Subject: Question: Could hardware problems cause JIT deoptimization? In-Reply-To: <94c49f50-add6-42be-b3a6-49a11720346f@littlepinkcloud.com> References: <94c49f50-add6-42be-b3a6-49a11720346f@littlepinkcloud.com> Message-ID: You can use -Xlog:deoptimization=debug flag to see reasons for deoptimization. Is hardware configuration the same? Or that server has more cores? Which could explain more deoptimized frames. Regards, Vladimir K On 11/10/25 4:00 AM, Andrew Haley wrote: > On 10/11/2025 08:12, ?? wrote: >> We have deployed the same application (Gravitee Gateway 3.5.16) on >> multiple servers with the same software configuration. However, when >> capturing flame graphs, we observed significantly different levels of >> JIT deoptimization across machines. On one server, the percentage of >> deoptimized frames was as high as 40%. > >> So my question is: the high deoptimization rate is caused by hardware >> problems? > > Almost by definition, hardware problems can cause anything. But some > profiling can cause deoptimization when it rewrites bytecodes. For > example, async-profiler can do it if you use method tracing. So I'd > investigate this possibility first. > From psandoz at openjdk.org Mon Nov 10 23:42:10 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 10 Nov 2025 23:42:10 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 02:19:12 GMT, Xiaohong Gong wrote: > and similarly move vector slice operations to the compiler Yes, you have to slice the mask, whether it be represented as a mask/predicate register or as a vector. There's no way around that and we have to deal with the current limitations in hardware. As a further compromise we can in Java convert the mask to a vector and rearrange it, then pass the vector representation of the mask to the scatter/gather intrinsic. Then the intrinsic can if it chooses convert it back to a mask/predicate register if that is the best form. IIUC we have agreed for non-masked subword scatter/gather to compose by parts using the intrinsic. That seems good, and it looks like we can do the same for masked subword scatter/gather, as above, but it may not be the most efficient for the platform. Do you have any use cases for mask subword scatter/gather? Given the lack of underlying hardware support it seems focusing on getting the non-masked version working well, and the masked version working ok is a pragmatic way forward. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3514360424 From psandoz at openjdk.org Tue Nov 11 00:02:04 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 11 Nov 2025 00:02:04 GMT Subject: RFR: 8370691: Add new HalffloatVector type and enable intrinsification of float16 vector operations In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 15:19:48 GMT, Jatin Bhateja wrote: > Add new HalffloatVector type and corresponding concrete vector classes in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added HalffloatVector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected HalfflotVector benchmarking kernels compared to equivalent Float16OperationsBenchmark kernels. > > {A2BA2D85-085A-489F-8DDD-0FCFB5986EA5} > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Some quick comments. We should be consistent in the naming, and rename `Halfloat*` to `Float16*`. When you generate the fallback code for unary/binary etc can you push the carrier type and conversations into the uOp/bOp implementations so you don't have to explicitly operate on the carrier type and do the conversions as you do now e.g.,: v0.uOp(m, (i, a) -> float16ToShortBits(Float16.valueOf(-(shortBitsToFloat16(($type$)a).floatValue())))); The transition of intrinsic arguments from `vsp.elementType()` to `vsp.carrierType(), vsp.operType()` is a little unfortunate. Is this because HotSpot cannot directly refer to the `Float16` class from the incubating module? Requiring two arguments means they can get out of sync. Previously the class provided all the information needed, now arguably the type does. ------------- PR Review: https://git.openjdk.org/jdk/pull/28002#pullrequestreview-3445662107 From darcy at openjdk.org Tue Nov 11 01:02:05 2025 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 11 Nov 2025 01:02:05 GMT Subject: RFR: 8370691: Add new HalffloatVector type and enable intrinsification of float16 vector operations In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 23:58:57 GMT, Paul Sandoz wrote: > Some quick comments. > > We should be consistent in the naming, and rename `Halfloat*` to `Float16*`. > I concur, especially since there are multiple 16-bit floating-point formats in use including the IEEE 754 float16 as well as bfloat16. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3514526479 From duke at openjdk.org Tue Nov 11 06:12:28 2025 From: duke at openjdk.org (Zihao Lin) Date: Tue, 11 Nov 2025 06:12:28 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v11] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'openjdk:master' into 8344116 - fix conflict - Merge master - remove C2AccessValuePtr - fix assert - add more assert - rid of access.addr().type() - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - ... and 3 more: https://git.openjdk.org/jdk/compare/76a1109d...42b17827 ------------- Changes: https://git.openjdk.org/jdk/pull/24258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=10 Stats: 230 lines in 18 files changed: 33 ins; 55 del; 142 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From 617263736 at qq.com Tue Nov 11 06:46:54 2025 From: 617263736 at qq.com (=?utf-8?B?5pif5pm0?=) Date: Tue, 11 Nov 2025 14:46:54 +0800 Subject: Could JIT deoptimization be caused by hardware issue? Message-ID: Hi all, I have a question about JIT deoptimization behavior in hotspot. We deployed a Java application on a virtual machine (4 vcpu, 8GB memory). When analyzing a flame graph, we found that JIT deoptimizations account for about 40% of the cpu samples. After live migrating this VM to another physical host, the high deoptimization ratio remained unchanged. Does this indicate that the deoptimization behavior is not related to the underlying hardware? Thanks for your insights. Hang Ren. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rsunderbabu at openjdk.org Tue Nov 11 07:19:02 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Tue, 11 Nov 2025 07:19:02 GMT Subject: RFR: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 03:33:15 GMT, Hao Sun wrote: >> We have a host of tests under test/hotspot/jtreg/compiler/intrinsics/sha which checks if the SHA intrinsics flags' enable/disable setting is in sync with CPU support in the underlying platform. There might be situations where the intrinsics might not be enabled despite the hardware supporting the relevant instructions. For example, there might be reliability issues or performance issues. In such situations, the tests will fail. >> >> Till now, the approach has been to exclude the platforms where the support is yet to be provided and remove the exclusion after. This necessitates additional work on the test front. >> >> A more compact design would be make predicate probes to rely on intrinsics availability in the platform as opposed to hardware support availability. The migration to intrinsics availability would especially help update releases where feature backport might not be complete. >> >> PS: This fix can/should be propagated to other such tests as well. Once this PR gets approval, I will work on similar tests. > > Hi, I suppose the failure may occur if we run this test case on CPU **with** SHA512 feature, but **disabling** SHA512Intrinsics. > > As **@requires vm.flagless** is set in this jtreg case, if we specify `-XX:-UseSHA512Intrinsics`, this test case is not tested actually. Here shows the log in my machine. > > > $ make test TEST=test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java JTREG="VM_OPTIONS=-XX:-UseSHA512Intrinsics" > Building target 'test' in configuration '/tmp/local-build-fastdebug' > Running tests using JTREG control variable 'VM_OPTIONS=-XX:-UseSHA512Intrinsics' > Test selection 'test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java', will run: > * jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java > Clean up dirs for jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java > > Running test 'jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java' > Test results: no tests selected > Report written to /tmp/local-build-fastdebug/test-results/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java/html/report.html > Results written to /tmp/local-build-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java > Finished running test 'jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java' > Test report is stored in /tmp/local-build-fastdebug/test-results/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java > 0 0 0 0 0 > ============================== > TEST SUCCESS > > > If so, I don't think it's a bug. > Is there anything I misunderstood? @shqking -XX:-UseSHA512Intrinsics is not the only case of disabling SHA512 instrinsics. Please have a look at your comment https://bugs.openjdk.org/browse/JDK-8293484?focusedId=14532743&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14532743 Intrinsics was disabled due to lack of test hardware. Please refer @snazarkin comment too. https://bugs.openjdk.org/browse/JDK-8293484?focusedId=14522406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14522406 I have come across instances where the support was temporarily disabled due to performance issue or reliability issue. Relying on the CPU feature is not a bug per se. But it makes test code maintenance a tad bit difficult. I have discussed this in detail under PR description. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28053#issuecomment-3515301594 From epeter at openjdk.org Tue Nov 11 07:25:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 07:25:49 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply set wrong type, led to wrong constant folding of phi [v8] In-Reply-To: References: Message-ID: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed an edges case. > > The issue is when we also (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. > > I now limit the modification to cases where the `phi` used to be for scalars, but now is for vectors. In those cases we should not have a constant. For good measure, I also added a corresponding assert. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - cleanup - refine fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28113/files - new: https://git.openjdk.org/jdk/pull/28113/files/dfc7c87e..5f720e72 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=06-07 Stats: 14 lines in 2 files changed: 4 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28113.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28113/head:pull/28113 PR: https://git.openjdk.org/jdk/pull/28113 From epeter at openjdk.org Tue Nov 11 07:32:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 07:32:07 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply setting type leads to assert/wrong result [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 07:40:09 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add diagnostic flag for product build > > Looks good to me, too, thanks! @chhagedorn @merykitty Given the second reproducer, I now modified the fix substantially. Local tests pass, now sending it to thorough testing. I'll need both of you to re-review ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28113#issuecomment-3515336915 From epeter at openjdk.org Tue Nov 11 07:47:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 07:47:24 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply setting type leads to assert/wrong result [v9] In-Reply-To: References: Message-ID: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed some edge cases that the fuzzer now found. > > - The first issue: when we (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. That leads to wrong results. > - The second issue: a phi that was scalar and we turned into vector still had some dead old scalar reduction nodes attached. They would of course eventually die during IGVN. But with `StressIGVN` just picking the right bad order, it could happen that an `AddI` attached to the `phi` would try to figure out its `Value` type, and try to combine the vector type of the `phi` with the other input, leading to a type error. > > With only the first issue at first, I tried to improve the way we modify the type from scalar to vector. But with the second issue, it became clear that we should just create a new phi node when we move from scalar to vector phi. Hence, I split the `LoopPhi` into a `PhiScalar` and a `PhiVector`, and give them separate implementations. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. > > Later, the fuzzer found the second reproducer on mainline, which was much easier to reduce. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add override ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28113/files - new: https://git.openjdk.org/jdk/pull/28113/files/5f720e72..3b65d8b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28113.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28113/head:pull/28113 PR: https://git.openjdk.org/jdk/pull/28113 From epeter at openjdk.org Tue Nov 11 07:47:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 07:47:24 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply setting type leads to assert/wrong result [v9] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 07:44:42 GMT, Emanuel Peter wrote: >> In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed some edge cases that the fuzzer now found. >> >> - The first issue: when we (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. That leads to wrong results. >> - The second issue: a phi that was scalar and we turned into vector still had some dead old scalar reduction nodes attached. They would of course eventually die during IGVN. But with `StressIGVN` just picking the right bad order, it could happen that an `AddI` attached to the `phi` would try to figure out its `Value` type, and try to combine the vector type of the `phi` with the other input, leading to a type error. >> >> With only the first issue at first, I tried to improve the way we modify the type from scalar to vector. But with the second issue, it became clear that we should just create a new phi node when we move from scalar to vector phi. Hence, I split the `LoopPhi` into a `PhiScalar` and a `PhiVector`, and give them separate implementations. >> >> --------- >> >> Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. >> I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. >> >> Later, the fuzzer found the second reproducer on mainline, which was much easier to reduce. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add override src/hotspot/share/opto/vtransform.cpp line 973: > 971: _node->as_Type()->set_type(t); > 972: phase->igvn().set_type(_node, t); > 973: Note: moved to the `PhiVector`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28113#discussion_r2513135852 From epeter at openjdk.org Tue Nov 11 07:47:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 07:47:27 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply setting type leads to assert/wrong result [v8] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 07:25:49 GMT, Emanuel Peter wrote: >> In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed some edge cases that the fuzzer now found. >> >> - The first issue: when we (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. That leads to wrong results. >> - The second issue: a phi that was scalar and we turned into vector still had some dead old scalar reduction nodes attached. They would of course eventually die during IGVN. But with `StressIGVN` just picking the right bad order, it could happen that an `AddI` attached to the `phi` would try to figure out its `Value` type, and try to combine the vector type of the `phi` with the other input, leading to a type error. >> >> With only the first issue at first, I tried to improve the way we modify the type from scalar to vector. But with the second issue, it became clear that we should just create a new phi node when we move from scalar to vector phi. Hence, I split the `LoopPhi` into a `PhiScalar` and a `PhiVector`, and give them separate implementations. >> >> --------- >> >> Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. >> I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. >> >> Later, the fuzzer found the second reproducer on mainline, which was much easier to reduce. > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - cleanup > - refine fix src/hotspot/share/opto/vtransform.cpp line 1353: > 1351: phi_vector->init_req(0, phi_scalar->in_req(0)); > 1352: phi_vector->init_req(1, vtn_identity_vector); > 1353: // Note: backedge comes later Note: rather than reusing the old phi, we create a new one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28113#discussion_r2513137867 From epeter at openjdk.org Tue Nov 11 09:50:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 09:50:48 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply setting type leads to assert/wrong result [v10] In-Reply-To: References: Message-ID: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed some edge cases that the fuzzer now found. > > - The first issue: when we (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. That leads to wrong results. > - The second issue: a phi that was scalar and we turned into vector still had some dead old scalar reduction nodes attached. They would of course eventually die during IGVN. But with `StressIGVN` just picking the right bad order, it could happen that an `AddI` attached to the `phi` would try to figure out its `Value` type, and try to combine the vector type of the `phi` with the other input, leading to a type error. > > With only the first issue at first, I tried to improve the way we modify the type from scalar to vector. But with the second issue, it became clear that we should just create a new phi node when we move from scalar to vector phi. Hence, I split the `LoopPhi` into a `PhiScalar` and a `PhiVector`, and give them separate implementations. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. > > Later, the fuzzer found the second reproducer on mainline, which was much easier to reduce. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add -XX:+UnlockDiagnosticVMOptions flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28113/files - new: https://git.openjdk.org/jdk/pull/28113/files/3b65d8b5..ea1f8e81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28113&range=08-09 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28113.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28113/head:pull/28113 PR: https://git.openjdk.org/jdk/pull/28113 From epeter at openjdk.org Tue Nov 11 10:05:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 10:05:19 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> Message-ID: On Fri, 7 Nov 2025 23:15:51 GMT, Vladimir Ivanov wrote: >> We call this many times, so not sure if this could explode somehow? > > It's hard to place a nested ResourceMark because there are dynamically reallocated data structures with different life cycles. Instead, I moved temporary data structure allocations up in the call chain and made them shared across all RF nodes. Yeah, neither is very nice. But that sounds reasonable :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2513593891 From epeter at openjdk.org Tue Nov 11 10:23:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 10:23:39 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: <_1EqjnWT2m_tByqitwUPdR7Db0-gZU1YnHkTchCSqhc=.a05fc6cb-9fc9-406f-9988-299415b90fdd@github.com> References: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> <_1EqjnWT2m_tByqitwUPdR7Db0-gZU1YnHkTchCSqhc=.a05fc6cb-9fc9-406f-9988-299415b90fdd@github.com> Message-ID: On Fri, 7 Nov 2025 19:59:18 GMT, Vladimir Ivanov wrote: >> And what if we find a lot of SafePoints for each RF? Do we end up attaching quadratically many referent edges over all? > > In the worst case the number of new edges added is `(# of unique referents) * (# of safepoints)`. Multiple reachability fences can share the same referent. Right. I suppose that could explode in an extreme edge case, but maybe we don't worry about that for now. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2513656148 From epeter at openjdk.org Tue Nov 11 10:23:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 10:23:36 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v22] In-Reply-To: References: <4a6494JZE-PGYqCA1jHtf7_-dQjxAObwhuA6VYnS9Lg=.66b13f4f-3f71-4a4b-8130-2f5b841f5ce7@github.com> Message-ID: <4q-M887XOOyHFl3vaysoK3jweaojR42YU4lFBy6t-Jg=.2c067427-058a-4b35-b417-bdd1f26c9fe4@github.com> On Fri, 7 Nov 2025 23:15:58 GMT, Vladimir Ivanov wrote: >> Ah. Right, at first I did not see that you are using a stack, which id not a node list. It also has the idx. >> >> In my experience, this usually creates code that is a little harder to read. I prefer using a `Unique_Node_List`, and then just traverse over all ctrl inputs, and add those to the worklist. You have to special case Region, and all other CFG nodes only have ctrl on `in(0)`. It tends to nicely flatten the whole BFS traversal into a small loop. But maybe it does use just a bit more memory than your traversal. >> >> Just an idea, I can probably find a way to wrap my head around this approach here too ;) > > Unified naming. > >> In my experience, this usually creates code that is a little harder to read. > > Well, in my experience graph traversal implementation in C2 is way too verbose most of the time. I'd prefer a standard utility methods to traverse relevant parts of the graph, especially since we can use lambdas now. It would make it much easier to reason about it at use sites while making it more beneficial to invest into microoptimizations for different types of traversals. We already have something for node uses with `Node::visit_uses`. You could have a similar one for visiting inputs, and then limit to `CFG` nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2513651655 From mli at openjdk.org Tue Nov 11 11:32:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 11:32:42 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) Message-ID: Hi, Can you help to review this patch? This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. ## Some background Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. # Test ## Jtreg in progress... ## Performance check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. Thanks ------------- Commit messages: - comments - simplify - fix code path change in VectorNode::implemented - fix JDK-8371297: assert in BoolTest - revert supports_transform_cmove_to_vectorblend for all cpus - disable Op_CMoveI/Op_CMoveL in VectorNode::opcode - disable riscv - Merge branch 'master' into vectorize-CMove-Bool - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 17 more: https://git.openjdk.org/jdk/compare/667744c3...56b6e029 Changes: https://git.openjdk.org/jdk/pull/28231/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28231&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357554 Stats: 44 lines in 10 files changed: 40 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28231.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28231/head:pull/28231 PR: https://git.openjdk.org/jdk/pull/28231 From mli at openjdk.org Tue Nov 11 11:32:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 11:32:42 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 11:24:12 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this patch? > > This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. > > To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. > > ## Some background > > Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. > > This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. > > # Test > ## Jtreg > > in progress... > > ## Performance > > check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Thanks @eme64 Could you have a look? :) Not sure who else can help to review it, feel free to help have a look if you're available. :) Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28231#issuecomment-3516425353 From epeter at openjdk.org Tue Nov 11 11:58:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 11:58:20 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v24] In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 02:27:15 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Revise RF redunancy & auto-boxed primitives handling > Cleanups Nice improvements. I only looked over the recent changes. I'll try to have a look at the whole change later on. src/hotspot/share/opto/parse1.cpp line 1250: > 1248: Node* loc = local(idx); > 1249: if (loc->bottom_type()->isa_oopptr() != nullptr && > 1250: !is_auto_boxed_primitive(loc)) { // ignore auto-boxed primitives I wonder if randomizing this would shake out more interesting patterns? src/hotspot/share/opto/reachability.cpp line 150: > 148: return false; // not a real safepoint > 149: } else if (sfpt->is_CallStaticJava() && sfpt->as_CallStaticJava()->is_uncommon_trap()) { > 150: return false; // uncommon traps are exit points Can we even hit this situation with a traversal from below? Just curious ;) src/hotspot/share/opto/reachability.cpp line 208: > 206: //---------------------------- Phase 1 --------------------------------- > 207: // Optimization pass over reachability fences during loop opts. > 208: // Eliminate redundant RFs and move RFs with loop-invariant referent out of the loop. You removed the `find_redundant_rfs` case. Is the comment still accurate? src/hotspot/share/opto/reachability.cpp line 219: > 217: for (int i = 0; i < C->reachability_fences_count(); i++) { > 218: ReachabilityFenceNode* rf = C->reachability_fence(i); > 219: assert(!rf->is_redundant(igvn()), "required"); Why can we assume this? Is this guaranteed by IGVN? src/hotspot/share/opto/reachability.cpp line 220: > 218: ReachabilityFenceNode* rf = C->reachability_fence(i); > 219: assert(!rf->is_redundant(igvn()), "required"); > 220: // Move RFs out of counted loops when possible. Is this limited to counted loops? Ah `unique_loop_exit_or_null` restricts it. That seems fine, I'm just worried that we may at some point allow non-counted loops, and then the comment will be incorrect. src/hotspot/share/opto/reachability.cpp line 228: > 226: for (IdealLoopTree* outer_loop = lpt->_parent; > 227: outer_loop->is_invariant(referent) && outer_loop->unique_loop_exit_or_null() != nullptr; > 228: outer_loop = outer_loop->_parent) { Out of curiosity: is it always desirable to move out as far as possible? Or are there downsides? src/hotspot/share/opto/reachability.cpp line 335: > 333: // Phase 2: migrate reachability info to safepoints. > 334: // All RFs are replaced with edges from corresponding referents to interfering safepoints. > 335: // Interfering safepoints are safepoint nodes which are reachable from the RF to its referent through CFG. Seems you don't do it for ALL any more, you drop those that `dominates_another_rf`. You should probably adapt the comment here. ------------- PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-3447403459 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2513717114 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2513740164 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2513873982 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2513809786 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2513822750 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2513854603 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2513952780 From epeter at openjdk.org Tue Nov 11 12:07:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 12:07:23 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v24] In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 02:27:15 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Revise RF redunancy & auto-boxed primitives handling > Cleanups You have a few tests already, but I'd love to see some IR tests. You could even check for the presence of `ReachabilityFenceNode` during some phase and then see if it goes away. Nice would be if we could even track if a SafePoint has a RF edge attached, but not sure how easy that is. It would allow us not only to check for correctness, and hoping that we would catch incorrect cases with a crash/wrong result. But it would allow us to verify the graph, including the optimizations. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3516564484 From epeter at openjdk.org Tue Nov 11 12:11:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 12:11:06 GMT Subject: RFR: 8370691: Add new HalffloatVector type and enable intrinsification of float16 vector operations In-Reply-To: References: Message-ID: <_ryF0SNpSLahH4HkGqSnGKc_6d9P1fWrKYTS0jRPvtk=.ff2143aa-d3a5-4776-bdd0-95646dfd35e9@github.com> On Mon, 27 Oct 2025 15:19:48 GMT, Jatin Bhateja wrote: > Add new HalffloatVector type and corresponding concrete vector classes in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added HalffloatVector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected HalfflotVector benchmarking kernels compared to equivalent Float16OperationsBenchmark kernels. > > {A2BA2D85-085A-489F-8DDD-0FCFB5986EA5} > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html We already have a lot of things in the codebase now from previous issues that use `HF` everywhere, for example some node names, and the type. Should we maybe rename all of them to `F16`, or something else? Open question, not sure of the answer yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3516579087 From qamai at openjdk.org Tue Nov 11 13:09:33 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 11 Nov 2025 13:09:33 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply setting type leads to assert/wrong result [v10] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 09:50:48 GMT, Emanuel Peter wrote: >> In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed some edge cases that the fuzzer now found. >> >> - The first issue: when we (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. That leads to wrong results. >> - The second issue: a phi that was scalar and we turned into vector still had some dead old scalar reduction nodes attached. They would of course eventually die during IGVN. But with `StressIGVN` just picking the right bad order, it could happen that an `AddI` attached to the `phi` would try to figure out its `Value` type, and try to combine the vector type of the `phi` with the other input, leading to a type error. >> >> With only the first issue at first, I tried to improve the way we modify the type from scalar to vector. But with the second issue, it became clear that we should just create a new phi node when we move from scalar to vector phi. Hence, I split the `LoopPhi` into a `PhiScalar` and a `PhiVector`, and give them separate implementations. >> >> --------- >> >> Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. >> I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. >> >> Later, the fuzzer found the second reproducer on mainline, which was much easier to reduce. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add -XX:+UnlockDiagnosticVMOptions flag Thanks a lot for your fix ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/28113#pullrequestreview-3447988295 From galder at openjdk.org Tue Nov 11 13:48:04 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 11 Nov 2025 13:48:04 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 11:24:12 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this patch? > > This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. > > To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. > > ## Some background > > Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. > > This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. > > # Test > ## Jtreg > > in progress... > > ## Performance > > check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Thanks Sounds like this PR should include some IR tests? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28231#issuecomment-3517005263 From epeter at openjdk.org Tue Nov 11 13:52:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 13:52:20 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v24] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 12:00:35 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix whitespace > > test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 62: > >> 60: * on how much time is spent on the code from the template corresponding to the frame, >> 61: * and to give a termination criterion to avoid nesting templates too deeply. >> 62: * > > It now more sounds like a "TemplateScope" since we have a "TemplateFrame" per scope and not per template which the latter name somehow suggests. But just wanted to share that thought here. I suppose we could change `TemplateFrame` -> `TemplateScope`, and also `CodeFrame` -> `CodeScope`. But I think it is also ok to keep the "frame" name, which models the "scope" concepts. Sometimes, there are also multiple frames for a scope, for example when we do anchor and insert. So it's not quite a 1:1, but they are closely related. And if a scope is transparent, we sometimes don't even insert a frame. That's why I'm a bit hesitant to do the renaming. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2514305038 From epeter at openjdk.org Tue Nov 11 13:56:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 13:56:12 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 11:24:12 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this patch? > > This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. > > To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. > > ## Some background > > Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. > > This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. > > # Test > ## Jtreg > > in progress... > > ## Performance > > check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Thanks @Hamlin-Li Thanks for your continued effort on CMove! Just a first initial comment. And yes, you'll need some IR tests. Would also be nice if we could get some aarch64 or x64 implementation, so we can test it. Maybe we can collaborate on this PR to make it work together :) src/hotspot/share/opto/superword.cpp line 2339: > 2337: } else if (VectorNode::is_different_use_def_size_supported()) { > 2338: return use->is_CMove() && def->is_Bool(); > 2339: } This looks a little tangled, and harder to extend. Can we make it linear like this? Suggestion: // Input size of use equals output size of def if (type2aelembytes(use_bt) == type2aelembytes(def_bt)) { return true; } // Allow CMove to have different type for comparision and moving. if (VectorNode::is_different_use_def_size_supported() && return use->is_CMove() && def->is_Bool()) { return true; } Because what if `is_different_use_def_size_supported` is true, but we don't have a CMove case, and then we would be able to go on with yet something else below later on? ------------- PR Review: https://git.openjdk.org/jdk/pull/28231#pullrequestreview-3448170017 PR Review Comment: https://git.openjdk.org/jdk/pull/28231#discussion_r2514311100 From aph at openjdk.org Tue Nov 11 14:01:09 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Nov 2025 14:01:09 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: <8EDw6V0e_j-QF8wCt_UnRYB4IZQpK2S2WtG3NbG1jHE=.01df100c-e733-4f32-84ef-f5889643d940@github.com> On Wed, 16 Jul 2025 14:41:58 GMT, Andrew Haley wrote: >> AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: >> >> ;; cmpxchg { >> 0x0000e708d144cf60: mov x8, x2 >> 0x0000e708d144cf64: casal x8, x3, [x0] >> 0x0000e708d144cf68: cmp x8, x2 >> ;; 0x1F1F1F1F1F1F1F1F >> 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f >> ;; } cmpxchg >> 0x0000e708d144cf70: cset x8, ne // ne = any >> 0x0000e708d144cf74: dmb ish >> >> >> According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] >> >>> Atomically sets the value of a variable to the >>> newValue with the memory semantics of setVolatile if >>> the variable's current value, referred to as the witness >>> value, == the expectedValue, as accessed with the memory >>> semantics of getVolatile. >> >> >> >> Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. >> >> Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) >> >> This is also reflected by C2 not having a dmb for the same respective method. >> >> [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) >> [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) > > I think we still need a DMB after non-LSE CMPXCHG, which gets failures without this DMB: > > > AArch64 MP > > { > 0:X0=x; 0:X2=y; > 1:X0=y; 1:X4=x; > } > P0 | P1 ; > LDAR W1,[X0] | MOV W2,#1 ; > | L0: ; > LDR W3,[X2] | LDAXR W1,[X0] ; > | STLXR W8,W2,[X0] ; > | CBNZ W8,L0; > | DMB ISH; > | MOV W3,#1 ; > | STR W3,[X4] ; > exists (0:X1=1 /\ 0:X3=0 /\ 1:X1=0) > Hi @theRealAph, I've pushed changes for this PR to a new branch [master...ruben-arm:jdk:pr-8360654](https://github.com/openjdk/jdk/compare/master...ruben-arm:jdk:pr-8360654) as Samuel is currently not available. Once he is back, he can update this PR's branch. In the meanwhile, I'm planning to run more of the `jcstress` testing. I'd appreciate your feedback on the version in the new branch. I can't add comments and suggestions because it's not a PR. If you make it a draft PR I'll comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3517060904 From mli at openjdk.org Tue Nov 11 14:10:47 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 14:10:47 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) [v2] In-Reply-To: References: Message-ID: <6OXsPbl9UiDbUtHy2sj1vdVcx1__gqJKRzPvYTGnjss=.9f044b9d-fe2b-423a-b470-e79bc94a98db@github.com> > Hi, > > Can you help to review this patch? > > This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. > > To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. > > ## Some background > > Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. > > This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. > > # Test > ## Jtreg > > in progress... > > ## Performance > > check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/superword.cpp refactor `is_velt_basic_type_compatible_use_def` Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28231/files - new: https://git.openjdk.org/jdk/pull/28231/files/56b6e029..cfbe0a65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28231&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28231&range=00-01 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28231.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28231/head:pull/28231 PR: https://git.openjdk.org/jdk/pull/28231 From mli at openjdk.org Tue Nov 11 14:10:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 14:10:48 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 13:45:44 GMT, Galder Zamarre?o wrote: >> Hi, >> >> Can you help to review this patch? >> >> This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. >> >> To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. >> >> ## Some background >> >> Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. >> >> This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Thanks > > Sounds like this PR should include some IR tests? @galderz @eme64 As this pr does not change any behaviour (it's splitted from https://github.com/openjdk/jdk/pull/28230, as suggested in previous review, check https://github.com/openjdk/jdk/pull/25341#issuecomment-2902440231 please), so the tests (jtreg & jmh) are put in https://github.com/openjdk/jdk/pull/28230. Or should I just close this one and use https://github.com/openjdk/jdk/pull/28230 instead? @galderz @eme64 BTW, there is an assert fix in this pr, which is also in another specific pr: https://github.com/openjdk/jdk/pull/28141. Please let me know if I should do it in this pr or not. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28231#issuecomment-3517063624 PR Comment: https://git.openjdk.org/jdk/pull/28231#issuecomment-3517084408 From mli at openjdk.org Tue Nov 11 14:10:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 14:10:48 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 13:59:16 GMT, Hamlin Li wrote: >> Sounds like this PR should include some IR tests? > > @galderz @eme64 > > As this pr does not change any behaviour (it's splitted from https://github.com/openjdk/jdk/pull/28230, as suggested in previous review, check https://github.com/openjdk/jdk/pull/25341#issuecomment-2902440231 please), so the tests (jtreg & jmh) are put in https://github.com/openjdk/jdk/pull/28230. > > Or should I just close this one and use https://github.com/openjdk/jdk/pull/28230 instead? > @Hamlin-Li Thanks for your continued effort on CMove! > > Just a first initial comment. And yes, you'll need some IR tests. Would also be nice if we could get some aarch64 or x64 implementation, so we can test it. Maybe we can collaborate on this PR to make it work together :) Sure, I'm happy to have you involved in this one, are you interested in enabling and implementing aarch64 or x64 part? Should I close this one and use https://github.com/openjdk/jdk/pull/28230 instead? Please kindly let know! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28231#issuecomment-3517076250 From mli at openjdk.org Tue Nov 11 14:10:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 14:10:51 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) [v2] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 13:51:01 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/superword.cpp >> >> refactor `is_velt_basic_type_compatible_use_def` >> >> Co-authored-by: Emanuel Peter > > src/hotspot/share/opto/superword.cpp line 2339: > >> 2337: } else if (VectorNode::is_different_use_def_size_supported()) { >> 2338: return use->is_CMove() && def->is_Bool(); >> 2339: } > > This looks a little tangled, and harder to extend. Can we make it linear like this? > > Suggestion: > > // Input size of use equals output size of def > if (type2aelembytes(use_bt) == type2aelembytes(def_bt)) { > return true; > } > > // Allow CMove to have different type for comparision and moving. > if (VectorNode::is_different_use_def_size_supported() && return use->is_CMove() && def->is_Bool()) { > return true; > } > > Because what if `is_different_use_def_size_supported` is true, but we don't have a CMove case, and then we would be able to go on with yet something else below later on? Sure, I just committed a change as you suggested via github. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28231#discussion_r2514371052 From epeter at openjdk.org Tue Nov 11 14:22:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 14:22:50 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 14:04:26 GMT, Hamlin Li wrote: >> Sounds like this PR should include some IR tests? > > @galderz @eme64 BTW, there is an assert fix in this pr, which is also in another specific pr: https://github.com/openjdk/jdk/pull/28141. Please let me know if I should do it in this pr or not. Thanks! @Hamlin-Li We can also just go with a risv impl for now, and then we can do x64 and aarch64 separately. Does this patch not affect the IR rules of the tests we have already in the code base? With an improvement, there is usually a chance to add IR rules to existing tests, or add new tests with new IR rules. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28231#issuecomment-3517143095 From epeter at openjdk.org Tue Nov 11 14:51:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 14:51:06 GMT Subject: RFR: 8371297: C2: assert triggerred in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 22:18:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Not sure how this one slipped in (https://github.com/openjdk/jdk/pull/28047/), I think it passed my local test and github CI test at that time. > Please check https://bugs.openjdk.org/browse/JDK-8371297 for details. > > Thanks! Looks hacky but reasonable for now. Though maybe it would be nicer if we had some printing method that works directly on the mask? Or some other way of passing around the mask. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28141#pullrequestreview-3448435013 From epeter at openjdk.org Tue Nov 11 14:51:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 14:51:07 GMT Subject: RFR: 8371297: C2: assert triggerred in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 14:01:44 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Not sure how this one slipped in (https://github.com/openjdk/jdk/pull/28047/), I think it passed my local test and github CI test at that time. >> Please check https://bugs.openjdk.org/browse/JDK-8371297 for details. >> >> Thanks! > > @eme64 Can you have a look? Thanks! :) @Hamlin-Li You have a PR title mismatch. And you should always give a quick description about what went wrong in your PR description ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28141#issuecomment-3517274431 From mli at openjdk.org Tue Nov 11 14:56:47 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 14:56:47 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) [v3] In-Reply-To: References: Message-ID: > Hi, > > Can you help to review this patch? > > This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. > > To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. > > ## Some background > > Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. > > This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. > > # Test > ## Jtreg > > in progress... > > ## Performance > > check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28231/files - new: https://git.openjdk.org/jdk/pull/28231/files/cfbe0a65..a89d26c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28231&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28231&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28231.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28231/head:pull/28231 PR: https://git.openjdk.org/jdk/pull/28231 From epeter at openjdk.org Tue Nov 11 14:56:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 14:56:48 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) [v2] In-Reply-To: <6OXsPbl9UiDbUtHy2sj1vdVcx1__gqJKRzPvYTGnjss=.9f044b9d-fe2b-423a-b470-e79bc94a98db@github.com> References: <6OXsPbl9UiDbUtHy2sj1vdVcx1__gqJKRzPvYTGnjss=.9f044b9d-fe2b-423a-b470-e79bc94a98db@github.com> Message-ID: <7WErDE_EcCZR2diFdYk2wJV96v6hk7cp-AuWU0waxBU=.183ed534-1e8d-438c-b677-d8128b256a76@github.com> On Tue, 11 Nov 2025 14:10:47 GMT, Hamlin Li wrote: >> Hi, >> >> Can you help to review this patch? >> >> This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. >> >> To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. >> >> ## Some background >> >> Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. >> >> This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/superword.cpp > > refactor `is_velt_basic_type_compatible_use_def` > > Co-authored-by: Emanuel Peter src/hotspot/cpu/aarch64/matcher_aarch64.hpp line 208: > 206: } > 207: > 208: static bool supports_vector_different_use_def_size() { This sounds extremely vague. Is this supposed to only be about `CMove`? Because we already have all sorts of instructions that allow different use and def types, such as conversion vectors. Those are already in use on `aarch64` and `x64`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28231#discussion_r2514537193 From epeter at openjdk.org Tue Nov 11 14:56:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 14:56:51 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) [v3] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 14:07:10 GMT, Hamlin Li wrote: >> src/hotspot/share/opto/superword.cpp line 2339: >> >>> 2337: } else if (VectorNode::is_different_use_def_size_supported()) { >>> 2338: return use->is_CMove() && def->is_Bool(); >>> 2339: } >> >> This looks a little tangled, and harder to extend. Can we make it linear like this? >> >> Suggestion: >> >> // Input size of use equals output size of def >> if (type2aelembytes(use_bt) == type2aelembytes(def_bt)) { >> return true; >> } >> >> // Allow CMove to have different type for comparision and moving. >> if (VectorNode::is_different_use_def_size_supported() && return use->is_CMove() && def->is_Bool()) { >> return true; >> } >> >> Because what if `is_different_use_def_size_supported` is true, but we don't have a CMove case, and then we would be able to go on with yet something else below later on? > > Sure, I just committed a change as you suggested via github. :) Except that I typed in haste, and now you have a syntax error, sorry about that :/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28231#discussion_r2514529522 From epeter at openjdk.org Tue Nov 11 14:56:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 14:56:52 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) [v3] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 14:53:15 GMT, Hamlin Li wrote: >> Hi, >> >> Can you help to review this patch? >> >> This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. >> >> To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. >> >> ## Some background >> >> Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. >> >> This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix typo src/hotspot/share/opto/vectornode.cpp line 411: > 409: bool VectorNode::is_different_use_def_size_supported() { > 410: return Matcher::supports_vector_different_use_def_size(); > 411: } Is this only a forwarding? What's the point of this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28231#discussion_r2514539447 From bmaillard at openjdk.org Tue Nov 11 14:57:24 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 11 Nov 2025 14:57:24 GMT Subject: RFR: 8371558: C2: Missing optimization opportunity in AbsNode::Ideal Message-ID: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> This PR addresses another missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. The affected optimization is the transformation of `abs(0-x)` into `abs(x)`. This transformation is implemented in `AbsNode::Ideal`. The bug was found by the fuzzer. At some point during IGVN, we have the following setup: Phi ... \ / SubI | AbsI The `Phi` node gets folded into a `ConI`, and we call `replace_node(phi, con)`, which ends up calling `add_users_to_worklist(phi)`, and `add_users_of_use_to_worklist(phi, sub, ...)`. However the case for this specific notification was missing there, and the `AbsI` node is never notified (not added to the worklist). This PR brings the following changes: - Detect the optimization pattern in `add_users_of_use_to_worklist` for `AbsI`, `AbsL`, `AbsF` and `AbsD` - Add new test `TestMissingOptAbsZeroMinusX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. In addition to `AbsI`, I have also added test cases for `AbsF` and `AbsD`, but was not able to reproduce for `AbsL` despite my best efforts. ### Testing - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) - [ ] tier1-4, plus some internal testing Thank you for reviewing! ------------- Commit messages: - Add cases for the different types and cleanup - Add reduced test - Add notification in PhaseIterGVN::add_users_of_use_to_worklist Changes: https://git.openjdk.org/jdk/pull/28237/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28237&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371558 Stats: 95 lines in 2 files changed: 95 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28237.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28237/head:pull/28237 PR: https://git.openjdk.org/jdk/pull/28237 From mli at openjdk.org Tue Nov 11 14:59:24 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 14:59:24 GMT Subject: RFR: 8371297: C2: assert triggered in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 14:53:37 GMT, Hamlin Li wrote: >> Looks hacky but reasonable for now. Though maybe it would be nicer if we had some printing method that works directly on the mask? Or some other way of passing around the mask. > >> Looks hacky but reasonable for now. Though maybe it would be nicer if we had some printing method that works directly on the mask? Or some other way of passing around the mask. > > Yes, you're right. I created https://bugs.openjdk.org/browse/JDK-8371396 to track it. > @Hamlin-Li You have a PR title mismatch. And you should always give a quick description about what went wrong in your PR description ;) Thank you for reminding and reviewing! I'll pay attention later. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28141#issuecomment-3517297445 From mli at openjdk.org Tue Nov 11 14:59:26 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 14:59:26 GMT Subject: Integrated: 8371297: C2: assert triggered in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 22:18:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Not sure how this one slipped in (https://github.com/openjdk/jdk/pull/28047/), I think it passed my local test and github CI test at that time. > Please check https://bugs.openjdk.org/browse/JDK-8371297 for details. > > Thanks! This pull request has now been integrated. Changeset: 405d5f7a Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/405d5f7a6892426d69409c3975d0c808304b8438 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8371297: C2: assert triggered in BoolTest::BoolTest Reviewed-by: dlong, luhenry, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28141 From mli at openjdk.org Tue Nov 11 14:59:23 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 14:59:23 GMT Subject: RFR: 8371297: C2: assert triggered in BoolTest::BoolTest In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 14:47:40 GMT, Emanuel Peter wrote: > Looks hacky but reasonable for now. Though maybe it would be nicer if we had some printing method that works directly on the mask? Or some other way of passing around the mask. Yes, you're right. I created https://bugs.openjdk.org/browse/JDK-8371396 to track it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28141#issuecomment-3517293552 From mli at openjdk.org Tue Nov 11 15:05:00 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 15:05:00 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) [v4] In-Reply-To: References: Message-ID: > Hi, > > Can you help to review this patch? > > This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. > > To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. > > ## Some background > > Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. > > This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. > > # Test > ## Jtreg > > in progress... > > ## Performance > > check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Thanks Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 32 additional commits since the last revision: - Merge branch 'master' into vectorize-CMove-Bool - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - fix typo - Update src/hotspot/share/opto/superword.cpp refactor `is_velt_basic_type_compatible_use_def` Co-authored-by: Emanuel Peter - comments - simplify - fix code path change in VectorNode::implemented - fix JDK-8371297: assert in BoolTest - revert supports_transform_cmove_to_vectorblend for all cpus - ... and 22 more: https://git.openjdk.org/jdk/compare/07be77ed...8e84017f ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28231/files - new: https://git.openjdk.org/jdk/pull/28231/files/a89d26c4..8e84017f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28231&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28231&range=02-03 Stats: 173302 lines in 1205 files changed: 118510 ins; 25624 del; 29168 mod Patch: https://git.openjdk.org/jdk/pull/28231.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28231/head:pull/28231 PR: https://git.openjdk.org/jdk/pull/28231 From epeter at openjdk.org Tue Nov 11 15:05:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 15:05:02 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) [v3] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 14:56:47 GMT, Hamlin Li wrote: >> Hi, >> >> Can you help to review this patch? >> >> This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. >> >> To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. >> >> ## Some background >> >> Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. >> >> This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix typo This seems like an empty refactor, and it's not clear what it solves. It also does not seem riscv specific. It would probably be better if you actually did this together with the patch that actually ensures vectorization for riscv, including IR tests and all. That's probably what you plan to do with https://github.com/openjdk/jdk/pull/28230, right? It is difficult to review the code here, without seeing how it all goes together. src/hotspot/share/opto/vectornode.hpp line 115: > 113: // Return true if every bit in this vector is 1, e.g. based on the comparison > 114: // result of 2 floats, set a double result. > 115: static bool is_different_use_def_size_supported(); I'm a bit confused about your description here. It sounds like this method is looking at a specific vector, and returns results based on that. But that's not what's happening here, is it? ------------- PR Review: https://git.openjdk.org/jdk/pull/28231#pullrequestreview-3448485583 PR Review Comment: https://git.openjdk.org/jdk/pull/28231#discussion_r2514550009 From epeter at openjdk.org Tue Nov 11 15:05:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 15:05:03 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 14:04:26 GMT, Hamlin Li wrote: >> Sounds like this PR should include some IR tests? > > @galderz @eme64 BTW, there is an assert fix in this pr, which is also in another specific pr: https://github.com/openjdk/jdk/pull/28141. Please let me know if I should do it in this pr or not. Thanks! @Hamlin-Li Can you describe your general approach with https://github.com/openjdk/jdk/pull/28230? How exactly will you deal with the type size change? Will you have a conversion of the mask, between the comparison and the blend? It may be good if you describe it in a bit of detail, so that we can allow `aarch64` and `x64` specialists to look at it, and see if the basic design is platform independent enough ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28231#issuecomment-3517321804 From epeter at openjdk.org Tue Nov 11 15:08:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 15:08:04 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 14:04:26 GMT, Hamlin Li wrote: >> Sounds like this PR should include some IR tests? > > @galderz @eme64 BTW, there is an assert fix in this pr, which is also in another specific pr: https://github.com/openjdk/jdk/pull/28141. Please let me know if I should do it in this pr or not. Thanks! @Hamlin-Li At a quick glance, https://github.com/openjdk/jdk/pull/28230 also has some scalar backend implementations of CMove. I think you could just integrate those separately first, and only then do the vectorization. Additionally: it may be easier to first ensure that the Vector API tests work for riscv backend vector instructions. And then we can work on Auto Vectorization once the all the backend instructions are already in place and tested via the Vector API. That would be a way I usually see aarch64 and x64 engineers split up the work. Also makes it easier to get specialists for the area to review the code. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28231#issuecomment-3517342327 From mli at openjdk.org Tue Nov 11 15:13:21 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Nov 2025 15:13:21 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: <2IJ9K-jgE91bwgX6DqyKmXtslw7ehJvFEZ_DT4oFG0I=.d24ea4fb-a400-4e72-bd61-6828c742bc8b@github.com> On Tue, 11 Nov 2025 14:04:26 GMT, Hamlin Li wrote: >> Sounds like this PR should include some IR tests? > > @galderz @eme64 BTW, there is an assert fix in this pr, which is also in another specific pr: https://github.com/openjdk/jdk/pull/28141. Please let me know if I should do it in this pr or not. Thanks! > @Hamlin-Li At a quick glance, #28230 also has some scalar backend implementations of CMove. I think you could just integrate those separately first, and only then do the vectorization. > > Additionally: it may be easier to first ensure that the Vector API tests work for riscv backend vector instructions. And then we can work on Auto Vectorization once the all the backend instructions are already in place and tested via the Vector API. > > That would be a way I usually see aarch64 and x64 engineers split up the work. Also makes it easier to get specialists for the area to review the code. > > What do you think? @eme64 Thank you for the suggestion. I'll do some investigation on vector API related things, and get back to this one later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28231#issuecomment-3517364043 From epeter at openjdk.org Tue Nov 11 15:14:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 15:14:27 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v6] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 05:41:37 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Revert smoke test changes Looks reasonable to me now. Thanks for all the updates! I'll run some internal testing before approving :) ------------- PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3448543009 From epeter at openjdk.org Tue Nov 11 15:22:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 15:22:20 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <_gdObRdkYkS7d3fQQ6bcms709TpeM1IQtuPJtI0fcyE=.073d0496-8dfc-48a9-aee8-b64b408a6e62@github.com> Message-ID: On Mon, 10 Nov 2025 16:07:35 GMT, Fei Gao wrote: >> @fg1417 Are you still working on this? > > Hi @eme64, many thanks for your review. It?s really comprehensive and insightful. I?ve given a thumbs-up to all the comments that have been resolved in this commit. > >> I have one concern: We now have changed the branches. There is now a long sequence of branches if we have very few iterations, so that we only go through pre and post loop. It would be interesting to see what the performance difference is between master and patch. > > Regarding this concern, I re-ran the microbenchmarks (now merged with the existing `VectorThroughputForIterationCount.java` ), named as `bench03*_drain_memoryBound`, and collected data across different platforms, including `128-bit` and `256-bit` `AArch64` machines as well as a `512-bit` `x86` machine. > > To summarize, I observe a minor performance regression for small-iteration loops on the `128-bit` and `256-bit` `AArch64` platforms. For larger-iteration loops, there is either a performance improvement or no noticeable change. The performance data on the `512-bit x86` machine shows a similar trend, though the regression is more significant. > > **The test range of `ITERATION_COUNT` is `0?300`. For larger `ITERATION_COUNT` values, there is either a performance improvement or no noticeable change, so those results are omitted. The following data only shows cases with regressions.** > > > (FIXED_OFFSET) (RANDOMIZE_OFFSETS) (REPETITIONS) (seed) Mode Cnt > 0 TRUE 1024 42 avgt 3 > > `Diff = (patch - master) / master` > > On `128-bit aarch64` platform: > > Benchmark (ITERATION_COUNT) Units Diff > bench031B_drain_memoryBound 1 ns/op 15.15% > bench031B_drain_memoryBound 2 ns/op 10.89% > bench031B_drain_memoryBound 3 ns/op 9.27% > bench031B_drain_memoryBound 4 ns/op 7.39% > bench031B_drain_memoryBound 5 ns/op 5.86% > bench031B_drain_memoryBound 6 ns/op 5.31% > bench031B_drain_memoryBound 7 ns/op 4.39% > bench031B_drain_memoryBound 8 ns/op 4.27% > bench031B_drain_memoryBound 9 ns/op 3.60% > bench031B_drain_memoryBound 10 ns/op 3.11% > bench031B_drain_memoryBound 11 ns/op 2.97% > bench031B_drain_memoryBound 12 ns/op 3.19% > bench031B_drain_memoryBound 13 ns/op 2.90% > bench031B_drain_memoryBound 14 ns/op 2.68% > bench031B_drain_memoryBound 15 ns/op 2.37% > bench031B_drain_memoryBound 16 ns/op 2.44% > bench031B_drain_memoryBound 17 ns/op 2.11% > bench031B_drain_memoryBound 18 ns... @fg1417 Thanks for benchmarking for my concern ? You plot from above probably shows exactly what I was expecting: image Seeing your results, I also lean to the side that the results are acceptable: very minor losses, but a clear win in the middle. I'll have a look at our smaller conversations now. FYI: I'm generally really impressed how clean the results on your plots are :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3517416813 From dbriemann at openjdk.org Tue Nov 11 15:38:37 2025 From: dbriemann at openjdk.org (David Briemann) Date: Tue, 11 Nov 2025 15:38:37 GMT Subject: RFR: 8371642: TestNumberOfContinuousZeros.java fails on PPC64 Message-ID: Skips IR match rules for COUNT_LEADING_ZEROS_VL on PPC. Nodes are not implemented there. ------------- Commit messages: - 8371642: TestNumberOfContinuousZeros.java fails on PPC64 Changes: https://git.openjdk.org/jdk/pull/28239/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28239&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371642 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28239.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28239/head:pull/28239 PR: https://git.openjdk.org/jdk/pull/28239 From epeter at openjdk.org Tue Nov 11 16:02:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 16:02:10 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v3] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Fri, 7 Nov 2025 09:45:24 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - Add necessary ASCII art, refactor insert_post_loop() and rename > "atomic post loop" with "vectorized drain loop. > - Merge branch 'master' into optimize-atomic-post > - ... and 1 more: https://git.openjdk.org/jdk/compare/eab5644a...e21a830f A few more comments / responses. Thanks again for all the updates. Next, I'll have to go over the whole code again :) test/hotspot/jtreg/compiler/loopopts/superword/TestVectorizedDrainLoop.java line 85: > 83: } > 84: return sum; > 85: } Since recently, this now also auto vectorizes. Maybe this method should not be compiled, if it is part of verification? test/micro/org/openjdk/bench/vm/compiler/VectorThroughputForIterationCount.java line 225: > 223: for (int i = startIndex; i < startIndex + length; i++) { > 224: c[i] = a[i] + b[i]; > 225: } You could forceinline them, just for good measure. Up to you. ------------- PR Review: https://git.openjdk.org/jdk/pull/22629#pullrequestreview-3448620070 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2514659287 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2514675792 From epeter at openjdk.org Tue Nov 11 16:02:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 16:02:12 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v2] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Mon, 10 Nov 2025 15:18:42 GMT, Fei Gao wrote: >> src/hotspot/share/opto/loopnode.hpp line 1434: >> >>> 1432: Node* get_vectorized_drain_input(Node* main_backedge_ctrl, VectorSet& visited, >>> 1433: Node_Stack& clones, Node* main_merge_region, >>> 1434: Node* main_phi); >> >> We don't just do this for the trip-counter though, right? Because the `main_incr` suggests that a bit here. Could you rephrase to make it more accurate? Do you think that could be worth it? It is also nice to have the analogy to the trip-counter, so I like that in the example ASCII art. > > Yes, it applies to all values that increase as the loop iterates. I?m afraid I forgot to rename `main_incr` to a more general name after refactoring the code here. I?ll update it in the next commit. How about renaming it to `main_out`? Ah, I just had an idea: we are talking only about the `iv` (trip counter), right? You could put `iv` in the name, for example `iv_after_main` or `main_out_iv`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2514752399 From epeter at openjdk.org Tue Nov 11 16:02:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 16:02:14 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v2] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Mon, 10 Nov 2025 14:29:49 GMT, Fei Gao wrote: >> Can you quickly say what this loop does with each phi? > >> Can you quickly say what this loop does with each phi? > > For each Phi node, referred to as `main_merge_phi`, we create a corresponding `drain_merge_phi` as one of its new data uses, as shown below: > > main_merge_phi = Phi (pre_out, main_out) > drain_merge_phi = Phi (drain_out, main_merge_phi) Thanks for the explanation. You could add that as a code comment, if you did not already do that ;) >> Ah, you have exact constant results that you compare with. Could be good to state this here as a comment, so that nobody removes this in the future. You are just making sure that the interpreter would have produced the same results. >> >> Still: why not add a run without any flags? > > Added a comment in the short summary part for interpreter run. Also added a run without any flags. Ah, I see. Right, these are all fuzzer tests. I suppose it's fine. But just that you know: internally we also run many tests with combinations of `-Xbatch -XX:-TieredCompilation -Xcomp`, only C1 etc. How long does this test take to complete with all the runs? If you are doing `-Xcomp`, I would restrict compilation to the class. Otherwise you essencially spend most time C2 compiling during start up of the VM, and compile a lot of JDK classes, which is really not necessary ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2514777555 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2514670112 From epeter at openjdk.org Tue Nov 11 16:02:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 16:02:15 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v3] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 11 Nov 2025 15:32:04 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Fixed new test failures after rebasing and refined parts of the code to address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Merge branch 'master' into optimize-atomic-post >> - Clean up comments for consistency and add spacing for readability >> - Fix some corner case failures and refined part of code >> - Merge branch 'master' into optimize-atomic-post >> - Refine ascii art, rename some variables and resolve conflicts >> - Merge branch 'master' into optimize-atomic-post >> - Add necessary ASCII art, refactor insert_post_loop() and rename >> "atomic post loop" with "vectorized drain loop. >> - Merge branch 'master' into optimize-atomic-post >> - ... and 1 more: https://git.openjdk.org/jdk/compare/eab5644a...e21a830f > > test/micro/org/openjdk/bench/vm/compiler/VectorThroughputForIterationCount.java line 225: > >> 223: for (int i = startIndex; i < startIndex + length; i++) { >> 224: c[i] = a[i] + b[i]; >> 225: } > > You could forceinline them, just for good measure. Up to you. Wait, you are doing some kind of special warmup above. Why? Do you maybe NOT want the methods to inline? Any other reason for the warmup? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2514682741 From epeter at openjdk.org Tue Nov 11 16:16:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 16:16:46 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v25] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with eight additional commits since the last revision: - fix whitespace - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/e1d50609..583b6870 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=23-24 Stats: 10 lines in 2 files changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Tue Nov 11 16:16:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 16:16:47 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v14] In-Reply-To: References: <5qx7k9BmYPEtyrpXBk3UhHZiXm2mnQ51Y30m8kdVYkg=.f1b6272c-11e1-4d5a-9a5a-7e831a260bca@github.com> Message-ID: On Wed, 5 Nov 2025 13:53:13 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > I think I exhausted my reviewer fuel for today and will resume tomorrow by calling `setFuel(100%)` again :-) @chhagedorn Ok, I applied all your small changes now. Had to do it 1-1 over GitHub, some of them just did not show up in the code view... thanks to the "new experience" ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3517657242 From epeter at openjdk.org Tue Nov 11 16:16:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 16:16:48 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v24] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 10:22:18 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix whitespace > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1166: > >> 1164: >> 1165: // In this section, we will look at some subtle facts about the behavior of >> 1166: // transparent scopes around hook insertion. This inteded for expert users > > Suggestion: > > // transparent scopes around hook insertion. This is intended for expert users Somehow, this is not showing on the "new experience". Now I'll just have to integrate everything individually :rofl: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2514802137 From epeter at openjdk.org Tue Nov 11 16:21:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 16:21:00 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v26] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add missing comma from suggestion application ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/583b6870..dfc25f59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From psandoz at openjdk.org Tue Nov 11 16:34:04 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 11 Nov 2025 16:34:04 GMT Subject: RFR: 8370691: Add new HalffloatVector type and enable intrinsification of float16 vector operations In-Reply-To: <_ryF0SNpSLahH4HkGqSnGKc_6d9P1fWrKYTS0jRPvtk=.ff2143aa-d3a5-4776-bdd0-95646dfd35e9@github.com> References: <_ryF0SNpSLahH4HkGqSnGKc_6d9P1fWrKYTS0jRPvtk=.ff2143aa-d3a5-4776-bdd0-95646dfd35e9@github.com> Message-ID: On Tue, 11 Nov 2025 12:08:42 GMT, Emanuel Peter wrote: > We already have a lot of things in the codebase now from previous issues that use `HF` everywhere, for example some node names, and the type. Should we maybe rename all of them to `F16`, or something else? Open question, not sure of the answer yet. I was only referring to the Java code, esp. the new public classes so they align with the `Float16` element type. I do think it worthwhile to align so we are consistent across the platform. Revisiting the names in HotSpot, and their internal connection in Java, could be done in a separate PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3517758143 From epeter at openjdk.org Tue Nov 11 16:34:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Nov 2025 16:34:05 GMT Subject: RFR: 8370691: Add new HalffloatVector type and enable intrinsification of float16 vector operations In-Reply-To: References: <_ryF0SNpSLahH4HkGqSnGKc_6d9P1fWrKYTS0jRPvtk=.ff2143aa-d3a5-4776-bdd0-95646dfd35e9@github.com> Message-ID: On Tue, 11 Nov 2025 16:28:54 GMT, Paul Sandoz wrote: > Revisiting the names in HotSpot, and their internal connection in Java, could be done in a separate PR? Yes, exactly. Maybe even in a quick renaming PR before this issue. Would be quickly reviewed, and would allow us to see complete consistency going forward with this PR here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3517766354 From bmaillard at openjdk.org Tue Nov 11 16:36:16 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 11 Nov 2025 16:36:16 GMT Subject: RFR: 8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL In-Reply-To: References: Message-ID: <-XUUHhtsOIo1jmiGiGjex9u1-fLXhsTKhK-QHT0HeeE=.688ce64c-09f9-4d4a-ab35-cbb2a5f96f81@github.com> On Mon, 10 Nov 2025 15:35:29 GMT, Tobias Hartmann wrote: >> This PR addresses a missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. >> >> The affected optimization is the transformation of `(x & mask) >> shift` into `(x >> shift) & (mask >> shift)`, where `mask` is a constant, for `URShiftL` and `URShiftI` nodes. This transformation is handled in `URShiftLNode::Ideal` and `URShiftINode::Ideal`. [JDK-8361700](https://bugs.openjdk.org/browse/JDK-8361700) addressed the analog case for `RShiftL` and `RShiftI`, but lacked the notification for unsigned shifting. >> >> This PR builds on top of [JDK-8361700](https://bugs.openjdk.org/browse/JDK-8361700) and adds the following changes: >> - Fix the notification mechanism in `add_users_of_use_to_worklist` >> - Add the `URShiftL` in `TestMaskAndRShiftReorder.java` >> - Drive-by changes: simplify the `RShiftL` test case slightly, and add the missing analog case for `RShiftI` >> >> >> I tried to reproduce the missing optimization for the `URShiftI` without success. There must be some subtle difference with the `long` case that causes the optimization to be triggered in this specific setup. I still added the case to the fix in `add_users_of_use_to_worklist`, as there are likely cases where the notification is missing (but I was just not able to find one). >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) >> - [x] tier1-4, plus some internal testing >> >> Thank you for reviewing! > > That looks good to me. Thanks for quickly jumping on this! Thank you for the reviews @TobiHartmann @mhaessig! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28218#issuecomment-3517770121 From bmaillard at openjdk.org Tue Nov 11 16:36:17 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 11 Nov 2025 16:36:17 GMT Subject: Integrated: 8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL In-Reply-To: References: Message-ID: <1Lsh7C-q8uUURM8cBDf9Pk5ZfdZW4ODlRVSZbRqyaLs=.2973b84a-0306-40a4-a4a8-cb740216d875@github.com> On Mon, 10 Nov 2025 15:23:25 GMT, Beno?t Maillard wrote: > This PR addresses a missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. > > The affected optimization is the transformation of `(x & mask) >> shift` into `(x >> shift) & (mask >> shift)`, where `mask` is a constant, for `URShiftL` and `URShiftI` nodes. This transformation is handled in `URShiftLNode::Ideal` and `URShiftINode::Ideal`. [JDK-8361700](https://bugs.openjdk.org/browse/JDK-8361700) addressed the analog case for `RShiftL` and `RShiftI`, but lacked the notification for unsigned shifting. > > This PR builds on top of [JDK-8361700](https://bugs.openjdk.org/browse/JDK-8361700) and adds the following changes: > - Fix the notification mechanism in `add_users_of_use_to_worklist` > - Add the `URShiftL` in `TestMaskAndRShiftReorder.java` > - Drive-by changes: simplify the `RShiftL` test case slightly, and add the missing analog case for `RShiftI` > > > I tried to reproduce the missing optimization for the `URShiftI` without success. There must be some subtle difference with the `long` case that causes the optimization to be triggered in this specific setup. I still added the case to the fix in `add_users_of_use_to_worklist`, as there are likely cases where the notification is missing (but I was just not able to find one). > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! This pull request has now been integrated. Changeset: f5eacbeb Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/f5eacbeb5fc58c1bd844d709fe92621ce3689d78 Stats: 35 lines in 2 files changed: 27 ins; 1 del; 7 mod 8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL Reviewed-by: thartmann, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/28218 From mdoerr at openjdk.org Tue Nov 11 17:22:04 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 11 Nov 2025 17:22:04 GMT Subject: RFR: 8371642: TestNumberOfContinuousZeros.java fails on PPC64 In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 15:32:03 GMT, David Briemann wrote: > Skips IR match rules for COUNT_LEADING_ZEROS_VL on PPC. Nodes are not implemented there. Ok. The node which is not implemented is `VectorCastL2X` which is why these loops are not vectorized on PPC64. LGTM. Thanks for adapting the test! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28239#pullrequestreview-3449103569 From duke at openjdk.org Tue Nov 11 22:12:16 2025 From: duke at openjdk.org (duke) Date: Tue, 11 Nov 2025 22:12:16 GMT Subject: Withdrawn: 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces In-Reply-To: References: Message-ID: <9tejAhPJgvO0llv2tlvWdLh4-jVeqX18CDpDrNcvZYI=.0dc52df4-4931-46ff-9b78-b8e714701243@github.com> On Tue, 2 Sep 2025 18:22:45 GMT, Kirill Shirokov wrote: > This PR addresses the trailing whitespaces for a .py test. > > They were introduced in commit 916694f2c1e7fc8d6a88e7026bc2d29ba2923849 and not detected by jcheck, since checking *.py for whitespaces is not enabled in .jcheck/conf. > > So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/27058 From duke at openjdk.org Wed Nov 12 01:06:28 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 12 Nov 2025 01:06:28 GMT Subject: RFR: 8371121: compiler/whitebox/DeoptimizeRelocatedNMethod.java fails with C1 Message-ID: [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121) This update aims to improve the test?s stability. A previous failure occurred because the method wasn?t compiled at the time of the check. I believe this could have occurred due to a deoptimization but I have not been able to reproduce. Previously, the test ensured compilation by repeatedly invoking the function. Instead, we now use Whitebox to add the method directly to the compile queue and wait for it to finish compiling. This approach should eliminate issues caused by deoptimization from function calls. ------------- Commit messages: - Make DeoptimizeRelocatedNMethod more stable Changes: https://git.openjdk.org/jdk/pull/28246/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28246&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371121 Stats: 15 lines in 1 file changed: 5 ins; 7 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28246/head:pull/28246 PR: https://git.openjdk.org/jdk/pull/28246 From liach at openjdk.org Wed Nov 12 01:07:07 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 12 Nov 2025 01:07:07 GMT Subject: RFR: 8369993: Redundant separate 'String' check in 'trust_final_non_static_fields' ciField.cpp In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 10:57:53 GMT, Zihao Lin wrote: > Remove redundant separate 'String' check in 'trust_final_non_static_fields' ciField.cpp @linzihao1999 I noted there is another more comprehensive issue for this function: https://bugs.openjdk.org/browse/JDK-8368961 There are a total of 3 redundant checks in this function that can be removed. If you want to update this patch, feel free to update the issue for the PR, and include the cleanup for all 3 redundant checks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28191#issuecomment-3519393814 From xgong at openjdk.org Wed Nov 12 01:34:10 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 12 Nov 2025 01:34:10 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v6] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 15:11:11 GMT, Emanuel Peter wrote: > Looks reasonable to me now. Thanks for all the updates! > > I'll run some internal testing before approving :) Sounds good! Thanks so much for your testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3519460562 From xgong at openjdk.org Wed Nov 12 01:50:11 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 12 Nov 2025 01:50:11 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 23:38:55 GMT, Paul Sandoz wrote: > > and similarly move vector slice operations to the compiler > > Yes, you have to slice the mask, whether it be represented as a mask/predicate register or as a vector. There's no way around that and we have to deal with the current limitations in hardware. As a further compromise we can in Java convert the mask to a vector and rearrange it, then pass the vector representation of the mask to the scatter/gather intrinsic. Then the intrinsic can if it chooses convert it back to a mask/predicate register if that is the best form. Yes, converting mask to vector will be the way to resolve. Do you think it's better that defining a private VectorMask function for the slice operation? The function could be implemented with corresponding vector slice APIs. Although this function is not friendly to SVE performance, it wins on unifying the implementation. > > IIUC we have agreed for non-masked subword scatter/gather to compose by parts using the intrinsic. That seems good, and it looks like we can do the same for masked subword scatter/gather, as above, but it may not be the most efficient for the platform. > > Do you have any use cases for mask subword scatter/gather? Given the lack of underlying hardware support it seems focusing on getting the non-masked version working well, and the masked version working ok is a pragmatic way forward. Currently, I do not have specific use cases for masked subword gather or scatter operations. However, I would like to ensure support for these APIs on SVE in case they become relevant for future Java workloads. However, compared to having no intrinsic support at all, using intrinsified APIs?even if not fully optimized?can still significantly improve performance, right? BTW, I agree that focusing only on the non-masked version would certainly simplify the implementation a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3519515914 From fyang at openjdk.org Wed Nov 12 03:19:10 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Nov 2025 03:19:10 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v22] In-Reply-To: References: Message-ID: <0CXkTkQmXYZJonbDiEVjvxgvwvRhk7cb8Wf7aBvngn8=.9a88547c-87ec-4b58-bc1a-ce04b0109439@github.com> On Mon, 10 Nov 2025 05:59:02 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify parm to unsigned as aarch64 and x86 Thanks for the update. Overall LGTM. I am running some tests with this change. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2778: > 2776: __ beqz(len, L_exit); > 2777: __ j(L_next); > 2778: Can you add code comment about what this `L_main_loop` loop does? Like: `// Encrypt the blocks of data one by one until there is less than a full block remaining.` And it's not that easy for me to find where the `L_main_loop` is. Maybe we can put the code of inner loop in a pair of braces to make it explicit. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2787: > 2785: > 2786: __ vse32_v(v16, saved_encrypted_ctr); > 2787: __ mv(used, 0); Can you move this update of `used` immediately before the `bltu` at L2794? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2794: > 2792: be_store_counter_128(counter_hi, counter_lo, counter); > 2793: > 2794: __ bltu(len, block_size, L_encrypt_next); It would be helpful if we add some extra code comment about what this check is for. Like: `// Do we have a remaining full block?`. ------------- PR Review: https://git.openjdk.org/jdk/pull/25281#pullrequestreview-3451127070 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2516561710 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2516544908 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2516547016 From jbhateja at openjdk.org Wed Nov 12 03:53:39 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Nov 2025 03:53:39 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v9] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Moving demotion candidate marking to AD file, review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 - Removing redundant interferecne check from biasing - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 - Limiting register biasing to NDD specific demotable instructions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 - Fix jtreg, one less spill - Updating as per reivew suggestions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 - ... and 2 more: https://git.openjdk.org/jdk/compare/8531fa14...038eebdb ------------- Changes: https://git.openjdk.org/jdk/pull/26283/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=08 Stats: 208 lines in 12 files changed: 131 ins; 8 del; 69 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Wed Nov 12 04:00:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Nov 2025 04:00:46 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v10] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Minor cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/038eebdb..6c359e87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Wed Nov 12 04:03:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Nov 2025 04:03:38 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: <79AScufLeBvh-9BFYBHAktT8fIQlOkuIAWrB5HrfrkM=.87162308-51ee-4369-8985-302105e77622@github.com> References: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> <79AScufLeBvh-9BFYBHAktT8fIQlOkuIAWrB5HrfrkM=.87162308-51ee-4369-8985-302105e77622@github.com> Message-ID: On Wed, 29 Oct 2025 22:44:28 GMT, Dean Long wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Removing redundant interferecne check from biasing > > src/hotspot/cpu/x86/x86_64.ad line 498: > >> 496: case xorL_rReg_im1_ndd_rule: >> 497: case xorL_rReg_ndd_rule: >> 498: case xorL_rReg_rReg_mem_ndd_rule: > > Having a list that needs adjusting as new rules are added seems fragile. Is there a way to detect that a rule is missing here? Is there an alternative way of implementing this? Hi @iwanowww , @dean-long , I have moved the demotion candidate marking to the AD file. This will help in better correlation with selection patterns. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2516655029 From jbhateja at openjdk.org Wed Nov 12 04:16:06 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Nov 2025 04:16:06 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v6] In-Reply-To: References: Message-ID: On Tue, 21 Oct 2025 12:17:46 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 >> - Limiting register biasing to NDD specific demotable instructions >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 >> - Fix jtreg, one less spill >> - Updating as per reivew suggestions >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 >> - Some refactoring >> - 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions > > Current scheme of validation is manual:- > > 1) Revert https://github.com/openjdk/jdk/pull/27320, since SDE 9.58 does not support APX_NCI_NDD_NF flag yet. > 2) Static register allocation ordering change in x86_64.ad to always prefer EGPR R16-R31 during allocation. > 3) Register allocation biasing facilitates demotion, which happens in the assembler layer. > 4) Added debug messages in demotable assembler routines. > 5) Inspected the assembler encoding in Intel xed64 > 6) Ran the following tests with -XX:-UseSuperWord to exercise various NDD demotable instructions with Intel SDE 9.58. > - test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java > - test/hotspot/jtreg/compiler/c2/cr6340864/TestLongVect.java > > **By limiting the scope of the fix to NDD-specific instructions, we have now mitigated any unwanted performance side effects on other backends OR non-APX x86 backends.** > > We do have existing tests in place for functional correctness of NDD assembler instructions https://github.com/openjdk/jdk/blob/master/test/hotspot/gtest/x86/x86-asmtest.py > Thanks for working on this @jatin-bhateja! I think the code changes themselves look sound, but I would like a bit more information about the performance and code size improvements. I'm also running some additional testing and benchmarking, and will let you know when I have the results. > > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > Can you elaborate on how you measured this improvement? > Hi @dlunde , improvements are gauged by inspecting the JIT code size. Every NDD instruction expects a 4-byte extended EVEX prefix. By demoting its to REX/REX2 prefix, we save 2-3 bytes per instruction. For example, consider the following micro kernel, with this patch, almost every NDD instruction gets the benefit of register biasing, and thus the assembler layer demotes these REX/REX2 prefixed instructions. Kernel:- -------- public static long micro(long arg1, long [] arg2, long arg3, long arg4, int ctr) { long t1 = arg1 + arg2[ctr] + arg3 + arg4; long t2 = arg1 * arg2[ctr] * arg3 * arg4; long t3 = arg1 ^ arg2[ctr] ^ arg3 ^ arg4; long t4 = arg1 | arg2[ctr] | arg3 | arg4; long t5 = arg1 & arg2[ctr] & arg3 & arg4; return t1 + t2 + t3 + t4 + t5; } OptoAssembly with patch:- ----------------- 028 eandq R11, RSI, R10 # long ndd 02e eimulq R9, RSI, R10 # long ndd 034 eandq R11, R11, RCX # long ndd 037 eimulq R9, R9, RCX # long ndd 03b eandq R11, R11, R8 # long ndd 03e eimulq R9, R9, R8 # long ndd 042 eaddq RBX, RSI, R10 # long ndd 048 exorq RDI, RSI, R10 # long ndd 04e eaddq RBX, RBX, RCX # long ndd 051 exorq RDI, RDI, RCX # long ndd 054 eaddq RBX, RBX, R8 # long ndd 057 exorq RDI, RDI, R8 # long ndd 05a eaddq RBX, RBX, R9 # long ndd 05d eorq RSI, RSI, R10 # long ndd 060 eaddq RDI, RDI, RBX # long ndd 063 eorq RSI, RSI, RCX # long ndd 066 eorq RSI, RSI, R8 # long ndd 069 eaddq RSI, RSI, RDI # long ndd Disassembly of JIT code:- --------------------------- EMR>xed64 -64 -d 4803d94833f94903d84933f84903d9 4803D94833F94903D84933F84903D9 ICLASS: ADD CATEGORY: BINARY EXTENSION: BASE IFORM: ADD_GPRv_GPRv_03 ISA_SET: I86 ATTRIBUTES: SCALABLE SHORT: add rbx, rcx 4833F94903D84933F84903D9 ICLASS: XOR CATEGORY: LOGICAL EXTENSION: BASE IFORM: XOR_GPRv_GPRv_33 ISA_SET: I86 ATTRIBUTES: SCALABLE SHORT: xor rdi, rcx 4903D84933F84903D9 ICLASS: ADD CATEGORY: BINARY EXTENSION: BASE IFORM: ADD_GPRv_GPRv_03 ISA_SET: I86 ATTRIBUTES: SCALABLE SHORT: add rbx, r8 4933F84903D9 ICLASS: XOR CATEGORY: LOGICAL EXTENSION: BASE IFORM: XOR_GPRv_GPRv_33 ISA_SET: I86 ATTRIBUTES: SCALABLE SHORT: xor rdi, r8 4903D9 ICLASS: ADD CATEGORY: BINARY EXTENSION: BASE IFORM: ADD_GPRv_GPRv_03 ISA_SET: I86 ATTRIBUTES: SCALABLE SHORT: add rbx, r9 > > Thorough validations are underway using the latest Intel Software Development Emulator version 9.58. > > Great, can you elaborate more on this? What types of validations? > The current scheme of validation is mostly manual, but running some tests under Intel SDE and inspecting OptoAssembly and disassembling JIT code, and also adding debug messages [in [](assembler.](https://github.com/jatin-bhateja/external_staging/blob/main/Backup/reg_alloc_ndd_demotion_validation.diff)), I have listed down validation configuration [above](https://github.com/openjdk/jdk/pull/26283#issuecomment-3426307551) > Also, here is a patch with some simple style and wording fixes: [dlunde at d2b5118](https://github.com/dlunde/jdk/commit/d2b511804c757c89c5662028ea9e4a9dff43b641). I know you just moved some of the affected code around, but we might as well fix a few style issues while we are at it. Thanks!, I have modified some code, so these anomalies are taken care of. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3519857868 From duke at openjdk.org Wed Nov 12 05:04:01 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 12 Nov 2025 05:04:01 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 01:04:45 GMT, Chen Liang wrote: > @linzihao1999 I noted there is another more comprehensive issue for this function: https://bugs.openjdk.org/browse/JDK-8368961 > > There are a total of 3 redundant checks in this function that can be removed. If you want to update this patch, feel free to update the issue for the PR, and include the cleanup for all 3 redundant checks. Sure, I will update this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28191#issuecomment-3519977435 From duke at openjdk.org Wed Nov 12 05:12:34 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 12 Nov 2025 05:12:34 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp [v2] In-Reply-To: References: Message-ID: > Remove redundant check in 'trust_final_non_static_fields' ciField.cpp > > Remove: > 1. java_lang_System check > 2. is_box_klass check > 3. java_lang_String check Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: remove ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28191/files - new: https://git.openjdk.org/jdk/pull/28191/files/215e71a2..36f8bce1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28191&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28191&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28191.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28191/head:pull/28191 PR: https://git.openjdk.org/jdk/pull/28191 From duke at openjdk.org Wed Nov 12 05:19:56 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 12 Nov 2025 05:19:56 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v2] In-Reply-To: References: Message-ID: > If nodes both are constant, support constant folding. Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28097/files - new: https://git.openjdk.org/jdk/pull/28097/files/f2fae20c..924c1555 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From duke at openjdk.org Wed Nov 12 05:19:58 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 12 Nov 2025 05:19:58 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v2] In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 10:55:23 GMT, Andrew Haley wrote: >> Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/mulnode.cpp >> >> Co-authored-by: Andrew Haley > > src/hotspot/share/opto/mulnode.cpp line 622: > >> 620: const TypeLong *longType1 = t1->is_long(); >> 621: const TypeLong *longType2 = t2->is_long(); >> 622: if(longType1 && longType2 && longType1->is_con() && longType2->is_con()){ > > Suggestion: > > if(longType1 != nullptr && longType2 != nullptr && longType1->is_con() && longType2->is_con()){ > > I know, it seems a bit fussy, but that's the way we do it. Got it, thank you ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2516830796 From wenanjian at openjdk.org Wed Nov 12 06:04:45 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 12 Nov 2025 06:04:45 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v23] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: add more comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/09a31b7d..051ce4e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=21-22 Stats: 10 lines in 1 file changed: 7 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Wed Nov 12 06:04:47 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 12 Nov 2025 06:04:47 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v22] In-Reply-To: <0CXkTkQmXYZJonbDiEVjvxgvwvRhk7cb8Wf7aBvngn8=.9a88547c-87ec-4b58-bc1a-ce04b0109439@github.com> References: <0CXkTkQmXYZJonbDiEVjvxgvwvRhk7cb8Wf7aBvngn8=.9a88547c-87ec-4b58-bc1a-ce04b0109439@github.com> Message-ID: <14odeM9JBG0jP3p0t9zoz1IP5jhbA1DqSFe6EQtnqOU=.d54d6819-c20e-4cb7-805a-88128a126ed5@github.com> On Wed, 12 Nov 2025 03:00:53 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> modify parm to unsigned as aarch64 and x86 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2787: > >> 2785: >> 2786: __ vse32_v(v16, saved_encrypted_ctr); >> 2787: __ mv(used, 0); > > Can you move this update of `used` immediately before the `bltu` at L2794? sure, good idea, I have changed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2516976692 From chagedorn at openjdk.org Wed Nov 12 06:39:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 12 Nov 2025 06:39:05 GMT Subject: RFR: 8371558: C2: Missing optimization opportunity in AbsNode::Ideal In-Reply-To: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> References: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> Message-ID: <8yK-mgs2IYDhJkkaZpka-5fiZNvF0YgbdRA0mCzxH0Y=.a7759179-6585-4739-9fd1-03ad92b4633c@github.com> On Tue, 11 Nov 2025 14:42:43 GMT, Beno?t Maillard wrote: > This PR addresses another missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. > > The affected optimization is the transformation of `abs(0-x)` into `abs(x)`. This transformation is implemented in `AbsNode::Ideal`. > > The bug was found by the fuzzer. At some point during IGVN, we have the following setup: > > > Phi ... > \ / > SubI > | > AbsI > > > The `Phi` node gets folded into a `ConI`, and we call `replace_node(phi, zero)`, which ends up calling `add_users_to_worklist(phi)`, and `add_users_of_use_to_worklist(phi, zero, ...)`. However the case for this specific notification was missing there, and the `AbsI` node is never notified (not added to the worklist). > > This PR brings the following changes: > - Detect the optimization pattern in `add_users_of_use_to_worklist` for `AbsI`, `AbsL`, `AbsF` and `AbsD` > - Add new test `TestMissingOptAbsZeroMinusX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. In addition to `AbsI`, I have also added test cases for `AbsF` and `AbsD`, but was not able to reproduce for `AbsL` despite my best efforts. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) > - [ ] tier1-4, plus some internal testing > > Thank you for reviewing! Looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28237#pullrequestreview-3451836082 From epeter at openjdk.org Wed Nov 12 06:45:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 06:45:06 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v6] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 05:41:37 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Revert smoke test changes Internal tests pass (just sanity testing, did not run it on SVE). Code looks reasonable. @XiaohongGong Thanks for all the updates and bearing with all the review comments ? ------------- PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3451855569 From thartmann at openjdk.org Wed Nov 12 06:49:02 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 12 Nov 2025 06:49:02 GMT Subject: RFR: 8371558: C2: Missing optimization opportunity in AbsNode::Ideal In-Reply-To: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> References: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> Message-ID: On Tue, 11 Nov 2025 14:42:43 GMT, Beno?t Maillard wrote: > This PR addresses another missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. > > The affected optimization is the transformation of `abs(0-x)` into `abs(x)`. This transformation is implemented in `AbsNode::Ideal`. > > The bug was found by the fuzzer. At some point during IGVN, we have the following setup: > > > Phi ... > \ / > SubI > | > AbsI > > > The `Phi` node gets folded into a `ConI`, and we call `replace_node(phi, zero)`, which ends up calling `add_users_to_worklist(phi)`, and `add_users_of_use_to_worklist(phi, zero, ...)`. However the case for this specific notification was missing there, and the `AbsI` node is never notified (not added to the worklist). > > This PR brings the following changes: > - Detect the optimization pattern in `add_users_of_use_to_worklist` for `AbsI`, `AbsL`, `AbsF` and `AbsD` > - Add new test `TestMissingOptAbsZeroMinusX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. In addition to `AbsI`, I have also added test cases for `AbsF` and `AbsD`, but was not able to reproduce for `AbsL` despite my best efforts. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) > - [ ] tier1-4, plus some internal testing > > Thank you for reviewing! Looks good to me otherwise. test/hotspot/jtreg/compiler/c2/TestMissingOptAbsZeroMinusX.java line 30: > 28: * This test ensures that updates to the Sub node?s inputs propagate as > 29: * expected and that the optimization is not missed. > 30: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:-TieredCompilation -Xbatch -Xcomp Suggestion: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:-TieredCompilation -Xcomp `-Xcomp` implies `-Xbatch` ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28237#pullrequestreview-3451866143 PR Review Comment: https://git.openjdk.org/jdk/pull/28237#discussion_r2517098082 From epeter at openjdk.org Wed Nov 12 06:56:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 06:56:08 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v6] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 05:41:37 GMT, Xiaohong Gong wrote: >> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. >> >> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. >> >> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. >> >> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. >> >> It also modifies the Vector API jtreg tests for well testing. Here is the details: >> >> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: >> >> VectorMaskToLong (VectorLongToMask l) => l >> >> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. >> >> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". >> >> Performance shows significant improvement on NVIDIA's Grace CPU. >> >> Here is the performance data with `-XX:UseSVE=2`: >> >> Benchmark bits inputs Mode Unit Before After Gain >> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 >> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 >> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 >> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 >> MaskQueryOperations... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Revert smoke test changes Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3451885644 From chagedorn at openjdk.org Wed Nov 12 06:56:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 12 Nov 2025 06:56:09 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply setting type leads to assert/wrong result [v10] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 09:50:48 GMT, Emanuel Peter wrote: >> In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed some edge cases that the fuzzer now found. >> >> - The first issue: when we (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. That leads to wrong results. >> - The second issue: a phi that was scalar and we turned into vector still had some dead old scalar reduction nodes attached. They would of course eventually die during IGVN. But with `StressIGVN` just picking the right bad order, it could happen that an `AddI` attached to the `phi` would try to figure out its `Value` type, and try to combine the vector type of the `phi` with the other input, leading to a type error. >> >> With only the first issue at first, I tried to improve the way we modify the type from scalar to vector. But with the second issue, it became clear that we should just create a new phi node when we move from scalar to vector phi. Hence, I split the `LoopPhi` into a `PhiScalar` and a `PhiVector`, and give them separate implementations. >> >> --------- >> >> Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. >> I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. >> >> Later, the fuzzer found the second reproducer on mainline, which was much easier to reduce. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add -XX:+UnlockDiagnosticVMOptions flag Update looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28113#pullrequestreview-3451884974 From epeter at openjdk.org Wed Nov 12 06:59:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 06:59:05 GMT Subject: RFR: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply setting type leads to assert/wrong result [v2] In-Reply-To: References: Message-ID: <9RCkvF9P3cxDK0jWcWW-cZvikVEkmW1i450VvEOh_mM=.1d47caab-acef-48f0-821f-2e8d8f9a765d@github.com> On Sat, 8 Nov 2025 15:59:50 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add diagnostic flag for product build > > That may be more preferable. Or we can track the type in `VTransformLoopPhiNode` and change it when we decide to do the transformation, at the same time as other nodes in the loop? I see that `VTransformLoopPhiNode::apply` returns a `make_scalar`, which seems confusing if it can be a vector, too. Or we can have `VTransformScalarLoopPhi` and `VTransformVectorLoopPhi` as separate classes, but it seems like it will result in some unnecessary duplication. > > These are just suggestions, and my expertise in the superword vectorizer is definitely lacking, please make the decision that you think is best. @merykitty @chhagedorn Thanks for the reviews and suggestions! @rwestrel Thanks again for the time spent on the initial reproducer! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28113#issuecomment-3520331722 From epeter at openjdk.org Wed Nov 12 07:13:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 07:13:23 GMT Subject: Integrated: 8371065: C2 SuperWord: VTransformLoopPhiNode::apply setting type leads to assert/wrong result In-Reply-To: References: Message-ID: <9aG3DLmxydK5yXzavHHqwY2hM7twETKTMM83dy9UUGw=.2a915ba2-d6a6-4be2-8ed4-9428444eab37@github.com> On Mon, 3 Nov 2025 15:20:37 GMT, Emanuel Peter wrote: > In `VTransformLoopPhiNode::apply`, we may have to modify the type of the phi node, because it may have been turned from a scalar phi to a vector phi by `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`. This logic was refactored in https://github.com/openjdk/jdk/pull/27704, and I missed some edge cases that the fuzzer now found. > > - The first issue: when we (uslessly) set the type of phis that stay scalar: the `in1` type can be a constant, and then we set the `phi` type to be constant. And then the phi wrongly constant folds. That leads to wrong results. > - The second issue: a phi that was scalar and we turned into vector still had some dead old scalar reduction nodes attached. They would of course eventually die during IGVN. But with `StressIGVN` just picking the right bad order, it could happen that an `AddI` attached to the `phi` would try to figure out its `Value` type, and try to combine the vector type of the `phi` with the other input, leading to a type error. > > With only the first issue at first, I tried to improve the way we modify the type from scalar to vector. But with the second issue, it became clear that we should just create a new phi node when we move from scalar to vector phi. Hence, I split the `LoopPhi` into a `PhiScalar` and a `PhiVector`, and give them separate implementations. > > --------- > > Thanks @rwestrel for filing this issue and spending a lot of time reproducing it without his changes. > I tried to find a simpler reproducer, but it was difficult: We need a constant on the lhs of the phi in the main-loop. But this requires us to constant-fold the pre-loop phi, and somehow magically not constant fold the phi of the main-loop. That is quite tricky, and I gave up. > > Later, the fuzzer found the second reproducer on mainline, which was much easier to reduce. This pull request has now been integrated. Changeset: 6df78c45 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/6df78c4585fc5a71ceafa6f4b1dc0fe68db2657c Stats: 290 lines in 4 files changed: 257 ins; 7 del; 26 mod 8371065: C2 SuperWord: VTransformLoopPhiNode::apply setting type leads to assert/wrong result Co-authored-by: Roland Westrelin Reviewed-by: qamai, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28113 From xgong at openjdk.org Wed Nov 12 07:41:12 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 12 Nov 2025 07:41:12 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v6] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 06:42:15 GMT, Emanuel Peter wrote: > Internal tests pass (just sanity testing, did not run it on SVE). Code looks reasonable. > > @XiaohongGong Thanks for all the updates and bearing with all the review comments ? Thanks for all your comments and testing. I also tested it with kinds of SVE environments locally. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3520467888 From chagedorn at openjdk.org Wed Nov 12 07:43:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 12 Nov 2025 07:43:05 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v2] In-Reply-To: <9e1r4VDSzP6VL3GMf8JQSDUcvwzjzy5XGKOFURXpGhk=.ce419221-79b3-44f6-b944-093b2d244f10@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> <5qtZxVebyVn6WML3Q4508dXPwxkw-CWhD_pE6UaNfF8=.76830409-b57d-410f-a30b-c7d01b62df7f@github.com> <9e1r4VDSzP6VL3GMf8JQSDUcvwzjzy5XGKOFURXpGhk=.ce419221-79b3-44f6-b944-093b2d244f10@github.com> Message-ID: On Mon, 27 Oct 2025 22:07:21 GMT, Saranya Natarajan wrote: >> src/hotspot/share/opto/idealGraphPrinter.hpp line 172: >> >>> 170: }; >>> 171: >>> 172: class PrintProperties >> >> Do you really need it in the header file? You could also just move it the the source file directly where we use the class. > > My reasoning is keep the interface and implementation separate. I have kept it this way. Will that be okay ? I'm not sure I understand the benefit of having it separately when the only user is in the source file and it's tightly coupled to the implementation of the `IdealGraphPrinter` class. This will expose it to other files while it's not needed. Or is it just for readability? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2517247669 From bmaillard at openjdk.org Wed Nov 12 07:49:42 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 12 Nov 2025 07:49:42 GMT Subject: RFR: 8371558: C2: Missing optimization opportunity in AbsNode::Ideal [v2] In-Reply-To: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> References: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> Message-ID: > This PR addresses another missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. > > The affected optimization is the transformation of `abs(0-x)` into `abs(x)`. This transformation is implemented in `AbsNode::Ideal`. > > The bug was found by the fuzzer. At some point during IGVN, we have the following setup: > > > Phi ... > \ / > SubI > | > AbsI > > > The `Phi` node gets folded into a `ConI`, and we call `replace_node(phi, zero)`, which ends up calling `add_users_to_worklist(phi)`, and `add_users_of_use_to_worklist(phi, zero, ...)`. However the case for this specific notification was missing there, and the `AbsI` node is never notified (not added to the worklist). > > This PR brings the following changes: > - Detect the optimization pattern in `add_users_of_use_to_worklist` for `AbsI`, `AbsL`, `AbsF` and `AbsD` > - Add new test `TestMissingOptAbsZeroMinusX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. In addition to `AbsI`, I have also added test cases for `AbsF` and `AbsD`, but was not able to reproduce for `AbsL` despite my best efforts. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) > - [ ] tier1-4, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/c2/TestMissingOptAbsZeroMinusX.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28237/files - new: https://git.openjdk.org/jdk/pull/28237/files/9df620da..83077987 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28237&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28237&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28237.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28237/head:pull/28237 PR: https://git.openjdk.org/jdk/pull/28237 From bmaillard at openjdk.org Wed Nov 12 07:49:44 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 12 Nov 2025 07:49:44 GMT Subject: RFR: 8371558: C2: Missing optimization opportunity in AbsNode::Ideal [v2] In-Reply-To: References: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> Message-ID: <6asLfXY0gXNY4q9JJ1mpBbUSj2qcf0YqpMQiGrjqbsQ=.e7909125-e890-4543-839d-788c96e3fc0d@github.com> On Wed, 12 Nov 2025 06:45:45 GMT, Tobias Hartmann wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/c2/TestMissingOptAbsZeroMinusX.java >> >> Co-authored-by: Tobias Hartmann > > test/hotspot/jtreg/compiler/c2/TestMissingOptAbsZeroMinusX.java line 30: > >> 28: * This test ensures that updates to the Sub node?s inputs propagate as >> 29: * expected and that the optimization is not missed. >> 30: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:-TieredCompilation -Xbatch -Xcomp > > Suggestion: > > * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:-TieredCompilation -Xcomp > > > `-Xcomp` implies `-Xbatch` I always forget about this one, thanks for pointing it out. Updated my notes as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28237#discussion_r2517260055 From aseoane at openjdk.org Wed Nov 12 07:59:04 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 12 Nov 2025 07:59:04 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 05:12:34 GMT, Zihao Lin wrote: >> Remove redundant check in 'trust_final_non_static_fields' ciField.cpp >> >> Remove: >> 1. java_lang_System check >> 2. is_box_klass check >> 3. java_lang_String check > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > remove I just saw this (I had assigned myself [JDK-8368961](https://bugs.openjdk.org/browse/JDK-8368961), but didn't start already). I appears that there were two "similar" issues... I'll unassign myself and run some testing on this final version of the PR, although it looks trivial and correct. I'll come back with results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28191#issuecomment-3520526595 From jbhateja at openjdk.org Wed Nov 12 08:03:04 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Nov 2025 08:03:04 GMT Subject: RFR: 8370691: Add new HalffloatVector type and enable intrinsification of float16 vector operations In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 00:59:25 GMT, Joe Darcy wrote: > > Some quick comments. > > We should be consistent in the naming, and rename `Halfloat*` to `Float16*`. > > I concur, especially since there are multiple 16-bit floating-point formats in use including the IEEE 754 float16 as well as bfloat16. There are nomenclature issues that I am facing. Currently, all the Float16 concrete classes use the Halffloat prefix i.e., Halffloat64Vector, Halffloat128Vector; converting these to Float16 looks a little confusing, i.e., Float1664Vector, Float16128Vector, etc Kindly suggest a better name to represent these classes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3520534564 From jbhateja at openjdk.org Wed Nov 12 08:03:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Nov 2025 08:03:02 GMT Subject: RFR: 8370691: Add new HalffloatVector type and enable intrinsification of float16 vector operations In-Reply-To: References: <_ryF0SNpSLahH4HkGqSnGKc_6d9P1fWrKYTS0jRPvtk=.ff2143aa-d3a5-4776-bdd0-95646dfd35e9@github.com> Message-ID: On Tue, 11 Nov 2025 16:28:54 GMT, Paul Sandoz wrote: >> We already have a lot of things in the codebase now from previous issues that use `HF` everywhere, for example some node names, and the type. Should we maybe rename all of them to `F16`, or something else? Open question, not sure of the answer yet. > >> We already have a lot of things in the codebase now from previous issues that use `HF` everywhere, for example some node names, and the type. Should we maybe rename all of them to `F16`, or something else? Open question, not sure of the answer yet. > > I was only referring to the Java code, esp. the new public classes so they align with the `Float16` element type. I do think it worthwhile to align so we are consistent across the platform. Revisiting the names in HotSpot, and their internal connection in Java, could be done in a separate PR? Hi @PaulSandoz , Thanks for your comments. Please find below my responses. > When you generate the fallback code for unary/binary etc can you push the carrier type and conversations into the uOp/bOp implementations so you don't have to explicitly operate on the carrier type and do the conversions as you do now e.g.,: > > ``` > v0.uOp(m, (i, a) -> float16ToShortBits(Float16.valueOf(-(shortBitsToFloat16(($type$)a).floatValue())))); > ``` Currently, uOp and uOpTemplates are part of the scaffolding logic and are sacrosanct; they are shared by various abstracted vector classes, and their semantics are defined by the lambda expression. I agree that explicit conversion in lambdas looks verbose, but moving them to uOpTemplate may fracture the lambda expression such that part of its semantics, i.e,. conversions, will seep into uOpTemplate, while what will appear at the surface will be the expression operating over primitive float values; this may become very confusing. > > The transition of intrinsic arguments from `vsp.elementType()` to `vsp.carrierType(), vsp.operType()` is a little unfortunate. Is this because HotSpot cannot directly refer to the `Float16` class from the incubating module? Yes, the idea here was to clearly differentiate b/w elemType and carrierType and avoid passing Float16.class as an argument to intrinsic entry points. Unlike the VectorSupport class, Float16 is part of the incubating module and cannot be directly exposed to VM, i.e., we cannot create a vmSymbol for it during initialization. This would have made all the lane type checks in-line expand name-based rather than efficient symbol lookup. > Requiring two arguments means they can get out of sync. Previously the class provided all the information needed, now > arguably the type does. Yes, from the compiler standpoint point all we care about is the carrier type, which determines the vector lane size. This is augmented with operation kind (PRIM / FP16) to differentiate a short vector lane from a float16 vector lane. Apart from this, we need to pass the VectorBox type to wrap the vector IR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3520530639 From epeter at openjdk.org Wed Nov 12 08:33:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 08:33:29 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 09:08:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: > > - review > - infinite loop in gvn fix > - renaming @rwestrel Sorry I dropped the review on this one for a long time :/ I left quite a few comments. But on the whole I'm really happy with the direction you are taking. It's getting much clearer. I would still see some more clear explanations/comments. That way, we can make our previously implicit assumptions even more explicit :) src/hotspot/share/opto/castnode.cpp line 47: > 45: Node* ConstraintCastNode::Identity(PhaseGVN* phase) { > 46: if (!_dependency.narrows_type()) { > 47: return this; Can you please add a code comment? I don't understand it right away :/ src/hotspot/share/opto/castnode.cpp line 153: > 151: if (!_dependency.narrows_type()) { > 152: return nullptr; > 153: } Interesting, we already check that at at least some of the use sites. If it turns out we already do it at all use sites, why not just assert? (maybe not possible or desirable, just an idea) A comment here would also be great. src/hotspot/share/opto/castnode.cpp line 277: > 275: > 276: CastIINode* CastIINode::pin_array_access_node() const { > 277: assert(depends_only_on_test(), "already pinned"); Would this not be more readable? Suggestion: assert(is_dependency_floating(), "already pinned"); src/hotspot/share/opto/castnode.cpp line 588: > 586: > 587: // If both inputs are not constant then, with the Cast pushed through the Add/Sub, the cast gets less precised types, > 588: // and the resulting Add/Sub's type is wider than that of the Cast before pushing. I find this long sentence a bit complicated to read. Can you reformulate and maybe break it into smaller sentences? It would also be good to explicitly say why that may require changing the dependency constraint. src/hotspot/share/opto/castnode.cpp line 615: > 613: // Widening the type of the Cast (to allow some commoning) causes the Cast to change how it can be optimized (if > 614: // type of its input is narrower than the Cast's type, we can't remove it to not loose the dependency). > 615: return make_with(in(1), wide_t, _dependency.widen_type_dependency()); Suggestion: return make_with(in(1), wide_t, _dependency.with_non_narrowing()); This may be clearer here, since non-narrowing prevents folding the cast away if the input is narrower. I like the code comment you already have though :) src/hotspot/share/opto/castnode.cpp line 625: > 623: if (!phase->C->post_loop_opts_phase()) { > 624: return this_type; > 625: } Honestly, I would prefer to see this "delay to post loop opts" to be done outside of `widen_type`. It would just make more sense there. What do you think? src/hotspot/share/opto/castnode.hpp line 46: > 44: // 1- and 2- are not always applied depending on what constraint are applied to the Cast: there are cases where 1- > 45: // and 2- apply, where neither 1- nor 2- apply and where one or the other apply. This class abstract away these > 46: // details. Can you spell it out a little more? Right now it feels a little bit like an "exercise for the reader". For each optimization, what is required of the constraints? I think that would help the reader. Equally: you could name why those constraints are required in the first place. Or is there some other place we could link to that already has those explanations? src/hotspot/share/opto/castnode.hpp line 53: > 51: _narrows_type(narrows_type), > 52: _desc(desc) { > 53: } Could you make the constructor private, and only expose the 4 static fields? That way, nobody comes to the strange idea to construct one of these themselves ;) src/hotspot/share/opto/castnode.hpp line 62: > 60: bool narrows_type() const { > 61: return _narrows_type; > 62: } Nits about naming: I would prefer `is_` for boolean queries. Otherwise, if I look at the names `floating` and `pinned_dependency`, I don't immediately know which one converts to a floating/non-floating, and which one is a boolean query. Maybe `pinned_dependency` should be renamed to `with_pinned_dependency`. src/hotspot/share/opto/castnode.hpp line 65: > 63: void dump_on(outputStream *st) const { > 64: st->print("%s", _desc); > 65: } Suggestion: bool narrows_type() const { return _narrows_type; } void dump_on(outputStream *st) const { st->print("%s", _desc); } Newline for consistency with surrounding code. src/hotspot/share/opto/castnode.hpp line 92: > 90: const bool _floating; // Does this Cast depends on its control input or is it pinned? > 91: const bool _narrows_type; // Does this Cast narrows the type i.e. if input type is narrower can it be removed? > 92: const char* _desc; I thought the hotspot convention was to usually put the fields first, at the top of the class? src/hotspot/share/opto/castnode.hpp line 104: > 102: // NonFloatingNarrowingDependency is used when an array access is no longer dependent on a single range check (range > 103: // check smearing for instance) > 104: // FloatingNonNarrowingDependency is used after loop opts when Cast nodes' types are widen so Casts that only differ Suggestion: // FloatingNonNarrowingDependency is used after loop opts when Cast nodes' types are widened so Casts that only differ src/hotspot/share/opto/castnode.hpp line 110: > 108: static const DependencyType FloatingNonNarrowingDependency; > 109: static const DependencyType NonFloatingNarrowingDependency; > 110: static const DependencyType NonFloatingNonNarrowingDependency; Why not put the example at each definition? Would prevent repeating the names :) It would be good if we could have this section earlier up, so the code comments of the `DependencyType` class and this form a unit. At least link them. `NonFloatingNonNarrowingDependency` example: can you spell out the why? What could go wrong otherwise? Would the node float back into the loop maybe? What's wrong with that? `NonFloatingNarrowingDependency` more detail would be helpful. I would like to know why non floating, and why narrowing? Because that's what these examples are for, right? `FloatingNonNarrowingDependency` ah, maybe that answers one of my questions further up somewhere. If we don't have narrowing, then we should not fold away the cast because of the type, right? I think if we spell out which optimizations require which constraints, that could help a lot here. src/hotspot/share/opto/castnode.hpp line 122: > 120: ShouldNotReachHere(); > 121: return nullptr; > 122: } This always smells like a messed up class hierarchy, when I see default methods with "not implemented". But maybe we can't do much better, and I've done similar things recently ? . A short code comment could be helpful though. Suggestion: virtual ConstraintCastNode* make_with(Node* parent, const TypeInteger* type, const DependencyType& dependency) const { ShouldNotReachHere(); // Only implemented for CastII and CastLL return nullptr; } src/hotspot/share/opto/castnode.hpp line 146: > 144: virtual uint ideal_reg() const = 0; > 145: bool carry_dependency() const { return !_dependency.cmp(FloatingNarrowingDependency); } > 146: virtual bool depends_only_on_test() const { return _dependency.floating(); } Why not rename it to `is_dependency_floating`? That may be more helpful at the use site. test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java line 95: > 93: j += Objects.checkIndex(i - 1, length); > 94: return j; > 95: } Why not add an additional IR rule that checks that there are more casts before they get commoned? Just for completenes ;) ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3451986831 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517197209 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517271796 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517301300 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517315011 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517336133 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517344615 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517236142 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517203781 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517366170 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517205971 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517200829 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517251068 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517260839 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517355725 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517299467 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517370224 From epeter at openjdk.org Wed Nov 12 08:33:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 08:33:29 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: References: Message-ID: <2RJF9zYoCEnq2riltw2AoWpBYa7T2F7eXEQRTIQJT_w=.f9001c12-2fe9-4432-9aba-d4f0eb59e5dd@github.com> On Wed, 12 Nov 2025 07:24:01 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: >> >> - review >> - infinite loop in gvn fix >> - renaming > > src/hotspot/share/opto/castnode.cpp line 47: > >> 45: Node* ConstraintCastNode::Identity(PhaseGVN* phase) { >> 46: if (!_dependency.narrows_type()) { >> 47: return this; > > Can you please add a code comment? I don't understand it right away :/ Maybe I'm slowly starting to understand... but a code comment would still help a lot here. We are trying to find a dominating cast that has the same or narrower type, and replace with that one. We are only allowed to do that if we have a narrowing cast, because ... > src/hotspot/share/opto/castnode.cpp line 277: > >> 275: >> 276: CastIINode* CastIINode::pin_array_access_node() const { >> 277: assert(depends_only_on_test(), "already pinned"); > > Would this not be more readable? > > Suggestion: > > assert(is_dependency_floating(), "already pinned"); Because it seems we are talking about floating vs pinned here. Adding yet another concept of "depending only on test" would require further explanation / definition. > src/hotspot/share/opto/castnode.cpp line 588: > >> 586: >> 587: // If both inputs are not constant then, with the Cast pushed through the Add/Sub, the cast gets less precised types, >> 588: // and the resulting Add/Sub's type is wider than that of the Cast before pushing. > > I find this long sentence a bit complicated to read. Can you reformulate and maybe break it into smaller sentences? > It would also be good to explicitly say why that may require changing the dependency constraint. I wonder if you renamed `widen_type_dependency` to `with_non_narrowing`, and explained that this now prevents folding away the cast if input types are narrower, etc... that would maybe be more straight forward? I suppose your approach was to just "notify" the dependency that we have widened the type, and then the dependency manages what the implications are. But I find that approach a bit less straight forward, because we are not talking about widening the exact same cast, but a cast that has been pushed through an add/sub. Maybe you can manage to make a coherent argument though, up to you. > src/hotspot/share/opto/castnode.cpp line 625: > >> 623: if (!phase->C->post_loop_opts_phase()) { >> 624: return this_type; >> 625: } > > Honestly, I would prefer to see this "delay to post loop opts" to be done outside of `widen_type`. It would just make more sense there. What do you think? But maybe that is a refactoring for a separate RFE, and then not really worth it. > src/hotspot/share/opto/castnode.hpp line 53: > >> 51: _narrows_type(narrows_type), >> 52: _desc(desc) { >> 53: } > > Could you make the constructor private, and only expose the 4 static fields? That way, nobody comes to the strange idea to construct one of these themselves ;) That would probably require moving the 4 static fields into this class here. Example: `ConstraintCastNode::DependencyType::FloatingNarrowing` Just an idea. Maybe you have a different solution. But a private constructor would be great for sure. > src/hotspot/share/opto/castnode.hpp line 146: > >> 144: virtual uint ideal_reg() const = 0; >> 145: bool carry_dependency() const { return !_dependency.cmp(FloatingNarrowingDependency); } >> 146: virtual bool depends_only_on_test() const { return _dependency.floating(); } > > Why not rename it to `is_dependency_floating`? That may be more helpful at the use site. Otherwise you have to give an explanation/code comment about the concept "depending on test", and define it in terms of floating / non-floating. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517268181 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517304372 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517331973 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517345703 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517217941 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517358981 From epeter at openjdk.org Wed Nov 12 08:33:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 08:33:31 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: <2RJF9zYoCEnq2riltw2AoWpBYa7T2F7eXEQRTIQJT_w=.f9001c12-2fe9-4432-9aba-d4f0eb59e5dd@github.com> References: <2RJF9zYoCEnq2riltw2AoWpBYa7T2F7eXEQRTIQJT_w=.f9001c12-2fe9-4432-9aba-d4f0eb59e5dd@github.com> Message-ID: On Wed, 12 Nov 2025 08:19:21 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/castnode.cpp line 625: >> >>> 623: if (!phase->C->post_loop_opts_phase()) { >>> 624: return this_type; >>> 625: } >> >> Honestly, I would prefer to see this "delay to post loop opts" to be done outside of `widen_type`. It would just make more sense there. What do you think? > > But maybe that is a refactoring for a separate RFE, and then not really worth it. But conceptually, we want to say: if we are in post loop opts, then widen the types. Now it looks like we want to widen always ... but then we check for post loop opts inside the method and bail out anyway. Not very transparent. Another idea: rename the method to `widen_type_in_post_loop_opts`. Totally up to you though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517350982 From duke at openjdk.org Wed Nov 12 08:54:23 2025 From: duke at openjdk.org (Chiranmoy Bhattacharya) Date: Wed, 12 Nov 2025 08:54:23 GMT Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v6] In-Reply-To: References: Message-ID: <5AcbdiDnDvOw9fnzQ0Ywmd5yEQ3q-1sjEIt590FPL2c=.152c45d1-798c-4411-84a5-1d6765484a05@github.com> On Wed, 12 Nov 2025 07:38:17 GMT, Xiaohong Gong wrote: > Internal tests pass (just sanity testing, did not run it on SVE). Code looks reasonable. > > @XiaohongGong Thanks for all the updates and bearing with all the review comments ? Tested the patch on AWS Graviton4 with the benchmarks provided, and the results match the reported numbers. With `VM options: -XX:UseSVE=2 --add-modules=jdk.incubator.vector` Benchmark bits inputs Mode Unit Before After Gain MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/s 269101754.957 1154781149.715 4.29 MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/s 269106841.271 1020391639.317 3.79 MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/s 269108088.073 1178242624.232 4.37 MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/s 833720082.241 1183112162.420 1.41 MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/s 851866517.512 905381882.385 1.06 MaskQueryOperationsBenchmark.testToLongInt 128 3 thrpt ops/s 841908544.850 1010800908.258 1.20 MaskQueryOperationsBenchmark.testToLongLong 128 1 thrpt ops/s 752714074.556 1116755995.074 1.48 MaskQueryOperationsBenchmark.testToLongLong 128 2 thrpt ops/s 733777062.242 1117923992.880 1.52 MaskQueryOperationsBenchmark.testToLongLong 128 3 thrpt ops/s 755390508.217 1125159886.042 1.48 MaskQueryOperationsBenchmark.testToLongShort 128 1 thrpt ops/s 915079922.329 1183247213.309 1.29 MaskQueryOperationsBenchmark.testToLongShort 128 2 thrpt ops/s 898902990.501 1157778493.700 1.28 MaskQueryOperationsBenchmark.testToLongShort 128 3 thrpt ops/s 913979902.412 1183483647.121 1.29 With `VM options: -XX:UseSVE=1 --add-modules=jdk.incubator.vector` Benchmark bits inputs Mode Unit Before After Gain MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/s 578862813.032 674722742.273 1.16 MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/s 577292103.016 671339970.996 1.16 MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/s 576827529.288 673882123.264 1.16 MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/s 792212973.997 957781054.650 1.20 MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/s 790683237.790 965247861.666 1.22 MaskQueryOperationsBenchmark.testToLongInt 128 3 thrpt ops/s 794710366.832 981858552.787 1.23 MaskQueryOperationsBenchmark.testToLongLong 128 1 thrpt ops/s 738425667.560 994493069.759 1.34 MaskQueryOperationsBenchmark.testToLongLong 128 2 thrpt ops/s 736805923.837 979981983.578 1.33 MaskQueryOperationsBenchmark.testToLongLong 128 3 thrpt ops/s 740591712.584 972150308.391 1.31 MaskQueryOperationsBenchmark.testToLongShort 128 1 thrpt ops/s 784464050.733 994221594.464 1.26 MaskQueryOperationsBenchmark.testToLongShort 128 2 thrpt ops/s 789528903.130 994094688.740 1.25 MaskQueryOperationsBenchmark.testToLongShort 128 3 thrpt ops/s 779944943.316 979813192.314 1.25 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3520532925 From aseoane at openjdk.org Wed Nov 12 08:57:13 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 12 Nov 2025 08:57:13 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function Message-ID: This PR adds a "shorthand" for the common `loop->is_member(get_loop(get_ctrl(node)))` pattern in loop optimizations. In PhaseIdealLoop, there is already an `is_member` function that checks if a node is a (nested) member of an IdealLoopTree. In a similar fashion, this changeset adds a `ctrl_is_member` that aims to simplify the common pattern of: Node* node_ctrl = get_ctrl(node); if (loop->is_member(get_loop(node))) { ... } This hopes to provide a bit more readability and code conciseness in such a common operation. **Testing:** passes tiers 1-3 ------------- Commit messages: - Small fix - Intial commit Changes: https://git.openjdk.org/jdk/pull/28259/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28259&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369002 Stats: 39 lines in 6 files changed: 7 ins; 7 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/28259.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28259/head:pull/28259 PR: https://git.openjdk.org/jdk/pull/28259 From rcastanedalo at openjdk.org Wed Nov 12 09:18:06 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Nov 2025 09:18:06 GMT Subject: RFR: 8341039: compiler/cha/TypeProfileFinalMethod.java fails with assertEquals expected: 0 but was: 2 In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 16:25:59 GMT, Daniel Lund?n wrote: > The test `compiler/cha/TypeProfileFinalMethod.java` exercises a specific compilation pattern and easily breaks by setting various VM flags (e.g., `-Xcomp`). > > ### Changeset > > - Make the test flagless. > - Ensure the test only compiles the intended methods. > - Fix problems with compiler directives used in the test (incorrect signatures and some directives getting unintentionally shadowed by other directives). > - Force C2 inlining of a method which the test author likely intended to always be inlined (based on source code comments in the test). > - Switch argument order in `assertEquals` to make error message correct. > > Note for reviewers: A more fundamental rewrite of the test is beyond the scope of this changeset. The objective here is simply to ensure the test runs only in contexts intended by the test author. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18972906513) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Stress testing of the specific test on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Thanks for resurrecting this test, Daniel! The changes look good, I just have a minor suggestion (readding a useful comment that was removed in the changeset). test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java line 104: > 102: // Make sure to inline test1 in C2 compilation > 103: "c2: { inline:[\"+" + Launcher.class.getName()+"::test1\"]," + > 104: " PrintInlining:true }" + Suggestion: // Print the inline tree for checking " PrintInlining:true }" + ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28200#pullrequestreview-3452390259 PR Review Comment: https://git.openjdk.org/jdk/pull/28200#discussion_r2517517847 From rcastanedalo at openjdk.org Wed Nov 12 09:27:03 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Nov 2025 09:27:03 GMT Subject: RFR: 8371558: C2: Missing optimization opportunity in AbsNode::Ideal [v2] In-Reply-To: References: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> Message-ID: On Wed, 12 Nov 2025 07:49:42 GMT, Beno?t Maillard wrote: >> This PR addresses another missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. >> >> The affected optimization is the transformation of `abs(0-x)` into `abs(x)`. This transformation is implemented in `AbsNode::Ideal`. >> >> The bug was found by the fuzzer. At some point during IGVN, we have the following setup: >> >> >> Phi ... >> \ / >> SubI >> | >> AbsI >> >> >> The `Phi` node gets folded into a `ConI`, and we call `replace_node(phi, zero)`, which ends up calling `add_users_to_worklist(phi)`, and `add_users_of_use_to_worklist(phi, zero, ...)`. However the case for this specific notification was missing there, and the `AbsI` node is never notified (not added to the worklist). >> >> This PR brings the following changes: >> - Detect the optimization pattern in `add_users_of_use_to_worklist` for `AbsI`, `AbsL`, `AbsF` and `AbsD` >> - Add new test `TestMissingOptAbsZeroMinusX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. In addition to `AbsI`, I have also added test cases for `AbsF` and `AbsD`, but was not able to reproduce for `AbsL` despite my best efforts. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) >> - [ ] tier1-4, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestMissingOptAbsZeroMinusX.java > > Co-authored-by: Tobias Hartmann Looks good! At some point, we will have to think about some kind of system to enforce notification of indirect users "by construction", or at least make it possible to somehow detect when such notification may be needed. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28237#pullrequestreview-3452442885 From dlunden at openjdk.org Wed Nov 12 09:33:36 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 12 Nov 2025 09:33:36 GMT Subject: RFR: 8341039: compiler/cha/TypeProfileFinalMethod.java fails with assertEquals expected: 0 but was: 2 [v2] In-Reply-To: References: Message-ID: > The test `compiler/cha/TypeProfileFinalMethod.java` exercises a specific compilation pattern and easily breaks by setting various VM flags (e.g., `-Xcomp`). > > ### Changeset > > - Make the test flagless. > - Ensure the test only compiles the intended methods. > - Fix problems with compiler directives used in the test (incorrect signatures and some directives getting unintentionally shadowed by other directives). > - Force C2 inlining of a method which the test author likely intended to always be inlined (based on source code comments in the test). > - Switch argument order in `assertEquals` to make error message correct. > > Note for reviewers: A more fundamental rewrite of the test is beyond the scope of this changeset. The objective here is simply to ensure the test runs only in contexts intended by the test author. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18972906513) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Stress testing of the specific test on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28200/files - new: https://git.openjdk.org/jdk/pull/28200/files/da0d3140..f92667fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28200&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28200&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28200/head:pull/28200 PR: https://git.openjdk.org/jdk/pull/28200 From dlunden at openjdk.org Wed Nov 12 09:33:38 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 12 Nov 2025 09:33:38 GMT Subject: RFR: 8341039: compiler/cha/TypeProfileFinalMethod.java fails with assertEquals expected: 0 but was: 2 [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 09:14:21 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java >> >> Co-authored-by: Roberto Casta?eda Lozano > > test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java line 104: > >> 102: // Make sure to inline test1 in C2 compilation >> 103: "c2: { inline:[\"+" + Launcher.class.getName()+"::test1\"]," + >> 104: " PrintInlining:true }" + > > Suggestion: > > // Print the inline tree for checking > " PrintInlining:true }" + Thanks! That one got lost in translation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28200#discussion_r2517569502 From bmaillard at openjdk.org Wed Nov 12 09:40:05 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 12 Nov 2025 09:40:05 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 08:49:49 GMT, Anton Seoane Ampudia wrote: > This PR adds a "shorthand" for the common `loop->is_member(get_loop(get_ctrl(node)))` pattern in loop optimizations. > > In PhaseIdealLoop, there is already an `is_member` function that checks if a node is a (nested) member of an IdealLoopTree. In a similar fashion, this changeset adds a `ctrl_is_member` that aims to simplify the common pattern of: > > Node* node_ctrl = get_ctrl(node); > if (loop->is_member(get_loop(node))) { ... } > > > This hopes to provide a bit more readability and code conciseness in such a common operation. > > **Testing:** passes tiers 1-3 Looks good to me, thanks for making the change @anton-seoane! I would just change the return type, see my comments. src/hotspot/share/opto/loopnode.hpp line 1389: > 1387: > 1388: // Is 'n' a (nested) member of 'loop'? > 1389: int is_member( const IdealLoopTree *loop, Node *n ) const { Let's change this one as well while we're at it Suggestion: bool is_member( const IdealLoopTree *loop, Node *n ) const { src/hotspot/share/opto/loopnode.hpp line 1394: > 1392: > 1393: // is the control for 'n' a (nested) member of 'loop'? > 1394: int ctrl_is_member(const IdealLoopTree *loop, Node *n) { We should take advantage of the opportunity to make the return type consistent with the other variation (`bool is_member(const IdealLoopTree *l)`) Suggestion: bool ctrl_is_member(const IdealLoopTree *loop, Node *n) { ------------- PR Review: https://git.openjdk.org/jdk/pull/28259#pullrequestreview-3452473070 PR Review Comment: https://git.openjdk.org/jdk/pull/28259#discussion_r2517579412 PR Review Comment: https://git.openjdk.org/jdk/pull/28259#discussion_r2517578812 From rcastanedalo at openjdk.org Wed Nov 12 09:42:05 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Nov 2025 09:42:05 GMT Subject: RFR: 8341039: compiler/cha/TypeProfileFinalMethod.java fails with assertEquals expected: 0 but was: 2 [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 09:33:36 GMT, Daniel Lund?n wrote: >> The test `compiler/cha/TypeProfileFinalMethod.java` exercises a specific compilation pattern and easily breaks by setting various VM flags (e.g., `-Xcomp`). >> >> ### Changeset >> >> - Make the test flagless. >> - Ensure the test only compiles the intended methods. >> - Fix problems with compiler directives used in the test (incorrect signatures and some directives getting unintentionally shadowed by other directives). >> - Force C2 inlining of a method which the test author likely intended to always be inlined (based on source code comments in the test). >> - Switch argument order in `assertEquals` to make error message correct. >> >> Note for reviewers: A more fundamental rewrite of the test is beyond the scope of this changeset. The objective here is simply to ensure the test runs only in contexts intended by the test author. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18972906513) >> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Stress testing of the specific test on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28200#pullrequestreview-3452504583 From qamai at openjdk.org Wed Nov 12 10:37:08 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Nov 2025 10:37:08 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v6] In-Reply-To: References: Message-ID: On Fri, 3 Oct 2025 16:05:52 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix test options This PR still needs another review, please. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25284#issuecomment-3521238553 From qamai at openjdk.org Wed Nov 12 10:37:11 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Nov 2025 10:37:11 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v5] In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 10:09:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into andorxor > - Add assertion for the helper in CTPComparator > > Co-authored-by: Emanuel Peter > - remove std::hash > - remove unordered_map, add some comments for all_instances_size > - Emanuel's reviews > - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences May I have a second review, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3521239764 From duke at openjdk.org Wed Nov 12 11:15:03 2025 From: duke at openjdk.org (Samuel Chee) Date: Wed, 12 Nov 2025 11:15:03 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: > AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: > > ;; cmpxchg { > 0x0000e708d144cf60: mov x8, x2 > 0x0000e708d144cf64: casal x8, x3, [x0] > 0x0000e708d144cf68: cmp x8, x2 > ;; 0x1F1F1F1F1F1F1F1F > 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f > ;; } cmpxchg > 0x0000e708d144cf70: cset x8, ne // ne = any > 0x0000e708d144cf74: dmb ish > > > According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] > >> Atomically sets the value of a variable to the >> newValue with the memory semantics of setVolatile if >> the variable's current value, referred to as the witness >> value, == the expectedValue, as accessed with the memory >> semantics of getVolatile. > > > > Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. > > Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) > > This is also reflected by C2 not having a dmb for the same respective method. > > [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) > [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) Samuel Chee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Address review comments. Refine. - Merge from the main branch - Add cmpxchg_barrier helper Change-Id: I17acf999140f0c1decb256de8291361c568a4ff8 - Add comment Signed-off-by: Samuel Chee Change-Id: I9793ed6ffdff6c044552d069af23620d178f2284 - Add back in dmb membar for non-LSE Change-Id: Ie64565420a1758d3191eaebed82c80584ce54ef6 - 8360654: AArch64: Remove redundant dmb from C1 compareAndSet Change-Id: I79a0079fc2d3d90eeb671b6ed73d963968d4fa53 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26000/files - new: https://git.openjdk.org/jdk/pull/26000/files/092c92e9..135123cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26000&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26000&range=03-04 Stats: 471597 lines in 7283 files changed: 319371 ins; 101734 del; 50492 mod Patch: https://git.openjdk.org/jdk/pull/26000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26000/head:pull/26000 PR: https://git.openjdk.org/jdk/pull/26000 From qamai at openjdk.org Wed Nov 12 11:16:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Nov 2025 11:16:51 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v6] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into andorxor - Merge branch 'master' into andorxor - Add assertion for the helper in CTPComparator Co-authored-by: Emanuel Peter - remove std::hash - remove unordered_map, add some comments for all_instances_size - Emanuel's reviews - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences ------------- Changes: https://git.openjdk.org/jdk/pull/27618/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=05 Stats: 964 lines in 9 files changed: 630 ins; 313 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/27618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27618/head:pull/27618 PR: https://git.openjdk.org/jdk/pull/27618 From qamai at openjdk.org Wed Nov 12 11:18:42 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Nov 2025 11:18:42 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v7] In-Reply-To: References: Message-ID: > Hi, > > The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'master' into misplacedcastll - fix test options - Merge branch 'master' into misplacedcastll - fix comment - fix comment - fix - fix issues - misplaced CastLL ------------- Changes: https://git.openjdk.org/jdk/pull/25284/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=06 Stats: 54 lines in 5 files changed: 8 ins; 14 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/25284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25284/head:pull/25284 PR: https://git.openjdk.org/jdk/pull/25284 From duke at openjdk.org Wed Nov 12 11:30:14 2025 From: duke at openjdk.org (Ruben) Date: Wed, 12 Nov 2025 11:30:14 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 12 Nov 2025 11:15:03 GMT, Samuel Chee wrote: >> AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: >> >> ;; cmpxchg { >> 0x0000e708d144cf60: mov x8, x2 >> 0x0000e708d144cf64: casal x8, x3, [x0] >> 0x0000e708d144cf68: cmp x8, x2 >> ;; 0x1F1F1F1F1F1F1F1F >> 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f >> ;; } cmpxchg >> 0x0000e708d144cf70: cset x8, ne // ne = any >> 0x0000e708d144cf74: dmb ish >> >> >> According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] >> >>> Atomically sets the value of a variable to the >>> newValue with the memory semantics of setVolatile if >>> the variable's current value, referred to as the witness >>> value, == the expectedValue, as accessed with the memory >>> semantics of getVolatile. >> >> >> >> Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. >> >> Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) >> >> This is also reflected by C2 not having a dmb for the same respective method. >> >> [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) >> [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) > > Samuel Chee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Address review comments. Refine. > - Merge from the main branch > - Add cmpxchg_barrier helper > > Change-Id: I17acf999140f0c1decb256de8291361c568a4ff8 > - Add comment > > Signed-off-by: Samuel Chee > Change-Id: I9793ed6ffdff6c044552d069af23620d178f2284 > - Add back in dmb membar for non-LSE > > Change-Id: Ie64565420a1758d3191eaebed82c80584ce54ef6 > - 8360654: AArch64: Remove redundant dmb from C1 compareAndSet > > Change-Id: I79a0079fc2d3d90eeb671b6ed73d963968d4fa53 This PR has been updated to include the latest changes (and the same for the https://github.com/openjdk/jdk/pull/26748) I've run `java -jar jcstress.jar` (revision 1d143cbd430f4cca63a8f0c8c1fad3aabc065421) for this with `+UseLSE` and `-UseLSE` with these outcomes respectively: - ``` Failed tests: No matches. Error tests: No matches. All remaining tests: 4945 matching test results. ``` - ``` Failed tests: No matches. Error tests: No matches. All remaining tests: 4955 matching test results. ``` ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3521458416 From rcastanedalo at openjdk.org Wed Nov 12 11:51:43 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Nov 2025 11:51:43 GMT Subject: RFR: 8356761: IGV: dump escape analysis information In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 13:49:44 GMT, Anton Seoane Ampudia wrote: > This PR introduces new IGV dumps, property fields and filters related to escape analysis information. > > The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. > > With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: > > - Node escape ?level?. > - Scalar replaceability. > - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). > > This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. > > Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. > > **Testing:** passes tiers 1-3, manual testing in IGV Thanks for this work, Ant?n! This is going to help immensely understanding, debugging, and improving C2's escape analysis. Here are a few higher-level comments and suggestions: - In lack of a proper hierarchical structure, the IGV filters are roughly sorted by generality. Transformation-specific filters like those proposed by this changeset should be placed further down. I suggest moving them to in between "Hide exception blocks" and "Color live ranges by allocation". - The "Color by escape analysis state" is very useful, but the current version has a couple of issues. 1. The coloring function is not "total" (i.e. does not color all nodes involved in EA). In particular, nodes with `escape=global_escape` and `replaceable=true` (which seems contradictory, I assume this is a transient state) are not colored. I guess they should be colored in red, but please check. Here is an example (see 98 Return): no-color 2. The order in which colors are applied leads to wrong coloring of allocation nodes in the intermediate EA phases. See e.g. 205 Allocate in the following example (which should be green but is colored in red): wrong-coloring I suggest applying colors based on persistent node attributes ("is_non_escaping", "does_not_escape_thread", etc.) first, and then applying colors based on CG-derived attributes ("escape", "replaceable", etc.). - Please name node attributes as closely as possible to the C2 source code names as possible for traceability, e.g. "escape_state" instead of "escape", "scalar_replaceable" instead of "replaceable", etc. - The "Show connection graph nodes" looks promising but I would suggest renaming it to "Show connection graph info" (for consistency with "Show custom node info") and extending it with the `PointsToNode::_pidx` corresponding to each relevant Ideal node. Actually printing the equivalent line of `PointsToNode::dump_header(false)` would be ideal, in my opinion. - It could also be useful to add yet another filter (named "Show connection graph nodes only" or similar) to show only connection graph nodes. This filter would be as simple as `remove(not(hasProperty("ea_node")));`. Here is an example, note how the EA information becomes much denser: cg-only - Finally, please define all the new phases in `test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java`, we try to keep this file in sync with `phasetype.hpp` (manually). src/hotspot/share/opto/escape.cpp line 117: > 115: } > 116: ConnectionGraph* congraph = new(C->comp_arena()) ConnectionGraph(C, igvn, invocation); > 117: NOT_PRODUCT(if (C->igv_printer()) C->igv_printer()->set_congraph(congraph);) Suggestion: NOT_PRODUCT(if (C->igv_printer() != nullptr) C->igv_printer()->set_congraph(congraph);) src/hotspot/share/opto/escape.cpp line 123: > 121: C->set_congraph(congraph); > 122: } > 123: NOT_PRODUCT(if (C->igv_printer()) C->igv_printer()->set_congraph(nullptr);) Suggestion: NOT_PRODUCT(if (C->igv_printer() != nullptr) C->igv_printer()->set_congraph(nullptr);) src/hotspot/share/opto/escape.cpp line 323: > 321: } > 322: } > 323: _compile->print_method(PHASE_EA_ADJUST_SCALAR_REPLACEABLE_ITER, 6); Suggestion: _compile->print_method(PHASE_EA_ADJUST_SCALAR_REPLACEABLE_ITER, 6, n); src/hotspot/share/opto/escape.cpp line 1320: > 1318: } > 1319: > 1320: _compile->print_method(PHASE_EA_BEFORE_PHI_REDUCTION, 5); Suggestion: _compile->print_method(PHASE_EA_BEFORE_PHI_REDUCTION, 5, ophi); src/hotspot/share/opto/escape.cpp line 1327: > 1325: for (uint i = 0; i < castpps.size(); i++) { > 1326: reduce_phi_on_castpp_field_load(castpps.at(i), alloc_worklist); > 1327: _compile->print_method(PHASE_EA_AFTER_PHI_CASTPP_REDUCTION, 6); Suggestion: _compile->print_method(PHASE_EA_AFTER_PHI_CASTPP_REDUCTION, 6, castpps.at(i)); src/hotspot/share/opto/escape.cpp line 1339: > 1337: } > 1338: > 1339: _compile->print_method(PHASE_EA_AFTER_PHI_ADDPP_CMP_REDUCTION, 6); Suggestion: _compile->print_method(PHASE_EA_AFTER_PHI_ADDPP_CMP_REDUCTION, 6, use); Also, please consider defining a separate phase for each type (AddP/Cmp). src/hotspot/share/opto/escape.cpp line 2576: > 2574: } > 2575: if (!verify) { > 2576: _compile->print_method(PHASE_EA_CONNECTION_GRAPH_PROPAGATE_ITER, 6); Suggestion: _compile->print_method(PHASE_EA_CONNECTION_GRAPH_PROPAGATE_ITER, 6, e->ideal_node()); src/hotspot/share/opto/escape.cpp line 3153: > 3151: revisit_reducible_phi_status(jobj, reducible_merges); > 3152: found_nsr_alloc = true; > 3153: _compile->print_method(PHASE_EA_PROPAGATE_NSR_ITER, 5); Why dumping here and not for the `use->is_LocalVar()` case? Anyway, maybe it would be more useful to dump one or two levels up instead, e.g. for each `jobj`. src/hotspot/share/opto/escape.cpp line 4749: > 4747: uint new_index_end = (uint) _compile->num_alias_types(); > 4748: > 4749: _compile->print_method(PHASE_EA_AFTER_SPLIT_UNIQUE_TYPES_1, 5); I guess there is no `PHASE_EA_AFTER_SPLIT_UNIQUE_TYPES_2` because Phase 2 does not change the state of the Ideal graph or the connection graph and hence the changes are not observable within IGV, right? src/hotspot/share/opto/escape.cpp line 5173: > 5171: } > 5172: } > 5173: Spurious change? src/hotspot/share/opto/idealGraphPrinter.cpp line 653: > 651: if (alloc->does_not_escape_thread()) { > 652: print_prop("does_not_escape_thread", "true"); > 653: } Please fix indentation. src/hotspot/share/opto/idealGraphPrinter.cpp line 761: > 759: if (_congraph && node->_idx < _congraph->nodes_size()) { > 760: PointsToNode* ptn = _congraph->ptnode_adr(node->_idx); > 761: if (ptn) { Suggestion: if (ptn != nullptr) { src/hotspot/share/opto/idealGraphPrinter.cpp line 765: > 763: ptn->is_LocalVar() ? "localvar" : > 764: ptn->is_Field() ? "field" : > 765: ""); Consider using `node_type_names` from `escape.cpp` instead. src/hotspot/share/opto/idealGraphPrinter.cpp line 769: > 767: ptn->escape_state() == PointsToNode::EscapeState::ArgEscape ? "arg_escape" : > 768: ptn->escape_state() == PointsToNode::EscapeState::GlobalEscape ? "global_escape" : > 769: ""); Consider using `esc_names` from `escape.cpp` instead. src/hotspot/share/opto/idealGraphPrinter.cpp line 770: > 768: ptn->escape_state() == PointsToNode::EscapeState::GlobalEscape ? "global_escape" : > 769: ""); > 770: print_prop("replaceable", ptn->scalar_replaceable() ? "true" : ""); I suggest to name the properties as closely as possible to C2's source code as possible, for traceability (don't forget to update the corresponding IGV filters): Suggestion: print_prop("scalar_replaceable", ptn->scalar_replaceable() ? "true" : ""); src/utils/IdealGraphVisualizer/ServerCompiler/src/main/resources/com/sun/hotspot/igv/servercompiler/filters/colorEscapeAnalysis.filter line 1: > 1: // Color allocation nodes to indicate the result of scape analysis. Please update the comment, the filter colors other nodes than allocation nodes. src/utils/IdealGraphVisualizer/ServerCompiler/src/main/resources/com/sun/hotspot/igv/servercompiler/filters/showConnectionNodes.filter line 8: > 6: // Merge a possibly existing extra label, bottom type, and phase type into a > 7: // new, single extra label. For memory nodes, add an extra label with the memory > 8: // slice, extracted from the dump_spec field. Please update this leftover comment. src/utils/IdealGraphVisualizer/ServerCompiler/src/main/resources/com/sun/hotspot/igv/servercompiler/filters/showConnectionNodes.filter line 20: > 18: "extra_label", > 19: function(propertyValues) {return mergeAndAppendTypeInfo(propertyValues[0], propertyValues[1]);}); > 20: Unnecessary line. src/utils/IdealGraphVisualizer/ServerCompiler/src/main/resources/com/sun/hotspot/igv/servercompiler/layer.xml line 26: > 24: > 25: > 26: Note that `stringvalue` here should refer to the name of the preceding filter (`"Show types"` in this case). src/utils/IdealGraphVisualizer/ServerCompiler/src/main/resources/com/sun/hotspot/igv/servercompiler/layer.xml line 94: > 92: > 93: > 94: Same as above. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28060#pullrequestreview-3452851763 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517848300 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517849185 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517850276 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517851664 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517853082 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517858612 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517860126 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517867376 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517881453 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517883463 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517884976 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517886131 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517952967 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517954679 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517889340 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517891149 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517892986 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517894537 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517898489 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2517899100 From aseoane at openjdk.org Wed Nov 12 12:27:13 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 12 Nov 2025 12:27:13 GMT Subject: RFR: 8356761: IGV: dump escape analysis information In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 11:01:30 GMT, Roberto Casta?eda Lozano wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > src/hotspot/share/opto/escape.cpp line 3153: > >> 3151: revisit_reducible_phi_status(jobj, reducible_merges); >> 3152: found_nsr_alloc = true; >> 3153: _compile->print_method(PHASE_EA_PROPAGATE_NSR_ITER, 5); > > Why dumping here and not for the `use->is_LocalVar()` case? Anyway, maybe it would be more useful to dump one or two levels up instead, e.g. for each `jobj`. Absolutely. I guess at some point of trying different places I forgot to move it back where it should be > src/hotspot/share/opto/escape.cpp line 4749: > >> 4747: uint new_index_end = (uint) _compile->num_alias_types(); >> 4748: >> 4749: _compile->print_method(PHASE_EA_AFTER_SPLIT_UNIQUE_TYPES_1, 5); > > I guess there is no `PHASE_EA_AFTER_SPLIT_UNIQUE_TYPES_2` because Phase 2 does not change the state of the Ideal graph or the connection graph and hence the changes are not observable within IGV, right? Correct > src/hotspot/share/opto/escape.cpp line 5173: > >> 5171: } >> 5172: } >> 5173: > > Spurious change? Yes, I had a function there and when removing it I got rid of the line apparently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2518098779 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2518099327 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2518106520 From chagedorn at openjdk.org Wed Nov 12 12:37:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 12 Nov 2025 12:37:30 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v26] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 16:21:00 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add missing comma from suggestion application Thanks for the update Emanuel! These look good. I will now have a look at the rest of your code ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3521716320 From fgao at openjdk.org Wed Nov 12 12:48:29 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 12 Nov 2025 12:48:29 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v2] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: <7hkDb2idn7M95bcvxh_8Poj9wbtrghPaUKqBob-Fqls=.d6310c65-568c-4087-97a9-ea3fb246cd0f@github.com> On Tue, 11 Nov 2025 15:30:14 GMT, Emanuel Peter wrote: > But just that you know: internally we also run many tests with combinations of -Xbatch -XX:-TieredCompilation -Xcomp, only C1 etc. Good to know! I?ll cut down some runs in the next commit to speed up testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2517948839 From fgao at openjdk.org Wed Nov 12 12:48:35 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 12 Nov 2025 12:48:35 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v3] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 11 Nov 2025 15:27:38 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Fixed new test failures after rebasing and refined parts of the code to address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Merge branch 'master' into optimize-atomic-post >> - Clean up comments for consistency and add spacing for readability >> - Fix some corner case failures and refined part of code >> - Merge branch 'master' into optimize-atomic-post >> - Refine ascii art, rename some variables and resolve conflicts >> - Merge branch 'master' into optimize-atomic-post >> - Add necessary ASCII art, refactor insert_post_loop() and rename >> "atomic post loop" with "vectorized drain loop. >> - Merge branch 'master' into optimize-atomic-post >> - ... and 1 more: https://git.openjdk.org/jdk/compare/eab5644a...e21a830f > > test/hotspot/jtreg/compiler/loopopts/superword/TestVectorizedDrainLoop.java line 85: > >> 83: } >> 84: return sum; >> 85: } > > Since recently, this now also auto vectorizes. Maybe this method should not be compiled, if it is part of verification? That makes sense. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2517948392 From fgao at openjdk.org Wed Nov 12 12:48:38 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 12 Nov 2025 12:48:38 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v3] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 11 Nov 2025 15:33:57 GMT, Emanuel Peter wrote: >> test/micro/org/openjdk/bench/vm/compiler/VectorThroughputForIterationCount.java line 225: >> >>> 223: for (int i = startIndex; i < startIndex + length; i++) { >>> 224: c[i] = a[i] + b[i]; >>> 225: } >> >> You could forceinline them, just for good measure. Up to you. > > Wait, you are doing some kind of special warmup above. Why? Do you maybe NOT want the methods to inline? Any other reason for the warmup? If I understand correctly, when `ITERATION_COUNT` is set to a fixed value, all loop optimizations will know the loop iteration count from profiling. Without a special warm-up phase, the main loop is unlikely to be auto-vectorized for these small iteration counts, because [policy_unroll()](https://github.com/openjdk/jdk/blob/400a83da893f5fc285a175b63a266de21e93683c/src/hotspot/share/opto/loopTransform.cpp#L960) in C2 always attempts to generate code that is optimal for the current trip count based on profiling information. It may decide not to auto-vectorize, or even remove the loop entirely and keep only some scalar nodes. As a result, we can?t observe the potential effects of this patch. The special warm-up phase would instead trigger auto-vectorization and full unrolling. I suppose this patch takes effect in scenarios where certain Java loops have already been compiled with auto-vectorization and unrolling, and are later used to process data with smaller array sizes. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2518155225 From thartmann at openjdk.org Wed Nov 12 13:52:07 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 12 Nov 2025 13:52:07 GMT Subject: RFR: 8371558: C2: Missing optimization opportunity in AbsNode::Ideal [v2] In-Reply-To: References: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> Message-ID: On Wed, 12 Nov 2025 07:49:42 GMT, Beno?t Maillard wrote: >> This PR addresses another missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. >> >> The affected optimization is the transformation of `abs(0-x)` into `abs(x)`. This transformation is implemented in `AbsNode::Ideal`. >> >> The bug was found by the fuzzer. At some point during IGVN, we have the following setup: >> >> >> Phi ... >> \ / >> SubI >> | >> AbsI >> >> >> The `Phi` node gets folded into a `ConI`, and we call `replace_node(phi, zero)`, which ends up calling `add_users_to_worklist(phi)`, and `add_users_of_use_to_worklist(phi, zero, ...)`. However the case for this specific notification was missing there, and the `AbsI` node is never notified (not added to the worklist). >> >> This PR brings the following changes: >> - Detect the optimization pattern in `add_users_of_use_to_worklist` for `AbsI`, `AbsL`, `AbsF` and `AbsD` >> - Add new test `TestMissingOptAbsZeroMinusX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. In addition to `AbsI`, I have also added test cases for `AbsF` and `AbsD`, but was not able to reproduce for `AbsL` despite my best efforts. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) >> - [x] tier1-4, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestMissingOptAbsZeroMinusX.java > > Co-authored-by: Tobias Hartmann Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28237#pullrequestreview-3453552960 From aph at openjdk.org Wed Nov 12 14:10:13 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Nov 2025 14:10:13 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 12 Nov 2025 11:15:03 GMT, Samuel Chee wrote: >> AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: >> >> ;; cmpxchg { >> 0x0000e708d144cf60: mov x8, x2 >> 0x0000e708d144cf64: casal x8, x3, [x0] >> 0x0000e708d144cf68: cmp x8, x2 >> ;; 0x1F1F1F1F1F1F1F1F >> 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f >> ;; } cmpxchg >> 0x0000e708d144cf70: cset x8, ne // ne = any >> 0x0000e708d144cf74: dmb ish >> >> >> According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] >> >>> Atomically sets the value of a variable to the >>> newValue with the memory semantics of setVolatile if >>> the variable's current value, referred to as the witness >>> value, == the expectedValue, as accessed with the memory >>> semantics of getVolatile. >> >> >> >> Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. >> >> Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) >> >> This is also reflected by C2 not having a dmb for the same respective method. >> >> [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) >> [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) > > Samuel Chee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Address review comments. Refine. > - Merge from the main branch > - Add cmpxchg_barrier helper > > Change-Id: I17acf999140f0c1decb256de8291361c568a4ff8 > - Add comment > > Signed-off-by: Samuel Chee > Change-Id: I9793ed6ffdff6c044552d069af23620d178f2284 > - Add back in dmb membar for non-LSE > > Change-Id: Ie64565420a1758d3191eaebed82c80584ce54ef6 > - 8360654: AArch64: Remove redundant dmb from C1 compareAndSet > > Change-Id: I79a0079fc2d3d90eeb671b6ed73d963968d4fa53 src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3471: > 3469: bool weak, > 3470: Register result) { > 3471: cmpxchg(addr, expected, new_val, size, acquire, release, weak, result, false); Suggestion: cmpxchg(addr, expected, new_val, size, acquire, release, weak, result, /*with_barrier*/false); Reason: avoid naked booleans at call sites. Please do this everywhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2518449445 From aseoane at openjdk.org Wed Nov 12 14:17:45 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 12 Nov 2025 14:17:45 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v2] In-Reply-To: References: Message-ID: <_UFCSDlxzgDa8H-hCh6lze3WPepXNLK-g0dHZl4RU4U=.ea21921d-6e95-4564-910b-be148185c095@github.com> > This PR introduces new IGV dumps, property fields and filters related to escape analysis information. > > The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. > > With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: > > - Node escape ?level?. > - Scalar replaceability. > - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). > > This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. > > Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. > > **Testing:** passes tiers 1-3, manual testing in IGV Anton Seoane Ampudia has updated the pull request incrementally with eight additional commits since the last revision: - Review comments: fix coloring - Review comments: general changes - Review comments: minor IGV changes - Review comment: update filter comment - Review comments: restore removed line - Review comments: small changes in idealGraphPrinter.cpp - Review comments: add node to dumps, split phase, general readjustments - Review comments: explicit null check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28060/files - new: https://git.openjdk.org/jdk/pull/28060/files/872b1b48..b7867f12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=00-01 Stats: 140 lines in 10 files changed: 74 ins; 37 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/28060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28060/head:pull/28060 PR: https://git.openjdk.org/jdk/pull/28060 From aseoane at openjdk.org Wed Nov 12 14:17:45 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 12 Nov 2025 14:17:45 GMT Subject: RFR: 8356761: IGV: dump escape analysis information In-Reply-To: <8j_40zCPi1joR0SAU9PtcIQGRSBe1eSCYUqDRpyS8Ts=.f6681026-f9f1-44bd-8e62-f68526f45d5d@github.com> References: <8j_40zCPi1joR0SAU9PtcIQGRSBe1eSCYUqDRpyS8Ts=.f6681026-f9f1-44bd-8e62-f68526f45d5d@github.com> Message-ID: On Thu, 6 Nov 2025 06:02:59 GMT, Roberto Casta?eda Lozano wrote: >> Nice improvement! I have not reviewed this PR, yet, but I just want to raise a general concern that our model of having different `PrintIdealGraphLevel` values might not fit anymore for all the different concepts (different loop opts, IGVN steps, Superword steps, parsing steps and now EA steps etc.). Maybe the time has come to use a different solution to allow some better filtering for different needs. It should not per se block this PR but could be a justification to actually tackle this. > >> I just want to raise a general concern that our model of having different `PrintIdealGraphLevel` values might not fit anymore for all the different concepts (different loop opts, IGVN steps, Superword steps, parsing steps and now EA steps etc.). Maybe the time has come to use a different solution to allow some better filtering for different needs. > > Right, see [JDK-8320070](https://bugs.openjdk.org/browse/JDK-8320070). Thanks for your detailed review @robcasloz! I have addressed your comments now. I decided to add a new filter for "only" viewing the CG nodes and adjusted the connection graph info to use the extended header as well ------------- PR Comment: https://git.openjdk.org/jdk/pull/28060#issuecomment-3522156729 From rcastanedalo at openjdk.org Wed Nov 12 14:18:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Nov 2025 14:18:18 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function In-Reply-To: References: Message-ID: <0R-3haLON4hjdKTLXX-7vgos0WMStq0vbFx8P2rEzQU=.9165316f-153f-46d3-a752-540491b75f39@github.com> On Wed, 12 Nov 2025 08:49:49 GMT, Anton Seoane Ampudia wrote: > This PR adds a "shorthand" for the common `loop->is_member(get_loop(get_ctrl(node)))` pattern in loop optimizations. > > In PhaseIdealLoop, there is already an `is_member` function that checks if a node is a (nested) member of an IdealLoopTree. In a similar fashion, this changeset adds a `ctrl_is_member` that aims to simplify the common pattern of: > > Node* node_ctrl = get_ctrl(node); > if (loop->is_member(get_loop(node))) { ... } > > > This hopes to provide a bit more readability and code conciseness in such a common operation. > > **Testing:** passes tiers 1-3 Changes requested by rcastanedalo (Reviewer). src/hotspot/share/opto/loopnode.hpp line 1394: > 1392: > 1393: // is the control for 'n' a (nested) member of 'loop'? > 1394: int ctrl_is_member(const IdealLoopTree *loop, Node *n) { On top of @benoitmaillard's suggestion: Suggestion: bool ctrl_is_member(const IdealLoopTree* loop, const Node* n) { Adding the `const` qualifier requires propagating constness to `PhaseIdealLoop::is_member` and then transitively to `PhaseIdealLoop::get_loop()`, but I think it is worth doing it in this RFE if we are going to touch `PhaseIdealLoop::is_member` anyway. src/hotspot/share/opto/loopnode.hpp line 1396: > 1394: int ctrl_is_member(const IdealLoopTree *loop, Node *n) { > 1395: Node* n_ctrl = get_ctrl(n); > 1396: return loop->is_member(get_loop(n_ctrl)); Suggestion: return is_member(loop, get_ctrl(n)); ------------- PR Review: https://git.openjdk.org/jdk/pull/28259#pullrequestreview-3453651928 PR Review Comment: https://git.openjdk.org/jdk/pull/28259#discussion_r2518477552 PR Review Comment: https://git.openjdk.org/jdk/pull/28259#discussion_r2518462925 From aph at openjdk.org Wed Nov 12 14:19:35 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Nov 2025 14:19:35 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 12 Nov 2025 11:15:03 GMT, Samuel Chee wrote: >> AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: >> >> ;; cmpxchg { >> 0x0000e708d144cf60: mov x8, x2 >> 0x0000e708d144cf64: casal x8, x3, [x0] >> 0x0000e708d144cf68: cmp x8, x2 >> ;; 0x1F1F1F1F1F1F1F1F >> 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f >> ;; } cmpxchg >> 0x0000e708d144cf70: cset x8, ne // ne = any >> 0x0000e708d144cf74: dmb ish >> >> >> According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] >> >>> Atomically sets the value of a variable to the >>> newValue with the memory semantics of setVolatile if >>> the variable's current value, referred to as the witness >>> value, == the expectedValue, as accessed with the memory >>> semantics of getVolatile. >> >> >> >> Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. >> >> Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) >> >> This is also reflected by C2 not having a dmb for the same respective method. >> >> [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) >> [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) > > Samuel Chee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Address review comments. Refine. > - Merge from the main branch > - Add cmpxchg_barrier helper > > Change-Id: I17acf999140f0c1decb256de8291361c568a4ff8 > - Add comment > > Signed-off-by: Samuel Chee > Change-Id: I9793ed6ffdff6c044552d069af23620d178f2284 > - Add back in dmb membar for non-LSE > > Change-Id: Ie64565420a1758d3191eaebed82c80584ce54ef6 > - 8360654: AArch64: Remove redundant dmb from C1 compareAndSet > > Change-Id: I79a0079fc2d3d90eeb671b6ed73d963968d4fa53 src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3465: > 3463: } > 3464: > 3465: void MacroAssembler::cmpxchg(Register addr, Register expected, Why do we need all of these non-barrier versions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2518484255 From duke at openjdk.org Wed Nov 12 14:19:45 2025 From: duke at openjdk.org (Max Verevkin) Date: Wed, 12 Nov 2025 14:19:45 GMT Subject: RFR: 8366076: arm32: Fix register allocation for vector instructions Message-ID: <17J8mScwi2eBCPmmmTJd0ittihe0BfqZYuPgC638L8Q=.6e87a120-905f-4a30-a6f0-7e80fd613144@github.com> Arm32 has 32 double-precision floating point registers, the first 16 of which coincide with the 32 single-precision floating point registers. Some vector-operation nodes were implemented in terms of scalar instructions, which only really works for the first 16 doubles. This commit addresses that. ------------- Commit messages: - 8366076: arm32: Fix register allocation for vector instructions Changes: https://git.openjdk.org/jdk/pull/27071/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27071&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366076 Stats: 31 lines in 2 files changed: 25 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/27071.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27071/head:pull/27071 PR: https://git.openjdk.org/jdk/pull/27071 From kxu at openjdk.org Wed Nov 12 14:19:36 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 12 Nov 2025 14:19:36 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v17] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 12:37:10 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> mark LoopExitTest::is_valid_with_bt() const > > src/hotspot/share/opto/loopnode.cpp line 1638: > >> 1636: #ifdef ASSERT >> 1637: void PhaseIdealLoop::check_counted_loop_shape(IdealLoopTree* loop, Node* x, BasicType bt) { >> 1638: Node* back_control = loop_exit_control(x, loop); > > How far away are we from just using `LoopStructure` and then `LoopStructure::is_valid()` instead? Not much, however, `is_valid()` is checking more conditions that are beyond the scope of a proper counted loop shape. I think it's a better idea to keep these `assert` as-is for easier debugging. > src/hotspot/share/opto/loopnode.cpp line 1764: > >> 1762: // Get merge point >> 1763: _xphi = incr->in(1); >> 1764: _node = incr->in(2); > > Should we name `_node` simply `_stride_con`? Renamed to `_stride_node` to avoid confusion between `jlong stride_con()` > src/hotspot/share/opto/loopnode.cpp line 2500: > >> 2498: PhaseIterGVN* igvn = &_phase->igvn(); >> 2499: Node* init_control = _head->in(LoopNode::EntryControl); >> 2500: const jlong stride_con = _structure.stride().compute_non_zero_stride_con(_structure.exit_test().mask(), _iv_bt); > > I've noticed that you use this pattern a few times. How about having a `LoopStructure::stride_con()` method instead? added `LoopStructure::stride_con()` > src/hotspot/share/opto/loopnode.cpp line 2505: > >> 2503: Node* cmp_limit = CmpNode::make(_structure.exit_test().limit(), igvn->integercon((stride_con > 0 >> 2504: ? max_signed_integer(_iv_bt) >> 2505: : min_signed_integer(_iv_bt)) > > It might be easier to read when we extract the `intercon()` call to a separate variable in a line above. Exracted as following: jlong adjusted_stride_con = (stride_con > 0 ? max_signed_integer(_iv_bt) : min_signed_integer(_iv_bt)) - _structure.final_limit_correction(); Node* cmp_limit = CmpNode::make(_structure.limit(), igvn->integercon(adjusted_stride_con, _iv_bt), _iv_bt); > src/hotspot/share/opto/loopnode.hpp line 1338: > >> 1336: _back_control(back_control), >> 1337: _loop(loop), >> 1338: _phase(phase) {} > > Maybe also add an assert here that `back_control` is non-null. I disagree: `back_control` is not nessarily non-null always. In fact, `loop_exit_control()` could return null even if `head` and `loop` are non-null. This is also why the original code explicitly checks this as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2518474056 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2518468727 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2518481511 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2518485338 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2518462909 From chagedorn at openjdk.org Wed Nov 12 14:31:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 12 Nov 2025 14:31:33 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v26] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 16:21:00 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add missing comma from suggestion application It was difficult to grasp the implementation in its entirety. I focused more on the descriptions and how things are connected. I added some last comments but then it looks good from my side! Thanks for your patience and again: Great work! :-) test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 32: > 30: > 31: /** > 32: * The {@link CodeFrame} represents a frame (i.e. scope) of generated code by appending {@link Code} to the {@code 'codeList'} Suggestion: * The {@link CodeFrame} represents a frame (i.e. scope) of generated code by appending {@link Code} to the {@link #codeList} test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 45: > 43: * the execution jumps from the current (caller) {@link CodeFrame} scope to the scope of the > 44: * {@link Hook#anchor}. This ensures that the {@link Name}s of the anchor scope are accessed, > 45: * and not of the ones from the caller scope. Once the {@link Hook#insert}ion is complete, we Suggestion: * {@link Hook#anchor}. This ensures that the {@link Name}s of the anchor scope are accessed, * and not the ones from the caller scope. Once the {@link Hook#insert}ion is complete, we test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 60: > 58: * Below, we look at a few examples, and show the use of CodeFrames (c) and TemplateFrames (t). > 59: * > 60: * Example1: anchoring and insertion in the same Template I think this example is already powerful enough to get a feeling of what is going on - thanks for adding it! From my side, no more examples are required for now but feel free to add more examples as you see fit. If we stick with one example: Suggestion: * Below, we look at an example, and show the use of CodeFrames (c) and TemplateFrames (t). test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 101: > 99: * t1 c1 t2 c2b ... t3 c3 <-- TemplateFrame nesting ---t4 c4 > 100: * t1 c1 t2 c2b ... t3 c3 with hashtag t4 c4 > 101: * t1 c1 t2 c2b ... t3 c3 and setFuelCost t4 c4 Probably clear but for completness for explanation below: Suggestion: * t1 c1 t2 c2b ... t3 c3 with hashtag t4 c4 // t: Concerns Template Frame * t1 c1 t2 c2b ... t3 c3 and setFuelCost t4 c4 // c: Concerns Code Frame test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 111: > 109: * t1 c1 t2 c2b ... t3 c3 t4 c4 > 110: * t1 c1 t2 c2b ... t3 c3 t4 c4 addDataName(...) -> c: names escape to the caller scope because > 111: * t1 c1 t2 c2b ... t3 c3 t4 c4 insertion scope is transparent The suggestions below include: - Since you gave the scopes proper names in the example, let's stress this here. - Use numbers `c3` etc.? - For `let()`: hashtag "definition"? - Maybe add a small extra comment for `dataNames()` code frame nesting? Suggestion: * t1 c1 t2 c2b ... t3 c3 t4 c4 "use hashtag #x" -> t: hashtag queried in Insertion (t4) and Caller Scope (t3) * t1 c1 t2 c2b ... t3 c3 t4 c4 c: code added to Anchoring Scope (c2a) * t1 c1 t2 c2b ... t3 c3 t4 c4 * t1 c1 t2 c2b ... t3 c3 t4 c4 let("x", 42) -> t: hashtag definition escapes to Caller Scope (t3) because * t1 c1 t2 c2b ... t3 c3 t4 c4 Insertion Scope is transparent * t1 c1 t2 c2b ... t3 c3 t4 c4 * t1 c1 t2 c2b ... t3 c3 t4 c4 dataNames(...)...sample() -> c: sample from Insertion (c4) and Anchoring Scope (c2a) * t1 c1 t2 c2b ... t3 c3 t4 c4 (CodeFrame nesting: c2a -> c4) * t1 c1 t2 c2b ... t3 c3 t4 c4 addDataName(...) -> c: names escape to the Caller Scope (c3) because * t1 c1 t2 c2b ... t3 c3 t4 c4 Insertion Scope is transparent test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 119: > 117: > 118: // Wrap the FilteredSet as a Predicate. > 119: private static record DataNamePredicate(FilteredSet fs) implements NameSet.Predicate { IDE reports that `static` is redundant for inner records: Suggestion: private record DataNamePredicate(FilteredSet fs) implements NameSet.Predicate { test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 205: > 203: */ > 204: public Token sample(Function function) { > 205: return new NameSampleToken(predicate(), null, null, function); Suggestion: return new NameSampleToken<>(predicate(), null, null, function); test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 329: > 327: */ > 328: public Token toList(Function, ScopeToken> function) { > 329: return new NamesToListToken(predicate(), function); Suggestion: return new NamesToListToken<>(predicate(), function); test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 337: > 335: * > 336: * @param function The {@link Function} that is called to create the inner {@link ScopeToken}s > 337: * for each of the {@link DataName}s in the filtereds set. Suggestion: * for each of the {@link DataName}s in the filtered set. test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 343: > 341: */ > 342: public Token forEach(Function function) { > 343: return new NameForEachToken(predicate(), null, null, function); Suggestion: return new NameForEachToken<>(predicate(), null, null, function); test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 367: > 365: */ > 366: public Token forEach(String name, String type, Function function) { > 367: return new NameForEachToken(predicate(), name, type, function); Suggestion: return new NameForEachToken<>(predicate(), name, type, function); test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 287: > 285: > 286: // Tear down CodeFrame nesting. If no nesting happened, the code is already > 287: // in the currendCodeFrame. Suggestion: // in the currentCodeFrame. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 346: > 344: currentCodeFrame.addName(name); > 345: } > 346: case ScopeToken st -> { For this and below, can you use full names (i.e. `scopeToken` etc.)? This makes it a little easier to read instead of the abbreviations. test/hotspot/jtreg/compiler/lib/template_framework/ScopeToken.java line 27: > 25: > 26: import java.util.List; > 27: Unused: Suggestion: test/hotspot/jtreg/compiler/lib/template_framework/ScopeTokenImpl.java line 32: > 30: * hashtag replacements and {@link Template#setFuelCost} are local, or escape to > 31: * outer scopes. > 32: * Suggestion: * *

test/hotspot/jtreg/compiler/lib/template_framework/ScopeTokenImpl.java line 34: > 32: * > 33: * Note: We want the {@link ScopeToken} to be public, but the internals of the > 34: * record should be private. One way too solve this is with a public interface Suggestion: * record should be private. One way to solve this is with a public interface test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 94: > 92: > 93: // Wrap the FilteredSet as a Predicate. > 94: private static record StructuralNamePredicate(FilteredSet fs) implements NameSet.Predicate { Suggestion: private record StructuralNamePredicate(FilteredSet fs) implements NameSet.Predicate { test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 178: > 176: */ > 177: public Token sample(Function function) { > 178: return new NameSampleToken(predicate(), null, null, function); Suggestion: return new NameSampleToken<>(predicate(), null, null, function); test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 248: > 246: */ > 247: public Token toList(Function, ScopeToken> function) { > 248: return new NamesToListToken(predicate(), function); Suggestion: return new NamesToListToken<>(predicate(), function); test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 262: > 260: */ > 261: public Token forEach(Function function) { > 262: return new NameForEachToken(predicate(), null, null, function); Suggestion: return new NameForEachToken<>(predicate(), null, null, function); test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 286: > 284: */ > 285: public Token forEach(String name, String type, Function function) { > 286: return new NameForEachToken(predicate(), name, type, function); Suggestion: return new NameForEachToken<>(predicate(), name, type, function); test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 289: > 287: } > 288: } > 289: } Feels like we have a lot of duplication in `DataName` and `StructuralName`. But I'm not sure if/how we could share it somehow. Anyhow, not something for today. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 774: > 772: * > 773: *

> 774: * In some cases, it can be helpful to have different {@link setFuelCost} within Suggestion: * In some cases, it can be helpful to have different {@link #setFuelCost} within test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 791: > 789: * // depth, and recursive template uses should not get > 790: * // as much fuel as in CODE1. > 791: * ) Suggestion: * ), test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 883: > 881: * "System.out.println(\"Use a and b as capture variables:\"" + a + " and " + b + ");\n" > 882: * )) > 883: * ))); Suggestion: * )); test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 55: > 53: * of the current {@link Template}. Inner scopes of a {@link Template} have access to > 54: * the outer scope hashtag replacements, and any hashtag replacement defined inside an > 55: * inner scope is local and disappears once we leave the scope. We could explicitly mention here that we do not mean inner scopes that are templates themselves? Suggestion: * of the current {@link Template}. Inner scopes of a {@link Template}, that are not * templates themselves, have access to the outer scope hashtag replacements, and any * hashtag replacement defined inside an inner scope is local and disappears once we * leave the scope. test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 60: > 58: * The {@link #parent} relationship provides a trace for the use chain of templates and > 59: * their inner scopes. The {@link #fuel} is reduced over this chain to give a heuristic > 60: * on how much time is spent on the code from the template corresponding to the frame, "time" sounds misleading. What about: [...] on how many times we already nested the template corresponding to the frame recursively [...]? test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 171: > 169: if (this.isTransparentForFuel) { > 170: this.parent.setFuelCost(fuelCost); > 171: } `this` can be omitted: Suggestion: if (isTransparentForFuel) { parent.setFuelCost(fuelCost); } ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3453229086 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518192324 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518272771 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518264127 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518237849 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518227687 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518469550 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518471976 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518479356 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518482072 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518482740 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518485286 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518413436 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518427367 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518430513 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518446122 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518445484 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518488456 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518489768 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518494322 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518494855 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518495329 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518506571 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518150213 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518149435 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518151125 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518174448 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518186319 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518154504 From kxu at openjdk.org Wed Nov 12 14:31:44 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 12 Nov 2025 14:31:44 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v21] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: add missed minor changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/de71e7c8..78ec0a7d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=19-20 Stats: 62 lines in 3 files changed: 26 ins; 27 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From kxu at openjdk.org Wed Nov 12 14:31:47 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 12 Nov 2025 14:31:47 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v17] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 13:43:55 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> mark LoopExitTest::is_valid_with_bt() const > > src/hotspot/share/opto/loopnode.hpp line 277: > >> 275: >> 276: // Match increment with optional truncation >> 277: class TruncatedIncrement { > > You could move this code down to the other loop structure classes. Moved to `CountedLoopConverter` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2518519238 From aseoane at openjdk.org Wed Nov 12 14:34:38 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 12 Nov 2025 14:34:38 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 05:12:34 GMT, Zihao Lin wrote: >> Remove redundant check in 'trust_final_non_static_fields' ciField.cpp >> >> Remove: >> 1. java_lang_System check >> 2. is_box_klass check >> 3. java_lang_String check > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > remove Looks good to me ------------- Marked as reviewed by aseoane (Author). PR Review: https://git.openjdk.org/jdk/pull/28191#pullrequestreview-3453764942 From roland at openjdk.org Wed Nov 12 14:41:42 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 12 Nov 2025 14:41:42 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> References: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> Message-ID: <8FMOO0ROYecei8GLKeTi8Y8o48-pIq9d3UmPcf_g1WQ=.8d57cc51-b5d9-4872-a73a-555e5bcbfb37@github.com> On Mon, 3 Nov 2025 18:38:13 GMT, Vladimir Ivanov wrote: >> Expand the optimization for interface calls introduced by [JDK-6986483](https://bugs.openjdk.org/browse/JDK-6986483) to calls through `MethodHandle.linkToInterface`. >> >> The implementation is straightforward except the fact that symbolic information is lost during `MemberName` resolution. The fix uses declaring class instead, but it's more conservative than what is done for invokeinterface case. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > naming Looks reasonable to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28094#pullrequestreview-3453800497 From dlunden at openjdk.org Wed Nov 12 14:44:47 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 12 Nov 2025 14:44:47 GMT Subject: RFR: 8341039: compiler/cha/TypeProfileFinalMethod.java fails with assertEquals expected: 0 but was: 2 [v2] In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 07:32:44 GMT, Damon Fenacci wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/cha/TypeProfileFinalMethod.java >> >> Co-authored-by: Roberto Casta?eda Lozano > > Thanks for this "refactoring" @dlunde. LGTM (just 1 question) Thanks for the reviews @dafedafe and @robcasloz! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28200#issuecomment-3522296912 From dlunden at openjdk.org Wed Nov 12 14:48:23 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 12 Nov 2025 14:48:23 GMT Subject: Integrated: 8341039: compiler/cha/TypeProfileFinalMethod.java fails with assertEquals expected: 0 but was: 2 In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 16:25:59 GMT, Daniel Lund?n wrote: > The test `compiler/cha/TypeProfileFinalMethod.java` exercises a specific compilation pattern and easily breaks by setting various VM flags (e.g., `-Xcomp`). > > ### Changeset > > - Make the test flagless. > - Ensure the test only compiles the intended methods. > - Fix problems with compiler directives used in the test (incorrect signatures and some directives getting unintentionally shadowed by other directives). > - Force C2 inlining of a method which the test author likely intended to always be inlined (based on source code comments in the test). > - Switch argument order in `assertEquals` to make error message correct. > > Note for reviewers: A more fundamental rewrite of the test is beyond the scope of this changeset. The objective here is simply to ensure the test runs only in contexts intended by the test author. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/18972906513) > - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Stress testing of the specific test on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. This pull request has now been integrated. Changeset: 56a27d11 Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/56a27d11971d935e8b28ac9d701cf9890014a949 Stats: 21 lines in 2 files changed: 3 ins; 4 del; 14 mod 8341039: compiler/cha/TypeProfileFinalMethod.java fails with assertEquals expected: 0 but was: 2 Reviewed-by: rcastanedalo, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/28200 From epeter at openjdk.org Wed Nov 12 15:15:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 15:15:59 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v27] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: - Update test/hotspot/jtreg/compiler/lib/template_framework/ScopeToken.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/lib/template_framework/Template.java Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/dfc25f59..f8a2ef29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=25-26 Stats: 39 lines in 8 files changed: 4 ins; 7 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Wed Nov 12 15:22:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 15:22:32 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v26] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 12:46:18 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add missing comma from suggestion application > > test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 55: > >> 53: * of the current {@link Template}. Inner scopes of a {@link Template} have access to >> 54: * the outer scope hashtag replacements, and any hashtag replacement defined inside an >> 55: * inner scope is local and disappears once we leave the scope. > > We could explicitly mention here that we do not mean inner scopes that are templates themselves? > Suggestion: > > * of the current {@link Template}. Inner scopes of a {@link Template}, that are not > * templates themselves, have access to the outer scope hashtag replacements, and any > * hashtag replacement defined inside an inner scope is local and disappears once we > * leave the scope. I'm not sure this is better: `Inner scopes of a {@link Template}, that are not templates themselves,` Because they are only scopes of this template if they are not scopes of another template ? I now wrote this, and hope it is a bit more helpful: 52 * hashtag replacements of the outer {@link TemplateFrame}s, up to the outermost ~ 53 * of the current {@link Template}. If a hashtag replacemnt is added in a scope, ~ 54 * we have to find traverse to outer scopes until we find one that is not transparent ~ 55 * for hashtags (at most it is the frame of the Template), and insert it there. + 56 * The hashtag replacent is local to that frame, and accessible for any frames nested + 57 * inside it, but not inside other Templates. The hashtag replacement disappears once + 58 * the corresponding scope is exited, i.e. the frame removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518731271 From epeter at openjdk.org Wed Nov 12 15:30:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 15:30:07 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v26] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 12:49:52 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add missing comma from suggestion application > > test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 60: > >> 58: * The {@link #parent} relationship provides a trace for the use chain of templates and >> 59: * their inner scopes. The {@link #fuel} is reduced over this chain to give a heuristic >> 60: * on how much time is spent on the code from the template corresponding to the frame, > > "time" sounds misleading. What about: [...] on how many times we already nested the template corresponding to the frame recursively [...]? Hmm, well the idea is really to limit runtime once we run the generated code... But that correlates to the nesting depth and the iteration count of loops etc. I now wrote this: 61 * The {@link #parent} relationship provides a trace for the use chain of templates and 62 * their inner scopes. The {@link #fuel} is reduced over this chain to give a heuristic ~ 63 * on how deeply nested the code is at a given point, correlating to the runtime that ~ 64 * would be spent if the code was executed. The idea is that once the fuel is depleated, + 65 * we do not want to nest more deaply, so that there is a reasonable chance that the + 66 * execution of the generated code can terminate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518758484 From epeter at openjdk.org Wed Nov 12 15:30:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 15:30:03 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v28] In-Reply-To: References: Message-ID: <3w4DZJaIcvT8PjDDGmcsUfNFuRGdepQd3Sc6HA4dL-M=.38761604-094b-43ca-b111-1cb4e70e6538@github.com> > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: better documentation, inspired by Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/f8a2ef29..e428272d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=26-27 Stats: 10 lines in 1 file changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Wed Nov 12 15:42:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 15:42:55 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: inflate abreviations to full names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/e428272d..32488900 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=27-28 Stats: 37 lines in 1 file changed: 0 ins; 0 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Wed Nov 12 15:42:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 15:42:59 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v26] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 12:35:01 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add missing comma from suggestion application > > Thanks for the update Emanuel! These look good. I will now have a look at the rest of your code ? @chhagedorn Thanks for spending all the time on this, and the many great suggestions :) I applied most of them directly, and modified some of the documentation based on your suggestion, but trying to make things even more precise. Do you want to have another quick look here, or should we ask @robcasloz to have a look next? > test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 346: > >> 344: currentCodeFrame.addName(name); >> 345: } >> 346: case ScopeToken st -> { > > For this and below, can you use full names (i.e. `scopeToken` etc.)? This makes it a little easier to read instead of the abbreviations. Ok, enabled full verbosity mode ;) > test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 289: > >> 287: } >> 288: } >> 289: } > > Feels like we have a lot of duplication in `DataName` and `StructuralName`. But I'm not sure if/how we could share it somehow. Anyhow, not something for today. Maybe it could be somehow smartly collapsed. Maybe using some smart way with Generics. But I could not find a good solution yet. Right, it could be tried in a later RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3522585069 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518796764 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2518805319 From dlunden at openjdk.org Wed Nov 12 15:59:56 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 12 Nov 2025 15:59:56 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v10] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 04:00:46 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Minor cleanup > Hi @dlunde , improvements are gauged by inspecting the JIT code size. Every NDD instruction expects a 4-byte extended EVEX prefix. By demoting its to REX/REX2 prefix, we save 2-3 bytes per instruction. For example, consider the following micro kernel, with this patch, almost every NDD instruction gets the benefit of register biasing, and thus the assembler layer demotes these REX/REX2 prefixed instructions. I am convinced your patch provides improvements in many cases. What I'm worried about is regressions. Do I understand you correctly: the patch provides, in theory, strict code size improvements without any other disadvantages? My performance testing (DaCapo 23, SPECjbb 2005, and SPECjvm 2008) indicates no regressions, so that's good. > I have shared the details on validation configuration above Ah, sorry, I missed that. Looks reasonable. Great that you moved much of the logic to the AD files. Looks much cleaner. Finally, it looks like you only partially applied my suggested changes: https://github.com/dlunde/jdk/commit/d2b511804c757c89c5662028ea9e4a9dff43b641. Please consider also applying the rest (or let me know if you disagree with them). I'll rerun testing for sanity when you have applied the final changes! src/hotspot/cpu/x86/x86.ad line 2646: > 2644: > 2645: // Returns true for MachNode corresponding to Intel APX NDD selection patterns which > 2646: // can be demoted to REX/REX2 encodings, for commutative operations with register Suggestion: // can be demoted to REX/REX2 encodings. For commutative operations with register ------------- Changes requested by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3454188127 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2518869268 From rcastanedalo at openjdk.org Wed Nov 12 16:04:49 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Nov 2025 16:04:49 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v2] In-Reply-To: <_UFCSDlxzgDa8H-hCh6lze3WPepXNLK-g0dHZl4RU4U=.ea21921d-6e95-4564-910b-be148185c095@github.com> References: <_UFCSDlxzgDa8H-hCh6lze3WPepXNLK-g0dHZl4RU4U=.ea21921d-6e95-4564-910b-be148185c095@github.com> Message-ID: On Wed, 12 Nov 2025 14:17:45 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has updated the pull request incrementally with eight additional commits since the last revision: > > - Review comments: fix coloring > - Review comments: general changes > - Review comments: minor IGV changes > - Review comment: update filter comment > - Review comments: restore removed line > - Review comments: small changes in idealGraphPrinter.cpp > - Review comments: add node to dumps, split phase, general readjustments > - Review comments: explicit null check Thank you for addressing my suggestions, looks good now, modulo a couple of minor nits! src/hotspot/share/opto/escape.cpp line 1335: > 1333: if (use->is_AddP()) { > 1334: reduce_phi_on_field_access(use, alloc_worklist); > 1335: _compile->print_method(PHASE_EA_AFTER_PHI_ADDPP_REDUCTION, 6, use); Suggestion: _compile->print_method(PHASE_EA_AFTER_PHI_ADDP_REDUCTION, 6, use); src/hotspot/share/opto/escape.cpp line 1340: > 1338: _compile->print_method(PHASE_EA_AFTER_PHI_CMP_REDUCTION, 6, use); > 1339: } > 1340: Spurious new line? src/hotspot/share/opto/phasetype.hpp line 68: > 66: flags(EA_BEFORE_PHI_REDUCTION, "EA: 5. Before Phi Reduction") \ > 67: flags(EA_AFTER_PHI_CASTPP_REDUCTION, "EA: 5. Phi -> CastPP Reduction") \ > 68: flags(EA_AFTER_PHI_ADDPP_REDUCTION, "EA: 5. Phi -> AddPP Reduction") \ Suggestion: flags(EA_AFTER_PHI_ADDP_REDUCTION, "EA: 5. Phi -> AddP Reduction") \ src/utils/IdealGraphVisualizer/ServerCompiler/src/main/resources/com/sun/hotspot/igv/servercompiler/filters/showConnectionGraphNodesOnly.filter line 2: > 1: // This filter shows only the nodes that are present in the escape analysis > 2: // connection graph. This can be used to visualize the connection graph inside Suggestion: // connection graph. This can be used to approximate the connection graph inside test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java line 78: > 76: EA_BEFORE_PHI_REDUCTION( "EA: 5. Before Phi Reduction"), > 77: EA_AFTER_PHI_CASTPP_REDUCTION( "EA: 5. Phi -> CastPP Reduction"), > 78: EA_AFTER_PHI_ADDPP_REDUCTION( "EA: 5. Phi -> AddPP Reduction"), Suggestion: EA_AFTER_PHI_ADDP_REDUCTION( "EA: 5. Phi -> AddP Reduction"), ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28060#pullrequestreview-3454178920 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2518886192 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2518884324 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2518862552 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2518873582 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2518864675 From dskantz at openjdk.org Wed Nov 12 16:06:24 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Wed, 12 Nov 2025 16:06:24 GMT Subject: RFR: 8371628: C2: add a test case for the arraycopy changes in JDK-8297933 Message-ID: This PR adds a test for the arraycopy bug that caused a crash in `ArrayCopyNode::prepare_array_copy` and was fixed in JDK-8297933. A crash with this signature was previously reported on `compiler/c1/TestArrayCopy.java` but this test does not reproduce the issue (at least not reliably). Testing: T1-3. Extra testing: the added test reliably fails if the arraycopy changes from JDK-8297933 are backed out. ------------- Commit messages: - bugid - CR; annotate test. - WIP add test Changes: https://git.openjdk.org/jdk/pull/28269/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28269&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371628 Stats: 14 lines in 1 file changed: 13 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28269.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28269/head:pull/28269 PR: https://git.openjdk.org/jdk/pull/28269 From aseoane at openjdk.org Wed Nov 12 16:23:27 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 12 Nov 2025 16:23:27 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v3] In-Reply-To: References: Message-ID: > This PR introduces new IGV dumps, property fields and filters related to escape analysis information. > > The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. > > With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: > > - Node escape ?level?. > - Scalar replaceability. > - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). > > This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. > > Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. > > **Testing:** passes tiers 1-3, manual testing in IGV Anton Seoane Ampudia has updated the pull request incrementally with three additional commits since the last revision: - Review comments: remove spurious new line - Review comments: update header Co-authored-by: Roberto Casta?eda Lozano - Review comments: AddP misspell Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28060/files - new: https://git.openjdk.org/jdk/pull/28060/files/b7867f12..cc56fb7e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=01-02 Stats: 5 lines in 4 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28060/head:pull/28060 PR: https://git.openjdk.org/jdk/pull/28060 From aseoane at openjdk.org Wed Nov 12 16:25:29 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 12 Nov 2025 16:25:29 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function [v2] In-Reply-To: References: Message-ID: > This PR adds a "shorthand" for the common `loop->is_member(get_loop(get_ctrl(node)))` pattern in loop optimizations. > > In PhaseIdealLoop, there is already an `is_member` function that checks if a node is a (nested) member of an IdealLoopTree. In a similar fashion, this changeset adds a `ctrl_is_member` that aims to simplify the common pattern of: > > Node* node_ctrl = get_ctrl(node); > if (loop->is_member(get_loop(node))) { ... } > > > This hopes to provide a bit more readability and code conciseness in such a common operation. > > **Testing:** passes tiers 1-3 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Review comments: update types ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28259/files - new: https://git.openjdk.org/jdk/pull/28259/files/bfc41929..8005ac9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28259&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28259&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28259.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28259/head:pull/28259 PR: https://git.openjdk.org/jdk/pull/28259 From aseoane at openjdk.org Wed Nov 12 16:25:30 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 12 Nov 2025 16:25:30 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 09:37:01 GMT, Beno?t Maillard wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments: update types > > Looks good to me, thanks for making the change @anton-seoane! I would just change the return type, see my comments. Thanks for your coments @benoitmaillard @robcasloz. I have now updated the code accordingly now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28259#issuecomment-3522770272 From haosun at openjdk.org Wed Nov 12 16:26:46 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 12 Nov 2025 16:26:46 GMT Subject: RFR: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 07:16:18 GMT, Ramkumar Sunderbabu wrote: >> Hi, I suppose the failure may occur if we run this test case on CPU **with** SHA512 feature, but **disabling** SHA512Intrinsics. >> >> As **@requires vm.flagless** is set in this jtreg case, if we specify `-XX:-UseSHA512Intrinsics`, this test case is not tested actually. Here shows the log in my machine. >> >> >> $ make test TEST=test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java JTREG="VM_OPTIONS=-XX:-UseSHA512Intrinsics" >> Building target 'test' in configuration '/tmp/local-build-fastdebug' >> Running tests using JTREG control variable 'VM_OPTIONS=-XX:-UseSHA512Intrinsics' >> Test selection 'test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java', will run: >> * jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java >> Clean up dirs for jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java >> >> Running test 'jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java' >> Test results: no tests selected >> Report written to /tmp/local-build-fastdebug/test-results/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java/html/report.html >> Results written to /tmp/local-build-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java >> Finished running test 'jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java' >> Test report is stored in /tmp/local-build-fastdebug/test-results/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR SKIP >> jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java >> 0 0 0 0 0 >> ============================== >> TEST SUCCESS >> >> >> If so, I don't think it's a bug. >> Is there anything I misunderstood? > > @shqking -XX:-UseSHA512Intrinsics is not the only case of disabling SHA512 instrinsics. Please have a look at your comment https://bugs.openjdk.org/browse/JDK-8293484?focusedId=14532743&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14532743 > Intrinsics was disabled due to lack of test hardware. Please refer @snazarkin comment too. https://bugs.openjdk.org/browse/JDK-8293484?focusedId=14522406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14522406 > > I have come across instances where the support was temporarily disabled due to performance issue or reliability issue. > Relying on the CPU feature is not a bug per se. But it makes test code maintenance a tad bit difficult. > I have discussed this in detail under PR description. Hi @rsunderbabu, thanks for your helpful information. Another question: can we remove the requirement to run the test case `TestUseSHA512IntrinsicsOptionOnSupportedCPU.java`? I.e., can we remove the following `requires` statement? * @requires os.arch!="x86" & os.arch!="i386" ------------- PR Comment: https://git.openjdk.org/jdk/pull/28053#issuecomment-3522781500 From rcastanedalo at openjdk.org Wed Nov 12 17:00:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Nov 2025 17:00:55 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v3] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 16:23:27 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has updated the pull request incrementally with three additional commits since the last revision: > > - Review comments: remove spurious new line > - Review comments: update header > > Co-authored-by: Roberto Casta?eda Lozano > - Review comments: AddP misspell > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28060#pullrequestreview-3454503670 From rcastanedalo at openjdk.org Wed Nov 12 17:06:25 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Nov 2025 17:06:25 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 16:25:29 GMT, Anton Seoane Ampudia wrote: >> This PR adds a "shorthand" for the common `loop->is_member(get_loop(get_ctrl(node)))` pattern in loop optimizations. >> >> In PhaseIdealLoop, there is already an `is_member` function that checks if a node is a (nested) member of an IdealLoopTree. In a similar fashion, this changeset adds a `ctrl_is_member` that aims to simplify the common pattern of: >> >> Node* node_ctrl = get_ctrl(node); >> if (loop->is_member(get_loop(node))) { ... } >> >> >> This hopes to provide a bit more readability and code conciseness in such a common operation. >> >> **Testing:** passes tiers 1-3 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Review comments: update types Marked as reviewed by rcastanedalo (Reviewer). src/hotspot/share/opto/loopnode.hpp line 1379: > 1377: > 1378: // Return a post-walked LoopNode > 1379: IdealLoopTree *get_loop(const Node* n) const { Nit suggestion (where is my "Add suggestion" button?): `IdealLoopTree *get_loop` -> `IdealLoopTree* get_loop`. ------------- PR Review: https://git.openjdk.org/jdk/pull/28259#pullrequestreview-3454530407 PR Review Comment: https://git.openjdk.org/jdk/pull/28259#discussion_r2519114244 From rcastanedalo at openjdk.org Wed Nov 12 17:19:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Nov 2025 17:19:52 GMT Subject: RFR: 8371628: C2: add a test case for the arraycopy changes in JDK-8297933 In-Reply-To: References: Message-ID: <0qAdsesPmpa-mFfsR0WOv2bmLJk9NWjLeTdbd-NvhkA=.38e3a5e7-ecfd-41d1-ac1f-023d9942a451@github.com> On Wed, 12 Nov 2025 15:58:00 GMT, Daniel Skantz wrote: > This PR adds a test for the arraycopy bug that caused a crash in `ArrayCopyNode::prepare_array_copy` and was fixed in JDK-8297933. A crash with this signature was previously reported on `compiler/c1/TestArrayCopy.java` but this test does not reproduce the issue (at least not reliably). > > Testing: T1-3. Extra testing: the added test reliably fails if the arraycopy changes from JDK-8297933 are backed out. Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28269#pullrequestreview-3454599468 From jbhateja at openjdk.org Wed Nov 12 18:04:42 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Nov 2025 18:04:42 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v11] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/6c359e87..ef51f875 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=09-10 Stats: 7 lines in 2 files changed: 1 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Wed Nov 12 18:11:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Nov 2025 18:11:38 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v10] In-Reply-To: References: Message-ID: <3b4xZu4mtKibXeCT6tF13txhub_NSoxjsKBkBPXI14o=.71a1c084-4ba9-4ad4-8930-4197b63b2634@github.com> On Wed, 12 Nov 2025 15:57:25 GMT, Daniel Lund?n wrote: > I am convinced your patch provides improvements in many cases. What I'm worried about is regressions. Do I understand you correctly: the patch provides, in theory, strict code size improvements without any other disadvantages? My performance testing (DaCapo 23, SPECjbb 2005, and SPECjvm 2008) indicates no regressions, so that's good. > Yes, patch biases the def operands towards non-interfearing first or second operand LRGs, and should facilate NDD demotion. Glad to know that there are no side effect seen with benchmarks!!, suggestions incorporated!. Best Regards ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3523246859 From shade at openjdk.org Wed Nov 12 18:25:32 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Nov 2025 18:25:32 GMT Subject: RFR: 8371628: C2: add a test case for the arraycopy changes in JDK-8297933 In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 15:58:00 GMT, Daniel Skantz wrote: > This PR adds a test for the arraycopy bug that caused a crash in `ArrayCopyNode::prepare_array_copy` and was fixed in JDK-8297933. A crash with this signature was previously reported on `compiler/c1/TestArrayCopy.java` but this test does not reproduce the issue (at least not reliably). > > Testing: T1-3. Extra testing: the added test reliably fails if the arraycopy changes from JDK-8297933 are backed out. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28269#pullrequestreview-3454859964 From psandoz at openjdk.org Wed Nov 12 19:51:18 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 12 Nov 2025 19:51:18 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 07:59:38 GMT, Jatin Bhateja wrote: > > > Some quick comments. > > > We should be consistent in the naming, and rename `Halfloat*` to `Float16*`. > > > > > > I concur, especially since there are multiple 16-bit floating-point formats in use including the IEEE 754 float16 as well as bfloat16. > > There are nomenclature issues that I am facing. Currently, all the Float16 concrete classes use the Halffloat prefix i.e., Halffloat64Vector, Halffloat128Vector; converting these to Float16 looks a little confusing, i.e., Float1664Vector, Float16128Vector, etc Kindly suggest a better name to represent these classes. Maybe we move the shape to the end e.g., `Float16Vector128`, `IntVector128`, `IntVectorMax`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3523631727 From psandoz at openjdk.org Wed Nov 12 20:13:49 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 12 Nov 2025 20:13:49 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 19:48:54 GMT, Paul Sandoz wrote: >>> > Some quick comments. >>> > We should be consistent in the naming, and rename `Halfloat*` to `Float16*`. >>> >>> I concur, especially since there are multiple 16-bit floating-point formats in use including the IEEE 754 float16 as well as bfloat16. >> >> There are nomenclature issues that I am facing. Currently, all the Float16 concrete classes use the Halffloat prefix i.e., Halffloat64Vector, Halffloat128Vector; converting these to Float16 looks a little confusing, i.e., Float1664Vector, Float16128Vector, etc Kindly suggest a better name to represent these classes. > >> > > Some quick comments. >> > > We should be consistent in the naming, and rename `Halfloat*` to `Float16*`. >> > >> > >> > I concur, especially since there are multiple 16-bit floating-point formats in use including the IEEE 754 float16 as well as bfloat16. >> >> There are nomenclature issues that I am facing. Currently, all the Float16 concrete classes use the Halffloat prefix i.e., Halffloat64Vector, Halffloat128Vector; converting these to Float16 looks a little confusing, i.e., Float1664Vector, Float16128Vector, etc Kindly suggest a better name to represent these classes. > > Maybe we move the shape to the end e.g., `Float16Vector128`, `IntVector128`, `IntVectorMax`? > Hi @PaulSandoz , Thanks for your comments. Please find below my responses. > > > When you generate the fallback code for unary/binary etc can you push the carrier type and conversations into the uOp/bOp implementations so you don't have to explicitly operate on the carrier type and do the conversions as you do now e.g.,: > > ``` > > v0.uOp(m, (i, a) -> float16ToShortBits(Float16.valueOf(-(shortBitsToFloat16(($type$)a).floatValue())))); > > ``` > > Currently, uOp and uOpTemplates are part of the scaffolding logic and are sacrosanct; they are shared by various abstracted vector classes, and their semantics are defined by the lambda expression. I agree that explicit conversion in lambdas looks verbose, but moving them to uOpTemplate may fracture the lambda expression such that part of its semantics, i.e,. conversions, will seep into uOpTemplate, while what will appear at the surface will be the expression operating over primitive float values; this may become very confusing. Since the uOpTemplate etc are per element vector type it seems straightforward to adjust the template to perform the conversion before and after the function application, or add a default method to FUnOp etc that operates on the carrier value and performs the conversions and the template calls that default method. Later we will eventually be able to declare Float16![] and it should all collapse away. > > > Requiring two arguments means they can get out of sync. Previously the class provided all the information needed, now > > arguably the type does. > > Yes, from the compiler standpoint point all we care about is the carrier type, which determines the vector lane size. This is augmented with operation kind (PRIM / FP16) to differentiate a short vector lane from a float16 vector lane. Apart from this, we need to pass the VectorBox type to wrap the vector IR. The basic type codes are declared and shared across Java and HotSpot - it's used in `LaneType`. Can we pass a single argument that is the basic type instead of two arguments. HotSpot should know from the basic type what the carrier class and also what the operation type without it being explicitly told, since presumably it knew the inverse - the basic type from the element class. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3523722566 From psandoz at openjdk.org Wed Nov 12 20:25:28 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 12 Nov 2025 20:25:28 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: References: Message-ID: <5BMjEAdM54Ms7L52lrFTnKnOIjnw2Q_e5sTW0VoQlYI=.cfdf9294-6f23-4279-b1ea-deeb4c9d7e35@github.com> On Wed, 12 Nov 2025 01:47:31 GMT, Xiaohong Gong wrote: > Yes, converting mask to vector will be the way to resolve. Do you think it's better that defining a private VectorMask function for the slice operation? The function could be implemented with corresponding vector slice APIs. Although this function is not friendly to SVE performance, it wins on unifying the implementation. If it helps just add a utility method that does the slice/rearrange mask<->vector conversion, but given your use case i expect it only to be used in one location, so perhaps keep it close to there. It maybe you don't need full slice functionality, since you only care about a part of the mask elements that was rearranged to the start of the vector and therefore don't need to zero out the remaining parts that are not relevant. (The same happens for conversion by parts.) Since we don't yet have any slice intrinsic i think that would be OK and we could revisit later. Ideally we should able to optimize rearrange of vectors using constant shuffles with recognizable patterns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3523758766 From xgong at openjdk.org Thu Nov 13 01:36:14 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 13 Nov 2025 01:36:14 GMT Subject: Integrated: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE In-Reply-To: References: Message-ID: On Thu, 25 Sep 2025 03:08:47 GMT, Xiaohong Gong wrote: > The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures. > > For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen. > > These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures. > > This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations. > > It also modifies the Vector API jtreg tests for well testing. Here is the details: > > 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity: > > VectorMaskToLong (VectorLongToMask l) => l > > Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2. > > 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2". > > Performance shows significant improvement on NVIDIA's Grace CPU. > > Here is the performance data with `-XX:UseSVE=2`: > > Benchmark bits inputs Mode Unit Before After Gain > MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09 > MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08 > MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19 > MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32 > MaskQueryOperationsBenchmark.testToLongInt 128 2 thrpt ops/ms 101031... This pull request has now been integrated. Changeset: 676e6fd8 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/676e6fd8d5152f4e0d14ae59ddd7aa0a7127ea58 Stats: 543 lines in 16 files changed: 294 ins; 80 del; 169 mod 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE Reviewed-by: epeter, psandoz, haosun, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/27481 From vlivanov at openjdk.org Thu Nov 13 01:53:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Nov 2025 01:53:29 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v25] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Merge branch 'master' into 8290892.rf - Revise RF redunancy & auto-boxed primitives handling Cleanups - updates - update - updates - Merge branch 'master' into 8290892.rf - cleanups - Fix merge - Merge branch 'master' into 8290892.rf - cleanup - ... and 24 more: https://git.openjdk.org/jdk/compare/676e6fd8...518c5702 ------------- Changes: https://git.openjdk.org/jdk/pull/25315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=24 Stats: 1439 lines in 38 files changed: 1377 ins; 20 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From xgong at openjdk.org Thu Nov 13 02:09:03 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 13 Nov 2025 02:09:03 GMT Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6] In-Reply-To: <5BMjEAdM54Ms7L52lrFTnKnOIjnw2Q_e5sTW0VoQlYI=.cfdf9294-6f23-4279-b1ea-deeb4c9d7e35@github.com> References: <5BMjEAdM54Ms7L52lrFTnKnOIjnw2Q_e5sTW0VoQlYI=.cfdf9294-6f23-4279-b1ea-deeb4c9d7e35@github.com> Message-ID: On Wed, 12 Nov 2025 20:22:45 GMT, Paul Sandoz wrote: > > Yes, converting mask to vector will be the way to resolve. Do you think it's better that defining a private VectorMask function for the slice operation? The function could be implemented with corresponding vector slice APIs. Although this function is not friendly to SVE performance, it wins on unifying the implementation. > > If it helps just add a utility method that does the slice/rearrange mask<->vector conversion, but given your use case i expect it only to be used in one location, so perhaps keep it close to there. It maybe you don't need full slice functionality, since you only care about a part of the mask elements that was rearranged to the start of the vector and therefore don't need to zero out the remaining parts that are not relevant. (The same happens for conversion by parts.) Since we don't yet have any slice intrinsic i think that would be OK and we could revisit later. Ideally we should able to optimize rearrange of vectors using constant shuffles with recognizable patterns. Make sense to me. Thanks for all your inputs! I will create a PR for the java-level refactor and X86 modifications first. We can have more discussion then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3524804695 From vlivanov at openjdk.org Thu Nov 13 02:19:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Nov 2025 02:19:13 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v24] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 11:22:15 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Revise RF redunancy & auto-boxed primitives handling >> Cleanups > > src/hotspot/share/opto/reachability.cpp line 228: > >> 226: for (IdealLoopTree* outer_loop = lpt->_parent; >> 227: outer_loop->is_invariant(referent) && outer_loop->unique_loop_exit_or_null() != nullptr; >> 228: outer_loop = outer_loop->_parent) { > > Out of curiosity: is it always desirable to move out as far as possible? Or are there downsides? I'm not aware about any downsides. A use inside a loop keeps a value alive for the duration of the outermost loop where the value is loop invariant. So, the transformation is equivalent. Alternatively, if only a single step is made each time, then on the next iteration the conditions will be met again. It's more optimal to do it all at once. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2520888266 From vlivanov at openjdk.org Thu Nov 13 02:52:14 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Nov 2025 02:52:14 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v24] In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 10:40:22 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Revise RF redunancy & auto-boxed primitives handling >> Cleanups > > src/hotspot/share/opto/parse1.cpp line 1250: > >> 1248: Node* loc = local(idx); >> 1249: if (loc->bottom_type()->isa_oopptr() != nullptr && >> 1250: !is_auto_boxed_primitive(loc)) { // ignore auto-boxed primitives > > I wonder if randomizing this would shake out more interesting patterns? It could be the case when domination-based redundancy analysis was in place. In the current version, I don't see any benefit in test coverage. > src/hotspot/share/opto/reachability.cpp line 150: > >> 148: return false; // not a real safepoint >> 149: } else if (sfpt->is_CallStaticJava() && sfpt->as_CallStaticJava()->is_uncommon_trap()) { >> 150: return false; // uncommon traps are exit points > > Can we even hit this situation with a traversal from below? Just curious ;) With the upwards traversal of CFG, uncommon traps shouldn't be enumerated. But it's clearer to still keep the check in `is_interfering_sfpt_candidate` since it's agnostic of the details about traversal. I added an assert in the caller. > src/hotspot/share/opto/reachability.cpp line 208: > >> 206: //---------------------------- Phase 1 --------------------------------- >> 207: // Optimization pass over reachability fences during loop opts. >> 208: // Eliminate redundant RFs and move RFs with loop-invariant referent out of the loop. > > You removed the `find_redundant_rfs` case. Is the comment still accurate? Ok, adjusted. > src/hotspot/share/opto/reachability.cpp line 219: > >> 217: for (int i = 0; i < C->reachability_fences_count(); i++) { >> 218: ReachabilityFenceNode* rf = C->reachability_fence(i); >> 219: assert(!rf->is_redundant(igvn()), "required"); > > Why can we assume this? Is this guaranteed by IGVN? Yes, the assumption is that last IGVN pass removed all redundant RFs and the graph hasn't changed for new cases to occur. > src/hotspot/share/opto/reachability.cpp line 220: > >> 218: ReachabilityFenceNode* rf = C->reachability_fence(i); >> 219: assert(!rf->is_redundant(igvn()), "required"); >> 220: // Move RFs out of counted loops when possible. > > Is this limited to counted loops? Ah `unique_loop_exit_or_null` restricts it. > That seems fine, I'm just worried that we may at some point allow non-counted loops, and then the comment will be incorrect. The code assumes a single exit point, so if non-counted loops are supported, then the code have to be adjusted anyway. > src/hotspot/share/opto/reachability.cpp line 335: > >> 333: // Phase 2: migrate reachability info to safepoints. >> 334: // All RFs are replaced with edges from corresponding referents to interfering safepoints. >> 335: // Interfering safepoints are safepoint nodes which are reachable from the RF to its referent through CFG. > > Seems you don't do it for ALL any more, you drop those that `dominates_another_rf`. You should probably adapt the comment here. The traversal enumerates all encountered safepoints which pass `is_interfering_sfpt_candidate`. It's the caller which avoids the traversals for redundant nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2520998205 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2521005809 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2521013812 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2521012642 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2521013291 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2521014079 From fyang at openjdk.org Thu Nov 13 03:00:38 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Nov 2025 03:00:38 GMT Subject: RFR: 8371753: compiler/c2/cr7200264/TestIntVect.java fails IR verification Message-ID: Hi, please consider this test-only change fixing an IR test failure. This test fails after https://bugs.openjdk.org/browse/JDK-8340093 which enabled IR matching for three vector nodes. That relies on support for vector operations and will fail on platforms without that. This adds the necessary conditions for applying this matching rule. This enables more IR matching in this test for RISC-V vector as well. ------------- Commit messages: - 8371753: compiler/c2/cr7200264/TestIntVect.java fails IR verification Changes: https://git.openjdk.org/jdk/pull/28279/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28279&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371753 Stats: 19 lines in 1 file changed: 1 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/28279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28279/head:pull/28279 PR: https://git.openjdk.org/jdk/pull/28279 From vlivanov at openjdk.org Thu Nov 13 03:05:14 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Nov 2025 03:05:14 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v26] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/518c5702..6bea1285 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=24-25 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Thu Nov 13 03:05:14 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Nov 2025 03:05:14 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v24] In-Reply-To: References: Message-ID: <_bcSw73-0z5kaEyGsSz9Kox24TnFg7KxrakwEoEaf0k=.ab80c231-1afd-4cc9-8982-0db0d0311a43@github.com> On Tue, 11 Nov 2025 12:04:54 GMT, Emanuel Peter wrote: > You have a few tests already, but I'd love to see some IR tests. You could even check for the presence of ReachabilityFenceNode during some phase and then see if it goes away. Nice would be if we could even track if a SafePoint has a RF edge attached, but not sure how easy that is. > It would allow us not only to check for correctness, and hoping that we would catch incorrect cases with a crash/wrong result. But it would allow us to verify the graph, including the optimizations. The main complications with IR tests I see are: (1) very few cases where RF node is missing are known and all of them have already have a dedicated regression test; (2) the invariants RF imposes on the graph are non-local and it's hard to check them by inspecting IR. There's the transformation for loop-invariant referent I could try to add an IR unit test for, but I don't know how suitable IR test framework is for such scenario. Overall, I'd prefer to leave it as is for now and explore opportunities for IR tests as part of general effort to improve RF test coverage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3524974064 From vlivanov at openjdk.org Thu Nov 13 03:19:04 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Nov 2025 03:19:04 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v4] In-Reply-To: References: Message-ID: On Sat, 24 May 2025 08:43:49 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/macroArrayCopy.cpp line 209: >> >>> 207: int inline_limit = ArrayOperationPartialInlineSize / type2aelembytes(type); >>> 208: >>> 209: const TypeLong* length_type = _igvn.type(length)->isa_long(); >> >> Any particular benefit in eagerly pruning the block? It duplicates post-expansion GVN checks of the branch condition. (If it were normal parsing with prompt GVN analysis, you could detect the branch is dead right after `generate_guard` call.) >> >> Alternatively, the checks are equivalent to checking that join of `length_type` with `[0...inline_limit]` is not empty. But I prefer to let GVN handle it. > > I think it is a trivial check and it is much more efficient than creating a bunch of nodes and removing them later. Ok, then please, turn it into: const TypeLong* inline_range = TypeLong::make(0, inline_limit); if (length_type->join(inline_range) == Type::TOP) { return; } Or even encapsulate the check in `ArrayCopyNode::get_partial_inline_vector_lane_count()` and guard the subsequent checks with `lane_count > 0`. Otherwise, the patch looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25284#discussion_r2521115265 From wenanjian at openjdk.org Thu Nov 13 03:43:45 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 13 Nov 2025 03:43:45 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v24] In-Reply-To: References: Message-ID: <7trn_r_LGGNYjssRYxhRFXnfB-FxWtlVxMeVJ-RIrKs=.4725e5cc-facd-4582-9e20-7a05aad2a859@github.com> > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: modify format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/051ce4e3..972b5ba4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=22-23 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From thartmann at openjdk.org Thu Nov 13 06:06:02 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 13 Nov 2025 06:06:02 GMT Subject: RFR: 8371121: compiler/whitebox/DeoptimizeRelocatedNMethod.java fails with C1 In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 00:59:54 GMT, Chad Rakoczy wrote: > [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121) > > This update aims to improve the test?s stability. A previous failure occurred because the method wasn?t compiled at the time of the check. I believe this could have occurred due to a deoptimization but I have not been able to reproduce. Previously, the test ensured compilation by repeatedly invoking the function. Instead, we now use Whitebox to add the method directly to the compile queue and wait for it to finish compiling. This approach should eliminate issues caused by deoptimization from function calls. That looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28246#pullrequestreview-3457662099 From fyang at openjdk.org Thu Nov 13 06:25:06 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Nov 2025 06:25:06 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v24] In-Reply-To: <7trn_r_LGGNYjssRYxhRFXnfB-FxWtlVxMeVJ-RIrKs=.4725e5cc-facd-4582-9e20-7a05aad2a859@github.com> References: <7trn_r_LGGNYjssRYxhRFXnfB-FxWtlVxMeVJ-RIrKs=.4725e5cc-facd-4582-9e20-7a05aad2a859@github.com> Message-ID: On Thu, 13 Nov 2025 03:43:45 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify format My local `tier1-tier3` and `com/sun/crypto` tests with this intrinsic enabled are good. You still need another review. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25281#pullrequestreview-3457739343 From wenanjian at openjdk.org Thu Nov 13 06:36:36 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 13 Nov 2025 06:36:36 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v25] In-Reply-To: References: Message-ID: <18f4fTiV1QwLTsC9nF0hIqSZyBPdlquJPBjqPG1lC_0=.e075c623-782b-4b77-8e59-391cfac6467f@github.com> > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge branch 'openjdk:master' into aes_ctr - modify format - add more comments - modify parm to unsigned as aarch64 and x86 - clean comments and format - clean code and optimize big endian increase - delete the zvbb assert and some assembler support because no use - save branch jump and add some comments - fix a jtreg problem - change some instruction's sequence to make it more hardware-friendly - ... and 16 more: https://git.openjdk.org/jdk/compare/279f39f1...f1e3dd04 ------------- Changes: https://git.openjdk.org/jdk/pull/25281/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=24 Stats: 231 lines in 2 files changed: 224 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From qamai at openjdk.org Thu Nov 13 06:51:38 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Nov 2025 06:51:38 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v8] In-Reply-To: References: Message-ID: > Hi, > > The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: change the early return condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25284/files - new: https://git.openjdk.org/jdk/pull/25284/files/81eb2d12..dc8f3443 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=06-07 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25284/head:pull/25284 PR: https://git.openjdk.org/jdk/pull/25284 From qamai at openjdk.org Thu Nov 13 06:51:39 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Nov 2025 06:51:39 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v4] In-Reply-To: References: Message-ID: <8GtaxNhsJy-4oM0Hgfajwde6byxoe50QiXWu0PIlUCg=.c0d20d8e-8fd5-4455-b84c-5c08f45f7c9a@github.com> On Thu, 13 Nov 2025 03:15:59 GMT, Vladimir Ivanov wrote: >> I think it is a trivial check and it is much more efficient than creating a bunch of nodes and removing them later. > > Ok, then please, turn it into: > > const TypeLong* inline_range = TypeLong::make(0, inline_limit); > if (length_type->join(inline_range) == Type::TOP) { > return; > } > > Or even encapsulate the check in `ArrayCopyNode::get_partial_inline_vector_lane_count()` and guard the subsequent checks with `lane_count > 0`. > > Otherwise, the patch looks good. Thanks a lot, done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25284#discussion_r2521844335 From wenanjian at openjdk.org Thu Nov 13 07:12:38 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 13 Nov 2025 07:12:38 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: modify stub_id name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/f1e3dd04..c1e29200 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From chagedorn at openjdk.org Thu Nov 13 07:54:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 07:54:30 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 15:42:55 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > inflate abreviations to full names Thanks for the updates, some last nits :-) test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 99: > 97: * t1 c1 t2 c2b ... t3 c3 <-- TemplateFrame nesting ---t4 c4 > 98: * t1 c1 t2 c2b ... t3 c3 with hashtag t4 c4 // t: Concerns Template Frame > 99: * t1 c1 t2 c2b ... t3 c3 and setFuelCost t4 c4 // c: Concerns Code Frame My mistake :-) Suggestion: * t1 c1 t2 c2b ... t3 c3 with hashtag t4 c4 // t: Concerns TemplateFrame * t1 c1 t2 c2b ... t3 c3 and setFuelCost t4 c4 // c: Concerns CodeFrame test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 53: > 51: * has such a set of hashtag replacements, and implicitly provides access to the > 52: * hashtag replacements of the outer {@link TemplateFrame}s, up to the outermost > 53: * of the current {@link Template}. If a hashtag replacemnt is added in a scope, Suggestion: * of the current {@link Template}. If a hashtag replacement is added in a scope, test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 54: > 52: * hashtag replacements of the outer {@link TemplateFrame}s, up to the outermost > 53: * of the current {@link Template}. If a hashtag replacemnt is added in a scope, > 54: * we have to find traverse to outer scopes until we find one that is not transparent Suggestion: * we have to traverse to outer scopes until we find one that is not transparent test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 65: > 63: * on how deeply nested the code is at a given point, correlating to the runtime that > 64: * would be spent if the code was executed. The idea is that once the fuel is depleated, > 65: * we do not want to nest more deaply, so that there is a reasonable chance that the Suggestion: * we do not want to nest more deeply, so that there is a reasonable chance that the ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3458081151 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2522077796 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2522066438 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2522067532 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2522060969 From chagedorn at openjdk.org Thu Nov 13 07:54:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 07:54:33 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v26] In-Reply-To: References: Message-ID: <7q80yuwB7Z7KtydSkKl-eip8Iuj3YJBoHS4cHzdAW0g=.4c312932-807f-4fe1-abf2-4c37054c91a3@github.com> On Wed, 12 Nov 2025 15:37:21 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 289: >> >>> 287: } >>> 288: } >>> 289: } >> >> Feels like we have a lot of duplication in `DataName` and `StructuralName`. But I'm not sure if/how we could share it somehow. Anyhow, not something for today. > > Maybe it could be somehow smartly collapsed. Maybe using some smart way with Generics. But I could not find a good solution yet. Right, it could be tried in a later RFE. Me neither, it's just a feeling that it is possible, maybe with Generics or some reusable common class. Anyway, can be tackled later. >> test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 55: >> >>> 53: * of the current {@link Template}. Inner scopes of a {@link Template} have access to >>> 54: * the outer scope hashtag replacements, and any hashtag replacement defined inside an >>> 55: * inner scope is local and disappears once we leave the scope. >> >> We could explicitly mention here that we do not mean inner scopes that are templates themselves? >> Suggestion: >> >> * of the current {@link Template}. Inner scopes of a {@link Template}, that are not >> * templates themselves, have access to the outer scope hashtag replacements, and any >> * hashtag replacement defined inside an inner scope is local and disappears once we >> * leave the scope. > > I'm not sure this is better: > `Inner scopes of a {@link Template}, that are not templates themselves,` > Because they are only scopes of this template if they are not scopes of another template ? > > I now wrote this, and hope it is a bit more helpful: > > 52 * hashtag replacements of the outer {@link TemplateFrame}s, up to the outermost > ~ 53 * of the current {@link Template}. If a hashtag replacemnt is added in a scope, > ~ 54 * we have to find traverse to outer scopes until we find one that is not transparent > ~ 55 * for hashtags (at most it is the frame of the Template), and insert it there. > + 56 * The hashtag replacent is local to that frame, and accessible for any frames nested > + 57 * inside it, but not inside other Templates. The hashtag replacement disappears once > + 58 * the corresponding scope is exited, i.e. the frame removed. That's better, good! >> test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 60: >> >>> 58: * The {@link #parent} relationship provides a trace for the use chain of templates and >>> 59: * their inner scopes. The {@link #fuel} is reduced over this chain to give a heuristic >>> 60: * on how much time is spent on the code from the template corresponding to the frame, >> >> "time" sounds misleading. What about: [...] on how many times we already nested the template corresponding to the frame recursively [...]? > > Hmm, well the idea is really to limit runtime once we run the generated code... But that correlates to the nesting depth and the iteration count of loops etc. > > I now wrote this: > > 61 * The {@link #parent} relationship provides a trace for the use chain of templates and > 62 * their inner scopes. The {@link #fuel} is reduced over this chain to give a heuristic > ~ 63 * on how deeply nested the code is at a given point, correlating to the runtime that > ~ 64 * would be spent if the code was executed. The idea is that once the fuel is depleated, > + 65 * we do not want to nest more deaply, so that there is a reasonable chance that the > + 66 * execution of the generated code can terminate. Great, that's much clearer ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2522099442 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2522070276 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2522070468 From chagedorn at openjdk.org Thu Nov 13 08:13:16 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 08:13:16 GMT Subject: RFR: 8371753: compiler/c2/cr7200264/TestIntVect.java fails IR verification In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 02:48:01 GMT, Fei Yang wrote: > Hi, please consider this test-only change fixing an IR test failure. > > This test fails after https://bugs.openjdk.org/browse/JDK-8340093 which enabled IR matching for three vector nodes. > That relies on support for vector operations and will fail on platforms without that. This adds the necessary conditions > for applying this matching rule. This enables more IR matching in this test for RISC-V vector as well. > > Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28279#pullrequestreview-3458240133 From chagedorn at openjdk.org Thu Nov 13 08:45:38 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 08:45:38 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v3] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 16:23:27 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has updated the pull request incrementally with three additional commits since the last revision: > > - Review comments: remove spurious new line > - Review comments: update header > > Co-authored-by: Roberto Casta?eda Lozano > - Review comments: AddP misspell > > Co-authored-by: Roberto Casta?eda Lozano Nice work! That could indeed be helpful when studying/debugging EA related issues. I have some minor code comments and two general comments but otherwise, looks good to me, too. Two things I've noticed: - When I tried to enable the "Color by escape analysis state" filter, I was confused because colors were not changing. I then found out that I need to disable the "Color by category" filter first. My guess was that it's because the new filter appears above the "Color by category" filter: Image And indeed: Creating the same filter further down and then enabling both will result in the later one overriding the earlier one. Additional benefit: The non-colored nodes by "Color by escape analysis state" are still colored by the "Color by category" which could be helpful (if not, one can still disable the category coloring). - All the new filters appear at the top. But I would not consider them the most important one and thus, I would suggest to move them all further down in the list. src/hotspot/share/opto/escape.cpp line 2515: > 2513: bool ConnectionGraph::find_non_escaped_objects(GrowableArray& ptnodes_worklist, > 2514: GrowableArray& non_escaped_allocs_worklist, > 2515: bool verify) { `verify` suggests to actually do some verification. But it seems like it's only a toggle for dumping a graph. Could we rename it to `dump_for_igv` or something like that? src/hotspot/share/opto/idealGraphPrinter.cpp line 759: > 757: } > 758: > 759: if (_congraph && node->_idx < _congraph->nodes_size()) { We should explicitly check for null: Suggestion: if (_congraph != nullptr && node->_idx < _congraph->nodes_size()) { src/hotspot/share/opto/phasetype.hpp line 68: > 66: flags(EA_BEFORE_PHI_REDUCTION, "EA: 5. Before Phi Reduction") \ > 67: flags(EA_AFTER_PHI_CASTPP_REDUCTION, "EA: 5. Phi -> CastPP Reduction") \ > 68: flags(EA_AFTER_PHI_ADDP_REDUCTION, "EA: 5. Phi -> AddP Reduction") \ Suggestion: flags(EA_AFTER_PHI_ADDP_REDUCTION, "EA: 5. Phi -> AddP Reduction") \ test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java line 78: > 76: EA_BEFORE_PHI_REDUCTION( "EA: 5. Before Phi Reduction"), > 77: EA_AFTER_PHI_CASTPP_REDUCTION( "EA: 5. Phi -> CastPP Reduction"), > 78: EA_AFTER_PHI_ADDP_REDUCTION( "EA: 5. Phi -> AddP Reduction"), Suggestion: EA_AFTER_PHI_ADDP_REDUCTION( "EA: 5. Phi -> AddP Reduction"), ------------- PR Review: https://git.openjdk.org/jdk/pull/28060#pullrequestreview-3458280695 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2522252769 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2522234139 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2522224752 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2522226181 From chagedorn at openjdk.org Thu Nov 13 08:57:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 08:57:37 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v21] In-Reply-To: References: Message-ID: <_hYdO3bmm-WHm4DDIQJGTS1fFhySq8EZTa2aQBk5D0o=.7f6cdf22-ed57-4888-a2bf-15b404708601@github.com> On Wed, 12 Nov 2025 14:31:44 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > add missed minor changes Thanks for the (ongoing?) updates! Let me know, when it's ready to be reviewed again ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3526550324 From bmaillard at openjdk.org Thu Nov 13 09:25:12 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 13 Nov 2025 09:25:12 GMT Subject: RFR: 8371558: C2: Missing optimization opportunity in AbsNode::Ideal [v2] In-Reply-To: <8yK-mgs2IYDhJkkaZpka-5fiZNvF0YgbdRA0mCzxH0Y=.a7759179-6585-4739-9fd1-03ad92b4633c@github.com> References: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> <8yK-mgs2IYDhJkkaZpka-5fiZNvF0YgbdRA0mCzxH0Y=.a7759179-6585-4739-9fd1-03ad92b4633c@github.com> Message-ID: On Wed, 12 Nov 2025 06:36:44 GMT, Christian Hagedorn wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/c2/TestMissingOptAbsZeroMinusX.java >> >> Co-authored-by: Tobias Hartmann > > Looks good, thanks! Thank you for the reviews! @chhagedorn @TobiHartmann @robcasloz > At some point, we will have to think about some kind of system to enforce notification of indirect users "by construction", or at least make it possible to somehow detect when such notification may be needed. I agree, this seems quite difficult to sustain over time. But finding a good solution is probably quite hard, as there are different locations in the code where these notifications can happen (`add_users_of_use_to_worklist` is only one of them). In the meantime at least we get the opportunity to add more tests (most work in these PRs goes into reproducing these missing optimizations reliably and reducing the tests to the maximum), and this will still be super useful if we ever change the notification mechanism. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28237#issuecomment-3526698145 From bmaillard at openjdk.org Thu Nov 13 09:27:57 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 13 Nov 2025 09:27:57 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 05:12:34 GMT, Zihao Lin wrote: >> Remove redundant check in 'trust_final_non_static_fields' ciField.cpp >> >> Remove: >> 1. java_lang_System check >> 2. is_box_klass check >> 3. java_lang_String check > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > remove Marked as reviewed by bmaillard (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28191#pullrequestreview-3458650827 From bmaillard at openjdk.org Thu Nov 13 09:28:09 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 13 Nov 2025 09:28:09 GMT Subject: Integrated: 8371558: C2: Missing optimization opportunity in AbsNode::Ideal In-Reply-To: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> References: <-IanFerV1zNzvyd4OrMHEHlki-rVKXxrZSSV4kFRW-Y=.2f71a1c1-d65c-4871-b11c-345a1c754600@github.com> Message-ID: On Tue, 11 Nov 2025 14:42:43 GMT, Beno?t Maillard wrote: > This PR addresses another missed optimization in `PhaseIterGVN` due to a missing notification to indirect users within `PhaseIterGVN::add_users_of_use_to_worklist`. > > The affected optimization is the transformation of `abs(0-x)` into `abs(x)`. This transformation is implemented in `AbsNode::Ideal`. > > The bug was found by the fuzzer. At some point during IGVN, we have the following setup: > > > Phi ... > \ / > SubI > | > AbsI > > > The `Phi` node gets folded into a `ConI`, and we call `replace_node(phi, zero)`, which ends up calling `add_users_to_worklist(phi)`, and `add_users_of_use_to_worklist(phi, zero, ...)`. However the case for this specific notification was missing there, and the `AbsI` node is never notified (not added to the worklist). > > This PR brings the following changes: > - Detect the optimization pattern in `add_users_of_use_to_worklist` for `AbsI`, `AbsL`, `AbsF` and `AbsD` > - Add new test `TestMissingOptAbsZeroMinusX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. In addition to `AbsI`, I have also added test cases for `AbsF` and `AbsD`, but was not able to reproduce for `AbsL` despite my best efforts. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371534) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! This pull request has now been integrated. Changeset: 9d6a61fd Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/9d6a61fda6f43577ee8f19483e5b47100ff8eec0 Stats: 95 lines in 2 files changed: 95 ins; 0 del; 0 mod 8371558: C2: Missing optimization opportunity in AbsNode::Ideal Reviewed-by: thartmann, rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28237 From duke at openjdk.org Thu Nov 13 09:31:02 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 13 Nov 2025 09:31:02 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 05:12:34 GMT, Zihao Lin wrote: >> Remove redundant check in 'trust_final_non_static_fields' ciField.cpp >> >> Remove: >> 1. java_lang_System check >> 2. is_box_klass check >> 3. java_lang_String check > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > remove Thank you everyone. /integrate ------------- PR Comment: https://git.openjdk.org/jdk/pull/28191#issuecomment-3526727458 From jbhateja at openjdk.org Thu Nov 13 09:31:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 13 Nov 2025 09:31:03 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer In-Reply-To: References: Message-ID: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> On Wed, 12 Nov 2025 20:11:06 GMT, Paul Sandoz wrote: > The basic type codes are declared and shared across Java and HotSpot - it's used in `LaneType`. Can we pass a single argument that is the basic type instead of two arguments. HotSpot should know from the basic type what the carrier class and also what the operation type without it being explicitly told, since presumably it knew the inverse - the basic type from the element class. Hi @PaulSandoz, T_HALFFLOAT used in LaneType is mainly used for differentiation of various cache keys used by conversion operation lookups. In principle, we can extend VM to acknowledge this new custom basic type on the lines of T_METADATA / T_ADDRESS; its scope for now will be restricted to VectorSupport. We can gradually expose this to C2 type, such that TypeVect for all Float16 VectorIR uses T_HALFFLOAT as its basic type; currently, we use T_SHORT as the lane type. Let me know if this looks reasonable ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3526715585 From jbhateja at openjdk.org Thu Nov 13 09:31:04 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 13 Nov 2025 09:31:04 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer In-Reply-To: References: Message-ID: <15AReOBUAseO-BiCWHW7N-OSOcknDc0Box3c90cXRZU=.5d7341db-94ea-4cdf-b3cd-fabe414dd88d@github.com> On Wed, 12 Nov 2025 19:48:54 GMT, Paul Sandoz wrote: > > > > Some quick comments. > > > > We should be consistent in the naming, and rename `Halfloat*` to `Float16*`. > > > > > > > > > I concur, especially since there are multiple 16-bit floating-point formats in use including the IEEE 754 float16 as well as bfloat16. > > > > > > There are nomenclature issues that I am facing. Currently, all the Float16 concrete classes use the Halffloat prefix i.e., Halffloat64Vector, Halffloat128Vector; converting these to Float16 looks a little confusing, i.e., Float1664Vector, Float16128Vector, etc Kindly suggest a better name to represent these classes. > > Maybe we move the shape to the end e.g., `Float16Vector128`, `IntVector128`, `IntVectorMax`? This looks good, since all these are concrete vector classes not exposed to users. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3526723445 From aseoane at openjdk.org Thu Nov 13 09:36:03 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 09:36:03 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 05:01:03 GMT, Zihao Lin wrote: >> @linzihao1999 I noted there is another more comprehensive issue for this function: https://bugs.openjdk.org/browse/JDK-8368961 >> >> There are a total of 3 redundant checks in this function that can be removed. If you want to update this patch, feel free to update the issue for the PR, and include the cleanup for all 3 redundant checks. > >> @linzihao1999 I noted there is another more comprehensive issue for this function: https://bugs.openjdk.org/browse/JDK-8368961 >> >> There are a total of 3 redundant checks in this function that can be removed. If you want to update this patch, feel free to update the issue for the PR, and include the cleanup for all 3 redundant checks. > > Sure, I will update this change. @linzihao1999 please note that you will need a "capital R" Reviewer to approve your changes before being able to integrate (you can see the required progress in the body of your PR) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28191#issuecomment-3526752066 From duke at openjdk.org Thu Nov 13 09:39:17 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 13 Nov 2025 09:39:17 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 05:12:34 GMT, Zihao Lin wrote: >> Remove redundant check in 'trust_final_non_static_fields' ciField.cpp >> >> Remove: >> 1. java_lang_System check >> 2. is_box_klass check >> 3. java_lang_String check > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > remove Thank your everyone! @liach Can you help to sponsor this change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28191#issuecomment-3526769335 From aseoane at openjdk.org Thu Nov 13 09:47:44 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 09:47:44 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v4] In-Reply-To: References: Message-ID: > This PR introduces new IGV dumps, property fields and filters related to escape analysis information. > > The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. > > With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: > > - Node escape ?level?. > - Scalar replaceability. > - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). > > This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. > > Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. > > **Testing:** passes tiers 1-3, manual testing in IGV Anton Seoane Ampudia has updated the pull request incrementally with two additional commits since the last revision: - Review comments: explicit null check Co-authored-by: Christian Hagedorn - Review comments: whitespace missing Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28060/files - new: https://git.openjdk.org/jdk/pull/28060/files/cc56fb7e..54a3e2d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28060/head:pull/28060 PR: https://git.openjdk.org/jdk/pull/28060 From aseoane at openjdk.org Thu Nov 13 09:47:44 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 09:47:44 GMT Subject: RFR: 8356761: IGV: dump escape analysis information In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 14:06:54 GMT, Christian Hagedorn wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Nice improvement! I have not reviewed this PR, yet, but I just want to raise a general concern that our model of having different `PrintIdealGraphLevel` values might not fit anymore for all the different concepts (different loop opts, IGVN steps, Superword steps, parsing steps and now EA steps etc.). Maybe the time has come to use a different solution to allow some better filtering for different needs. It should not per se block this PR but could be a justification to actually tackle this. Thanks for your comments @chhagedorn! I believe I already moved the filters down, so it might be that you have some "non cleaned up" IGV state from a previous build where the ordering was wrong. I have tried cleaning and rebuilding and the filters show down in the list, so your IGV may be carrying some old filter list. Could you please clean and rebuild IGV to confirm if that is the case? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28060#issuecomment-3526802721 From rcastanedalo at openjdk.org Thu Nov 13 09:49:46 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 13 Nov 2025 09:49:46 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 15:42:55 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > inflate abreviations to full names Hi Emanuel, thanks for improving the design of the template framework, the enforcement of "everything is a token" and the introduction of explicit scope constraints seem like a step in the right direction. Before I go on with the review, I would like to ask two high-level questions (apologies if these are already discussed, it is hard to browse through a PR history): - The tutorial and the Template documentation remark that we would ideally have used string templates rather than hashtag replacements. Is this still true after the introduction of explicit scoping constraints, i.e. could we still simply use string templates and still enforce the user-provided scoping rules if the feature was available? - If I got the comments in the tutorial right, it seems that the user has good control over the "transparency level" of scopes, while the transparency rules for templates are hardcoded (hashtag replacements never escape, DataNames always escape, etc.). This felt a bit surprising, would it be feasible to just let the outermost scope in a template determine the template's transparency level? ------------- PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3458766576 From chagedorn at openjdk.org Thu Nov 13 10:12:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 10:12:12 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v4] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 09:47:44 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has updated the pull request incrementally with two additional commits since the last revision: > > - Review comments: explicit null check > > Co-authored-by: Christian Hagedorn > - Review comments: whitespace missing > > Co-authored-by: Christian Hagedorn Thanks for the hint, that does solve the problem! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28060#pullrequestreview-3458898546 From chagedorn at openjdk.org Thu Nov 13 10:12:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 10:12:17 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v3] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 08:17:16 GMT, Christian Hagedorn wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with three additional commits since the last revision: >> >> - Review comments: remove spurious new line >> - Review comments: update header >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Review comments: AddP misspell >> >> Co-authored-by: Roberto Casta?eda Lozano > > src/hotspot/share/opto/phasetype.hpp line 68: > >> 66: flags(EA_BEFORE_PHI_REDUCTION, "EA: 5. Before Phi Reduction") \ >> 67: flags(EA_AFTER_PHI_CASTPP_REDUCTION, "EA: 5. Phi -> CastPP Reduction") \ >> 68: flags(EA_AFTER_PHI_ADDP_REDUCTION, "EA: 5. Phi -> AddP Reduction") \ > > Suggestion: > > flags(EA_AFTER_PHI_ADDP_REDUCTION, "EA: 5. Phi -> AddP Reduction") \ Slipped through your update ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2522740115 From aseoane at openjdk.org Thu Nov 13 10:21:06 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 10:21:06 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v5] In-Reply-To: References: Message-ID: > This PR introduces new IGV dumps, property fields and filters related to escape analysis information. > > The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. > > With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: > > - Node escape ?level?. > - Scalar replaceability. > - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). > > This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. > > Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. > > **Testing:** passes tiers 1-3, manual testing in IGV Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Review comments: whitespace fix Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28060/files - new: https://git.openjdk.org/jdk/pull/28060/files/54a3e2d2..0cde9c91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28060/head:pull/28060 PR: https://git.openjdk.org/jdk/pull/28060 From aseoane at openjdk.org Thu Nov 13 10:21:09 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 10:21:09 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v3] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 08:22:42 GMT, Christian Hagedorn wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with three additional commits since the last revision: >> >> - Review comments: remove spurious new line >> - Review comments: update header >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Review comments: AddP misspell >> >> Co-authored-by: Roberto Casta?eda Lozano > > src/hotspot/share/opto/escape.cpp line 2515: > >> 2513: bool ConnectionGraph::find_non_escaped_objects(GrowableArray& ptnodes_worklist, >> 2514: GrowableArray& non_escaped_allocs_worklist, >> 2515: bool verify) { > > `verify` suggests to actually do some verification. But it seems like it's only a toggle for dumping a graph. Could we rename it to `dump_for_igv` or something like that? `find_non_escaped_objects` runs twice, with the second time as verification phase only. I thought adding it this way would be explicit at the caller site about if we are doing verification. Changing it to `dump_to_igv` is trivial though, so let me know if you feel it's still clearer that way and I'll change it quickly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2522764620 From aseoane at openjdk.org Thu Nov 13 10:21:10 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 10:21:10 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v3] In-Reply-To: References: Message-ID: <2dyHDX8ZIEVYesg_em07KXJPfKXwDOARMk1XKEbnXVg=.30a899df-ff25-4ee8-82ab-3af1c1762656@github.com> On Thu, 13 Nov 2025 10:09:26 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/phasetype.hpp line 68: >> >>> 66: flags(EA_BEFORE_PHI_REDUCTION, "EA: 5. Before Phi Reduction") \ >>> 67: flags(EA_AFTER_PHI_CASTPP_REDUCTION, "EA: 5. Phi -> CastPP Reduction") \ >>> 68: flags(EA_AFTER_PHI_ADDP_REDUCTION, "EA: 5. Phi -> AddP Reduction") \ >> >> Suggestion: >> >> flags(EA_AFTER_PHI_ADDP_REDUCTION, "EA: 5. Phi -> AddP Reduction") \ > > Slipped through your update Oh right. Thanks for the heads up! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2522776781 From aseoane at openjdk.org Thu Nov 13 10:36:25 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 10:36:25 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function [v3] In-Reply-To: References: Message-ID: <5WxrAw8WdUVcuKyNXY1YMED3vmFneSA4jzh5T62FedU=.2b30ee68-d76b-4352-b535-5c9e9ae45b82@github.com> > This PR adds a "shorthand" for the common `loop->is_member(get_loop(get_ctrl(node)))` pattern in loop optimizations. > > In PhaseIdealLoop, there is already an `is_member` function that checks if a node is a (nested) member of an IdealLoopTree. In a similar fashion, this changeset adds a `ctrl_is_member` that aims to simplify the common pattern of: > > Node* node_ctrl = get_ctrl(node); > if (loop->is_member(get_loop(node))) { ... } > > > This hopes to provide a bit more readability and code conciseness in such a common operation. > > **Testing:** passes tiers 1-3 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Review comments: nit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28259/files - new: https://git.openjdk.org/jdk/pull/28259/files/8005ac9b..c343e2c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28259&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28259&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28259.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28259/head:pull/28259 PR: https://git.openjdk.org/jdk/pull/28259 From aseoane at openjdk.org Thu Nov 13 10:36:28 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 10:36:28 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 17:03:08 GMT, Roberto Casta?eda Lozano wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments: update types > > src/hotspot/share/opto/loopnode.hpp line 1379: > >> 1377: >> 1378: // Return a post-walked LoopNode >> 1379: IdealLoopTree *get_loop(const Node* n) const { > > Nit suggestion (where is my "Add suggestion" button?): `IdealLoopTree *get_loop` -> `IdealLoopTree* get_loop`. Oh, absolutely! Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28259#discussion_r2522841575 From mli at openjdk.org Thu Nov 13 10:51:29 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 13 Nov 2025 10:51:29 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: <0RG4QpxVEu5xunn5J8AQdiWcY1jaJLtICuS_tDm2TjQ=.b51bd8d8-7aa5-40c2-aeb1-af62c41adbc5@github.com> References: <0RG4QpxVEu5xunn5J8AQdiWcY1jaJLtICuS_tDm2TjQ=.b51bd8d8-7aa5-40c2-aeb1-af62c41adbc5@github.com> Message-ID: On Thu, 13 Nov 2025 10:35:17 GMT, Anjian Wen wrote: > This patch Need more reviewers, @robehn @Hamlin-Li Do you want to have a look? Thanks for your work! I can have a look next week if still needed, currently working on several other tasks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3527177194 From wenanjian at openjdk.org Thu Nov 13 10:38:04 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 13 Nov 2025 10:38:04 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: <0RG4QpxVEu5xunn5J8AQdiWcY1jaJLtICuS_tDm2TjQ=.b51bd8d8-7aa5-40c2-aeb1-af62c41adbc5@github.com> On Thu, 13 Nov 2025 07:12:38 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify stub_id name This patch Need more reviewers, @robehn @Hamlin-Li Do you want to have a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3527105369 From shade at openjdk.org Thu Nov 13 10:56:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 10:56:34 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes Message-ID: I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. Additional testing: - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails - [x] Linux x86_64 server fastdebug, `all` tests pass - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/28288/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28288&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371581 Stats: 71 lines in 2 files changed: 43 ins; 0 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/28288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28288/head:pull/28288 PR: https://git.openjdk.org/jdk/pull/28288 From wenanjian at openjdk.org Thu Nov 13 11:04:27 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 13 Nov 2025 11:04:27 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: <0RG4QpxVEu5xunn5J8AQdiWcY1jaJLtICuS_tDm2TjQ=.b51bd8d8-7aa5-40c2-aeb1-af62c41adbc5@github.com> Message-ID: On Thu, 13 Nov 2025 10:48:24 GMT, Hamlin Li wrote: > > This patch Need more reviewers, @robehn @Hamlin-Li Do you want to have a look? > > Thanks for your work! I can have a look next week if still needed, currently working on several other tasks. Thanks! sure, next week is good? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3527243424 From rcastanedalo at openjdk.org Thu Nov 13 11:18:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 13 Nov 2025 11:18:18 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function [v3] In-Reply-To: <5WxrAw8WdUVcuKyNXY1YMED3vmFneSA4jzh5T62FedU=.2b30ee68-d76b-4352-b535-5c9e9ae45b82@github.com> References: <5WxrAw8WdUVcuKyNXY1YMED3vmFneSA4jzh5T62FedU=.2b30ee68-d76b-4352-b535-5c9e9ae45b82@github.com> Message-ID: On Thu, 13 Nov 2025 10:36:25 GMT, Anton Seoane Ampudia wrote: >> This PR adds a "shorthand" for the common `loop->is_member(get_loop(get_ctrl(node)))` pattern in loop optimizations. >> >> In PhaseIdealLoop, there is already an `is_member` function that checks if a node is a (nested) member of an IdealLoopTree. In a similar fashion, this changeset adds a `ctrl_is_member` that aims to simplify the common pattern of: >> >> Node* node_ctrl = get_ctrl(node); >> if (loop->is_member(get_loop(node))) { ... } >> >> >> This hopes to provide a bit more readability and code conciseness in such a common operation. >> >> **Testing:** passes tiers 1-3 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Review comments: nit Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28259#pullrequestreview-3459276890 From rcastanedalo at openjdk.org Thu Nov 13 11:22:23 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 13 Nov 2025 11:22:23 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v5] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 10:21:06 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Review comments: whitespace fix > > Co-authored-by: Christian Hagedorn Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28060#pullrequestreview-3459296437 From qamai at openjdk.org Thu Nov 13 11:33:25 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Nov 2025 11:33:25 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes In-Reply-To: References: Message-ID: <66v3410UHKmmOwI6p8tjeC-BVGl2edIrF5PJrToI7nE=.f76efd0b-e4d2-4fcc-9e05-af4a857ebaf7@github.com> On Thu, 13 Nov 2025 10:49:14 GMT, Aleksey Shipilev wrote: > I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. > > At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. > > It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. > > Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. > > Additional testing: > - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails > - [x] Linux x86_64 server fastdebug, `all` tests pass > - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) LGTM. One minor concern is that for nodes in `worklist_revisit`, if the type changes, then we will perform `n->Value(this)` twice before updating the type table. Do you think it is preferable to duplicate this snippet instead: const Type* new_type = n->Value(this); if (new_type != type(n)) { DEBUG_ONLY(verify_type(n, new_type, type(n));) dump_type_and_node(n, new_type); set_type(n, new_type); push_child_nodes_to_worklist(worklist, n); } if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { // Keep track of Type nodes to kill CFG paths that use Type // nodes that become dead. _maybe_top_type_nodes.push(n); } ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/28288#pullrequestreview-3459340457 From chagedorn at openjdk.org Thu Nov 13 11:54:20 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 11:54:20 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v3] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 10:14:49 GMT, Anton Seoane Ampudia wrote: >> src/hotspot/share/opto/escape.cpp line 2515: >> >>> 2513: bool ConnectionGraph::find_non_escaped_objects(GrowableArray& ptnodes_worklist, >>> 2514: GrowableArray& non_escaped_allocs_worklist, >>> 2515: bool verify) { >> >> `verify` suggests to actually do some verification. But it seems like it's only a toggle for dumping a graph. Could we rename it to `dump_for_igv` or something like that? > > `find_non_escaped_objects` runs twice, with the second time as verification phase only. I thought adding it this way would be explicit at the caller site about if we are doing verification. Changing it to `dump_to_igv` is trivial though, so let me know if you feel it's still clearer that way and I'll change it quickly. I see, I looked at it from the callee-side which makes it suggest to perform some verification which it does not. Maybe @robcasloz can break the tie here :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2523167925 From rcastanedalo at openjdk.org Thu Nov 13 12:05:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 13 Nov 2025 12:05:18 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v3] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 11:50:58 GMT, Christian Hagedorn wrote: >> `find_non_escaped_objects` runs twice, with the second time as verification phase only. I thought adding it this way would be explicit at the caller site about if we are doing verification. Changing it to `dump_to_igv` is trivial though, so let me know if you feel it's still clearer that way and I'll change it quickly. > > I see, I looked at it from the callee-side which makes it suggest to perform some verification which it does not. Maybe @robcasloz can break the tie here :-) No strong opinions here, but if you ask me anyway I tend to agree with Christian that it would be a bit clearer to name the predicate for the effect it has. May I suggest to call it `print_method` in that case? (since `Compile::print_method` does more than IGV dumping). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2523200269 From qamai at openjdk.org Thu Nov 13 12:05:20 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Nov 2025 12:05:20 GMT Subject: RFR: 8371789: C2: More explicit dump results for TypePtr Message-ID: Hi, This patch tries to clear up the dumped information of `TypePtr` and its subclasses. It makes it immediately clear the states of the `Type` object without us having to look into the implementation of `dump2`, for example, to know that the absence of `:NotNull` implies that it is a `BotPTR`. Please take a look and kindly review, thanks a lot. ------------- Commit messages: - whitespace - fix IR framework - clean up printing of TypePtr and subclasses Changes: https://git.openjdk.org/jdk/pull/28292/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28292&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371789 Stats: 232 lines in 3 files changed: 64 ins; 121 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/28292.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28292/head:pull/28292 PR: https://git.openjdk.org/jdk/pull/28292 From epeter at openjdk.org Thu Nov 13 12:12:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Nov 2025 12:12:49 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v10] In-Reply-To: References: Message-ID: On Sun, 9 Nov 2025 09:34:52 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Move Test to compiler.igvn Very nice work @ichttt ! I like all your comments, and thanks for all the test cases, including the randomized ones! I just have a few minor suggestions :) src/hotspot/share/opto/divnode.cpp line 566: > 564: } else { > 565: new_hi = min_val; > 566: } Suggestion: if (!i1->is_con()) { // a) non-constant dividend: i1 could be min_val + 1. // -> i1 / i2 = (min_val + 1) / -1 = max_val is possible. new_hi = max_val; assert((min_val + 1) / -1 == new_hi, "new_hi should be max_val"); } else if (i2_lo != i2_hi) { // b) i1 is constant min_val, i2 is non-constant. // if i2 = -1 -> i1 / i2 = min_val / -1 = min_val // if i2 < -1 -> i1 / i2 <= min_val / -2 = (max_val / 2) + 1 new_hi = (max_val / 2) + 1; assert(min_val / -2 == new_hi, "new_hi should be (max_val / 2) + 1)"); } else { // c) i1 is constant min_val, i2 is constant -1. // -> i1 / i2 = min_val / -1 = min_val new_hi = min_val; } Reading this, I felt like I had to reconstruct a lot in my head. We could help the reader a little to close the gap. Feel free to reformulate, and even to remove you list above in favour of the inlined comments. test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 66: > 64: // All constants available during parsing > 65: return getIntConstant(Integer.MIN_VALUE) / getIntConstant(-1); > 66: } Why not add an IR rule that the div is still present after parsing? It seems you have already had the possible issue that javac optimized the div away, right? So this would ensure the optimization really does happen in C2, and that you are checking for the right kinds of nodes. test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 70: > 68: @Test > 69: @IR(failOn = {IRNode.DIV_I, IRNode.DIV_L, IRNode.URSHIFT_I, IRNode.URSHIFT_L, IRNode.RSHIFT_I, IRNode.RSHIFT_L, IRNode.MUL_I, IRNode.MUL_L, IRNode.ADD_I, IRNode.ADD_L, IRNode.SUB_I, IRNode.SUB_L, IRNode.AND_I, IRNode.AND_L}) > 70: Suggestion: test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 286: > 284: // transform_long_divide splits up the division into multiple other nodes, such as MulHiLNode, which does not have a good Value() implemantion. > 285: // When JDK-8366815 is fixed, these rules should be reenabled > 286: // Alternatively, a better MulHiLNode::Value() implemantion should also lead to constant folding Could you have some temporary IR rule that now passes, but fails once `JDK-8366815` is fixed? Otherwise, I'm afraid we will miss these comments here, and they will never be cleaned up. ------------- PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3459442625 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2523188178 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2523208681 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2523209322 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2523214237 From epeter at openjdk.org Thu Nov 13 12:12:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Nov 2025 12:12:51 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v10] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 12:05:31 GMT, Emanuel Peter wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Move Test to compiler.igvn > > test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 66: > >> 64: // All constants available during parsing >> 65: return getIntConstant(Integer.MIN_VALUE) / getIntConstant(-1); >> 66: } > > Why not add an IR rule that the div is still present after parsing? It seems you have already had the possible issue that javac optimized the div away, right? So this would ensure the optimization really does happen in C2, and that you are checking for the right kinds of nodes. Consider doing the same in other places in this file ;) > test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 286: > >> 284: // transform_long_divide splits up the division into multiple other nodes, such as MulHiLNode, which does not have a good Value() implemantion. >> 285: // When JDK-8366815 is fixed, these rules should be reenabled >> 286: // Alternatively, a better MulHiLNode::Value() implemantion should also lead to constant folding > > Could you have some temporary IR rule that now passes, but fails once `JDK-8366815` is fixed? Otherwise, I'm afraid we will miss these comments here, and they will never be cleaned up. Same elsewhere in this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2523210177 PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2523214882 From aseoane at openjdk.org Thu Nov 13 12:21:12 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 12:21:12 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v6] In-Reply-To: References: Message-ID: > This PR introduces new IGV dumps, property fields and filters related to escape analysis information. > > The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. > > With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: > > - Node escape ?level?. > - Scalar replaceability. > - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). > > This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. > > Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. > > **Testing:** passes tiers 1-3, manual testing in IGV Anton Seoane Ampudia has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8356761' of github.com:anton-seoane/jdk into JDK-8356761 - Review comments: rename `verify` to more explicit `print_method` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28060/files - new: https://git.openjdk.org/jdk/pull/28060/files/0cde9c91..64bacdc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=04-05 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28060/head:pull/28060 PR: https://git.openjdk.org/jdk/pull/28060 From aseoane at openjdk.org Thu Nov 13 12:21:14 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 12:21:14 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v3] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 12:02:27 GMT, Roberto Casta?eda Lozano wrote: >> I see, I looked at it from the callee-side which makes it suggest to perform some verification which it does not. Maybe @robcasloz can break the tie here :-) > > No strong opinions here, but if you ask me anyway I tend to agree with Christian that it would be a bit clearer to name the predicate for the effect it has. May I suggest to call it `print_method` in that case? (since `Compile::print_method` does more than IGV dumping). Makes sense. Changed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2523237143 From rcastanedalo at openjdk.org Thu Nov 13 12:28:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 13 Nov 2025 12:28:57 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v6] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 12:21:12 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8356761' of github.com:anton-seoane/jdk into JDK-8356761 > - Review comments: rename `verify` to more explicit `print_method` Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28060#pullrequestreview-3459542329 From aseoane at openjdk.org Thu Nov 13 12:28:58 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 12:28:58 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v6] In-Reply-To: References: Message-ID: <9BLlyiv2Cirdyh_CLwLvAhNu4lT-7sT_SaKjqMfEZ2g=.3794d3cf-a603-4515-9b79-2e6f08a8c30b@github.com> On Thu, 13 Nov 2025 12:21:12 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8356761' of github.com:anton-seoane/jdk into JDK-8356761 > - Review comments: rename `verify` to more explicit `print_method` @dlunde took a quick look and "offline reviewed" the changes, suggesting that I change the `if (C->igv_printer() != nullptr)` checks to use `should_print_igv()`. This will make things more consistent with similar cases such as [JDK-8370569](https://bugs.openjdk.org/browse/JDK-8370569). Just wanted to give a heads-up as I will be adding some extra changes now ------------- PR Comment: https://git.openjdk.org/jdk/pull/28060#issuecomment-3527575625 From chagedorn at openjdk.org Thu Nov 13 12:48:02 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 12:48:02 GMT Subject: RFR: 8371789: C2: More explicit dump results for TypePtr In-Reply-To: References: Message-ID: <79SI3TPjmPgPVt7pavCYb1XtG_bYVhIYGFMNEIvA5rg=.b1f32ffc-6380-4745-bd8f-b79c5b824303@github.com> On Thu, 13 Nov 2025 11:50:46 GMT, Quan Anh Mai wrote: > Hi, > > This patch tries to clear up the dumped information of `TypePtr` and its subclasses. It makes it immediately clear the states of the `Type` object without us having to look into the implementation of `dump2`, for example, to know that the absence of `:NotNull` implies that it is a `BotPTR`. > > Please take a look and kindly review, thanks a lot. That looks like a nice readability improvement! Can you show some before vs. after output to summarize your changes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28292#issuecomment-3527649423 From shade at openjdk.org Thu Nov 13 12:48:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 12:48:35 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 10:49:14 GMT, Aleksey Shipilev wrote: > I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. > > At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. > > It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. > > Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. > > Additional testing: > - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails > - [x] Linux x86_64 server fastdebug, `all` tests pass > - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) > If the type changes, then we will perform `n->Value(this)` twice before updating the type table. Yes, we do. Is your concern correctness or performance? I thought `Value()` is idempotent, so it is correct. For some nodes, computing `Value()` might take a while. I think we can make a helper method that commons the loops, let me try. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28288#issuecomment-3527652908 From chagedorn at openjdk.org Thu Nov 13 12:53:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 12:53:33 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v10] In-Reply-To: References: Message-ID: On Sun, 9 Nov 2025 09:34:52 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Move Test to compiler.igvn test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 77: > 75: > 76: @Test > 77: @IR(failOn = {IRNode.DIV_I, IRNode.DIV_L, IRNode.URSHIFT_I, IRNode.URSHIFT_L, IRNode.RSHIFT_I, IRNode.RSHIFT_L, IRNode.MUL_I, IRNode.MUL_L, IRNode.ADD_I, IRNode.ADD_L, IRNode.SUB_I, IRNode.SUB_L, IRNode.AND_I, IRNode.AND_L}) Drive-by comment: Since you only care about the absence of these nodes, you can also use the generic `DIV`, `MUL`, `URSHIFT` etc. `IRNode` definitions (should exist for all your cases) to make it easier to read. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2523347465 From qamai at openjdk.org Thu Nov 13 12:57:19 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Nov 2025 12:57:19 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 10:49:14 GMT, Aleksey Shipilev wrote: > I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. > > At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. > > It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. > > Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. > > Additional testing: > - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails > - [x] Linux x86_64 server fastdebug, `all` tests pass > - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) The concern is only about performance. I thought that if a node tries to look deep into the graph, then doing `Value` twice would be expensive. It might not be too concerning since the cases are not really common, so if refactoring hurts readability then staying with the current version would be fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28288#issuecomment-3527678466 From epeter at openjdk.org Thu Nov 13 13:04:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Nov 2025 13:04:02 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes In-Reply-To: References: Message-ID: <0OTQGgVWIugG7uVN8afIueHEiu_3yyGkSUCSsw4P0W8=.fd43b696-34f3-4013-a863-6b85b71ce7a1@github.com> On Thu, 13 Nov 2025 10:49:14 GMT, Aleksey Shipilev wrote: > I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. > > At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. > > It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. > > Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. > > Additional testing: > - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails > - [x] Linux x86_64 server fastdebug, `all` tests pass > - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) @shipilev Nice work, looks like it was a big struggle with the reproducer. The randomness with the seed surely makes it more intermittent. src/hotspot/share/opto/phaseX.cpp line 2850: > 2848: // Add nodes here if particular *Node::Value is doing deep graph traversals > 2849: // not handled by push_more_uses. > 2850: bool PhaseCCP::needs_revisit(Node *n) const { Suggestion: bool PhaseCCP::needs_revisit(Node* n) const { src/hotspot/share/opto/phaseX.cpp line 2856: > 2854: } > 2855: // CmpPNode performs deep traversals if it compares oopptr. CmpP is not notified for changes far away. > 2856: if (n->Opcode() == Op_CmpP) { The verification restricts it to `n->Opcode() == Op_CmpP && type(n->in(1))->isa_oopptr() && type(n->in(2))->isa_oopptr()`. How big is the difference here? Might this have a performance impact? src/hotspot/share/opto/phaseX.cpp line 2875: > 2873: // We should either make sure that these nodes are properly added back to the CCP worklist > 2874: // in PhaseCCP::push_child_nodes_to_worklist() to update their type in the same round, > 2875: // or that they are added in PhaseCCP::maybe_needs_revisit() so that analysis revisits Suggestion: // or that they are added in PhaseCCP::needs_revisit() so that analysis revisits ------------- PR Review: https://git.openjdk.org/jdk/pull/28288#pullrequestreview-3459633236 PR Review Comment: https://git.openjdk.org/jdk/pull/28288#discussion_r2523348552 PR Review Comment: https://git.openjdk.org/jdk/pull/28288#discussion_r2523341454 PR Review Comment: https://git.openjdk.org/jdk/pull/28288#discussion_r2523343785 From aseoane at openjdk.org Thu Nov 13 13:08:55 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 13:08:55 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v7] In-Reply-To: References: Message-ID: <5suodwTPhZZL7ePO1xcs66CmiGb8Vbpg4mxSR3b0zAw=.9582b117-3949-4796-ad5b-35df6af62fda@github.com> > This PR introduces new IGV dumps, property fields and filters related to escape analysis information. > > The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. > > With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: > > - Node escape ?level?. > - Scalar replaceability. > - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). > > This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. > > Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. > > **Testing:** passes tiers 1-3, manual testing in IGV Anton Seoane Ampudia has updated the pull request incrementally with two additional commits since the last revision: - Remove build files - Review comments: use `should_print_igv` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28060/files - new: https://git.openjdk.org/jdk/pull/28060/files/64bacdc2..311c7ff2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28060/head:pull/28060 PR: https://git.openjdk.org/jdk/pull/28060 From rcastanedalo at openjdk.org Thu Nov 13 13:08:58 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 13 Nov 2025 13:08:58 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v7] In-Reply-To: <5suodwTPhZZL7ePO1xcs66CmiGb8Vbpg4mxSR3b0zAw=.9582b117-3949-4796-ad5b-35df6af62fda@github.com> References: <5suodwTPhZZL7ePO1xcs66CmiGb8Vbpg4mxSR3b0zAw=.9582b117-3949-4796-ad5b-35df6af62fda@github.com> Message-ID: On Thu, 13 Nov 2025 13:04:49 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has updated the pull request incrementally with two additional commits since the last revision: > > - Remove build files > - Review comments: use `should_print_igv` Changes requested by rcastanedalo (Reviewer). make/hs_err_pid70657.log line 1: > 1: # Please remove. make/replay_pid70657.log line 1: > 1: version 2 Remove. ------------- PR Review: https://git.openjdk.org/jdk/pull/28060#pullrequestreview-3459688355 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2523384072 PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2523384498 From aseoane at openjdk.org Thu Nov 13 13:08:59 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 13:08:59 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v7] In-Reply-To: References: <5suodwTPhZZL7ePO1xcs66CmiGb8Vbpg4mxSR3b0zAw=.9582b117-3949-4796-ad5b-35df6af62fda@github.com> Message-ID: On Thu, 13 Nov 2025 13:00:48 GMT, Roberto Casta?eda Lozano wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove build files >> - Review comments: use `should_print_igv` > > make/hs_err_pid70657.log line 1: > >> 1: # > > Please remove. My bad. Realized as it was pushing. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28060#discussion_r2523396317 From shade at openjdk.org Thu Nov 13 13:24:37 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 13:24:37 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v2] In-Reply-To: References: Message-ID: > I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. > > At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. > > It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. > > Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. > > Additional testing: > - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails > - [x] Linux x86_64 server fastdebug, `all` tests pass > - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: - More restrictive CmpP check - Tighten up comments and signatures - Do Value() once ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28288/files - new: https://git.openjdk.org/jdk/pull/28288/files/bad9e384..adcfa567 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28288&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28288&range=00-01 Stats: 74 lines in 2 files changed: 33 ins; 32 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/28288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28288/head:pull/28288 PR: https://git.openjdk.org/jdk/pull/28288 From shade at openjdk.org Thu Nov 13 13:24:39 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 13:24:39 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 12:53:53 GMT, Quan Anh Mai wrote: > I thought that if a node tries to look deep into the graph, then doing `Value` twice would be expensive. It might not be too concerning since the cases are not really common, so if refactoring hurts readability then staying with the current version would be fine. Right. Arguably, splitting the loops and introducing a helper method is more readable, see new commits. Testing it now... ------------- PR Comment: https://git.openjdk.org/jdk/pull/28288#issuecomment-3527793650 From shade at openjdk.org Thu Nov 13 13:24:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 13:24:42 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v2] In-Reply-To: <0OTQGgVWIugG7uVN8afIueHEiu_3yyGkSUCSsw4P0W8=.fd43b696-34f3-4013-a863-6b85b71ce7a1@github.com> References: <0OTQGgVWIugG7uVN8afIueHEiu_3yyGkSUCSsw4P0W8=.fd43b696-34f3-4013-a863-6b85b71ce7a1@github.com> Message-ID: On Thu, 13 Nov 2025 12:49:47 GMT, Emanuel Peter wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - More restrictive CmpP check >> - Tighten up comments and signatures >> - Do Value() once > > src/hotspot/share/opto/phaseX.cpp line 2850: > >> 2848: // Add nodes here if particular *Node::Value is doing deep graph traversals >> 2849: // not handled by push_more_uses. >> 2850: bool PhaseCCP::needs_revisit(Node *n) const { > > Suggestion: > > bool PhaseCCP::needs_revisit(Node* n) const { Done. > src/hotspot/share/opto/phaseX.cpp line 2856: > >> 2854: } >> 2855: // CmpPNode performs deep traversals if it compares oopptr. CmpP is not notified for changes far away. >> 2856: if (n->Opcode() == Op_CmpP) { > > The verification restricts it to `n->Opcode() == Op_CmpP && type(n->in(1))->isa_oopptr() && type(n->in(2))->isa_oopptr()`. How big is the difference here? Might this have a performance impact? Honestly, no idea. I just wanted to have a conservative check, e.g. "We know `CmpP` does something fishy? We are going to revisit it." But it will make sense to keep C2 compilation fast. Let me try to add the oopptr checks and see if anything shows up in CTW. > src/hotspot/share/opto/phaseX.cpp line 2875: > >> 2873: // We should either make sure that these nodes are properly added back to the CCP worklist >> 2874: // in PhaseCCP::push_child_nodes_to_worklist() to update their type in the same round, >> 2875: // or that they are added in PhaseCCP::maybe_needs_revisit() so that analysis revisits > > Suggestion: > > // or that they are added in PhaseCCP::needs_revisit() so that analysis revisits Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28288#discussion_r2523433941 PR Review Comment: https://git.openjdk.org/jdk/pull/28288#discussion_r2523442427 PR Review Comment: https://git.openjdk.org/jdk/pull/28288#discussion_r2523434127 From aseoane at openjdk.org Thu Nov 13 13:26:54 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Thu, 13 Nov 2025 13:26:54 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v8] In-Reply-To: References: Message-ID: > This PR introduces new IGV dumps, property fields and filters related to escape analysis information. > > The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. > > With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: > > - Node escape ?level?. > - Scalar replaceability. > - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). > > This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. > > Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. > > **Testing:** passes tiers 1-3, manual testing in IGV Anton Seoane Ampudia has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains five new commits since the last revision: - Review comments: use `should_print_igv` - Merge branch 'JDK-8356761' of github.com:anton-seoane/jdk into JDK-8356761 - Review comments: whitespace fix Co-authored-by: Christian Hagedorn - Review comments: rename `verify` to more explicit `print_method` - Review comments: explicit null check Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28060/files - new: https://git.openjdk.org/jdk/pull/28060/files/311c7ff2..371f4e90 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28060&range=06-07 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28060/head:pull/28060 PR: https://git.openjdk.org/jdk/pull/28060 From bmaillard at openjdk.org Thu Nov 13 13:28:16 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 13 Nov 2025 13:28:16 GMT Subject: RFR: 8371674: C2 fails with Missed optimization opportunity in PhaseIterGVN for MoveL2D Message-ID: This PR addresses yet another missed optimization in `PhaseIterGVN`. The way this optimization is triggered is a bit different this time though, and the notification is missing in `Node::has_special_unique_user`. ## Analysis The affected optimization is the transformation of `MoveX2Y (LoadX mem)` into `LoadY mem`. This is implemented in `MoveNode::Ideal`. The optimization is as follows: ```c++ // Fold reinterpret cast into memory operation: // MoveX2Y (LoadX mem) => LoadY mem LoadNode* ld = in(1)->isa_Load(); if (ld != nullptr && (ld->outcnt() == 1)) { // replace only const Type* rt = bottom_type(); if (ld->has_reinterpret_variant(rt)) { if (phase->C->post_loop_opts_phase()) { return ld->convert_to_reinterpret_load(*phase, rt); } else { // attempt the transformation once loop opts are over phase->C->record_for_post_loop_opts_igvn(this); } } } The optimization is triggered only if the input is a `LoadNode` and the `MoveNode` is its only user. This is a relatively unusual pattern. The bug was found by the fuzzer. At some point during IGVN, we have the following subgraph: CountedLoop LoadL \ / \ Phi MoveL2D In `RegionNode::Ideal`, we end up calling `set_req_X` on the `Phi` node to delete the edge from the `Phi` node to `LoadL`. As a result, the `LoadL` node only has one user left, and the `MoveNode::Ideal` gets triggered at the next verification pass. ## Proposed Solution The solution is to add this particular case to `Node::has_special_unique_user`, which gets called by `Node::set_req_X`. ## Summary of changes This PR brings the following changes: - Detect the optimization pattern in `Node::has_special_unique_user` - Add new test `TestMissingOptMoveX2YLoadX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. I tried to get a reproducer for each of the `Move` nodes, but I was only able to get one for `MoveL2D` ### Testing - [x] [GitHub Actions](TODO) TODO - [x] tier1-3, plus some internal testing (TODO) Thank you for reviewing! ------------- Commit messages: - Bring back array declaration - Add new reduced fuzzer test - Add notification in Node::has_special_unique_user Changes: https://git.openjdk.org/jdk/pull/28290/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28290&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371674 Stats: 64 lines in 2 files changed: 62 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28290/head:pull/28290 PR: https://git.openjdk.org/jdk/pull/28290 From rcastanedalo at openjdk.org Thu Nov 13 13:29:35 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 13 Nov 2025 13:29:35 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v8] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 13:26:54 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains five new commits since the last revision: > > - Review comments: use `should_print_igv` > - Merge branch 'JDK-8356761' of github.com:anton-seoane/jdk into JDK-8356761 > - Review comments: whitespace fix > > Co-authored-by: Christian Hagedorn > - Review comments: rename `verify` to more explicit `print_method` > - Review comments: explicit null check > > Co-authored-by: Christian Hagedorn Thanks for the useful feature Ant?n! Please run some additional testing before integration. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28060#pullrequestreview-3459789551 From chagedorn at openjdk.org Thu Nov 13 13:31:41 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 13:31:41 GMT Subject: RFR: 8371121: compiler/whitebox/DeoptimizeRelocatedNMethod.java fails with C1 In-Reply-To: References: Message-ID: <274pwOR73-km5KxWsUktqBXacEMQ-9XWg9EewPMuN3E=.90c5229c-8f9f-433e-b28e-89ad713f8250@github.com> On Wed, 12 Nov 2025 00:59:54 GMT, Chad Rakoczy wrote: > [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121) > > This update aims to improve the test?s stability. A previous failure occurred because the method wasn?t compiled at the time of the check. I believe this could have occurred due to a deoptimization but I have not been able to reproduce. Previously, the test ensured compilation by repeatedly invoking the function. Instead, we now use Whitebox to add the method directly to the compile queue and wait for it to finish compiling. This approach should eliminate issues caused by deoptimization from function calls. test/hotspot/jtreg/compiler/whitebox/DeoptimizeRelocatedNMethod.java line 64: > 62: while (WHITE_BOX.isMethodQueuedForCompilation(method)) { > 63: Thread.onSpinWait(); > 64: } Thanks for tackling this. I have a concern that this can still fail when we remove the task from the queue to compile it, i.e. `isMethodQueuedForCompilation()` returns false, but we have not finished the compilation in the background and already check if it's compiled below. Can't we just re-add `-Xbatch` which makes `enqueueMethodForCompilation()` blocking until the method is compiled? Then you can remove the loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28246#discussion_r2523472797 From chagedorn at openjdk.org Thu Nov 13 13:34:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Nov 2025 13:34:13 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v8] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 13:26:54 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains five new commits since the last revision: > > - Review comments: use `should_print_igv` > - Merge branch 'JDK-8356761' of github.com:anton-seoane/jdk into JDK-8356761 > - Review comments: whitespace fix > > Co-authored-by: Christian Hagedorn > - Review comments: rename `verify` to more explicit `print_method` > - Review comments: explicit null check > > Co-authored-by: Christian Hagedorn Looks good, thanks for the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28060#pullrequestreview-3459811199 From bmaillard at openjdk.org Thu Nov 13 13:35:51 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 13 Nov 2025 13:35:51 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function [v3] In-Reply-To: <5WxrAw8WdUVcuKyNXY1YMED3vmFneSA4jzh5T62FedU=.2b30ee68-d76b-4352-b535-5c9e9ae45b82@github.com> References: <5WxrAw8WdUVcuKyNXY1YMED3vmFneSA4jzh5T62FedU=.2b30ee68-d76b-4352-b535-5c9e9ae45b82@github.com> Message-ID: On Thu, 13 Nov 2025 10:36:25 GMT, Anton Seoane Ampudia wrote: >> This PR adds a "shorthand" for the common `loop->is_member(get_loop(get_ctrl(node)))` pattern in loop optimizations. >> >> In PhaseIdealLoop, there is already an `is_member` function that checks if a node is a (nested) member of an IdealLoopTree. In a similar fashion, this changeset adds a `ctrl_is_member` that aims to simplify the common pattern of: >> >> Node* node_ctrl = get_ctrl(node); >> if (loop->is_member(get_loop(node))) { ... } >> >> >> This hopes to provide a bit more readability and code conciseness in such a common operation. >> >> **Testing:** passes tiers 1-3 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Review comments: nit Marked as reviewed by bmaillard (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28259#pullrequestreview-3459822128 From shade at openjdk.org Thu Nov 13 13:40:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 13:40:12 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v3] In-Reply-To: References: Message-ID: <7miya1xPfnEhuOs0Shzj9mcQ_ceuhttk40UbZTjnU7I=.43edf995-d866-4582-8a66-8607af2dd9df@github.com> > I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. > > At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. > > It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. > > Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. > > Additional testing: > - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails > - [x] Linux x86_64 server fastdebug, `all` tests pass > - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28288/files - new: https://git.openjdk.org/jdk/pull/28288/files/adcfa567..82cd8dae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28288&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28288&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28288/head:pull/28288 PR: https://git.openjdk.org/jdk/pull/28288 From qamai at openjdk.org Thu Nov 13 13:40:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Nov 2025 13:40:14 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v2] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 13:24:37 GMT, Aleksey Shipilev wrote: >> I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. >> >> At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. >> >> It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. >> >> Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails >> - [x] Linux x86_64 server fastdebug, `all` tests pass >> - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - More restrictive CmpP check > - Tighten up comments and signatures > - Do Value() once Thanks, LGTM. Marked as reviewed by qamai (Committer). src/hotspot/share/opto/phaseX.cpp line 2799: > 2797: // This is the meat of CCP: pull from worklist; compute new value; push changes out. > 2798: > 2799: // Do the first round. It's worth noting that because we start with everything being `Type::TOP`, this round will visit all alive nodes in the graph. ------------- PR Review: https://git.openjdk.org/jdk/pull/28288#pullrequestreview-3459814499 PR Review: https://git.openjdk.org/jdk/pull/28288#pullrequestreview-3459821035 PR Review Comment: https://git.openjdk.org/jdk/pull/28288#discussion_r2523484358 From shade at openjdk.org Thu Nov 13 13:40:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 13:40:16 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v2] In-Reply-To: References: Message-ID: <6KMCmhqI4qskN-ul2K83wQd_WVSTlet0LDwyei0g8Vk=.93badfcb-5afa-4f17-8fe3-33bcf2cea6ab@github.com> On Thu, 13 Nov 2025 13:31:16 GMT, Quan Anh Mai wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - More restrictive CmpP check >> - Tighten up comments and signatures >> - Do Value() once > > src/hotspot/share/opto/phaseX.cpp line 2799: > >> 2797: // This is the meat of CCP: pull from worklist; compute new value; push changes out. >> 2798: >> 2799: // Do the first round. > > It's worth noting that because we start with everything being `Type::TOP`, this round will visit all alive nodes in the graph. Right, blurbed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28288#discussion_r2523498119 From shade at openjdk.org Thu Nov 13 14:03:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 14:03:22 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v3] In-Reply-To: <7miya1xPfnEhuOs0Shzj9mcQ_ceuhttk40UbZTjnU7I=.43edf995-d866-4582-8a66-8607af2dd9df@github.com> References: <7miya1xPfnEhuOs0Shzj9mcQ_ceuhttk40UbZTjnU7I=.43edf995-d866-4582-8a66-8607af2dd9df@github.com> Message-ID: On Thu, 13 Nov 2025 13:40:12 GMT, Aleksey Shipilev wrote: >> I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. >> >> At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. >> >> It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. >> >> Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails >> - [x] Linux x86_64 server fastdebug, `all` tests pass >> - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More comments `hotspot_compiler` passes on current revision. CTW reproducer is still fixed. I am re-running larger CTW tests now. Feel free to start testing in your environments as well :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28288#issuecomment-3527966688 From fjiang at openjdk.org Thu Nov 13 14:28:05 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 13 Nov 2025 14:28:05 GMT Subject: RFR: 8371753: compiler/c2/cr7200264/TestIntVect.java fails IR verification In-Reply-To: References: Message-ID: <3BpQ-r1wlW3pdgH0cBszaEMaCDrpjD9LYiV7A25_wG4=.cac46c79-918d-4c48-924b-b0d4e73eb053@github.com> On Thu, 13 Nov 2025 02:48:01 GMT, Fei Yang wrote: > Hi, please consider this test-only change fixing an IR test failure. > > This test fails after https://bugs.openjdk.org/browse/JDK-8340093 which enabled IR matching for three vector nodes. > That relies on support for vector operations and will fail on platforms without that. This adds the necessary conditions > for applying this matching rule. This enables more IR matching in this test for RISC-V vector as well. > > Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. Thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/28279#pullrequestreview-3460062251 From shade at openjdk.org Thu Nov 13 14:35:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 14:35:34 GMT Subject: RFR: 8371789: C2: More explicit dump results for TypePtr In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 11:50:46 GMT, Quan Anh Mai wrote: > Hi, > > This patch tries to clear up the dumped information of `TypePtr` and its subclasses. It makes it immediately clear the states of the `Type` object without us having to look into the implementation of `dump2`, for example, to know that the absence of `:NotNull` implies that it is a `BotPTR`. > > Please take a look and kindly review, thanks a lot. Oh, this nuisance tripped me _hard_ when I was looking at [JDK-8371581](https://bugs.openjdk.org/browse/JDK-8371581). Wanted to do something like this as the follow-up :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28292#issuecomment-3528104650 From shade at openjdk.org Thu Nov 13 14:38:36 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 14:38:36 GMT Subject: RFR: 8371804: C2: Tighten up LoadNode::Value comments after JDK-8346184 Message-ID: This confused me quite a bit in [JDK-8371581](https://bugs.openjdk.org/browse/JDK-8371581) investigations. With [JDK-8346184](https://bugs.openjdk.org/browse/JDK-8346184), we have moved the block in `LoadNode::Value` that produced bottom values for the block that "If we are loading from a freshly-allocated object, produce a zero, if the load is provably beyond the header of the object." This comment is misleading, and really relates to the old place, which actually returns zeroes. It would be better to clean this up to avoid further confusion. There should be no semantic change, only the cleanup. Additional testing: - [ ] GHA ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/28296/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28296&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371804 Stats: 15 lines in 1 file changed: 3 ins; 4 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/28296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28296/head:pull/28296 PR: https://git.openjdk.org/jdk/pull/28296 From dfenacci at openjdk.org Thu Nov 13 15:07:32 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 13 Nov 2025 15:07:32 GMT Subject: RFR: 8371419: IGV: Add view to visualise dominator tree and dominator information Message-ID: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> This change introduces a dominator tree view in IGV?s CFG panel, enabling users to toggle between the control flow graph and the dominator tree. This makes dominator relationships easier to inspect than the current stdout-based output (`-XX:+PrintDominators`). ## Motivation * Today, dominator information is difficult to access (e.g. via `-XX:+PrintDominators`, which is hard to read and correlate with the graph). * IGV already computes dominators for some phases but does not visualize them. * Comparing dominator trees across graphs/phases was not supported. ## What?s New 1. Toggle in the CFG view (toolbar button (image) to switch between: * Control Flow Graph (CFG) * Dominator Tree 2. Dominator edge coloring to indicate provenance: * Blue: dominator info provided by C2 (from GCM phase onward for now, a follow RFE will handle loop optimization dominator information) * Red: dominator info computed by IGV (pre-GCM) 3. Graph comparison enhancements: * Compare dominator trees between graphs (new) * Compare CFG differences between graphs (previously missing) 4. Node annotations: * `idom`: immediate dominator * `dom_depth`: dominator depth * `block`: numeric block ID for all nodes in a block The resulting main view looks like this: Screenshot 2025-11-13 at 15 04 12 ## Testing * Tier 1-3 * Manual testing in IGV ------------- Commit messages: - JDK-8371419: Update copyright year to 2025 - JDK-8371419: undo unnecessary variable definition - JDK-8371419: remove white line - JDK-8371419: remove processBlockDominatorConnection - JDK-8371419: add dominator info in scheduler - JDK-8371419: remove superfluous toString - JDK-8371419: IGV: Add view to visualise dominator tree and dominator information Changes: https://git.openjdk.org/jdk/pull/28293/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28293&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371419 Stats: 296 lines in 15 files changed: 275 ins; 3 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/28293.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28293/head:pull/28293 PR: https://git.openjdk.org/jdk/pull/28293 From kxu at openjdk.org Thu Nov 13 15:07:45 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 13 Nov 2025 15:07:45 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v21] In-Reply-To: <_hYdO3bmm-WHm4DDIQJGTS1fFhySq8EZTa2aQBk5D0o=.7f6cdf22-ed57-4888-a2bf-15b404708601@github.com> References: <_hYdO3bmm-WHm4DDIQJGTS1fFhySq8EZTa2aQBk5D0o=.7f6cdf22-ed57-4888-a2bf-15b404708601@github.com> Message-ID: On Thu, 13 Nov 2025 08:54:26 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> add missed minor changes > > Thanks for the (ongoing?) updates! Let me know, when it's ready to be reviewed again ? @chhagedorn It is ready for review. Please see https://github.com/openjdk/jdk/pull/24458#discussion_r2518462909 and https://github.com/openjdk/jdk/pull/24458#discussion_r2518474056. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3528224222 From qamai at openjdk.org Thu Nov 13 15:33:34 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Nov 2025 15:33:34 GMT Subject: RFR: 8371789: C2: More explicit dump results for TypePtr In-Reply-To: <79SI3TPjmPgPVt7pavCYb1XtG_bYVhIYGFMNEIvA5rg=.b1f32ffc-6380-4745-bd8f-b79c5b824303@github.com> References: <79SI3TPjmPgPVt7pavCYb1XtG_bYVhIYGFMNEIvA5rg=.b1f32ffc-6380-4745-bd8f-b79c5b824303@github.com> Message-ID: On Thu, 13 Nov 2025 12:45:23 GMT, Christian Hagedorn wrote: >> Hi, >> >> This patch tries to clear up the dumped information of `TypePtr` and its subclasses. It makes it immediately clear the states of the `Type` object without us having to look into the implementation of `dump2`, for example, to know that the absence of `:NotNull` implies that it is a `BotPTR`. >> >> Please take a look and kindly review, thanks a lot. > > That looks like a nice readability improvement! Can you show some before vs. after output to summarize your changes? @chhagedorn Yes, for example: A byte array: Before: byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact * After: aryptr:byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact,iid=bot A `j.l.Object`: Before: narrowoop: java/lang/Object * After: narrowoop: instptr:java/lang/Object:BotPTR+0,iid=bot A pointer to the klass of `Object[]`: Before: precise [java/lang/Object: 0x00007011e800b840 * (java/lang/Cloneable,java/io/Serializable): :Constant:exact * After: aryklassptr:[instklassptr:java/lang/Object:NotNull+0 (java/lang/Cloneable,java/io/Serializable):Constant+0 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28292#issuecomment-3528364010 From epeter at openjdk.org Thu Nov 13 15:46:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Nov 2025 15:46:55 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 09:29:34 GMT, Roland Westrelin wrote: >> What if we just relax the assert? I failed to figure out what this assert is protecting us from by looking at the code. So what happens in a product build or when this assert is commented out? > >> What if we just relax the assert? I failed to figure out what this assert is protecting us from by looking at the code. So what happens in a product build or when this assert is commented out? > > For this particular test case, nothing. The assert is right before the cast nodes are removed, anyway. Once they are removed, the `AddP` in the chain all have the same base input. > The risk, I think, is if some code that transforms a chain of `AddP`s (some time before the assert) wrongly assume they all have the same base. It's also easier to write such a transformation if it's an invariant that a chain of `AddP`s have the same base (it's one less thing to worry about). @rwestrel I'm stepping through the reproducer myself now, and visualizing what happens. After Parsing, before Optimize: image After first loopopts, before its IGVN. We see that there was some duplication of the loop. Somehow, this introduces a second `CastPP`, and we get an orange and a yellow address subgraph. image Now, lots of interesting things happend in the loopopts cleanup (IGVN). 1. The yellow graph pushes the `488 Phi` through the `AddP`s. image 2. Then `371 Phi` is pushed through its `AddP`s. image This is an interesting state, because now we have a new `527 Phi` that merges the base (just the two `CastPP`), and the other new `528 Phi` merges the addresses. TODO continue ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3528421673 From epeter at openjdk.org Thu Nov 13 16:18:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Nov 2025 16:18:31 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 09:29:34 GMT, Roland Westrelin wrote: >> What if we just relax the assert? I failed to figure out what this assert is protecting us from by looking at the code. So what happens in a product build or when this assert is commented out? > >> What if we just relax the assert? I failed to figure out what this assert is protecting us from by looking at the code. So what happens in a product build or when this assert is commented out? > > For this particular test case, nothing. The assert is right before the cast nodes are removed, anyway. Once they are removed, the `AddP` in the chain all have the same base input. > The risk, I think, is if some code that transforms a chain of `AddP`s (some time before the assert) wrongly assume they all have the same base. It's also easier to write such a transformation if it's an invariant that a chain of `AddP`s have the same base (it's one less thing to worry about). @rwestrel > Then Phi#514 which has 2 CastPPs as input with identical inputs is transformed into another CastPP at the Phi constrol with the data control of the CastPP as input. PhiNode::unique_input() with uncast = true is where that happens. That's where things go wrong I think. Right, this is where we go from 2->3 `CastPP`. Every additional `CastPP` with the same input seems to be a liability. > The 2 CastPPs have the same data input but not same control and igvn can't common them. Do you know why we insert a new `CastPP` there, and why it is put not at the ctrl of the CastPP, but of the phi? I suppose the ctrl of the phi is correct, but we do lose information there, and that later prevents the `CastPP` to common. > The fix I propose is to delay the call to PhiNode::unique_input() with uncast = true if the Phi's inputs are cast nodes and have yet to be processed by igvn. This causes identical CastPPs to common and then only the Phi has 2 identical inputs is transformed to that input (rather than have a new CastPPs be created at a different control). Ok, so in this case, we prevent the phi from looking through the two CastPP and creating a new third one, because that would create a third CastPP with a new ctrl, that we cannot fold away later. Rather, we hope that the two CastPP get commoned first. Ok, it is starting to make sense to me now. @rwestrel Does what I describe match what you tried to explain so far? @rwestrel Do you think we could have the same assert also after every IGVN? That a chain of `AddP` all have the same base? I think that would be a nice addition to this fix here, and would strengthen the invariant. This could be part of `VerifyIterativeGVN`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3528554194 PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3528567651 From roland at openjdk.org Thu Nov 13 16:20:26 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Nov 2025 16:20:26 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v5] In-Reply-To: References: Message-ID: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - review - Merge branch 'master' into JDK-8370939 - review - more - more - more - more - test - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28088/files - new: https://git.openjdk.org/jdk/pull/28088/files/7f796587..2cc796b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=03-04 Stats: 235209 lines in 1766 files changed: 151486 ins; 50188 del; 33535 mod Patch: https://git.openjdk.org/jdk/pull/28088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28088/head:pull/28088 PR: https://git.openjdk.org/jdk/pull/28088 From roland at openjdk.org Thu Nov 13 16:20:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Nov 2025 16:20:30 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v4] In-Reply-To: References: <-9uTmVk3XFV39gQjQp5NQsedrwYRUN2TVIaAOMB1pvA=.9819a01c-e05a-4569-a59c-0f90d3c4c161@github.com> Message-ID: On Thu, 6 Nov 2025 23:04:11 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/compile.hpp line 472: > >> 470: >> 471: int _late_inlines_pos; // Where in the queue should the next late inlining candidate go (emulate depth first inlining) >> 472: bool _has_mh_late_inlines; // Any method handle late inlining still pending? > > The comment is slightly misleading. As it is now, `_has_mh_late_inlines` signals that a late inline candidate has been observed during compilation. Right. Does the updated comment look better to you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2524085242 From epeter at openjdk.org Thu Nov 13 16:24:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Nov 2025 16:24:48 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 09:29:34 GMT, Roland Westrelin wrote: >> What if we just relax the assert? I failed to figure out what this assert is protecting us from by looking at the code. So what happens in a product build or when this assert is commented out? > >> What if we just relax the assert? I failed to figure out what this assert is protecting us from by looking at the code. So what happens in a product build or when this assert is commented out? > > For this particular test case, nothing. The assert is right before the cast nodes are removed, anyway. Once they are removed, the `AddP` in the chain all have the same base input. > The risk, I think, is if some code that transforms a chain of `AddP`s (some time before the assert) wrongly assume they all have the same base. It's also easier to write such a transformation if it's an invariant that a chain of `AddP`s have the same base (it's one less thing to worry about). @rwestrel I'm also not sure if the fix is complete, but as you say, it's not clear that it is not complete. I'm trying to think where it could fail. What if the CastPP above the phi are not yet on the worklist, because their inputs are only later going to change? But that would require CastPP to have different inputs/ctrl in the first place, and that probably cannot happen from unrolling the loop, or other similar operations, such as pre/main/post? Another thought: if it is so important that we common the just duplicated `CastPP` first, then maybe we really should take care not to duplicate them in the first place. Is that a feasible alternative approach? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3528601902 From bmaillard at openjdk.org Thu Nov 13 16:25:54 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 13 Nov 2025 16:25:54 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure Message-ID: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. In summary, this PR brings the following changes: - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. ### Example outputs #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message.

Before Missed Ideal optimization (can_reshape=false): The node was replaced by Ideal. Old node: dist dump --------------------------------------------- 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) The result after Ideal: dist dump --------------------------------------------- 1 337 ConL === 0 [[ 338 ]] #long:-9 1 336 URShiftL === _ 298 22 [[ 338 ]] 0 338 AndL === _ 336 337 [[ ]] Missed Ideal optimization (can_reshape=true): The node was replaced by Ideal. Old node: dist dump --------------------------------------------- 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) The result after Ideal: dist dump --------------------------------------------- 1 337 ConL === 0 [[ 338 340 ]] #long:-9 1 336 URShiftL === _ 298 22 [[ 338 340 ]] !orig=[339] 0 340 AndL === _ 336 337 [[ ]] # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/bmaillar/src/jdk/main/open/src/hotspot/share/opto/phaseX.cpp:1105), pid=1949581, tid=1949599 # assert(!failure) failed: Missed optimization opportunity in PhaseIterGVN # # JRE version: Java(TM) SE Runtime Environment (26.0) (fastdebug build 26-internal-bmaillar.open) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-internal-bmaillar.open, compiled mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x18897f9] PhaseIterGVN::verify_optimize()+0xd19 #
After Missed Ideal optimization (can_reshape=false): The node was replaced by Ideal. Old node: dist dump --------------------------------------------- 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) The result after Ideal: dist dump --------------------------------------------- 1 337 ConL === 0 [[ 338 ]] #long:-9 1 336 URShiftL === _ 298 22 [[ 338 ]] 0 338 AndL === _ 336 337 [[ ]] # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/bmaillar/src/jdk/JDK-8371536/open/src/hotspot/share/opto/phaseX.cpp:1096), pid=1950252, tid=1950270 # assert(!failure) failed: Missed Ideal optimization opportunity in PhaseIterGVN for URShiftL # # JRE version: Java(TM) SE Runtime Environment (26.0) (fastdebug build 26-internal-bmaillar.open) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-internal-bmaillar.open, compiled mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1889a12] PhaseIterGVN::verify_optimize()+0x8b2 #
#### [JDK-8371558: C2: Missing optimization opportunity in AbsNode::Ideal](https://bugs.openjdk.org/browse/JDK-8371534) Before, we get a confusing `Need to remove from hash before changing edges` assert. After, we get the actual cause of the failure as well as the name of the node in the assert itself.
Before Need to remove from hash before changing edges 435 SubI === _ 101 103 [[ 433 434 106 ]] !orig=[104] !jvms: TestMissingOptAbsZeroMinusX::testAbsI @ bci:16 (line 53) 434 AbsI === _ 435 [[ 433 ]] !orig=105 !jvms: TestMissingOptAbsZeroMinusX::testAbsI @ bci:20 (line 54) Set at i = 1 103 LoadI === _ 7 102 [[ 503 435 105 ]] @java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact+120 *, name=a, idx=4; #int !jvms: TestMissingOptAbsZeroMinusX::testAbsI @ bci:13 (line 53) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/bmaillar/src/jdk/main/open/src/hotspot/share/opto/phaseX.cpp:3357), pid=1956852, tid=1956870 # assert(false) failed: Need to remove from hash before changing edges # # JRE version: Java(TM) SE Runtime Environment (26.0) (fastdebug build 26-internal-bmaillar.open) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-internal-bmaillar.open, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1881b85] Node::set_req_X(unsigned int, Node*, PhaseIterGVN*)+0x1e5 #
After Missed Ideal optimization (can_reshape=false): The node was reshaped by Ideal. The result after Ideal: dist dump --------------------------------------------- 1 103 LoadI === _ 7 102 [[ 503 435 105 434 ]] @java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact+120 *, name=a, idx=4; #int !jvms: TestMissingOptAbsZeroMinusX::testAbsI @ bci:13 (line 53) 0 434 AbsI === _ 103 [[ 433 ]] !orig=105 !jvms: TestMissingOptAbsZeroMinusX::testAbsI @ bci:20 (line 54) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/bmaillar/src/jdk/JDK-8371536/open/src/hotspot/share/opto/phaseX.cpp:1096), pid=1958068, tid=1958086 # assert(!failure) failed: Missed Ideal optimization opportunity in PhaseIterGVN for AbsI # # JRE version: Java(TM) SE Runtime Environment (26.0) (fastdebug build 26-internal-bmaillar.open) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-internal-bmaillar.open, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1889a12] PhaseIterGVN::verify_optimize()+0x8b2 #
### Testing - [x] [GitHub Actions]([TODO](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371536)) - [x] tier1-4, plus some internal testing - [x] Manual testing on recent missed optimizations reproducers, including [JDK-8371558](https://bugs.openjdk.org/browse/JDK-8371558), [JDK-8371534](https://bugs.openjdk.org/browse/JDK-8371534) and [JDK-8371674](https://bugs.openjdk.org/browse/JDK-8371674) Thank you for reviewing! ------------- Commit messages: - Add comment for _table.hash_delete(n) - Change assert to print only the cause and the node name - Assert at first failure - Remove node from hash table before calling Ideal in verification Changes: https://git.openjdk.org/jdk/pull/28295/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371536 Stats: 29 lines in 1 file changed: 19 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From roland at openjdk.org Thu Nov 13 16:33:28 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Nov 2025 16:33:28 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v5] In-Reply-To: References: Message-ID: > In: > > > for (int i = 100; i < 1100; i++) { > v += floatArray[i - 100]; > Objects.checkIndex(i, longRange); > } > > > The int counted loop has both an int range check and a long range. The > int range check is optimized first. Assertion predicates are inserted > above the loop. One predicates checks that: > > > init - 100 > > The loop is then transformed to enable the optimization of the long > range check. The loop is short running, so there's no need to create a > loop nest. The counted loop is mostly left as is but, the loop's > bounds are changed from: > > > for (int i = 100; i < 1100; i++) { > > > to: > > > for (int i = 0; i < 1000; i++) { > > > The reason for that the long range check transformation expects the > loop to start at 0. > > Pre/main/post loops are created. Template Assertion predicates are > added above the main loop. The loop is unrolled. Initialized assertion > predicates are created. The one created from the condition: > > > init - 100 > > checks the value of `i` out of the pre loop which is 1. That check fails. > > The root cause of the failure is that when bounds of the counted loop > are changed, template assertion predicates need to be updated with and > adjusted init input. > > When the bounds of the loop are known, the assertion predicates can be > updated in place. Otherwise, when the loop is speculated to be short > running, the assertion predicates are updated when they are cloned. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - review - Merge branch 'master' into JDK-8366888 - Merge branch 'master' into JDK-8366888 - whitespaces - review - Merge branch 'master' into JDK-8366888 - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - ... and 2 more: https://git.openjdk.org/jdk/compare/b8119926...b0d7aab1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27250/files - new: https://git.openjdk.org/jdk/pull/27250/files/ce97c772..b0d7aab1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=03-04 Stats: 235208 lines in 1766 files changed: 151485 ins; 50188 del; 33535 mod Patch: https://git.openjdk.org/jdk/pull/27250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27250/head:pull/27250 PR: https://git.openjdk.org/jdk/pull/27250 From roland at openjdk.org Thu Nov 13 16:33:34 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Nov 2025 16:33:34 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v4] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 18:52:42 GMT, Beno?t Maillard wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8366888 >> - whitespaces >> - review >> - Merge branch 'master' into JDK-8366888 >> - Update src/hotspot/share/opto/predicates.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/predicates.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - whitespaces >> - fix > > src/hotspot/share/opto/loopnode.cpp line 1196: > >> 1194: // for (int = 0; i < stop - start; i+= stride) { ... } >> 1195: // Template Assertion Predicates added so far were with an init value of start. They need to be updated with the new >> 1196: // init value of 0: > > Not being super familiar with assertion predicates, I was a little bit confused at first. I would maybe add something along the lines of: > > Suggestion: > > // init value of 0. We want the OpaqueLoopInit node on the zero in order to be able to replace it when cloning the predicate. > > > But feel free to ignore if you think this is obvious. Thanks for having a look at this. Does the updated comment look good to you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2524131208 From mdoerr at openjdk.org Thu Nov 13 16:56:36 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 13 Nov 2025 16:56:36 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation Message-ID: This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. ------------- Commit messages: - 8371820: Further AES performance improvements for key schedule generation Changes: https://git.openjdk.org/jdk/pull/28299/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371820 Stats: 33 lines in 2 files changed: 12 ins; 8 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/28299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28299/head:pull/28299 PR: https://git.openjdk.org/jdk/pull/28299 From mdoerr at openjdk.org Thu Nov 13 16:56:37 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 13 Nov 2025 16:56:37 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 16:48:28 GMT, Martin Doerr wrote: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. @smemery: I've seen your recent improvements and performance measurements. It would be great if you could take a look at this proposal and check the performance results in your environment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3528746436 From shade at openjdk.org Thu Nov 13 19:04:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Nov 2025 19:04:56 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v4] In-Reply-To: References: Message-ID: > I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. > > At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. > > It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. > > Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. > > Additional testing: > - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails > - [x] Linux x86_64 server fastdebug, `all` tests pass > - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into JDK-8371581-ccp-spooky-nodes - More comments - More restrictive CmpP check - Tighten up comments and signatures - Do Value() once - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28288/files - new: https://git.openjdk.org/jdk/pull/28288/files/82cd8dae..748b32eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28288&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28288&range=02-03 Stats: 171409 lines in 1004 files changed: 116469 ins; 26310 del; 28630 mod Patch: https://git.openjdk.org/jdk/pull/28288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28288/head:pull/28288 PR: https://git.openjdk.org/jdk/pull/28288 From vpaprotski at openjdk.org Thu Nov 13 19:40:08 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 13 Nov 2025 19:40:08 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 16:38:49 GMT, Volodymyr Paprotski wrote: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" @ferakocz @ascarpino when you can spare some time, would appreciate a review (would like to get this into 26 if possible..) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3529414018 From psandoz at openjdk.org Thu Nov 13 19:51:03 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 13 Nov 2025 19:51:03 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer In-Reply-To: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> Message-ID: On Thu, 13 Nov 2025 09:25:34 GMT, Jatin Bhateja wrote: > > The basic type codes are declared and shared across Java and HotSpot - it's used in `LaneType`. Can we pass a single argument that is the basic type instead of two arguments. HotSpot should know from the basic type what the carrier class and also what the operation type without it being explicitly told, since presumably it knew the inverse - the basic type from the element class. > > Hi @PaulSandoz, T_HALFFLOAT used in LaneType is mainly used for differentiation of various cache keys used by conversion operation lookups. In principle, we can extend VM to acknowledge this new custom basic type on the lines of T_METADATA / T_ADDRESS; its scope for now will be restricted to VectorSupport. We can gradually expose this to C2 type, such that TypeVect for all Float16 VectorIR uses T_HALFFLOAT as its basic type; currently, we use T_SHORT as the lane type. Let me know if this looks reasonable I am proposing something simpler, really as a temporary step until `Float16` becomes part of the `java.base` module. IIUC from the basic type we can reliably determine what the two arguments we currently passing are e.g., T_HALFFLOAT = { short.class, VECTOR_TYPE_FP16 }. So we don't need to pass two arguments, we can just pass one, the intrinsic can lookup the class and operation type kind. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3529452461 From kvn at openjdk.org Thu Nov 13 20:18:10 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Nov 2025 20:18:10 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v6] In-Reply-To: <9BLlyiv2Cirdyh_CLwLvAhNu4lT-7sT_SaKjqMfEZ2g=.3794d3cf-a603-4515-9b79-2e6f08a8c30b@github.com> References: <9BLlyiv2Cirdyh_CLwLvAhNu4lT-7sT_SaKjqMfEZ2g=.3794d3cf-a603-4515-9b79-2e6f08a8c30b@github.com> Message-ID: On Thu, 13 Nov 2025 12:25:22 GMT, Anton Seoane Ampudia wrote: >> Anton Seoane Ampudia has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. > > @dlunde took a quick look and "offline reviewed" the changes, suggesting that I change the `if (C->igv_printer() != nullptr)` checks to use `should_print_igv()`. This will make things more consistent with similar cases such as [JDK-8370569](https://bugs.openjdk.org/browse/JDK-8370569). Just wanted to give a heads-up as I will be adding some extra changes now Very nice feature. Thank you for doing it @anton-seoane. May be we should consider dumping Connection graph too in a future so we can debug it during its transformations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28060#issuecomment-3529551290 From duke at openjdk.org Thu Nov 13 21:17:08 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 13 Nov 2025 21:17:08 GMT Subject: RFR: 8371121: compiler/whitebox/DeoptimizeRelocatedNMethod.java fails with C1 [v2] In-Reply-To: References: Message-ID: > [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121) > > This update aims to improve the test?s stability. A previous failure occurred because the method wasn?t compiled at the time of the check. I believe this could have occurred due to a deoptimization but I have not been able to reproduce. Previously, the test ensured compilation by repeatedly invoking the function. Instead, we now use Whitebox to add the method directly to the compile queue and wait for it to finish compiling. This approach should eliminate issues caused by deoptimization from function calls. Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: - Fix comment - Block on comp instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28246/files - new: https://git.openjdk.org/jdk/pull/28246/files/8d761d15..0cba5fc7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28246&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28246&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28246/head:pull/28246 PR: https://git.openjdk.org/jdk/pull/28246 From vlivanov at openjdk.org Thu Nov 13 23:06:04 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Nov 2025 23:06:04 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v8] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 06:51:38 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > change the early return condition Looks good. Testing results (hs-tier1 - hs-tier4) are clean. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25284#pullrequestreview-3462060162 From fyang at openjdk.org Fri Nov 14 01:13:20 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Nov 2025 01:13:20 GMT Subject: Integrated: 8371753: compiler/c2/cr7200264/TestIntVect.java fails IR verification In-Reply-To: References: Message-ID: <4xlljM7qgqX5N3P845JXwXKGElOfWSb2ldOHXlmXiXw=.ef419ac9-8032-4695-a260-7eb274291416@github.com> On Thu, 13 Nov 2025 02:48:01 GMT, Fei Yang wrote: > Hi, please consider this test-only change fixing an IR test failure. > > This test fails after https://bugs.openjdk.org/browse/JDK-8340093 which enabled IR matching for three vector nodes. > That relies on support for vector operations and will fail on platforms without that. This adds the necessary conditions > for applying this matching rule. This enables more IR matching in this test for RISC-V vector as well. > > Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. This pull request has now been integrated. Changeset: eaddefb4 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/eaddefb475c6431821c2d62baf550ba2c5f357bf Stats: 19 lines in 1 file changed: 1 ins; 0 del; 18 mod 8371753: compiler/c2/cr7200264/TestIntVect.java fails IR verification Reviewed-by: chagedorn, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/28279 From fyang at openjdk.org Fri Nov 14 01:13:20 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Nov 2025 01:13:20 GMT Subject: RFR: 8371753: compiler/c2/cr7200264/TestIntVect.java fails IR verification In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 08:10:19 GMT, Christian Hagedorn wrote: >> Hi, please consider this test-only change fixing an IR test failure. >> >> This test fails after https://bugs.openjdk.org/browse/JDK-8340093 which enabled IR matching for three vector nodes. >> That relies on support for vector operations and will fail on platforms without that. This adds the necessary conditions >> for applying this matching rule. This enables more IR matching in this test for RISC-V vector as well. >> >> Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. > > Looks good! @chhagedorn @feilongjiang : Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28279#issuecomment-3530382725 From duke at openjdk.org Fri Nov 14 01:26:03 2025 From: duke at openjdk.org (erifan) Date: Fri, 14 Nov 2025 01:26:03 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns Message-ID: `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. Current optimizations related to `VectorMaskCastNode` include: 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. This PR does the following optimizations: 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as the vector length remains the same, and this is guranteed in the api level. I conducted some simple research on different mask generation methods and mask operations, and obtained the following table, which includes some potential optimization opportunities that may use this `uncast_mask` function. mask_gen/op | toLong | anyTrue | allTrue | trueCount | firstTrue | lastTrue | and | or | xor | andNot | not | laneIsSet -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- compare | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | TBI | N/A maskAll | TBI | TBI | TBI | TBI | TBI | TBI | TBI | TBI | TBI | TBI | TBI | TBI fromLong | TBI | TBI | N/A | TBI | TBI | TBI | N/A | N/A | N/A | N/A | TBI | TBI `TBI` indicated that there may be potential optimizations here that require further investigation. Benchmarks: On a Nvidia Grace machine with 128-bit SVE2: Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 59.23 0.21 148.12 0.07 2.50 microMaskLoadCastStoreDouble128 ops/us 2.43 0.00 38.31 0.01 15.73 microMaskLoadCastStoreFloat128 ops/us 6.19 0.00 75.67 0.11 12.22 microMaskLoadCastStoreInt128 ops/us 6.19 0.00 75.67 0.03 12.22 microMaskLoadCastStoreLong128 ops/us 2.43 0.00 38.32 0.01 15.74 microMaskLoadCastStoreShort64 ops/us 28.89 0.02 75.60 0.09 2.62 On a Nvidia Grace machine with 128-bit NEON: Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 75.75 0.19 149.74 0.08 1.98 microMaskLoadCastStoreDouble128 ops/us 8.71 0.03 38.71 0.05 4.44 microMaskLoadCastStoreFloat128 ops/us 24.05 0.03 76.49 0.05 3.18 microMaskLoadCastStoreInt128 ops/us 24.06 0.02 76.51 0.05 3.18 microMaskLoadCastStoreLong128 ops/us 8.72 0.01 38.71 0.02 4.44 microMaskLoadCastStoreShort64 ops/us 24.64 0.01 76.43 0.06 3.10 On an AMD EPYC 9124 16-Core Processor with AVX3: Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 82.13 0.31 115.14 0.08 1.40 microMaskLoadCastStoreDouble128 ops/us 0.32 0.00 0.32 0.00 1.01 microMaskLoadCastStoreFloat128 ops/us 42.18 0.05 57.56 0.07 1.36 microMaskLoadCastStoreInt128 ops/us 42.19 0.01 57.53 0.08 1.36 microMaskLoadCastStoreLong128 ops/us 0.30 0.01 0.32 0.00 1.05 microMaskLoadCastStoreShort64 ops/us 42.18 0.05 57.59 0.01 1.37 On an AMD EPYC 9124 16-Core Processor with AVX2: Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 73.53 0.20 114.98 0.03 1.56 microMaskLoadCastStoreDouble128 ops/us 0.29 0.01 0.30 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 30.78 0.14 57.50 0.01 1.87 microMaskLoadCastStoreInt128 ops/us 30.65 0.26 57.50 0.01 1.88 microMaskLoadCastStoreLong128 ops/us 0.30 0.00 0.30 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 24.92 0.00 57.49 0.01 2.31 On an AMD EPYC 9124 16-Core Processor with AVX1: Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 79.68 0.01 248.49 0.91 3.12 microMaskLoadCastStoreDouble128 ops/us 0.28 0.00 0.28 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 31.11 0.04 95.48 2.27 3.07 microMaskLoadCastStoreInt128 ops/us 31.10 0.03 99.94 1.87 3.21 microMaskLoadCastStoreLong128 ops/us 0.28 0.00 0.28 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 31.11 0.02 94.97 2.30 3.05 This PR was tested on 128-bit, 256-bit, and 512-bit (QEMU) aarch64 environments, and two 512-bit x64 machines. With various configurations, including sve2, sve1, neon, avx3, avx2, avx1, sse4 and sse3, all tests passed. ------------- Commit messages: - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns Changes: https://git.openjdk.org/jdk/pull/28313/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370863 Stats: 582 lines in 7 files changed: 498 ins; 0 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From dlong at openjdk.org Fri Nov 14 02:37:03 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 14 Nov 2025 02:37:03 GMT Subject: RFR: 8366076: arm32: Fix register allocation for vector instructions In-Reply-To: <17J8mScwi2eBCPmmmTJd0ittihe0BfqZYuPgC638L8Q=.6e87a120-905f-4a30-a6f0-7e80fd613144@github.com> References: <17J8mScwi2eBCPmmmTJd0ittihe0BfqZYuPgC638L8Q=.6e87a120-905f-4a30-a6f0-7e80fd613144@github.com> Message-ID: <11z84H0pSO4eduTEEVcUelci_1MxZMimuwouswlt8W0=.a0d59c62-092c-4620-b4c2-c2ff62423c4e@github.com> On Wed, 3 Sep 2025 12:55:40 GMT, Max Verevkin wrote: > Arm32 has 32 double-precision floating point registers, the first 16 of which coincide with the 32 single-precision floating point registers. Some vector-operation nodes were implemented in terms of scalar instructions, which only really works for the first 16 doubles. This commit addresses that. src/hotspot/cpu/arm/arm_32.ad line 330: > 328: R_S16,R_S17,R_S18,R_S19, R_S20,R_S21,R_S22,R_S23, > 329: R_S24,R_S25,R_S26,R_S27, R_S28,R_S29,R_S30,R_S31); > 330: Isn't this the same as dflt_low_reg? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27071#discussion_r2525562386 From kvn at openjdk.org Fri Nov 14 03:15:10 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 14 Nov 2025 03:15:10 GMT Subject: RFR: 8371804: C2: Tighten up LoadNode::Value comments after JDK-8346184 In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 14:30:10 GMT, Aleksey Shipilev wrote: > This confused me quite a bit in [JDK-8371581](https://bugs.openjdk.org/browse/JDK-8371581) investigations. > > With [JDK-8346184](https://bugs.openjdk.org/browse/JDK-8346184), we have moved the block in `LoadNode::Value` that produced bottom values for the block that "If we are loading from a freshly-allocated object, produce a zero, if the load is provably beyond the header of the object." This comment is misleading, and really relates to the old place, which actually returns zeroes. > > It would be better to clean this up to avoid further confusion. There should be no semantic change, only the cleanup. > > Additional testing: > - [ ] GHA Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28296#pullrequestreview-3462580358 From vlivanov at openjdk.org Fri Nov 14 03:55:10 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 14 Nov 2025 03:55:10 GMT Subject: RFR: 8371804: C2: Tighten up LoadNode::Value comments after JDK-8346184 In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 14:30:10 GMT, Aleksey Shipilev wrote: > This confused me quite a bit in [JDK-8371581](https://bugs.openjdk.org/browse/JDK-8371581) investigations. > > With [JDK-8346184](https://bugs.openjdk.org/browse/JDK-8346184), we have moved the block in `LoadNode::Value` that produced bottom values for the block that "If we are loading from a freshly-allocated object, produce a zero, if the load is provably beyond the header of the object." This comment is misleading, and really relates to the old place, which actually returns zeroes. > > It would be better to clean this up to avoid further confusion. There should be no semantic change, only the cleanup. > > Additional testing: > - [ ] GHA Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28296#pullrequestreview-3462663637 From duke at openjdk.org Fri Nov 14 04:57:08 2025 From: duke at openjdk.org (Harshit470250) Date: Fri, 14 Nov 2025 04:57:08 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v3] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge remote-tracking branch 'origin/master' - remove whitespace - Resolved a bug - remove TODO comments - final size added - Final sizes - oop_decoder and load_const_optimized - error fix and added more sizes - ... and 3 more: https://git.openjdk.org/jdk/compare/12e6eb88...91ea135e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/4e02e366..91ea135e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=01-02 Stats: 180565 lines in 1256 files changed: 122678 ins; 28278 del; 29609 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From aseoane at openjdk.org Fri Nov 14 07:09:29 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Fri, 14 Nov 2025 07:09:29 GMT Subject: RFR: 8356761: IGV: dump escape analysis information In-Reply-To: <8j_40zCPi1joR0SAU9PtcIQGRSBe1eSCYUqDRpyS8Ts=.f6681026-f9f1-44bd-8e62-f68526f45d5d@github.com> References: <8j_40zCPi1joR0SAU9PtcIQGRSBe1eSCYUqDRpyS8Ts=.f6681026-f9f1-44bd-8e62-f68526f45d5d@github.com> Message-ID: On Thu, 6 Nov 2025 06:02:59 GMT, Roberto Casta?eda Lozano wrote: >> Nice improvement! I have not reviewed this PR, yet, but I just want to raise a general concern that our model of having different `PrintIdealGraphLevel` values might not fit anymore for all the different concepts (different loop opts, IGVN steps, Superword steps, parsing steps and now EA steps etc.). Maybe the time has come to use a different solution to allow some better filtering for different needs. It should not per se block this PR but could be a justification to actually tackle this. > >> I just want to raise a general concern that our model of having different `PrintIdealGraphLevel` values might not fit anymore for all the different concepts (different loop opts, IGVN steps, Superword steps, parsing steps and now EA steps etc.). Maybe the time has come to use a different solution to allow some better filtering for different needs. > > Right, see [JDK-8320070](https://bugs.openjdk.org/browse/JDK-8320070). Thanks all for the comments and reviews (and especially to @robcasloz for his lengthy review)! \integrate ------------- PR Comment: https://git.openjdk.org/jdk/pull/28060#issuecomment-3531215390 From aseoane at openjdk.org Fri Nov 14 07:11:16 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Fri, 14 Nov 2025 07:11:16 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function [v3] In-Reply-To: References: <5WxrAw8WdUVcuKyNXY1YMED3vmFneSA4jzh5T62FedU=.2b30ee68-d76b-4352-b535-5c9e9ae45b82@github.com> Message-ID: On Thu, 13 Nov 2025 13:32:59 GMT, Beno?t Maillard wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments: nit > > Marked as reviewed by bmaillard (Committer). Thanks @benoitmaillard @robcasloz for your reviews! \integrate ------------- PR Comment: https://git.openjdk.org/jdk/pull/28259#issuecomment-3531219699 From dskantz at openjdk.org Fri Nov 14 07:11:29 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Fri, 14 Nov 2025 07:11:29 GMT Subject: RFR: 8371628: C2: add a test case for the arraycopy changes in JDK-8297933 In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 15:58:00 GMT, Daniel Skantz wrote: > This PR adds a test for the arraycopy bug that caused a crash in `ArrayCopyNode::prepare_array_copy` and was fixed in JDK-8297933. A crash with this signature was previously reported on `compiler/c1/TestArrayCopy.java` but this test does not reproduce the issue (at least not reliably). > > Testing: T1-3. Extra testing: the added test reliably fails if the arraycopy changes from JDK-8297933 are backed out. Thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28269#issuecomment-3531220668 From dskantz at openjdk.org Fri Nov 14 07:11:30 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Fri, 14 Nov 2025 07:11:30 GMT Subject: Integrated: 8371628: C2: add a test case for the arraycopy changes in JDK-8297933 In-Reply-To: References: Message-ID: <40Gk13KAqXZB_T7ZD3NgBw4dLoKL386M6i0FUQRGqR4=.8b391c35-810d-44d2-a74a-355c61df0c31@github.com> On Wed, 12 Nov 2025 15:58:00 GMT, Daniel Skantz wrote: > This PR adds a test for the arraycopy bug that caused a crash in `ArrayCopyNode::prepare_array_copy` and was fixed in JDK-8297933. A crash with this signature was previously reported on `compiler/c1/TestArrayCopy.java` but this test does not reproduce the issue (at least not reliably). > > Testing: T1-3. Extra testing: the added test reliably fails if the arraycopy changes from JDK-8297933 are backed out. This pull request has now been integrated. Changeset: 1baf5164 Author: Daniel Skantz URL: https://git.openjdk.org/jdk/commit/1baf5164d6a9077e0c440b7b78be6424a052f8a9 Stats: 14 lines in 1 file changed: 13 ins; 0 del; 1 mod 8371628: C2: add a test case for the arraycopy changes in JDK-8297933 Reviewed-by: rcastanedalo, shade ------------- PR: https://git.openjdk.org/jdk/pull/28269 From chagedorn at openjdk.org Fri Nov 14 07:13:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Nov 2025 07:13:36 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Thu, 13 Nov 2025 14:25:21 GMT, Beno?t Maillard wrote: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
> Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Good idea, looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28295#pullrequestreview-3463156604 From rsunderbabu at openjdk.org Fri Nov 14 07:16:02 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Fri, 14 Nov 2025 07:16:02 GMT Subject: RFR: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support [v2] In-Reply-To: References: Message-ID: > We have a host of tests under test/hotspot/jtreg/compiler/intrinsics/sha which checks if the SHA intrinsics flags' enable/disable setting is in sync with CPU support in the underlying platform. There might be situations where the intrinsics might not be enabled despite the hardware supporting the relevant instructions. For example, there might be reliability issues or performance issues. In such situations, the tests will fail. > > Till now, the approach has been to exclude the platforms where the support is yet to be provided and remove the exclusion after. This necessitates additional work on the test front. > > A more compact design would be make predicate probes to rely on intrinsics availability in the platform as opposed to hardware support availability. The migration to intrinsics availability would especially help update releases where feature backport might not be complete. > > PS: This fix can/should be propagated to other such tests as well. Once this PR gets approval, I will work on similar tests. Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: removing requires condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28053/files - new: https://git.openjdk.org/jdk/pull/28053/files/7d0ac48e..c9cd82c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28053&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28053&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28053/head:pull/28053 PR: https://git.openjdk.org/jdk/pull/28053 From duke at openjdk.org Fri Nov 14 07:19:06 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 14 Nov 2025 07:19:06 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation In-Reply-To: References: Message-ID: <7ilzxthduL8v18I-SAyihrSyNxkep_mEYkxRBL3lHAY=.41ce1a68-928d-4fb3-a312-f2b8e4b907fd@github.com> On Thu, 13 Nov 2025 16:49:34 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > @smemery: I've seen your recent improvements and performance measurements. It would be great if you could take a look at this proposal and check the performance results in your environment. @TheRealMDoerr: I've ran your update of the init key schedule w/intrinsics logic and obtained the following results for AESReinit: x86_64: 19.51% improvement arm64: 3.11% improvement Changes in performance for the other AES-related benchmarks (AES[Decrypt].testBaseline and AESBench) had the expected nominal changes. AES regression tests (Cipher/AES and hotspot/*/aes) have passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3531238034 From duke at openjdk.org Fri Nov 14 07:21:28 2025 From: duke at openjdk.org (duke) Date: Fri, 14 Nov 2025 07:21:28 GMT Subject: RFR: 8356761: IGV: dump escape analysis information [v8] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 13:26:54 GMT, Anton Seoane Ampudia wrote: >> This PR introduces new IGV dumps, property fields and filters related to escape analysis information. >> >> The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. >> >> With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: >> >> - Node escape ?level?. >> - Scalar replaceability. >> - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). >> >> This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. >> >> Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. >> >> **Testing:** passes tiers 1-3, manual testing in IGV > > Anton Seoane Ampudia has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains five new commits since the last revision: > > - Review comments: use `should_print_igv` > - Merge branch 'JDK-8356761' of github.com:anton-seoane/jdk into JDK-8356761 > - Review comments: whitespace fix > > Co-authored-by: Christian Hagedorn > - Review comments: rename `verify` to more explicit `print_method` > - Review comments: explicit null check > > Co-authored-by: Christian Hagedorn @anton-seoane Your change (at version 371f4e901f5f35e01d99785adb500cbccab044b7) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28060#issuecomment-3531251595 From duke at openjdk.org Fri Nov 14 07:23:05 2025 From: duke at openjdk.org (duke) Date: Fri, 14 Nov 2025 07:23:05 GMT Subject: RFR: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function [v3] In-Reply-To: <5WxrAw8WdUVcuKyNXY1YMED3vmFneSA4jzh5T62FedU=.2b30ee68-d76b-4352-b535-5c9e9ae45b82@github.com> References: <5WxrAw8WdUVcuKyNXY1YMED3vmFneSA4jzh5T62FedU=.2b30ee68-d76b-4352-b535-5c9e9ae45b82@github.com> Message-ID: On Thu, 13 Nov 2025 10:36:25 GMT, Anton Seoane Ampudia wrote: >> This PR adds a "shorthand" for the common `loop->is_member(get_loop(get_ctrl(node)))` pattern in loop optimizations. >> >> In PhaseIdealLoop, there is already an `is_member` function that checks if a node is a (nested) member of an IdealLoopTree. In a similar fashion, this changeset adds a `ctrl_is_member` that aims to simplify the common pattern of: >> >> Node* node_ctrl = get_ctrl(node); >> if (loop->is_member(get_loop(node))) { ... } >> >> >> This hopes to provide a bit more readability and code conciseness in such a common operation. >> >> **Testing:** passes tiers 1-3 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Review comments: nit @anton-seoane Your change (at version c343e2c5c9aee09320ce12f64cd3ee06b1af9970) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28259#issuecomment-3531260634 From epeter at openjdk.org Fri Nov 14 07:24:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 07:24:07 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Fri, 14 Nov 2025 07:18:02 GMT, Emanuel Peter wrote: >> This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. >> >> In summary, this PR brings the following changes: >> - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. >> - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. >> - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. >> >> ### Example outputs >> #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) >> Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >>
>> Before >> >> >> Missed Ideal optimization (can_reshape=false): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 >> 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) >> 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) >> The result after Ideal: >> dist dump >> --------------------------------------------- >> 1 337 ConL === 0 [[ 338 ]] #long:-9 >> 1 336 URShiftL === _ 298 22 [[ 338 ]] >> 0 338 AndL === _ 336 337 [[ ]] >> >> >> Missed Ideal optimization (can_reshape=true): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76... > > src/hotspot/share/opto/phaseX.cpp line 1101: > >> 1099: bool failure = verify_Identity_for(n); >> 1100: assert(!failure, "Missed Identity optimization opportunity in PhaseIterGVN for %s", n->Name()); >> 1101: } > > The alternative would be to directly assert in the verify methods, but I suppose that would be a bigger code change. > > Hmm, I did see some cases in the verify methods that are maybe not directly "missed optimization opportunity" but some other kind of issue. Maybe we should assert directly for those, rather than returning and ending up at this assert. Look at: `Ideal optimization did not make progress but created new unused nodes.` And `Ideal optimization did not make progress but node hash changed.` That's all I could find now, but you should double check ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2526097636 From epeter at openjdk.org Fri Nov 14 07:24:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 07:24:05 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Thu, 13 Nov 2025 14:25:21 GMT, Beno?t Maillard wrote: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
> Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... @benoitmaillard Thanks for working on this, it will be really helpful for triaging :) src/hotspot/share/opto/phaseX.cpp line 1101: > 1099: bool failure = verify_Identity_for(n); > 1100: assert(!failure, "Missed Identity optimization opportunity in PhaseIterGVN for %s", n->Name()); > 1101: } The alternative would be to directly assert in the verify methods, but I suppose that would be a bigger code change. Hmm, I did see some cases in the verify methods that are maybe not directly "missed optimization opportunity" but some other kind of issue. Maybe we should assert directly for those, rather than returning and ending up at this assert. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28295#pullrequestreview-3463193618 PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2526093232 From aseoane at openjdk.org Fri Nov 14 07:29:00 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Fri, 14 Nov 2025 07:29:00 GMT Subject: Integrated: 8356761: IGV: dump escape analysis information In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 13:49:44 GMT, Anton Seoane Ampudia wrote: > This PR introduces new IGV dumps, property fields and filters related to escape analysis information. > > The C2 escape analysis algorithm is carried out in six primary steps, of which many have interesting sub-steps (e.g. `split_unique_types`) or present an iterative nature where access to intermediate results can aid debugging and analysis. Additionally, escape analysis relies on an "intermediate structure" called the _connection graph_, which is also particularly valuable for deeper investigations. > > With this changeset, escape analysis information is now dumped at key points throughout the algorithm, with a degree of granularity (from only the basic steps to in-detail iterative dumping). The dumps include several property fields, such as: > > - Node escape ?level?. > - Scalar replaceability. > - Node type within the connection graph (per [C2 Escape Analysis connection graph](https://wiki.openjdk.org/display/HotSpot/EscapeAnalysis)). > > This is achieved by passing the `ConnectionGraph` in use to the `IdealGraphPrinter` during escape analysis, so that these properties can be dumped. After escape analysis, remaining interesting information that is left until macro elimination (and consequent elimination of non-escaping, replaceable allocations) is also dumped. > > Additionally, two filters are provided: one for displaying the connection node type in the IGV node box, and another one for color-scaling nodes based on their escaping/scalar status. > > **Testing:** passes tiers 1-3, manual testing in IGV This pull request has now been integrated. Changeset: 0829c6ac Author: Anton Seoane Ampudia Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/0829c6acde496833300efb38b4b900bf94b99dc0 Stats: 205 lines in 10 files changed: 200 ins; 1 del; 4 mod 8356761: IGV: dump escape analysis information Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28060 From aseoane at openjdk.org Fri Nov 14 07:29:32 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Fri, 14 Nov 2025 07:29:32 GMT Subject: Integrated: 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 08:49:49 GMT, Anton Seoane Ampudia wrote: > This PR adds a "shorthand" for the common `loop->is_member(get_loop(get_ctrl(node)))` pattern in loop optimizations. > > In PhaseIdealLoop, there is already an `is_member` function that checks if a node is a (nested) member of an IdealLoopTree. In a similar fashion, this changeset adds a `ctrl_is_member` that aims to simplify the common pattern of: > > Node* node_ctrl = get_ctrl(node); > if (loop->is_member(get_loop(node))) { ... } > > > This hopes to provide a bit more readability and code conciseness in such a common operation. > > **Testing:** passes tiers 1-3 This pull request has now been integrated. Changeset: f4305923 Author: Anton Seoane Ampudia Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/f4305923fb6203089fd13cf3387c81e127ae5fe2 Stats: 40 lines in 6 files changed: 6 ins; 7 del; 27 mod 8369002: Extract the loop->is_member(get_loop(get_ctrl(node))) pattern in a new function Reviewed-by: bmaillard, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/28259 From duke at openjdk.org Fri Nov 14 07:31:13 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 14 Nov 2025 07:31:13 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation In-Reply-To: References: Message-ID: <6lsTW4mcptKcVAuFHu3h39LMajICZZVDhHwrkxM6Rl8=.787cc24a-ae13-49ed-bc56-9c71ad8659b0@github.com> On Thu, 13 Nov 2025 16:48:28 GMT, Martin Doerr wrote: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 61: > 59: // used for everything else. > 60: private int[] sessionKe = null; // key for encryption > 61: private int[] sessionKd = null; // preprocessed key for decryption We really don't need sessionKd, since it's just assigned to K, but I'm fine leaving it as is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2526136351 From shade at openjdk.org Fri Nov 14 07:35:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Nov 2025 07:35:06 GMT Subject: RFR: 8371709: Add CTW to hotspot_compiler testing In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 14:59:41 GMT, Aleksey Shipilev wrote: > CTW tests are for compiler testing, so it makes sense to run them as part of hotspot_compiler group. There are no external dependencies for CTW that processes JDK-s own modules, so we can add that. Any comments? I think this is a right thing to do, given we catch fire in CTW testing every so often. @TobiHartmann, @eme64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28268#issuecomment-3531316284 From duke at openjdk.org Fri Nov 14 07:35:09 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 14 Nov 2025 07:35:09 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 16:48:28 GMT, Martin Doerr wrote: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. src/hotspot/share/opto/library_call.cpp line 7483: > 7481: // However, ppc64 vncipher processes MixColumns and requires the same round keys with encryption. > 7482: // The ppc64 and riscv64 stubs of encryption and decryption use the same round keys (sessionK[0]). > 7483: Node* objSessionK = load_field_from_object(aescrypt_object, "sessionK", "[[I"); Good catch, as I didn't see that intrinsics wasn't using the second array element (inverse key schedule) for these platforms all this time! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2526153490 From duke at openjdk.org Fri Nov 14 07:39:05 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 14 Nov 2025 07:39:05 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 16:48:28 GMT, Martin Doerr wrote: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 941: > 939: if (decrypting) { > 940: if (sessionKd == null) { > 941: sessionKd = genInvRoundKeys(sessionKe, rounds); Good catch, as this is more efficient given that the inverse key schedule is dependent upon the (encryption) key schedule in the code's current state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2526173217 From duke at openjdk.org Fri Nov 14 07:46:31 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 14 Nov 2025 07:46:31 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 16:48:28 GMT, Martin Doerr wrote: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. Good catch in eliminating the unnecessary construction of both key schedules on the PPC64, S390, and RISCV64 architectures. src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 59: > 57: // Following attribute is specific to Intrinsics where the unprocessed > 58: // key is used for PPC64, S390, and RISCV64 architectures, whereas K is > 59: // used for everything else. I would change this to: // Following attributes (sessionKe and K) are specific to Intrinsics, where sessionKe // is the unprocessed key that is used for PPC64, S390, and RISCV64 architectures, // whereas K is used for everything else. ------------- PR Review: https://git.openjdk.org/jdk/pull/28299#pullrequestreview-3463343453 PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2526196244 From shade at openjdk.org Fri Nov 14 08:13:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Nov 2025 08:13:18 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v4] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 19:04:56 GMT, Aleksey Shipilev wrote: >> I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. >> >> At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. >> >> It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. >> >> Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails >> - [x] Linux x86_64 server fastdebug, `all` tests pass >> - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8371581-ccp-spooky-nodes > - More comments > - More restrictive CmpP check > - Tighten up comments and signatures > - Do Value() once > - Fix Linux x86_64 server fastdebug, Maven Central CTW still passes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28288#issuecomment-3531486193 From epeter at openjdk.org Fri Nov 14 08:46:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 08:46:17 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 09:46:17 GMT, Roberto Casta?eda Lozano wrote: > Hi Emanuel, thanks for improving the design of the template framework, the enforcement of "everything is a token" and the introduction of explicit scope constraints seem like a step in the right direction. Before I go on with the review, I would like to ask two high-level questions (apologies if these are already discussed, it is hard to browse through a PR history): > > * The tutorial and the Template documentation remark that we would ideally have used string templates rather than hashtag replacements. Is this still true after the introduction of explicit scoping constraints, i.e. could we still simply use string templates and still enforce the user-provided scoping rules if the feature was available? Yes, I think so. We would probably get rid of `let` and just use local variables in lambdas, and then format them directly into strings. Scopes would be colocated with lambdas, so that local variables could be local to the scopes. I'm less sure about letting local variables (instead of hashtags) escape lambdas .. that's not really possible. But maybe there would be work-arounds. > * If I got the comments in the tutorial right, it seems that the user has good control over the "transparency level" of scopes, while the transparency rules for templates are hardcoded (hashtag replacements never escape, DataNames always escape, etc.). This felt a bit surprising, would it be feasible to just let the outermost scope in a template determine the template's transparency level? The understanding seems to maybe be incomplete: > transparency rules for templates are hardcoded It is only "hardcoded" to never let hashtags and setFuelCost escape, it just implicitly downgrades a scope on those two "dimensions". But the user still has control over the name "dimension": It still needs to be possible to decide if `DataName`s and `StructuralName`s escape the Template scope, otherwise they cannot escape from a `Hook.insert`. Maybe it would be nice to force the user to use only scopes that exactly match the semantics of the implementation (only allow control if names are transparent or not). So maybe one could only use `scope` (completely non-transparent) or `scopeWithNameTransparency` (only transparent to names), and that would somehow be enforced by the Java types/interfaces of the framework. Or maybe we just throw an Exception if the wrong one is used. But it would require us to define this extra `scopeWithNameTransparency`. Do you think that is worth it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3531617479 From jbhateja at openjdk.org Fri Nov 14 09:35:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 14 Nov 2025 09:35:08 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v11] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 18:04:42 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Hi @dean-long , @iwanowww , @dlunde , I have addressed your comments. Kindly verify. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3531807253 From shade at openjdk.org Fri Nov 14 10:01:26 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Nov 2025 10:01:26 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v4] In-Reply-To: References: <0OTQGgVWIugG7uVN8afIueHEiu_3yyGkSUCSsw4P0W8=.fd43b696-34f3-4013-a863-6b85b71ce7a1@github.com> Message-ID: On Thu, 13 Nov 2025 13:19:32 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/opto/phaseX.cpp line 2856: >> >>> 2854: } >>> 2855: // CmpPNode performs deep traversals if it compares oopptr. CmpP is not notified for changes far away. >>> 2856: if (n->Opcode() == Op_CmpP) { >> >> The verification restricts it to `n->Opcode() == Op_CmpP && type(n->in(1))->isa_oopptr() && type(n->in(2))->isa_oopptr()`. How big is the difference here? Might this have a performance impact? > > Honestly, no idea. I just wanted to have a conservative check, e.g. "We know `CmpP` does something fishy? We are going to revisit it." But it will make sense to keep C2 compilation fast. Let me try to add the oopptr checks and see if anything shows up in CTW. Seems to work fine on large CTW corpus run, so I am leaving this additional check in. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28288#discussion_r2526824520 From thartmann at openjdk.org Fri Nov 14 10:08:43 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 14 Nov 2025 10:08:43 GMT Subject: RFR: 8371709: Add CTW to hotspot_compiler testing In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 14:59:41 GMT, Aleksey Shipilev wrote: > CTW tests are for compiler testing, so it makes sense to run them as part of hotspot_compiler group. There are no external dependencies for CTW that processes JDK-s own modules, so we can add that. Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28268#pullrequestreview-3464115756 From epeter at openjdk.org Fri Nov 14 10:25:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 10:25:45 GMT Subject: RFR: 8371709: Add CTW to hotspot_compiler testing In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 14:59:41 GMT, Aleksey Shipilev wrote: > CTW tests are for compiler testing, so it makes sense to run them as part of hotspot_compiler group. There are no external dependencies for CTW that processes JDK-s own modules, so we can add that. Seems reasonable to me too. @TobiHartmann Just launched some internal tests, so please hold off with integration until we are sure those passed ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28268#pullrequestreview-3464208653 From haosun at openjdk.org Fri Nov 14 11:08:10 2025 From: haosun at openjdk.org (Hao Sun) Date: Fri, 14 Nov 2025 11:08:10 GMT Subject: RFR: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support [v2] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 07:16:02 GMT, Ramkumar Sunderbabu wrote: >> We have a host of tests under test/hotspot/jtreg/compiler/intrinsics/sha which checks if the SHA intrinsics flags' enable/disable setting is in sync with CPU support in the underlying platform. There might be situations where the intrinsics might not be enabled despite the hardware supporting the relevant instructions. For example, there might be reliability issues or performance issues. In such situations, the tests will fail. >> >> Till now, the approach has been to exclude the platforms where the support is yet to be provided and remove the exclusion after. This necessitates additional work on the test front. >> >> A more compact design would be make predicate probes to rely on intrinsics availability in the platform as opposed to hardware support availability. The migration to intrinsics availability would especially help update releases where feature backport might not be complete. >> >> PS: This fix can/should be propagated to other such tests as well. Once this PR gets approval, I will work on similar tests. > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > removing requires condition Yes, I think the requires condition can be removed safely because we have a stronger check now. Thanks for your update. ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/28053#pullrequestreview-3464415982 From shade at openjdk.org Fri Nov 14 12:10:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Nov 2025 12:10:00 GMT Subject: Integrated: 8371709: Add CTW to hotspot_compiler testing In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 14:59:41 GMT, Aleksey Shipilev wrote: > CTW tests are for compiler testing, so it makes sense to run them as part of hotspot_compiler group. There are no external dependencies for CTW that processes JDK-s own modules, so we can add that. This pull request has now been integrated. Changeset: ff851de8 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/ff851de852673740542d922d1ee15a6c92b80473 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8371709: Add CTW to hotspot_compiler testing Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28268 From thartmann at openjdk.org Fri Nov 14 12:09:58 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 14 Nov 2025 12:09:58 GMT Subject: RFR: 8371709: Add CTW to hotspot_compiler testing In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 14:59:41 GMT, Aleksey Shipilev wrote: > CTW tests are for compiler testing, so it makes sense to run them as part of hotspot_compiler group. There are no external dependencies for CTW that processes JDK-s own modules, so we can add that. All green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28268#issuecomment-3532420188 From shade at openjdk.org Fri Nov 14 12:09:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Nov 2025 12:09:59 GMT Subject: RFR: 8371709: Add CTW to hotspot_compiler testing In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 14:59:41 GMT, Aleksey Shipilev wrote: > CTW tests are for compiler testing, so it makes sense to run them as part of hotspot_compiler group. There are no external dependencies for CTW that processes JDK-s own modules, so we can add that. Thank you both! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28268#issuecomment-3532425960 From mdoerr at openjdk.org Fri Nov 14 12:13:25 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 14 Nov 2025 12:13:25 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v2] In-Reply-To: References: Message-ID: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Improve comment and minor cleanup. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28299/files - new: https://git.openjdk.org/jdk/pull/28299/files/477a3dda..b03e6b43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=00-01 Stats: 8 lines in 2 files changed: 2 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28299/head:pull/28299 PR: https://git.openjdk.org/jdk/pull/28299 From mdoerr at openjdk.org Fri Nov 14 12:13:27 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 14 Nov 2025 12:13:27 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v2] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 07:41:05 GMT, Shawn M Emery wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve comment and minor cleanup. > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 59: > >> 57: // Following attribute is specific to Intrinsics where the unprocessed >> 58: // key is used for PPC64, S390, and RISCV64 architectures, whereas K is >> 59: // used for everything else. > > I would change this to: > // Following attributes (sessionKe and K) are specific to Intrinsics, where sessionKe > // is the unprocessed key that is used for PPC64, S390, and RISCV64 architectures, > // whereas K is used for everything else. Updated. I have also cleaned up the hotspot part a bit. > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 61: > >> 59: // used for everything else. >> 60: private int[] sessionKe = null; // key for encryption >> 61: private int[] sessionKd = null; // preprocessed key for decryption > > We really don't need sessionKd, since it's just assigned to K, but I'm fine leaving it as is. Currently, `sessionKd` is needed if we switch between encryption and decryption while using the same key. We could easier remove `K` and pass the information to `LibraryCallKit::get_key_start_from_aescrypt_object` if we are doing encryption or decryption. I can change that if you want, but I'm not sure if it's worth the effort. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2527275801 PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2527271643 From bmaillard at openjdk.org Fri Nov 14 13:10:09 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 14 Nov 2025 13:10:09 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v5] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 16:33:28 GMT, Roland Westrelin wrote: >> In: >> >> >> for (int i = 100; i < 1100; i++) { >> v += floatArray[i - 100]; >> Objects.checkIndex(i, longRange); >> } >> >> >> The int counted loop has both an int range check and a long range. The >> int range check is optimized first. Assertion predicates are inserted >> above the loop. One predicates checks that: >> >> >> init - 100 > >> >> The loop is then transformed to enable the optimization of the long >> range check. The loop is short running, so there's no need to create a >> loop nest. The counted loop is mostly left as is but, the loop's >> bounds are changed from: >> >> >> for (int i = 100; i < 1100; i++) { >> >> >> to: >> >> >> for (int i = 0; i < 1000; i++) { >> >> >> The reason for that the long range check transformation expects the >> loop to start at 0. >> >> Pre/main/post loops are created. Template Assertion predicates are >> added above the main loop. The loop is unrolled. Initialized assertion >> predicates are created. The one created from the condition: >> >> >> init - 100 > >> >> checks the value of `i` out of the pre loop which is 1. That check fails. >> >> The root cause of the failure is that when bounds of the counted loop >> are changed, template assertion predicates need to be updated with and >> adjusted init input. >> >> When the bounds of the loop are known, the assertion predicates can be >> updated in place. Otherwise, when the loop is speculated to be short >> running, the assertion predicates are updated when they are cloned. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8366888 > - Merge branch 'master' into JDK-8366888 > - whitespaces > - review > - Merge branch 'master' into JDK-8366888 > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 2 more: https://git.openjdk.org/jdk/compare/9a4bc181...b0d7aab1 Marked as reviewed by bmaillard (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27250#pullrequestreview-3464856654 From bmaillard at openjdk.org Fri Nov 14 13:10:12 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 14 Nov 2025 13:10:12 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v4] In-Reply-To: References: Message-ID: <4FVGOla8dvi1C1P-_4Ql6ovVPEd3P5WWg7fogCvglXI=.6a72b3de-63f7-48b5-8a9b-e340814bc5ad@github.com> On Thu, 13 Nov 2025 16:28:43 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 1196: >> >>> 1194: // for (int = 0; i < stop - start; i+= stride) { ... } >>> 1195: // Template Assertion Predicates added so far were with an init value of start. They need to be updated with the new >>> 1196: // init value of 0: >> >> Not being super familiar with assertion predicates, I was a little bit confused at first. I would maybe add something along the lines of: >> >> Suggestion: >> >> // init value of 0. We want the OpaqueLoopInit node on the zero in order to be able to replace it when cloning the predicate. >> >> >> But feel free to ignore if you think this is obvious. > > Thanks for having a look at this. Does the updated comment look good to you? Yes, this works for me, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27250#discussion_r2527437488 From aseoane at openjdk.org Fri Nov 14 13:19:10 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Fri, 14 Nov 2025 13:19:10 GMT Subject: RFR: 8356761: IGV: dump escape analysis information In-Reply-To: References: <8j_40zCPi1joR0SAU9PtcIQGRSBe1eSCYUqDRpyS8Ts=.f6681026-f9f1-44bd-8e62-f68526f45d5d@github.com> Message-ID: On Fri, 14 Nov 2025 07:06:18 GMT, Anton Seoane Ampudia wrote: >>> I just want to raise a general concern that our model of having different `PrintIdealGraphLevel` values might not fit anymore for all the different concepts (different loop opts, IGVN steps, Superword steps, parsing steps and now EA steps etc.). Maybe the time has come to use a different solution to allow some better filtering for different needs. >> >> Right, see [JDK-8320070](https://bugs.openjdk.org/browse/JDK-8320070). > > Thanks all for the comments and reviews (and especially to @robcasloz for his lengthy review)! > > \integrate > Very nice feature. Thank you for doing it @anton-seoane. > > May be we should consider dumping Connection graph too in a future so we can debug it during its transformations. @vnkozlov I have filed [JDK-8371904](https://bugs.openjdk.org/browse/JDK-8371904) to track this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28060#issuecomment-3532694836 From rcastanedalo at openjdk.org Fri Nov 14 13:20:12 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 14 Nov 2025 13:20:12 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 08:43:14 GMT, Emanuel Peter wrote: > We would probably get rid of `let` and just use local variables in lambdas, and then format them directly into strings. But wouldn't this take us out of the "everything is a token" design again? What I am getting at is that the comments in the tutorial and the documentation suggesting that string templates could be used to provide the same functionality could confuse the user (they at least did challenge my mental model of the framework), and maybe it would be worth removing them, or at least adding some nuance to them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3532709646 From epeter at openjdk.org Fri Nov 14 13:45:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 13:45:28 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: <0sNFPS5IDEuMwyosF0_qS0Z3Y-W6cZ5DUWv9hQng-wQ=.ad37188d-1476-47e8-abf9-0ed21678b0f0@github.com> On Fri, 14 Nov 2025 13:17:11 GMT, Roberto Casta?eda Lozano wrote: >>> Hi Emanuel, thanks for improving the design of the template framework, the enforcement of "everything is a token" and the introduction of explicit scope constraints seem like a step in the right direction. Before I go on with the review, I would like to ask two high-level questions (apologies if these are already discussed, it is hard to browse through a PR history): >>> >>> * The tutorial and the Template documentation remark that we would ideally have used string templates rather than hashtag replacements. Is this still true after the introduction of explicit scoping constraints, i.e. could we still simply use string templates and still enforce the user-provided scoping rules if the feature was available? >> >> Yes, I think so. We would probably get rid of `let` and just use local variables in lambdas, and then format them directly into strings. Scopes would be colocated with lambdas, so that local variables could be local to the scopes. I'm less sure about letting local variables (instead of hashtags) escape lambdas .. that's not really possible. But maybe there would be work-arounds. >> >>> * If I got the comments in the tutorial right, it seems that the user has good control over the "transparency level" of scopes, while the transparency rules for templates are hardcoded (hashtag replacements never escape, DataNames always escape, etc.). This felt a bit surprising, would it be feasible to just let the outermost scope in a template determine the template's transparency level? >> >> The understanding seems to maybe be incomplete: >> >>> transparency rules for templates are hardcoded >> >> It is only "hardcoded" to never let hashtags and setFuelCost escape, it just implicitly downgrades a scope on those two "dimensions". >> >> But the user still has control over the name "dimension": >> It still needs to be possible to decide if `DataName`s and `StructuralName`s escape the Template scope, otherwise they cannot escape from a `Hook.insert`. >> >> Maybe it would be nice to force the user to use only scopes that exactly match the semantics of the implementation (only allow control if names are transparent or not). So maybe one could only use `scope` (completely non-transparent) or `scopeWithNameTransparency` (only transparent to names), and that would somehow be enforced by the Java types/interfaces of the framework. Or maybe we just throw an Exception if the wrong one is used. But it would require us to define this extra `scopeWithNameTransparency`. Do you think... > >> We would probably get rid of `let` and just use local variables in lambdas, and then format them directly into strings. > > But wouldn't this take us out of the "everything is a token" design again? What I am getting at is that the comments in the tutorial and the documentation suggesting that string templates could be used to provide the same functionality could confuse the user (they at least did challenge my mental model of the framework), and maybe it would be worth removing them, or at least adding some nuance to them. @robcasloz I suppose we could just remove them, they are just a bit of background. > But wouldn't this take us out of the "everything is a token" design again? I don't think so. It just avoids having to do the detour via `let` and hashtags, and would allow the values to go directly into the string. The formatted strings would then be the tokens. So you would end up with fewer tokens: instead of a `let` and the string with the hashtag, you would just have a templated string that injects the variable into it. But of course, the Template Framework provides more than just string formatting: it also has the `fuel` concept, and the `DataName/StructuralName` concept. And for those, we would still need the explicit scoping. Especially for names - those are fundamentally related to scopes, just because Java has "names in scopes". > at least adding some nuance to them. Ok, well let me try that... So I looked for mentions of "string template", and I only found the passage below from `Template.java`. Let me know if there are more that you were thinking about. 159 * Ideally, we would have used string templates to inject these Template 160 * arguments into the strings. But since string templates are not (yet) available, the Templates provide 161 * hashtag replacements in the {@link String}s: the Template argument names are captured, and 162 * the argument values automatically replace any {@code "#name"} in the {@link String}s. See the different overloads 163 * of {@link #make} for examples. Additional hashtag replacements can be defined with {@link #let}. 164 * I think that's still largely accurate now. We need hashtag replacements because we don't have string templating. But that does not mean we would get rid of the Template Framework once we have string templating. Though I suppose we might be able to get rid of hashtag replacements at that point. @robcasloz What do you think: does it still need more nuance? Should I just delete it? Or keep as is? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3532872057 From duke at openjdk.org Fri Nov 14 13:48:11 2025 From: duke at openjdk.org (Vishal Chand) Date: Fri, 14 Nov 2025 13:48:11 GMT Subject: RFR: 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing Message-ID: This PR fixes a potential SEGV and removes dead code: ? **Fix**: Prevent potential SEGV in VTransformReductionVectorNode - [from @shipilev] This fixes a crash in diagnostic code when isa_ReductionVector() unexpectedly returns nullptr. While this indicates the graph is already corrupted, the additional crash in `TRACE_OPTIMIZE` makes debugging harder. The fix adds defensive null checking to prevent the diagnostic crash and improve error handling. ? **Cleanup**: Remove unused first_red variable ------------- Commit messages: - 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing Changes: https://git.openjdk.org/jdk/pull/28323/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28323&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371881 Stats: 6 lines in 1 file changed: 4 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28323.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28323/head:pull/28323 PR: https://git.openjdk.org/jdk/pull/28323 From rcastanedalo at openjdk.org Fri Nov 14 14:00:21 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 14 Nov 2025 14:00:21 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: <0sNFPS5IDEuMwyosF0_qS0Z3Y-W6cZ5DUWv9hQng-wQ=.ad37188d-1476-47e8-abf9-0ed21678b0f0@github.com> References: <0sNFPS5IDEuMwyosF0_qS0Z3Y-W6cZ5DUWv9hQng-wQ=.ad37188d-1476-47e8-abf9-0ed21678b0f0@github.com> Message-ID: <0a6xhV0OUOj5h6xptVwqM_ji-Hx9pHwg4YQ_ypsWbg0=.d5959e03-ea9e-4cbd-b776-ad06303cd0e7@github.com> On Fri, 14 Nov 2025 13:42:25 GMT, Emanuel Peter wrote: > So I looked for mentions of "string template", and I only found the passage below from `Template.java`. Let me know if there are more that you were thinking about. There are a few mentions at TestTutorial.java. > I think that's still largely accurate now. We need hashtag replacements because we don't have string templating. But that does not mean we would get rid of the Template Framework once we have string templating. Though I suppose we might be able to get rid of hashtag replacements at that point. > > @robcasloz What do you think: does it still need more nuance? Should I just delete it? Or keep as is? Thanks for the explanation. I'm still failing to see how one would combine string templates and explicit scoping, but it is good enough for me at this point if you have a clear plan of how it could be done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3532929124 From epeter at openjdk.org Fri Nov 14 14:21:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 14:21:35 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: <0a6xhV0OUOj5h6xptVwqM_ji-Hx9pHwg4YQ_ypsWbg0=.d5959e03-ea9e-4cbd-b776-ad06303cd0e7@github.com> References: <0sNFPS5IDEuMwyosF0_qS0Z3Y-W6cZ5DUWv9hQng-wQ=.ad37188d-1476-47e8-abf9-0ed21678b0f0@github.com> <0a6xhV0OUOj5h6xptVwqM_ji-Hx9pHwg4YQ_ypsWbg0=.d5959e03-ea9e-4cbd-b776-ad06303cd0e7@github.com> Message-ID: On Fri, 14 Nov 2025 13:57:08 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz I suppose we could just remove them, they are just a bit of background. >> >>> But wouldn't this take us out of the "everything is a token" design again? >> >> I don't think so. It just avoids having to do the detour via `let` and hashtags, and would allow the values to go directly into the string. The formatted strings would then be the tokens. So you would end up with fewer tokens: instead of a `let` and the string with the hashtag, you would just have a templated string that injects the variable into it. >> >> But of course, the Template Framework provides more than just string formatting: it also has the `fuel` concept, and the `DataName/StructuralName` concept. And for those, we would still need the explicit scoping. Especially for names - those are fundamentally related to scopes, just because Java has "names in scopes". >> >>> at least adding some nuance to them. >> >> Ok, well let me try that... >> >> So I looked for mentions of "string template", and I only found the passage below from `Template.java`. Let me know if there are more that you were thinking about. >> >> >> 159 * Ideally, we would have used string templates to inject these Template >> 160 * arguments into the strings. But since string templates are not (yet) available, the Templates provide >> 161 * hashtag replacements in the {@link String}s: the Template argument names are captured, and >> 162 * the argument values automatically replace any {@code "#name"} in the {@link String}s. See the different overloads >> 163 * of {@link #make} for examples. Additional hashtag replacements can be defined with {@link #let}. >> 164 * >> >> >> I think that's still largely accurate now. We need hashtag replacements because we don't have string templating. But that does not mean we would get rid of the Template Framework once we have string templating. Though I suppose we might be able to get rid of hashtag replacements at that point. >> >> @robcasloz What do you think: does it still need more nuance? Should I just delete it? Or keep as is? > >> So I looked for mentions of "string template", and I only found the passage below from `Template.java`. Let me know if there are more that you were thinking about. > > There are a few mentions at TestTutorial.java. > >> I think that's still largely accurate now. We need hashtag replacements because we don't have string templating. But that does not mean we would get rid of the Template Framework once we have string templating. Though I suppose we might be able to get rid of hashtag replacements at that point. >> >> @robcasloz What do you think: does it still need more nuance? Should I just delete it? Or keep as is? > > Thanks for the explanation. I'm still failing to see how one would combine string templates and explicit scoping, but it is good enough for me at this point if you have a clear plan of how it could be done. @robcasloz > Thanks for the explanation. I'm still failing to see how one would combine string templates and explicit scoping, but it is good enough for me at this point if you have a clear plan of how it could be done. Right, it is all quite hypothetical anyway, and I don't think quite relevant to this PR. But let me still try to explain with some quick example, how it may look like with string template: var template = Template.make((String arg1, Integer arg2) -> scope( // Note: no hashtag definition of arg1 and arg2 any more. // Note: arg1 and arg2 are essecially colocated with the scope of the Template above. // Now let's use arg1 and arg2 in some templated string: f"testing {arg1} testing {arg2 + 1} testing\n", // Above, we could inject variables and even computations into the string, and that produces our first token. // Now let's define some DataName, and then sample: addDataName("x", ...), dataNames(...)....sample((DataName dn) -> scope( // Note: dn is colocated with the scope of "sample". "testing {dn.name()} testing {dn.type()} testing\n" )) )); One concern would probably be that we need a way to define local variables in a scope, so that we can define one value (e.g. randomly generated), and use that exact value multiple times. Maybe that would then still require some modified version of `let`: let(RANDOM.next(), (Integer x) -> scope( "testing {x} testing {x} testing", )) Again: the `scope` would not directly interact with the string templates, but the Template Framework would ensure that they are colocated: local variables are basically only created as lambda arguments that are live for the "scope" of the lambda, which by design of the Template Framework, is colocated with the `scope`s. I hope that makes some sense and illustrates the alternative we would have with string templates - once they would be available ;) > There are a few mentions at TestTutorial.java. Right, good catch. I can find these cases where string templates are mentioned: 159 // It would have been optimal to use Java String Templates to format 160 // argument values into Strings. However, since these are not (yet) 161 // available, the Template Framework provides two alternative ways of 162 // formatting Strings: 163 // 1) By appending to the comma-separated list of Tokens passed to scope(). 164 // Appending as a Token works whenever one has a reference to the Object 165 // in Java code. But often, this is rather cumbersome and looks awkward, 166 // given all the additional quotes and commands required. Hence, it 167 // is encouraged to only use this method when necessary. 168 // 2) By hashtag replacements inside a single string. One can either 169 // use "#arg" directly, or use brackets "#{arg}". When possible, one 170 // should prefer avoiding the brackets, as they create additional 171 // noise. However, there are cases where they are useful, for 172 // example "#TYPE_CON" would be parsed as a hashtag replacement 173 // for the hashtag name "TYPE_CON", whereas "#{TYPE}_CON" is 174 // parsed as hashtag name "TYPE", followed by literal string "_CON". 175 // See also: generateWithHashtagAndDollarReplacements2 176 // There are two ways to define the value of a hashtag replacement: 177 // a) Capturing Template arguments as Strings. 178 // b) Using a "let" definition (see examples further down). 179 // Which one should be preferred is a code style question. Generally, we 180 // prefer the use of hashtag replacements because that allows easy use of 181 // multiline strings (i.e. text blocks). And 207 // Example with hashtag replacements (arguments and let), and $-name renamings. 208 // Note: hashtag replacements are a workaround for the missing string templates. 209 // If we had string templates, we could just capture the typed lambda 210 // arguments, and use them directly in the String via string templating. This still sounds fine to me. But it is my own writing, so of course it makes sense to me. Let me know if something specific about it is unclear / misleading for you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3532997671 PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3533012992 From rcastanedalo at openjdk.org Fri Nov 14 14:21:37 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 14 Nov 2025 14:21:37 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 08:43:14 GMT, Emanuel Peter wrote: > It is only "hardcoded" to never let hashtags and setFuelCost escape, it just implicitly downgrades a scope on those two "dimensions". Is this a design choice or a constraint of the current implementation? I could imagine situations in which it could be useful to let a hashtag escape across Template boundaries, no? Something like: var innerTemplate = Template.make(() -> transparentScope(let("foo", "42"))); var outerTemplate = Template.make(() -> scope( innerTemplate.asToken(), "// value of foo: #foo" )); outerTemplate.render(); ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3533008770 From epeter at openjdk.org Fri Nov 14 14:29:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 14:29:16 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 14:16:06 GMT, Roberto Casta?eda Lozano wrote: > > It is only "hardcoded" to never let hashtags and setFuelCost escape, it just implicitly downgrades a scope on those two "dimensions". > > Is this a design choice or a constraint of the current implementation? I could imagine situations in which it could be useful to let a hashtag escape across Template boundaries, no? Something like: > > ``` > var innerTemplate = Template.make(() -> transparentScope(let("foo", "42"))); > var outerTemplate = Template.make(() -> scope( > innerTemplate.asToken(), > "// value of foo: #foo" > )); > outerTemplate.render(); > ``` I think this would lead to issues once you use a template recursively. What would you do if `foo` was already defined, and now you call `innerTemplate`? So I suppose it is a choice, yes. But I don't think the alternatives would be better. - You could just throw an exception at the second definition. But then you would need a way to check for existence of hashtag names ... not great. - You could just hide outer definitions... basically they would work like scoped values: you can bind and re-bind them. But that brings its own complexity that I don't want to push on the users if it's not absolutely necessary. If you really do need access to something from an outer template, you should just pass it via template argument. That makes the flow explicit. That's my opinion. Or do you already see some case where something like a "scoped value" would be really really useful? I suppose we could still add that in the future. Another thought: hooks are a bit like "scoped value" ... except that they carry no "value" ;) What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3533046093 From rcastanedalo at openjdk.org Fri Nov 14 14:39:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 14 Nov 2025 14:39:57 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: <1gI7e9aF37hlcKkf0VX3WECh2gYNrxrjHkUJXvuhc68=.157a74bc-6f77-48a2-a3f4-f6614d821ce1@github.com> On Fri, 14 Nov 2025 14:16:06 GMT, Roberto Casta?eda Lozano wrote: >>> Hi Emanuel, thanks for improving the design of the template framework, the enforcement of "everything is a token" and the introduction of explicit scope constraints seem like a step in the right direction. Before I go on with the review, I would like to ask two high-level questions (apologies if these are already discussed, it is hard to browse through a PR history): >>> >>> * The tutorial and the Template documentation remark that we would ideally have used string templates rather than hashtag replacements. Is this still true after the introduction of explicit scoping constraints, i.e. could we still simply use string templates and still enforce the user-provided scoping rules if the feature was available? >> >> Yes, I think so. We would probably get rid of `let` and just use local variables in lambdas, and then format them directly into strings. Scopes would be colocated with lambdas, so that local variables could be local to the scopes. I'm less sure about letting local variables (instead of hashtags) escape lambdas .. that's not really possible. But maybe there would be work-arounds. >> >>> * If I got the comments in the tutorial right, it seems that the user has good control over the "transparency level" of scopes, while the transparency rules for templates are hardcoded (hashtag replacements never escape, DataNames always escape, etc.). This felt a bit surprising, would it be feasible to just let the outermost scope in a template determine the template's transparency level? >> >> The understanding seems to maybe be incomplete: >> >>> transparency rules for templates are hardcoded >> >> It is only "hardcoded" to never let hashtags and setFuelCost escape, it just implicitly downgrades a scope on those two "dimensions". >> >> But the user still has control over the name "dimension": >> It still needs to be possible to decide if `DataName`s and `StructuralName`s escape the Template scope, otherwise they cannot escape from a `Hook.insert`. >> >> Maybe it would be nice to force the user to use only scopes that exactly match the semantics of the implementation (only allow control if names are transparent or not). So maybe one could only use `scope` (completely non-transparent) or `scopeWithNameTransparency` (only transparent to names), and that would somehow be enforced by the Java types/interfaces of the framework. Or maybe we just throw an Exception if the wrong one is used. But it would require us to define this extra `scopeWithNameTransparency`. Do you think... > >> It is only "hardcoded" to never let hashtags and setFuelCost escape, it just implicitly downgrades a scope on those two "dimensions". > > Is this a design choice or a constraint of the current implementation? I could imagine situations in which it could be useful to let a hashtag escape across Template boundaries, no? Something like: > > > var innerTemplate = Template.make(() -> transparentScope(let("foo", "42"))); > var outerTemplate = Template.make(() -> scope( > innerTemplate.asToken(), > "// value of foo: #foo" > )); > outerTemplate.render(); > @robcasloz > > > Thanks for the explanation. I'm still failing to see how one would combine string templates and explicit scoping, but it is good enough for me at this point if you have a clear plan of how it could be done. > > Right, it is all quite hypothetical anyway, and I don't think quite relevant to this PR. > > But let me still try to explain with some quick example, how it may look like with string template: > > ``` > var template = Template.make((String arg1, Integer arg2) -> scope( > // Note: no hashtag definition of arg1 and arg2 any more. > // Note: arg1 and arg2 are essecially colocated with the scope of the Template above. > // Now let's use arg1 and arg2 in some templated string: > f"testing {arg1} testing {arg2 + 1} testing\n", > // Above, we could inject variables and even computations into the string, and that produces our first token. > // Now let's define some DataName, and then sample: > addDataName("x", ...), > dataNames(...)....sample((DataName dn) -> scope( > // Note: dn is colocated with the scope of "sample". > "testing {dn.name()} testing {dn.type()} testing\n" > )) > )); > ``` > > One concern would probably be that we need a way to define local variables in a scope, so that we can define one value (e.g. randomly generated), and use that exact value multiple times. Maybe that would then still require some modified version of `let`: > > ``` > let(RANDOM.next(), (Integer x) -> scope( > "testing {x} testing {x} testing", > )) > ``` > > Again: the `scope` would not directly interact with the string templates, but the Template Framework would ensure that they are colocated: local variables are basically only created as lambda arguments that are live for the "scope" of the lambda, which by design of the Template Framework, is colocated with the `scope`s. > > I hope that makes some sense and illustrates the alternative we would have with string templates - once they would be available ;) OK, thanks for elaborating! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3533085266 From rcastanedalo at openjdk.org Fri Nov 14 14:44:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 14 Nov 2025 14:44:54 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: <8hzrSIP5a5sdMvLuTH_my204dNYT6sHOJDfOeqA8qI0=.d0d9a056-c6ac-49a4-9d44-0afbeaa473dc@github.com> On Fri, 14 Nov 2025 14:25:45 GMT, Emanuel Peter wrote: > So I suppose it is a choice, yes. But I don't think the alternatives would be better. > > * You could just throw an exception at the second definition. But then you would need a way to check for existence of hashtag names ... not great. > * You could just hide outer definitions... basically they would work like scoped values: you can bind and re-bind them. But that brings its own complexity that I don't want to push on the users if it's not absolutely necessary. > > If you really do need access to something from an outer template, you should just pass it via template argument. That makes the flow explicit. That's my opinion. Sounds OK to me, I think it would be worth capturing this rationale and advisory somewhere in the documentation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3533101399 From epeter at openjdk.org Fri Nov 14 14:58:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 14:58:36 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: <8hzrSIP5a5sdMvLuTH_my204dNYT6sHOJDfOeqA8qI0=.d0d9a056-c6ac-49a4-9d44-0afbeaa473dc@github.com> References: <8hzrSIP5a5sdMvLuTH_my204dNYT6sHOJDfOeqA8qI0=.d0d9a056-c6ac-49a4-9d44-0afbeaa473dc@github.com> Message-ID: <6saHdEPSO2x41tozEyZiHvHM1yeiwt1XO6zt3r3to9I=.6775a5d1-2c92-40d0-9903-b962192a185e@github.com> On Fri, 14 Nov 2025 14:40:49 GMT, Roberto Casta?eda Lozano wrote: > > So I suppose it is a choice, yes. But I don't think the alternatives would be better. > > > > * You could just throw an exception at the second definition. But then you would need a way to check for existence of hashtag names ... not great. > > * You could just hide outer definitions... basically they would work like scoped values: you can bind and re-bind them. But that brings its own complexity that I don't want to push on the users if it's not absolutely necessary. > > > > If you really do need access to something from an outer template, you should just pass it via template argument. That makes the flow explicit. That's my opinion. > > Sounds OK to me, I think it would be worth capturing this rationale and advisory somewhere in the documentation. Where do you think it should go? Some ideas: - At the `let` definition, I could say that the hashtags won't go into a nested template, and if one wants to pass something to a nested template it should be done with template arguments. And in the turorial I could add a note somewhere. That's for the advice. - About the rationale for the choice of having hashtags be local to a Template ... I don't know where to put that really. It does not belong into the API documentation, and also not really into the tutorial so much. I'll make some code change suggestions, and then you can give me feedback on if that's sufficient for you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3533159147 From epeter at openjdk.org Fri Nov 14 15:09:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 15:09:33 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 15:42:55 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > inflate abreviations to full names I could append to this section: 159 * Ideally, we would have used string templates to inject these Template 160 * arguments into the strings. But since string templates are not (yet) available, the Templates provide 161 * hashtag replacements in the {@link String}s: the Template argument names are captured, and 162 * the argument values automatically replace any {@code "#name"} in the {@link String}s. See the different overloads 163 * of {@link #make} for examples. Additional hashtag replacements can be defined with {@link #let}. Proposal: We have decided to keep hashtag replacements constrained to the scope of one Template. They do not escape to outer or inner Template uses. If one needs to pass values to inner Templates, this can be done with Template arguments. Keeping hashtag replacements local to Templates has the benefit that there is no conflict in recursive templates, where outer and inner Templates define the same hashtag replacement. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 857: > 855: * for anything that follows it, until the the end of the next outer scope > 856: * that is non-transparent for hashtag replacements. Additionally, hashtag > 857: * replacements are limited to the template they were defined in. Suggestion: * Note that a {@code let} definition makes the hashtag replacement available * for anything that follows it, until the the end of the next outer scope * that is non-transparent for hashtag replacements. Additionally, hashtag * replacements are limited to the template they were defined in. * If you want to pass values from an outer to an inner template, this cannot * be done with hashtags directly, instead one has to pass the values via * template arguments, and use {@code let} in each template that requires * the hashtag replacement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3533195851 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2527814913 From epeter at openjdk.org Fri Nov 14 15:14:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 15:14:50 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 15:42:55 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > inflate abreviations to full names test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 216: > 214: // are only there to facilitate string templating within the limited > 215: // scope of a template. You may consider it like a "local variable" > 216: // for code generation purposes only. Here I would append: If you need to pass some value to a nested Template, consider using a Template argument, and capturing that Template argument. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2527853248 From epeter at openjdk.org Fri Nov 14 15:22:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 15:22:18 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v30] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: document hashtag locality for Roberto ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/32488900..bba71529 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=28-29 Stats: 10 lines in 2 files changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Fri Nov 14 15:22:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 15:22:20 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: <8hzrSIP5a5sdMvLuTH_my204dNYT6sHOJDfOeqA8qI0=.d0d9a056-c6ac-49a4-9d44-0afbeaa473dc@github.com> References: <8hzrSIP5a5sdMvLuTH_my204dNYT6sHOJDfOeqA8qI0=.d0d9a056-c6ac-49a4-9d44-0afbeaa473dc@github.com> Message-ID: On Fri, 14 Nov 2025 14:40:49 GMT, Roberto Casta?eda Lozano wrote: >>> > It is only "hardcoded" to never let hashtags and setFuelCost escape, it just implicitly downgrades a scope on those two "dimensions". >>> >>> Is this a design choice or a constraint of the current implementation? I could imagine situations in which it could be useful to let a hashtag escape across Template boundaries, no? Something like: >>> >>> ``` >>> var innerTemplate = Template.make(() -> transparentScope(let("foo", "42"))); >>> var outerTemplate = Template.make(() -> scope( >>> innerTemplate.asToken(), >>> "// value of foo: #foo" >>> )); >>> outerTemplate.render(); >>> ``` >> >> I think this would lead to issues once you use a template recursively. What would you do if `foo` was already defined, and now you call `innerTemplate`? >> >> So I suppose it is a choice, yes. But I don't think the alternatives would be better. >> - You could just throw an exception at the second definition. But then you would need a way to check for existence of hashtag names ... not great. >> - You could just hide outer definitions... basically they would work like scoped values: you can bind and re-bind them. But that brings its own complexity that I don't want to push on the users if it's not absolutely necessary. >> >> If you really do need access to something from an outer template, you should just pass it via template argument. That makes the flow explicit. That's my opinion. >> >> Or do you already see some case where something like a "scoped value" would be really really useful? I suppose we could still add that in the future. Another thought: hooks are a bit like "scoped value" ... except that they carry no "value" ;) >> >> What do you think? > >> So I suppose it is a choice, yes. But I don't think the alternatives would be better. >> >> * You could just throw an exception at the second definition. But then you would need a way to check for existence of hashtag names ... not great. >> * You could just hide outer definitions... basically they would work like scoped values: you can bind and re-bind them. But that brings its own complexity that I don't want to push on the users if it's not absolutely necessary. >> >> If you really do need access to something from an outer template, you should just pass it via template argument. That makes the flow explicit. That's my opinion. > > Sounds OK to me, I think it would be worth capturing this rationale and advisory somewhere in the documentation. @robcasloz Alright, I added some extra documentation of hashtag locality with [bba7152](https://github.com/openjdk/jdk/pull/27255/commits/bba71529ac081b0f53cd5106f57fb541bd3e0ead) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3533254286 From chagedorn at openjdk.org Fri Nov 14 15:29:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Nov 2025 15:29:15 GMT Subject: RFR: 8371789: C2: More explicit dump results for TypePtr In-Reply-To: References: <79SI3TPjmPgPVt7pavCYb1XtG_bYVhIYGFMNEIvA5rg=.b1f32ffc-6380-4745-bd8f-b79c5b824303@github.com> Message-ID: <2A0Q18o4WdL-lqXTqfge7YVp7LfPy8HzFkvNW4mNnpU=.ccdb6ed6-cda4-42e7-9e0e-dc2aac52ae32@github.com> On Thu, 13 Nov 2025 15:30:47 GMT, Quan Anh Mai wrote: >> That looks like a nice readability improvement! Can you show some before vs. after output to summarize your changes? > > @chhagedorn Yes, for example: > > A byte array: > > Before: > byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact * > After: > aryptr:byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):BotPTR:exact,iid=bot > > A `j.l.Object`: > > Before: > narrowoop: java/lang/Object * > After: > narrowoop: instptr:java/lang/Object:BotPTR+0,iid=bot > > A pointer to the klass of `Object[]`: > > Before: > precise [java/lang/Object: 0x00007011e800b840 * (java/lang/Cloneable,java/io/Serializable): :Constant:exact * > After: > aryklassptr:[instklassptr:java/lang/Object:NotNull+0 (java/lang/Cloneable,java/io/Serializable):Constant+0 Thanks @merykitty for the examples! I will have a closer look at your PR next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28292#issuecomment-3533279562 From epeter at openjdk.org Fri Nov 14 15:32:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 15:32:56 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v24] In-Reply-To: <_bcSw73-0z5kaEyGsSz9Kox24TnFg7KxrakwEoEaf0k=.ab80c231-1afd-4cc9-8982-0db0d0311a43@github.com> References: <_bcSw73-0z5kaEyGsSz9Kox24TnFg7KxrakwEoEaf0k=.ab80c231-1afd-4cc9-8982-0db0d0311a43@github.com> Message-ID: On Thu, 13 Nov 2025 02:57:59 GMT, Vladimir Ivanov wrote: >> You have a few tests already, but I'd love to see some IR tests. You could even check for the presence of `ReachabilityFenceNode` during some phase and then see if it goes away. Nice would be if we could even track if a SafePoint has a RF edge attached, but not sure how easy that is. >> >> It would allow us not only to check for correctness, and hoping that we would catch incorrect cases with a crash/wrong result. But it would allow us to verify the graph, including the optimizations. >> >> What do you think? > >> You have a few tests already, but I'd love to see some IR tests. You could even check for the presence of ReachabilityFenceNode during some phase and then see if it goes away. Nice would be if we could even track if a SafePoint has a RF edge attached, but not sure how easy that is. >> It would allow us not only to check for correctness, and hoping that we would catch incorrect cases with a crash/wrong result. But it would allow us to verify the graph, including the optimizations. > > The main complications with IR tests I see are: > (1) very few cases where RF node is missing are known and all of them have already have a dedicated regression test; > (2) the invariants RF imposes on the graph are non-local and it's hard to check them by inspecting IR. > > There's the transformation for loop-invariant referent I could try to add an IR unit test for, but I don't know how suitable IR test framework is for such scenario. > > Overall, I'd prefer to leave it as is for now and explore opportunities for IR tests as part of general effort to improve RF test coverage. @iwanowww I suppose adding more IR tests now would ensure that the implementation you are now adding does exactly what we expect, and not something different that just happens to behave correct. Examples would also help for the review. We have quite a machinery here, and understanding how the steps come together would be great. So it would be nice if you could annotate the tests we already do have: what do you expect happens to the RF's, the reference, the SafePoints, and the other relevant nodes? Maybe we can then turn some of those annotations into IR nodes right away, and some are probably too difficult. But I do see that the current regex matching on single IR nodes is a bit tricky here. It only allows you to detect the presence of a `ReachabilityFence` node, but not where it is and what it's connected to. This could be a motivation for extending the IR framework to allow some kind of graph matching. Though I do see that this could get quite complex. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3533295028 From shade at openjdk.org Fri Nov 14 15:33:57 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Nov 2025 15:33:57 GMT Subject: RFR: 8371804: C2: Tighten up LoadNode::Value comments after JDK-8346184 In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 14:30:10 GMT, Aleksey Shipilev wrote: > This confused me quite a bit in [JDK-8371581](https://bugs.openjdk.org/browse/JDK-8371581) investigations. > > With [JDK-8346184](https://bugs.openjdk.org/browse/JDK-8346184), we have moved the block in `LoadNode::Value` that produced bottom values for the block that "If we are loading from a freshly-allocated object, produce a zero, if the load is provably beyond the header of the object." This comment is misleading, and really relates to the old place, which actually returns zeroes. > > It would be better to clean this up to avoid further confusion. There should be no semantic change, only the cleanup. > > Additional testing: > - [x] GHA > - [x] Linux AArch64 server fastdebug, `tier1` Thank you both! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28296#issuecomment-3533303446 From shade at openjdk.org Fri Nov 14 15:33:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Nov 2025 15:33:58 GMT Subject: Integrated: 8371804: C2: Tighten up LoadNode::Value comments after JDK-8346184 In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 14:30:10 GMT, Aleksey Shipilev wrote: > This confused me quite a bit in [JDK-8371581](https://bugs.openjdk.org/browse/JDK-8371581) investigations. > > With [JDK-8346184](https://bugs.openjdk.org/browse/JDK-8346184), we have moved the block in `LoadNode::Value` that produced bottom values for the block that "If we are loading from a freshly-allocated object, produce a zero, if the load is provably beyond the header of the object." This comment is misleading, and really relates to the old place, which actually returns zeroes. > > It would be better to clean this up to avoid further confusion. There should be no semantic change, only the cleanup. > > Additional testing: > - [x] GHA > - [x] Linux AArch64 server fastdebug, `tier1` This pull request has now been integrated. Changeset: 10f262a6 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/10f262a6ad9a6e89cd79409c5e1a3f7efda76928 Stats: 15 lines in 1 file changed: 3 ins; 4 del; 8 mod 8371804: C2: Tighten up LoadNode::Value comments after JDK-8346184 Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/28296 From shade at openjdk.org Fri Nov 14 15:33:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Nov 2025 15:33:45 GMT Subject: RFR: 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 13:24:14 GMT, Vishal Chand wrote: > This PR fixes a potential SEGV and removes dead code: > ? **Fix**: Prevent potential SEGV in VTransformReductionVectorNode - [from @shipilev] This fixes a crash in diagnostic code when isa_ReductionVector() unexpectedly returns nullptr. While this indicates the graph is already corrupted, the additional crash in `TRACE_OPTIMIZE` makes debugging harder. The fix adds defensive null checking to prevent the diagnostic crash and improve error handling. > > ? **Cleanup**: Remove unused first_red variable Looks fine, but @eme64 should take a look as well. src/hotspot/share/opto/vtransform.cpp line 1269: > 1267: current_red->print(); > 1268: } else { > 1269: tty->print(" nullptr"); Suggestion: tty->print("nullptr"); ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28323#pullrequestreview-3465474733 PR Review Comment: https://git.openjdk.org/jdk/pull/28323#discussion_r2527913198 From epeter at openjdk.org Fri Nov 14 15:45:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 15:45:12 GMT Subject: RFR: 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 13:24:14 GMT, Vishal Chand wrote: > This PR fixes a potential SEGV and removes dead code: > ? **Fix**: Prevent potential SEGV in VTransformReductionVectorNode - [from @shipilev] This fixes a crash in diagnostic code when isa_ReductionVector() unexpectedly returns nullptr. While this indicates the graph is already corrupted, the additional crash in `TRACE_OPTIMIZE` makes debugging harder. The fix adds defensive null checking to prevent the diagnostic crash and improve error handling. > > ? **Cleanup**: Remove unused first_red variable @vish-chan @shipilev thanks for working on this! Looks reasonable to me :) ------------- PR Review: https://git.openjdk.org/jdk/pull/28323#pullrequestreview-3465517156 From epeter at openjdk.org Fri Nov 14 15:57:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 15:57:46 GMT Subject: RFR: 8371642: TestNumberOfContinuousZeros.java fails on PPC64 In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 15:32:03 GMT, David Briemann wrote: > Skips IR match rules for COUNT_LEADING_ZEROS_VL on PPC. VectorCastL2X is not implemented on Power for performance reasons. Looks reasonable to me :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28239#pullrequestreview-3465572125 From duke at openjdk.org Fri Nov 14 15:59:32 2025 From: duke at openjdk.org (Vishal Chand) Date: Fri, 14 Nov 2025 15:59:32 GMT Subject: RFR: 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing [v2] In-Reply-To: References: Message-ID: > This PR fixes a potential SEGV and removes dead code: > ? **Fix**: Prevent potential SEGV in VTransformReductionVectorNode - [from @shipilev] This fixes a crash in diagnostic code when isa_ReductionVector() unexpectedly returns nullptr. While this indicates the graph is already corrupted, the additional crash in `TRACE_OPTIMIZE` makes debugging harder. The fix adds defensive null checking to prevent the diagnostic crash and improve error handling. > > ? **Cleanup**: Remove unused first_red variable Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/vtransform.cpp Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28323/files - new: https://git.openjdk.org/jdk/pull/28323/files/f6947173..0824593d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28323&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28323&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28323.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28323/head:pull/28323 PR: https://git.openjdk.org/jdk/pull/28323 From epeter at openjdk.org Fri Nov 14 16:07:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 16:07:06 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization In-Reply-To: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: On Thu, 13 Nov 2025 21:34:30 GMT, Hamlin Li wrote: > Hi, > > This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. > > This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. > > Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. > > # Test > ## Jtreg > > in progress... > > ## Performance > > Column names meanings: > * p: with patch > * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > * m: without patch > * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > > #### Average improvement > > NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. > > For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) > -- | -- | -- | -- > 1.022782609 | 2.198717391 | 2.162673913 | 2.199 > > I won't be able to review the RISCV part, so you'll have to find someone else there. I just dropped 2 drive-by comments about the tests :) test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMove.java line 36: > 34: * @test > 35: * @summary Test conditional move. > 36: * @requires vm.simpleArch == "riscv64" I would prefer if you could enable the test on all platforms, but just require the specific platform on the IR rules. What would be even more fantastic: if you were able to also enable the IR rules for `x64` and `aarch64`, but we can also file a follow-up RFE for that. test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMove.java line 49: > 47: "-XX:+UnlockExperimentalVMOptions", "-XX:-UseCompactObjectHeaders"); > 48: TestFramework.runWithFlags("-XX:+UseCMoveUnconditionally", "-XX:-UseVectorCmov", > 49: "-XX:+UnlockExperimentalVMOptions", "-XX:+UseCompactObjectHeaders"); Wait. Is this just a copy of the existing vector test, but run with CMove vectorization disabled? If so, we could just add these additional runs to the existing test, and guard the IR test with corresponding flags: Have an IR rule for `-XX:-UseVectorCmov` and one for `-XX:+UseVectorCmov`. That would allow us to reduce some code duplication. And it would also avoid letting the two tests go out of sync when people add more to one but not the other. What do you think? ------------- PR Review: https://git.openjdk.org/jdk/pull/28309#pullrequestreview-3465590460 PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2528003621 PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2528011154 From epeter at openjdk.org Fri Nov 14 16:11:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Nov 2025 16:11:33 GMT Subject: RFR: 8371674: C2 fails with Missed optimization opportunity in PhaseIterGVN for MoveL2D In-Reply-To: References: Message-ID: <0-ADAzn-YzszXJq-OaAv_PT8sLgxNkGOSLrfMpNZdYM=.279ef48b-3267-42a0-8273-0ca398eb5284@github.com> On Thu, 13 Nov 2025 11:46:14 GMT, Beno?t Maillard wrote: > This PR addresses yet another missed optimization in `PhaseIterGVN`. The way this optimization is triggered is a bit different this time though, and the notification is missing in `Node::has_special_unique_user`. > > ## Analysis > > The affected optimization is the transformation of `MoveX2Y (LoadX mem)` into `LoadY mem`. This is implemented in `MoveNode::Ideal`. The optimization is as follows: > > ```c++ > // Fold reinterpret cast into memory operation: > // MoveX2Y (LoadX mem) => LoadY mem > LoadNode* ld = in(1)->isa_Load(); > if (ld != nullptr && (ld->outcnt() == 1)) { // replace only > const Type* rt = bottom_type(); > if (ld->has_reinterpret_variant(rt)) { > if (phase->C->post_loop_opts_phase()) { > return ld->convert_to_reinterpret_load(*phase, rt); > } else { > // attempt the transformation once loop opts are over > phase->C->record_for_post_loop_opts_igvn(this); > } > } > } > > > The optimization is triggered only if the input is a `LoadNode` and the `MoveNode` is its only user. This is a relatively unusual pattern. > > The bug was found by the fuzzer. At some point during IGVN, we have the following subgraph: > > > CountedLoop LoadL > \ / \ > Phi MoveL2D > > In `RegionNode::Ideal`, we end up calling `set_req_X` on the `Phi` node to delete the edge from the `Phi` node to `LoadL`. As a result, the `LoadL` node only has one user left, and the `MoveNode::Ideal` gets triggered at the next verification pass. > > ## Proposed Solution > > Add this particular case to `Node::has_special_unique_user`, which gets called by `Node::set_req_X`. > > ## Summary of changes > > This PR brings the following changes: > - Detect the optimization pattern in `Node::has_special_unique_user`. > - Add new test `TestMissingOptMoveX2YLoadX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. I tried to get a reproducer for each of the `Move` nodes, but I was only able to get one for `MoveL2D`. > > ### Testing > > - [x] https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371674 > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! test/hotspot/jtreg/compiler/c2/TestMissingOptMoveX2YLoadX.java line 54: > 52: while (++e < 37) { > 53: for (f = 1; f < 7; f++) { > 54: h >>>= (int)(--g - Double.longBitsToDouble(j[e])); Drive-by comment, might review more fully next week: could the same happen with `MoveI2F`? Or with `MoveD2L`, i.e. `Double.doubleRawBitsToLong`? Probably yes. Not sure if it's worth duplicating the test, up to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28290#discussion_r2528039234 From duke at openjdk.org Fri Nov 14 16:13:12 2025 From: duke at openjdk.org (Vishal Chand) Date: Fri, 14 Nov 2025 16:13:12 GMT Subject: RFR: 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing [v2] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 15:42:14 GMT, Emanuel Peter wrote: >> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/vtransform.cpp >> >> Co-authored-by: Aleksey Shipil?v > > @vish-chan @shipilev thanks for working on this! > > Looks reasonable to me :) @eme64 I'm not sure but your approval might be required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28323#issuecomment-3533487132 From shade at openjdk.org Fri Nov 14 17:11:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Nov 2025 17:11:03 GMT Subject: RFR: 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing [v2] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 15:59:32 GMT, Vishal Chand wrote: >> This PR fixes a potential SEGV and removes dead code: >> ? **Fix**: Prevent potential SEGV in VTransformReductionVectorNode - [from @shipilev] This fixes a crash in diagnostic code when isa_ReductionVector() unexpectedly returns nullptr. While this indicates the graph is already corrupted, the additional crash in `TRACE_OPTIMIZE` makes debugging harder. The fix adds defensive null checking to prevent the diagnostic crash and improve error handling. >> >> ? **Cleanup**: Remove unused first_red variable > > Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/vtransform.cpp > > Co-authored-by: Aleksey Shipil?v Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28323#pullrequestreview-3465906006 From mdoerr at openjdk.org Fri Nov 14 17:21:50 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 14 Nov 2025 17:21:50 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v3] In-Reply-To: References: Message-ID: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: More minor cleanup. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28299/files - new: https://git.openjdk.org/jdk/pull/28299/files/b03e6b43..621616a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28299/head:pull/28299 PR: https://git.openjdk.org/jdk/pull/28299 From mli at openjdk.org Fri Nov 14 18:03:53 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Nov 2025 18:03:53 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: <8Y3gUUVCNU1ZpfRkZeJqgIUomP6NCDIQqqgN-lRgk5A=.60177ffe-52ba-46de-a099-57d73f096a49@github.com> > Hi, > > This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. > > This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. > > Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. > > # Test > ## Jtreg > > in progress... > > ## Performance > > Column names meanings: > * p: with patch > * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > * m: without patch > * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > > #### Average improvement > > NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. > > For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) > -- | -- | -- | -- > 1.022782609 | 2.198717391 | 2.162673913 | 2.199 > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - add CMove+CmpP/N tests - fix cmovF/D_cmpP ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28309/files - new: https://git.openjdk.org/jdk/pull/28309/files/ec0d8cc4..5c0d645d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=00-01 Stats: 359 lines in 2 files changed: 357 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28309/head:pull/28309 PR: https://git.openjdk.org/jdk/pull/28309 From mli at openjdk.org Fri Nov 14 18:15:07 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Nov 2025 18:15:07 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: <5PzMJntiu2waMvciTLvXaUH15Fm3dXZPsDVvkuqWPI0=.68c6456a-e5d3-413e-bef8-d8da95de40bd@github.com> On Fri, 14 Nov 2025 15:59:18 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - add CMove+CmpP/N tests >> - fix cmovF/D_cmpP > > test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMove.java line 36: > >> 34: * @test >> 35: * @summary Test conditional move. >> 36: * @requires vm.simpleArch == "riscv64" > > I would prefer if you could enable the test on all platforms, but just require the specific platform on the IR rules. > What would be even more fantastic: if you were able to also enable the IR rules for `x64` and `aarch64`, but we can also file a follow-up RFE for that. Make sense. I filed https://bugs.openjdk.org/browse/JDK-8371920 to track the task, will do it later after this pr. > test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMove.java line 49: > >> 47: "-XX:+UnlockExperimentalVMOptions", "-XX:-UseCompactObjectHeaders"); >> 48: TestFramework.runWithFlags("-XX:+UseCMoveUnconditionally", "-XX:-UseVectorCmov", >> 49: "-XX:+UnlockExperimentalVMOptions", "-XX:+UseCompactObjectHeaders"); > > Wait. Is this just a copy of the existing vector test, but run with CMove vectorization disabled? > If so, we could just add these additional runs to the existing test, and guard the IR test with corresponding flags: > Have an IR rule for `-XX:-UseVectorCmov` and one for `-XX:+UseVectorCmov`. > > That would allow us to reduce some code duplication. And it would also avoid letting the two tests go out of sync when people add more to one but not the other. > > What do you think? Good idea! I can do it. What do you think about the name of the merged tests? `TestConditionalMove.java` or `TestScalarAndVectorConditionalMove.java` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2528463608 PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2528467634 From sviswanathan at openjdk.org Fri Nov 14 19:51:13 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 14 Nov 2025 19:51:13 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 16:38:49 GMT, Volodymyr Paprotski wrote: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" src/hotspot/cpu/x86/assembler_x86.cpp line 3865: > 3863: void Assembler::vmovsldup(XMMRegister dst, XMMRegister src, int vector_len) { > 3864: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : > 3865: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : Vector length 256 bit is supported by AVX=1. src/hotspot/cpu/x86/assembler_x86.cpp line 3874: > 3872: void Assembler::vmovshdup(XMMRegister dst, XMMRegister src, int vector_len) { > 3873: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : > 3874: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : Vector length 256 bit is supported by AVX=1. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 83: > 81: // size 0 and 1 are used for initial and final shuffles respectivelly of > 82: // dilithiumAlmostInverseNtt and dilithiumAlmostNtt. > 83: // NOTE: For size 0 and 1, input1[] and input2[] are modified in-place what is the size-in-bits when size is 0 and 1? What is the difference between size 0 and size1? The overloading of size makes it confusing. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 137: > 135: for (int i = 0; i < regCnt; i++) { > 136: // 0b-1-2-3-1 > 137: __ vshufps(output2[i], input1[i], input2[i], 0b11011101, vector_len); Did you mean this to be //0b-1-3-1-3? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2528279719 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2528288894 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2528416321 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2528610634 From sviswanathan at openjdk.org Fri Nov 14 19:51:14 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 14 Nov 2025 19:51:14 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 17:51:24 GMT, Sandhya Viswanathan wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 83: > >> 81: // size 0 and 1 are used for initial and final shuffles respectivelly of >> 82: // dilithiumAlmostInverseNtt and dilithiumAlmostNtt. >> 83: // NOTE: For size 0 and 1, input1[] and input2[] are modified in-place > > what is the size-in-bits when size is 0 and 1? What is the difference between size 0 and size1? The overloading of size makes it confusing. size 0 seems to be doing a different shuffle than what is described in the diagram. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 137: > >> 135: for (int i = 0; i < regCnt; i++) { >> 136: // 0b-1-2-3-1 >> 137: __ vshufps(output2[i], input1[i], input2[i], 0b11011101, vector_len); > > Did you mean this to be //0b-1-3-1-3? or 3-1-3-1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2528747938 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2528753295 From mchevalier at openjdk.org Fri Nov 14 20:06:19 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 14 Nov 2025 20:06:19 GMT Subject: RFR: 8371716: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" Message-ID: This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. # Analysis ## Obervationally ### IGVN During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * (inline_depth=4)) in(2): java/lang/Object * (speculative=null) We compute the join (HS' meet): https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1310-L1317 t=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) But the current `_type` (of the `PhiNode` as a `TypeNode`) is _type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *) We filter `t` by `_type` https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1332 and we get ft=java/lang/Object * which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 and https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/node.cpp#L1127-L1133 ### Verification On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time _type=java/lang/Object * and so after filtering `t` by (new) `_type` and we get ft=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. ## But why?! ### Details on type computation In short, we are doing t = typeof(in(1)) / typeof(in(2)) ft = t /\ _type (* IGVN *) ft' = t /\ ft (* Verification *) and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". To me, the surprising fact was that the intersection java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) /\ _type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *) ~> java/lang/Object * What happened to the speculative type? Both `MyValue2` and `MyValue3` are inheriting `MyAbstract` (and implementing `MyInterface`). So the code correctly find that the intersection of these speculative type is compiler/valhalla/inlinetypes/MyAbstract (compiler/valhalla/inlinetypes/MyInterface):AnyNull * (flat in array),iid=top The interesting part is that it's `AnyNull`: indeed, what else is a `MyValue2` and `MyValue3` at the same time? And then, `above_centerline` decides it's not useful enough (too precise, too clone from HS' top/normal bottom) and remove the speculative type. https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/type.cpp#L2886-L2888 But on the verification run, `compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *` is intersected with the speculative type of `java/lang/Object *`, which is unknown (HS' bottom/normal top), so we are simply getting `MyValue2`. If we did not discard `AnyNull` using `above_centerline`, we would have the intersection of `MyValue2` and `AnyNull`, giving `AnyNull`, which is indeed stable. ## Ok, but the types are weird? Indeed, they are! How can we get a speculative type `MyValue3` on the `PhiNode` when inputs are both `Object`, and one is speculated to be a `MyValue2`? This comes from incremental inlining. It seems that we have some profiling information on the returned type of a callee, that happens to be `MyValue3`, which propagate to the `PhiNode`. Later, the callee is inlined, and we get new type information (`MyValue2`) from its body (from the returned type of a callee of our callee, if I remember well), that reaches the input of our `PhiNode`. # Reproducing ## In Valhalla This crash is quite rare because: 1. it needs a specific speculative type setup, which depends heavily on timing 2. if `PhiNode::Value` is called a second time, it will stabilize the `_type` field before verification. To limitate the influence of 2., I've tested with an additional assert that would immediately do const Type* ft_ = t->filter_speculative(ft); in `PhiNode::Value` and compare `ft` and `ft_`. Indeed, we are never sure a run of `Value` is not the last one: it should always be legal to stop anywhere (even if in a particular case), it was going to run further. With this extra check, the crash a bit more common, but still pretty rare. Tests that have been witness to crash then at least once: - `compiler/valhalla/inlinetypes/TestCallingConvention.java` - `compiler/valhalla/inlinetypes/TestIntrinsics.java` - `compiler/valhalla/inlinetypes/TestArrays.java` - `compiler/valhalla/inlinetypes/TestBasicFunctionality.java` All in `compiler/valhalla/inlinetypes` while I was also testing with mainline tests. Suspicious, uh. ## In mainline With the aforementioned extra check, I've tried to see if it could happen on mainline since the involved code seems not to be valhalla-specific. Yet, nothing failed. Fortunately, Roland crafted an example that reproduces in mainline (and Valhalla)! You'll find it as a test here. # Fixing? I think changing the type system would be quite risky: it is all over the place. Also, fixing would require not to drop the speculative type when `above_centerline`. This might not be desirable. On top of the complexity and the associated risk, a too specific speculative type is rather useless. If we keep the too specific type around, we probably should ignore it where we make use of it. That's distributing the effort and open the door to inconsistencies. If we should ignore it, we might just as well drop it immediately. It is also dubious whether ordering requirement are meaningful for speculative type: they are not sound or complete. Moreover, one could argue that in the abstract, we don't even need a lattice or anything like that. A single poset whose functions are sound approximation of the concrete is enough. It is not uncommon that in the abstract world that `a /\ b` is not smaller than `a`. For instance, in the co-interval domain (the whole universe `E` minus an interval), if we take `a = E \ [1, 2]` and `b = E \ [3, 4]`, the concrete intersection would be `E \ [1, 2] \ [3, 4]` which is not allowed in our domain since it has 2 holes. If we want a sound approximation of that, we must remove at least one hole. We can then take `E \ [1, 2] = a`, `E \ [3, 4] = b` or even `E` (and of course, a lot of other things, with smaller holes than `a` or `b`). Whichever our abstract domain chooses, there is never both `a /\ b < a` and `a /\ b < b`. Indeed, this poset (like many real-world domains) is not a lattice, which doesn't keep us from speaking about soundness. An interesting and related fact is that there is no best abstraction of `E \ [1, 2] \ [3, 4]`. Maybe this digression about soundness rather hints that we should not compare speculative types during verification. Finally, as a simple solution, one could simply run `filter_speculative` twice, that should be enough as the second filter will simply select the non empty speculative type if there is only one, and this one won't be `above_centerline`, or it would not exist as a speculative type already. To try to be a bit less aggressive, we can rather do that in case where we know it cannot be useful. If `ft` obtained from `filter_speculative` has no speculative type, but `t` has, we can know that it might be because it has been dropped, and computing `t->filter_speculative(ft)` could pick the speculative type of `t`. The speculative type can still be removed if the non-speculative type of `ft` is exact and non null for instance, but we've still reached a fixpoint, so it's correct, but a little bit too much work. This solution means that we are basically throwing away the speculative type in `_type` in case of clash. That sounds not shocking to me: if we get (hopefully) more accurate information o n the inputs, it seems reasonable, or at least not shocking, to give up the previous speculative type if it clashes since it's not known to be correct. One can also phrase it as "we keep the speculative type of `_type` only if we don't have anything going against it". This PR includes the last solution. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion, when it was not proven to happen in mainline yet. Thanks, Marc ------------- Commit messages: - Filter twice Changes: https://git.openjdk.org/jdk/pull/28331/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371716 Stats: 160 lines in 2 files changed: 160 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28331/head:pull/28331 PR: https://git.openjdk.org/jdk/pull/28331 From mchevalier at openjdk.org Fri Nov 14 20:06:19 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 14 Nov 2025 20:06:19 GMT Subject: RFR: 8371716: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 19:56:14 GMT, Marc Chevalier wrote: > This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. > > # Analysis > ## Obervationally > ### IGVN > During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: > > in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * (inline_depth=4)) > in(2): java/lang/Object * (speculative=null) > > We compute the join (HS' meet): > https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1310-L1317 > > t=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) > > But the current `_type` (of the `PhiNode` as a `TypeNode`) is > > _type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *) > > We filter `t` by `_type` > https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1332 > and we get > > ft=java/lang/Object * > > which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. > https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 > and > https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/node.cpp#L1127-L1133 > > > ### Verification > On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time > > _type=java/lang/Object * > > and so after filtering `t` by (new) `_type` and we get > > ft=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) > > which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. > > ## But why?! > ### Details on type computation > In short, we are doing > > t = typeof(in(1)) / typeof(in(2)) > ft = t /\ _type (* IGVN *) > ft' = t /\ ft (* Verification *) > > and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". > > To me, the surprising fact was that the intersection > > java/lang/Object *... For the reproducer, now test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3534353796 From dlong at openjdk.org Fri Nov 14 20:12:26 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 14 Nov 2025 20:12:26 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v11] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 09:32:57 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > Hi @dean-long , @iwanowww , @dlunde , I have addressed your comments. Kindly verify. Thanks @jatin-bhateja , I like this version. But I don't understand RA enough to do a complete review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3534388767 From duke at openjdk.org Fri Nov 14 21:39:19 2025 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 14 Nov 2025 21:39:19 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v10] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 12:06:02 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 66: >> >>> 64: // All constants available during parsing >>> 65: return getIntConstant(Integer.MIN_VALUE) / getIntConstant(-1); >>> 66: } >> >> Why not add an IR rule that the div is still present after parsing? It seems you have already had the possible issue that javac optimized the div away, right? So this would ensure the optimization really does happen in C2, and that you are checking for the right kinds of nodes. > > Consider doing the same in other places in this file ;) Because it is already optimized during local GVN during parsing, so after parsing, these nodes are already gone ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2529015207 From duke at openjdk.org Fri Nov 14 21:45:08 2025 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 14 Nov 2025 21:45:08 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v10] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 12:07:38 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 286: >> >>> 284: // transform_long_divide splits up the division into multiple other nodes, such as MulHiLNode, which does not have a good Value() implemantion. >>> 285: // When JDK-8366815 is fixed, these rules should be reenabled >>> 286: // Alternatively, a better MulHiLNode::Value() implemantion should also lead to constant folding >> >> Could you have some temporary IR rule that now passes, but fails once `JDK-8366815` is fixed? Otherwise, I'm afraid we will miss these comments here, and they will never be cleaned up. > > Same elsewhere in this file. I guess it is not really possible for this method. Depending on the random constants that have been chosen, a different sequence of nodes will be emitted by `transform_long_divide`, and sometimes a `MulHiLNode` will not be emitted and the expression may constant fold after all. So I fear we just have to remember this, but as https://github.com/openjdk/jdk/pull/27886 looks pretty close to being done as well, I guess this will not be too hard. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2529036809 From duke at openjdk.org Fri Nov 14 21:50:27 2025 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 14 Nov 2025 21:50:27 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Simplify test, add temporary @IR rule for testLongRange and improve comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26143/files - new: https://git.openjdk.org/jdk/pull/26143/files/89e60231..313215bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26143&range=09-10 Stats: 28 lines in 2 files changed: 8 ins; 4 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/26143.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143 PR: https://git.openjdk.org/jdk/pull/26143 From duke at openjdk.org Fri Nov 14 21:50:27 2025 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 14 Nov 2025 21:50:27 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v10] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 21:42:28 GMT, Tobias Hotz wrote: >> Same elsewhere in this file. > > I guess it is not really possible for this method. Depending on the random constants that have been chosen, a different sequence of nodes will be emitted by `transform_long_divide`, and sometimes a `MulHiLNode` will not be emitted and the expression may constant fold after all. > So I fear we just have to remember this, but as https://github.com/openjdk/jdk/pull/27886 looks pretty close to being done as well, I guess this will not be too hard. Thinking about it some more, we can add IR validation to `testLongRange`, though. This one is not dependent on random constants, so it should always work. That way, we should notice the other rule as well ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26143#discussion_r2529052842 From duke at openjdk.org Fri Nov 14 21:57:08 2025 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 14 Nov 2025 21:57:08 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v11] In-Reply-To: References: Message-ID: <-0DxAa1ArvKj3FFMZXv9-twDPdXrd9AebQrNLOzzeUc=.af55010e-03eb-4410-9f24-7634466f4438@github.com> On Wed, 12 Nov 2025 18:04:42 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution src/hotspot/cpu/x86/x86.ad line 2855: > 2853: Flag_clears_sign_flag = Node::_last_flag << 11, > 2854: Flag_ndd_demotable_flag = Node::_last_flag << 12, > 2855: Flag_ndd_commutative_flag = Node::_last_flag << 13, Why are these called Flag_ndd_demotable_flag and Flag_ndd_commutative_flag instead of just Flag_ndd_demotable and Flag_ndd_commutative? Would make more sense to me. The flags above end in flag cause they mean that the instruction does set/clear the flag in the EFLAGS register of the CPU ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2529071761 From jbhateja at openjdk.org Sat Nov 15 02:24:47 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 15 Nov 2025 02:24:47 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v12] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/ef51f875..cccef216 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=10-11 Stats: 65 lines in 1 file changed: 0 ins; 0 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From vlivanov at openjdk.org Sat Nov 15 02:28:57 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 15 Nov 2025 02:28:57 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v24] In-Reply-To: References: <_bcSw73-0z5kaEyGsSz9Kox24TnFg7KxrakwEoEaf0k=.ab80c231-1afd-4cc9-8982-0db0d0311a43@github.com> Message-ID: <8c2Go5pzAatq4fAB2EzTxYhH3QSohhtIXUVjFw8VaY0=.87310088-3b09-4d57-984c-4ef146989ae1@github.com> On Fri, 14 Nov 2025 15:28:23 GMT, Emanuel Peter wrote: >>> You have a few tests already, but I'd love to see some IR tests. You could even check for the presence of ReachabilityFenceNode during some phase and then see if it goes away. Nice would be if we could even track if a SafePoint has a RF edge attached, but not sure how easy that is. >>> It would allow us not only to check for correctness, and hoping that we would catch incorrect cases with a crash/wrong result. But it would allow us to verify the graph, including the optimizations. >> >> The main complications with IR tests I see are: >> (1) very few cases where RF node is missing are known and all of them have already have a dedicated regression test; >> (2) the invariants RF imposes on the graph are non-local and it's hard to check them by inspecting IR. >> >> There's the transformation for loop-invariant referent I could try to add an IR unit test for, but I don't know how suitable IR test framework is for such scenario. >> >> Overall, I'd prefer to leave it as is for now and explore opportunities for IR tests as part of general effort to improve RF test coverage. > > @iwanowww I suppose adding more IR tests now would ensure that the implementation you are now adding does exactly what we expect, and not something different that just happens to behave correct. > > Examples would also help for the review. We have quite a machinery here, and understanding how the steps come together would be great. So it would be nice if you could annotate the tests we already do have: what do you expect happens to the RF's, the reference, the SafePoints, and the other relevant nodes? Maybe we can then turn some of those annotations into IR nodes right away, and some are probably too difficult. > > But I do see that the current regex matching on single IR nodes is a bit tricky here. It only allows you to detect the presence of a `ReachabilityFence` node, but not where it is and what it's connected to. This could be a motivation for extending the IR framework to allow some kind of graph matching. Though I do see that this could get quite complex. @eme64 I added IR tests for the test cases in `compiler/c2/TestReachabilityFence.java`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3535424171 From vlivanov at openjdk.org Sat Nov 15 02:28:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 15 Nov 2025 02:28:55 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v27] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: IR test cases ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/6bea1285..b411c454 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=25-26 Stats: 177 lines in 3 files changed: 90 ins; 12 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From qamai at openjdk.org Sat Nov 15 13:03:13 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 15 Nov 2025 13:03:13 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v8] In-Reply-To: References: Message-ID: <41o9gdPdnLsm2eMmIQDwZ2U41gTV0RcfPp0hitLwG78=.e799687a-8621-41a9-a127-17ddf16f7a0d@github.com> On Thu, 13 Nov 2025 06:51:38 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > change the early return condition Thanks a lot for your reviews and testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25284#issuecomment-3536464876 From qamai at openjdk.org Sat Nov 15 13:03:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 15 Nov 2025 13:03:14 GMT Subject: Integrated: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII In-Reply-To: References: Message-ID: On Sat, 17 May 2025 14:54:43 GMT, Quan Anh Mai wrote: > Hi, > > The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. > > Please take a look and leave your reviews, thanks a lot. This pull request has now been integrated. Changeset: f510b4a3 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/f510b4a3bafa3f0d2c9ebf0b33d48f57f3bdef95 Stats: 57 lines in 5 files changed: 11 ins; 17 del; 29 mod 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII Reviewed-by: vlivanov, roland ------------- PR: https://git.openjdk.org/jdk/pull/25284 From duke at openjdk.org Sun Nov 16 15:51:10 2025 From: duke at openjdk.org (Zihao Lin) Date: Sun, 16 Nov 2025 15:51:10 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v3] In-Reply-To: References: Message-ID: <3YIjUMnoP4-shebueWobcA3GxioQeLY7CxWa46crmuQ=.3fd96b95-0f01-4908-9dfc-2f72e96a7759@github.com> > If nodes both are constant, support constant folding. Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Add tests - Merge branch 'master' into JDK-8370196 - test - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Andrew Haley - C2: Improve (U)MulHiLNode::MulHiValue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28097/files - new: https://git.openjdk.org/jdk/pull/28097/files/924c1555..3157f735 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=01-02 Stats: 33517 lines in 772 files changed: 19256 ins; 10488 del; 3773 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From duke at openjdk.org Sun Nov 16 15:54:24 2025 From: duke at openjdk.org (Zihao Lin) Date: Sun, 16 Nov 2025 15:54:24 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue In-Reply-To: References: Message-ID: On Sun, 2 Nov 2025 09:53:40 GMT, Hannes Greule wrote: >> If nodes both are constant, support constant folding. > > Thanks for working on this. A few things: > > - You need tests to cover this change. The `Math.multiplyHigh(...)` and `Math.unsignedMultiplyHigh(...)` methods can be used to test this from the Java world. See e.g., #26143 or #25254 for inspiration. > - The current method is for both unsigned and signed multiplication. You either have to deal with that directly there or get rid of that method and implement it directly in the respective `Value(...)` methods (the latter might be cleaner imo). > - For unsigned multiplication, you can use the unsigned bounds (_uhi, _ulo) > - I think extending from simple constant folding to intervals isn't that much more work. From my understanding, there shouldn't be any overflows that need to be handled. This would also automatically deal with cases like `multiplyHigh(x, 0)` etc. > - The bottom checks are unneeded and can be removed (in fact, they would otherwise prevent proper calculation of the previous example) > - Make sure to follow the code style: `T* v`; `if (a) {` spacing. Hi @SirYwell Thanks for your comments. I change both Value method and add the relevant unit tests. Can you help to check if my understanding is correct? Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28097#issuecomment-3538892058 From mpowers at openjdk.org Sun Nov 16 17:24:15 2025 From: mpowers at openjdk.org (Mark Powers) Date: Sun, 16 Nov 2025 17:24:15 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements In-Reply-To: References: Message-ID: <_TeZd3joeNkWYg7ZOgYRwzRJJjwMcUVOfe-pdXzJTv4=.d413a241-c8de-4267-8b98-0b41c7629371@github.com> On Tue, 4 Nov 2025 16:38:49 GMT, Volodymyr Paprotski wrote: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" You might want to have @kuksenko or @ericcaspole look at MLDSABench.java. test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 29: > 27: import java.lang.invoke.MethodHandle; > 28: import java.lang.invoke.MethodHandles; > 29: import java.lang.reflect.Field; unused import statement test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 31: > 29: import java.lang.reflect.Field; > 30: import java.lang.reflect.Method; > 31: import java.lang.reflect.Constructor; unused import test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 123: > 121: try { > 122: for (int i = 0; i < repeat; i++) { > 123: // seed = rnd.nextLong(); 2 lines commented out test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 219: > 217: int[] coeffs3 = new int[ML_DSA_N]; > 218: for (int j = 0; j 219: coeffs3[j] = `coeffs3` is written to but never read test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 517: > 515: }; > 516: } > 517: // java --add-opens java.base/sun.security.provider=ALL-UNNAMED -XX:+UseDilithiumIntrinsics test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java This is line is useful. Not sure I would hide it at the bottom of the file. test/micro/org/openjdk/bench/javax/crypto/full/MLDSABench.java line 2: > 1: /* > 2: * Copyright (c) 2015, 2018, Oracle and/or its affiliates. All rights reserved. Copyright date. ------------- Marked as reviewed by mpowers (Committer). PR Review: https://git.openjdk.org/jdk/pull/28136#pullrequestreview-3470287661 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532070492 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532071025 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532075447 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532074544 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532078122 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532078790 From dholmes at openjdk.org Sun Nov 16 21:25:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Sun, 16 Nov 2025 21:25:12 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v8] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 06:51:38 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > change the early return condition We are seeing a crash in tier2 after this was integrated - [JDK-8371964](https://bugs.openjdk.org/browse/JDK-8371964) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25284#issuecomment-3539377810 From duke at openjdk.org Sun Nov 16 22:58:12 2025 From: duke at openjdk.org (Max Verevkin) Date: Sun, 16 Nov 2025 22:58:12 GMT Subject: RFR: 8366076: arm32: Fix register allocation for vector instructions In-Reply-To: <11z84H0pSO4eduTEEVcUelci_1MxZMimuwouswlt8W0=.a0d59c62-092c-4620-b4c2-c2ff62423c4e@github.com> References: <17J8mScwi2eBCPmmmTJd0ittihe0BfqZYuPgC638L8Q=.6e87a120-905f-4a30-a6f0-7e80fd613144@github.com> <11z84H0pSO4eduTEEVcUelci_1MxZMimuwouswlt8W0=.a0d59c62-092c-4620-b4c2-c2ff62423c4e@github.com> Message-ID: On Fri, 14 Nov 2025 02:34:48 GMT, Dean Long wrote: >> Arm32 has 32 double-precision floating point registers, the first 16 of which coincide with the 32 single-precision floating point registers. Some vector-operation nodes were implemented in terms of scalar instructions, which only really works for the first 16 doubles. This commit addresses that. > > src/hotspot/cpu/arm/arm_32.ad line 330: > >> 328: R_S16,R_S17,R_S18,R_S19, R_S20,R_S21,R_S22,R_S23, >> 329: R_S24,R_S25,R_S26,R_S27, R_S28,R_S29,R_S30,R_S31); >> 330: > > Isn't this the same as dflt_low_reg? I am not 100% sure if they are completely equivalent and `dflt_low_reg` could be used instead of defining a new class. I figured I should introduce a new class similar to how `sflt_reg` and `dflt_low_reg` are similar yet distinct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27071#discussion_r2532299653 From fyang at openjdk.org Mon Nov 17 03:28:06 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Nov 2025 03:28:06 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v2] In-Reply-To: <2Ts5dNdaDuen71ZoTYdKP8UNG44epCiEsIb8DeJpvps=.24618d81-4b98-489a-962b-c04e0d561270@github.com> References: <2Ts5dNdaDuen71ZoTYdKP8UNG44epCiEsIb8DeJpvps=.24618d81-4b98-489a-962b-c04e0d561270@github.com> Message-ID: On Fri, 7 Nov 2025 16:29:25 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: > > - Enhance comment > - Fix OptoAssembly for Power 8 test/hotspot/jtreg/compiler/vectorapi/TestVectorSpilling.java line 79: > 77: @Test > 78: @IR(counts = {IRNode.MEM_TO_REG_SPILL_COPY_TYPE, "vectorx", "> 0"}, > 79: phase = {CompilePhase.FINAL_CODE}) Hi, I find this IR test is failing on riscv where we are spilling a `vectora`. Maybe we should exclude this case? diff --git a/test/hotspot/jtreg/compiler/vectorapi/TestVectorSpilling.java b/test/hotspot/jtreg/compiler/vectorapi/TestVectorSpilling.java index 5e8b9341d8e..9d9a85e174c 100644 --- a/test/hotspot/jtreg/compiler/vectorapi/TestVectorSpilling.java +++ b/test/hotspot/jtreg/compiler/vectorapi/TestVectorSpilling.java @@ -76,7 +76,8 @@ static void test16ByteSpilling_runner() { @Test @IR(counts = {IRNode.MEM_TO_REG_SPILL_COPY_TYPE, "vectorx", "> 0"}, - phase = {CompilePhase.FINAL_CODE}) + phase = {CompilePhase.FINAL_CODE}, + applyIfCPUFeature= {"rvv", "false"}) static long test16ByteSpilling(long l1, long l2, long l3, long l4, long l5, long l6, long l7, long l8, long l9 /* odd stack arg */) { // To be scalar replaced and spilled to stack ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2532551373 From thartmann at openjdk.org Mon Nov 17 06:29:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 17 Nov 2025 06:29:06 GMT Subject: RFR: 8371121: compiler/whitebox/DeoptimizeRelocatedNMethod.java fails with C1 [v2] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 21:17:08 GMT, Chad Rakoczy wrote: >> [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121) >> >> This update aims to improve the test?s stability. A previous failure occurred because the method wasn?t compiled at the time of the check. I believe this could have occurred due to a deoptimization but I have not been able to reproduce. Previously, the test ensured compilation by repeatedly invoking the function. Instead, we now use Whitebox to add the method directly to the compile queue and wait for it to finish compiling. This approach should eliminate issues caused by deoptimization from function calls. > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Fix comment > - Block on comp instead Looks good. I'll run some testing to confirm that the issues are gone. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28246#pullrequestreview-3471183393 From fyang at openjdk.org Mon Nov 17 06:41:08 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Nov 2025 06:41:08 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: <8Y3gUUVCNU1ZpfRkZeJqgIUomP6NCDIQqqgN-lRgk5A=.60177ffe-52ba-46de-a099-57d73f096a49@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <8Y3gUUVCNU1ZpfRkZeJqgIUomP6NCDIQqqgN-lRgk5A=.60177ffe-52ba-46de-a099-57d73f096a49@github.com> Message-ID: On Fri, 14 Nov 2025 18:03:53 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - add CMove+CmpP/N tests > - fix cmovF/D_cmpP src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2133: > 2131: break; > 2132: case BoolTest::ge: > 2133: assert(false, "Should go to BoolTest::le case"); I am not sure if it's safe to have these assertions for `ge` and `gt`. It seems to me that we should handle all possible condition codes here. Check this bug: https://bugs.openjdk.org/browse/JDK-8358892. We have added handling for `ge` and `gt` in `C2_MacroAssembler::enc_cmove_cmp_fp` to fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2532878358 From dbriemann at openjdk.org Mon Nov 17 06:53:13 2025 From: dbriemann at openjdk.org (David Briemann) Date: Mon, 17 Nov 2025 06:53:13 GMT Subject: RFR: 8371642: TestNumberOfContinuousZeros.java fails on PPC64 In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 15:32:03 GMT, David Briemann wrote: > Skips IR match rules for COUNT_LEADING_ZEROS_VL on PPC. VectorCastL2X is not implemented on Power for performance reasons. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28239#issuecomment-3540223470 From dbriemann at openjdk.org Mon Nov 17 06:53:13 2025 From: dbriemann at openjdk.org (David Briemann) Date: Mon, 17 Nov 2025 06:53:13 GMT Subject: Integrated: 8371642: TestNumberOfContinuousZeros.java fails on PPC64 In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 15:32:03 GMT, David Briemann wrote: > Skips IR match rules for COUNT_LEADING_ZEROS_VL on PPC. VectorCastL2X is not implemented on Power for performance reasons. This pull request has now been integrated. Changeset: 77381318 Author: David Briemann URL: https://git.openjdk.org/jdk/commit/7738131835d08f47dd7c535b12bb7ea7b0ff0b90 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8371642: TestNumberOfContinuousZeros.java fails on PPC64 Reviewed-by: mdoerr, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28239 From jbhateja at openjdk.org Mon Nov 17 07:14:19 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 17 Nov 2025 07:14:19 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: References: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> Message-ID: On Wed, 29 Oct 2025 21:29:08 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Removing redundant interferecne check from biasing > > Hard-coded lists in `Matcher::should_attempt_register_biasing()` and `is_commutative_oper` look fragile and hard to verify. (Especially `is_commutative_oper` which is used to check the root of matched ideal tree.) > > With proper ADLC support, that information can be placed on individual AD instructions which would make it clearer what is affected. Hi @iwanowww , @dlunde , @eme64 , @TobiHartmann , @sviswa7 , your comments have been addressed. Let me know if this is good to land in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3540284439 From jbhateja at openjdk.org Mon Nov 17 07:16:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 17 Nov 2025 07:16:08 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 16:38:49 GMT, Volodymyr Paprotski wrote: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" Minor initial comments src/hotspot/cpu/x86/assembler_x86.cpp line 3867: > 3865: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : > 3866: (vector_len == AVX_512bit ? VM_Version::supports_evex() : false)), ""); > 3867: InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); When you check for AVX512-VL you allow accessing 128/256 bit registers from the higher register bank [X/Y]MM(16-31) But your assertions are nowhere checking this. src/hotspot/cpu/x86/assembler_x86.cpp line 3876: > 3874: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : > 3875: (vector_len == AVX_512bit ? VM_Version::supports_evex() : false)), ""); > 3876: InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); When you check for AVX512-VL you allow accessing 128/256 bit registers from the higher register bank [X/Y]MM(16-31) But your assertions are nowhere checking this. src/hotspot/cpu/x86/assembler_x86.cpp line 3882: > 3880: > 3881: void Assembler::evmovsldup(XMMRegister dst, KRegister mask, XMMRegister src, bool merge, int vector_len) { > 3882: assert(VM_Version::supports_evex(), ""); Suggestion: assert(vector_len == AVX_512 || VM_Version::supports_avx512vl), ""); src/hotspot/cpu/x86/assembler_x86.cpp line 3894: > 3892: > 3893: void Assembler::evmovshdup(XMMRegister dst, KRegister mask, XMMRegister src, bool merge, int vector_len) { > 3894: assert(VM_Version::supports_evex(), ""); Same as above src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 397: > 395: // > 396: static address generate_dilithiumAlmostNtt_avx(StubGenerator *stubgen, > 397: int vector_len, MacroAssembler *_masm) { Indentation corretness test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 2: > 1: /* > 2: * Copyright (c) 2024, 2025, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 114: > 112: rnd.setSeed(seed); > 113: //Note: it might be useful to increase this number during development of new intrinsics > 114: final int repeat = 10000000; Instead of high repetition count can you try tuning the tiered compilation threshold. test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 145: > 143: coeffs1[j] = rnd.nextInt(); > 144: coeffs2[j] = rnd.nextInt(); > 145: } You can uses generators for randome initialization of array ------------- PR Review: https://git.openjdk.org/jdk/pull/28136#pullrequestreview-3471195396 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532894350 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532894989 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532900199 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532901821 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532910907 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532868326 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532875974 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2532872372 From epeter at openjdk.org Mon Nov 17 07:28:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Nov 2025 07:28:07 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v3] In-Reply-To: <3YIjUMnoP4-shebueWobcA3GxioQeLY7CxWa46crmuQ=.3fd96b95-0f01-4908-9dfc-2f72e96a7759@github.com> References: <3YIjUMnoP4-shebueWobcA3GxioQeLY7CxWa46crmuQ=.3fd96b95-0f01-4908-9dfc-2f72e96a7759@github.com> Message-ID: <1xL5pz6BVbqGUdleNQcrIaCYh0j8TcYNlDjuMzdnPDU=.94ad1a91-d921-4312-9db9-0dcc129a54e2@github.com> On Sun, 16 Nov 2025 15:51:10 GMT, Zihao Lin wrote: >> If nodes both are constant, support constant folding. > > Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add tests > - Merge branch 'master' into JDK-8370196 > - test > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Andrew Haley > - C2: Improve (U)MulHiLNode::MulHiValue @linzihao1999 Thanks for working on this! To further improve coverage, could you please add `Math.multiplyHigh` and `Math.unsignedMultiplyHigh` to out template library? `./test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java` (see the TODO at the bottom of the file) We will soon add more operations there anyway. But in the meantime, it would be good to test the methods where we are adding new optimizations. @SirYwell >I think extending from simple constant folding to intervals isn't that much more work. That's probably true. But we can also do those extensions in a separate RFE. src/hotspot/share/opto/mulnode.cpp line 622: > 620: } > 621: > 622: const Type *UMulHiLNode::Value(PhaseGVN *phase) const { Suggestion: const Type* UMulHiLNode::Value(PhaseGVN* phase) const { ------------- PR Review: https://git.openjdk.org/jdk/pull/28097#pullrequestreview-3471303018 PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2532959328 From epeter at openjdk.org Mon Nov 17 07:30:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Nov 2025 07:30:14 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: <5PzMJntiu2waMvciTLvXaUH15Fm3dXZPsDVvkuqWPI0=.68c6456a-e5d3-413e-bef8-d8da95de40bd@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <5PzMJntiu2waMvciTLvXaUH15Fm3dXZPsDVvkuqWPI0=.68c6456a-e5d3-413e-bef8-d8da95de40bd@github.com> Message-ID: <46DWeMiCRNMC58wGr4T52KXbtRjU0PxQ4L6LuVFMZEo=.867fcc86-edd1-4492-9c1a-58f83d135969@github.com> On Fri, 14 Nov 2025 18:11:28 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMove.java line 36: >> >>> 34: * @test >>> 35: * @summary Test conditional move. >>> 36: * @requires vm.simpleArch == "riscv64" >> >> I would prefer if you could enable the test on all platforms, but just require the specific platform on the IR rules. >> What would be even more fantastic: if you were able to also enable the IR rules for `x64` and `aarch64`, but we can also file a follow-up RFE for that. > > Make sense. I filed https://bugs.openjdk.org/browse/JDK-8371920 to track the task, will do it later after this pr. I would suggest that you already make the move from `@requires` to IR rule level restrictions. But we can look at adding `x64` and `aarch64` in the separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2532986110 From epeter at openjdk.org Mon Nov 17 07:42:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Nov 2025 07:42:13 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: <8Y3gUUVCNU1ZpfRkZeJqgIUomP6NCDIQqqgN-lRgk5A=.60177ffe-52ba-46de-a099-57d73f096a49@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <8Y3gUUVCNU1ZpfRkZeJqgIUomP6NCDIQqqgN-lRgk5A=.60177ffe-52ba-46de-a099-57d73f096a49@github.com> Message-ID: On Fri, 14 Nov 2025 18:03:53 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - add CMove+CmpP/N tests > - fix cmovF/D_cmpP test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMoveCmpObj.java line 131: > 129: // applyIf = {"UseCompressedOops", "false"}) > 130: // @IR(counts = {IRNode.CMOVE_L, ">0", IRNode.CMP_N, ">0"}, > 131: // applyIf = {"UseCompressedOops", "true"}) Do you plan to still do this in this PR? Probably a future RFE would be better. It could be nice if you could link to the RFE with the issue number from this comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2533013052 From epeter at openjdk.org Mon Nov 17 07:46:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Nov 2025 07:46:12 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: <5PzMJntiu2waMvciTLvXaUH15Fm3dXZPsDVvkuqWPI0=.68c6456a-e5d3-413e-bef8-d8da95de40bd@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <5PzMJntiu2waMvciTLvXaUH15Fm3dXZPsDVvkuqWPI0=.68c6456a-e5d3-413e-bef8-d8da95de40bd@github.com> Message-ID: On Fri, 14 Nov 2025 18:12:56 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMove.java line 49: >> >>> 47: "-XX:+UnlockExperimentalVMOptions", "-XX:-UseCompactObjectHeaders"); >>> 48: TestFramework.runWithFlags("-XX:+UseCMoveUnconditionally", "-XX:-UseVectorCmov", >>> 49: "-XX:+UnlockExperimentalVMOptions", "-XX:+UseCompactObjectHeaders"); >> >> Wait. Is this just a copy of the existing vector test, but run with CMove vectorization disabled? >> If so, we could just add these additional runs to the existing test, and guard the IR test with corresponding flags: >> Have an IR rule for `-XX:-UseVectorCmov` and one for `-XX:+UseVectorCmov`. >> >> That would allow us to reduce some code duplication. And it would also avoid letting the two tests go out of sync when people add more to one but not the other. >> >> What do you think? > > Good idea! > I can do it. What do you think about the name of the merged tests? `TestConditionalMove.java` or `TestScalarAndVectorConditionalMove.java` `TestConditionalMove.java` sounds good :) It would also be nice if we could move it out of the `irTests` directory, we would like to eventually move all tests away from it, and rather sort the tests by what they test and not by how we test them. Though now it's a little tricky because we check for both vector and scalar things. Still, I would propose that you move it under `c2/vectorization` or `c2/loopopts/superword`, since they do include vectorization tests. An alternative could also be in a new `c2/cmove` directory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2533020809 From epeter at openjdk.org Mon Nov 17 07:58:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Nov 2025 07:58:02 GMT Subject: RFR: 8371716: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 19:57:22 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * (inline_depth=4)) >> in(2): java/lang/Object * (speculative=null) >> >> We compute the join (HS' meet): >> https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1310-L1317 >> >> t=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1332 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/node.cpp#L1127-L1133 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again a... > > For the reproducer, now test. @marc-chevalier I tried to review the issue at hand, but it is a little tricky because the regression test uses one set of class names and your description another (valhalla reproducer). Would it be possible to adjust the explanation to your mainline reproducer? Or maybe even just annotating your mainline reproducer with comments would help already. I think it would be easier to review the PR once it is easy to follow the example. Also: the issue name should be adjusted to say something a bit more specific about the issue. We have a lot of "missed optimization opportunity" issues, and it would be nice to be able to tell them apart easily ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3540413579 From epeter at openjdk.org Mon Nov 17 08:02:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Nov 2025 08:02:02 GMT Subject: RFR: 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing [v2] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 15:59:32 GMT, Vishal Chand wrote: >> This PR fixes a potential SEGV and removes dead code: >> ? **Fix**: Prevent potential SEGV in VTransformReductionVectorNode - [from @shipilev] This fixes a crash in diagnostic code when isa_ReductionVector() unexpectedly returns nullptr. While this indicates the graph is already corrupted, the additional crash in `TRACE_OPTIMIZE` makes debugging harder. The fix adds defensive null checking to prevent the diagnostic crash and improve error handling. >> >> ? **Cleanup**: Remove unused first_red variable > > Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/vtransform.cpp > > Co-authored-by: Aleksey Shipil?v Sorry, missed the swap from "Comment" to "Approve" ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28323#pullrequestreview-3471424230 From chagedorn at openjdk.org Mon Nov 17 08:18:02 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 17 Nov 2025 08:18:02 GMT Subject: RFR: 8371121: compiler/whitebox/DeoptimizeRelocatedNMethod.java fails with C1 [v2] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 21:17:08 GMT, Chad Rakoczy wrote: >> [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121) >> >> This update aims to improve the test?s stability. A previous failure occurred because the method wasn?t compiled at the time of the check. I believe this could have occurred due to a deoptimization but I have not been able to reproduce. Previously, the test ensured compilation by repeatedly invoking the function. Instead, we now use Whitebox to add the method directly to the compile queue and wait for it to finish compiling. This approach should eliminate issues caused by deoptimization from function calls. > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Fix comment > - Block on comp instead Looks good, thanks for the update! Let's wait for the testing to be complete. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28246#pullrequestreview-3471475643 From shade at openjdk.org Mon Nov 17 08:24:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Nov 2025 08:24:04 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v4] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 19:04:56 GMT, Aleksey Shipilev wrote: >> I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. >> >> At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. >> >> It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. >> >> Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails >> - [x] Linux x86_64 server fastdebug, `all` tests pass >> - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8371581-ccp-spooky-nodes > - More comments > - More restrictive CmpP check > - Tighten up comments and signatures > - Do Value() once > - Fix Still looking for reviewers, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28288#issuecomment-3540501243 From thartmann at openjdk.org Mon Nov 17 08:48:04 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 17 Nov 2025 08:48:04 GMT Subject: RFR: 8371121: compiler/whitebox/DeoptimizeRelocatedNMethod.java fails with C1 [v2] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 21:17:08 GMT, Chad Rakoczy wrote: >> [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121) >> >> This update aims to improve the test?s stability. A previous failure occurred because the method wasn?t compiled at the time of the check. I believe this could have occurred due to a deoptimization but I have not been able to reproduce. Previously, the test ensured compilation by repeatedly invoking the function. Instead, we now use Whitebox to add the method directly to the compile queue and wait for it to finish compiling. This approach should eliminate issues caused by deoptimization from function calls. > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Fix comment > - Block on comp instead All testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28246#issuecomment-3540591794 From thartmann at openjdk.org Mon Nov 17 08:58:25 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 17 Nov 2025 08:58:25 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 05:12:34 GMT, Zihao Lin wrote: >> Remove redundant check in 'trust_final_non_static_fields' ciField.cpp >> >> Remove: >> 1. java_lang_System check >> 2. is_box_klass check >> 3. java_lang_String check > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > remove Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28191#pullrequestreview-3471646818 From duke at openjdk.org Mon Nov 17 09:27:06 2025 From: duke at openjdk.org (Zihao Lin) Date: Mon, 17 Nov 2025 09:27:06 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 05:12:34 GMT, Zihao Lin wrote: >> Remove redundant check in 'trust_final_non_static_fields' ciField.cpp >> >> Remove: >> 1. java_lang_System check >> 2. is_box_klass check >> 3. java_lang_String check > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > remove Thank you. /integrate ------------- PR Comment: https://git.openjdk.org/jdk/pull/28191#issuecomment-3540755255 From dfenacci at openjdk.org Mon Nov 17 09:34:42 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 17 Nov 2025 09:34:42 GMT Subject: RFR: 8371716: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" In-Reply-To: References: Message-ID: <_WY0NPvc_0tlO31lVAVTl9QyznCQLl9r1nrQ4P6983U=.ecc067a2-8783-469e-850b-b9b612017e57@github.com> On Fri, 14 Nov 2025 19:56:14 GMT, Marc Chevalier wrote: > This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. > > # Analysis > ## Obervationally > ### IGVN > During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: > > in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * (inline_depth=4)) > in(2): java/lang/Object * (speculative=null) > > We compute the join (HS' meet): > https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1310-L1317 > > t=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) > > But the current `_type` (of the `PhiNode` as a `TypeNode`) is > > _type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *) > > We filter `t` by `_type` > https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1332 > and we get > > ft=java/lang/Object * > > which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. > https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 > and > https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/node.cpp#L1127-L1133 > > > ### Verification > On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time > > _type=java/lang/Object * > > and so after filtering `t` by (new) `_type` and we get > > ft=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) > > which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. > > ## But why?! > ### Details on type computation > In short, we are doing > > t = typeof(in(1)) / typeof(in(2)) > ft = t /\ _type (* IGVN *) > ft' = t /\ ft (* Verification *) > > and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". > > To me, the surprising fact was that the intersection > > java/lang/Object *... src/hotspot/share/opto/cfgnode.cpp line 1364: > 1362: const Type* first_ft = ft; > 1363: ft = t->filter_speculative(first_ft); > 1364: #ifdef ASSERT More of a flyby rather than a review but I was wondering if it would make sense to extract this assert block since it is the same as the one above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2533280871 From aph at openjdk.org Mon Nov 17 09:36:36 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 17 Nov 2025 09:36:36 GMT Subject: RFR: 8261837: SIGSEGV in ciVirtualCallTypeData::translate_from [v5] In-Reply-To: References: <02JzN66pkbsxQOjb6-fbHPf4Y5p0d1jIPERz80WXv7Q=.b507b97f-2625-4b9c-8b65-dac0a609251e@github.com> Message-ID: On Thu, 23 Nov 2023 00:34:20 GMT, Dean Long wrote: >> Type profiling code based on the x86 implementation uses XOR to check if the MDO value matches the klass, then later stores that XORed value into the MDO if the MDO value was 0. However, there is a race here if we reload the MDO value to check for 0, resulting in storing OBJ_KLASS XOR MDO_KLASS back to the MDO. >> >> I took a stab at riscv, but I don't have a way to test it. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > riscv patch from Fei Yang src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3658: > 3656: #endif > 3657: // atomic update to prevent overwriting Klass* with 0 > 3658: __ lock(); One thing I'm curious about: why is the locked update only here on x86, and not in any other port? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16750#discussion_r2533363164 From duke at openjdk.org Mon Nov 17 09:37:31 2025 From: duke at openjdk.org (duke) Date: Mon, 17 Nov 2025 09:37:31 GMT Subject: RFR: 8368961: Remove redundant checks in ciField.cpp [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 05:12:34 GMT, Zihao Lin wrote: >> Remove redundant check in 'trust_final_non_static_fields' ciField.cpp >> >> Remove: >> 1. java_lang_System check >> 2. is_box_klass check >> 3. java_lang_String check > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > remove @linzihao1999 Your change (at version 36f8bce16c2ed50f0f0012a95955f78336456c75) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28191#issuecomment-3540796191 From mli at openjdk.org Mon Nov 17 09:43:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Nov 2025 09:43:06 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <8Y3gUUVCNU1ZpfRkZeJqgIUomP6NCDIQqqgN-lRgk5A=.60177ffe-52ba-46de-a099-57d73f096a49@github.com> Message-ID: On Mon, 17 Nov 2025 07:39:19 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - add CMove+CmpP/N tests >> - fix cmovF/D_cmpP > > test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMoveCmpObj.java line 131: > >> 129: // applyIf = {"UseCompressedOops", "false"}) >> 130: // @IR(counts = {IRNode.CMOVE_L, ">0", IRNode.CMP_N, ">0"}, >> 131: // applyIf = {"UseCompressedOops", "true"}) > > Do you plan to still do this in this PR? Probably a future RFE would be better. It could be nice if you could link to the RFE with the issue number from this comment. In this PR, no, this one will only implement CMoveF/D and enable the vectorization of CMoveF/D, so do some preparation for https://github.com/openjdk/jdk/pull/28231. To guarantee the generation of CMoveI/L, seems to me we need to improve the cost model when transfrom a phi to a conditional move. I can have a invetigation later, as this impact how & whether CMoveL/I can be generated and be vectorized accordingly. File https://bugs.openjdk.org/browse/JDK-8371984 to track it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2533384835 From epeter at openjdk.org Mon Nov 17 09:55:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Nov 2025 09:55:06 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <8Y3gUUVCNU1ZpfRkZeJqgIUomP6NCDIQqqgN-lRgk5A=.60177ffe-52ba-46de-a099-57d73f096a49@github.com> Message-ID: On Mon, 17 Nov 2025 09:40:29 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMoveCmpObj.java line 131: >> >>> 129: // applyIf = {"UseCompressedOops", "false"}) >>> 130: // @IR(counts = {IRNode.CMOVE_L, ">0", IRNode.CMP_N, ">0"}, >>> 131: // applyIf = {"UseCompressedOops", "true"}) >> >> Do you plan to still do this in this PR? Probably a future RFE would be better. It could be nice if you could link to the RFE with the issue number from this comment. > > In this PR, no, this one will only implement CMoveF/D and enable the vectorization of CMoveF/D, so do some preparation for https://github.com/openjdk/jdk/pull/28231. > To guarantee the generation of CMoveI/L, seems to me we need to improve the cost model when transfrom a phi to a conditional move. I can have a invetigation later, as this impact how & whether CMoveL/I can be generated and be vectorized accordingly. File https://bugs.openjdk.org/browse/JDK-8371984 to track it. Ok. Sounds good. Just note: getting the cost model right here can be really difficult. People have played with the cost model in recent years, and it has also led to regressions in some cases. Just FYI, I'm not stopping you from trying if you like ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2533420545 From mchevalier at openjdk.org Mon Nov 17 10:21:20 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 17 Nov 2025 10:21:20 GMT Subject: RFR: 8371716: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 19:56:14 GMT, Marc Chevalier wrote: > This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. > > # Analysis > ## Obervationally > ### IGVN > During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: > > in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) > in(2): null > > We compute the join (HS' meet): > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 > > t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > But the current `_type` (of the `PhiNode` as a `TypeNode`) is > > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > > We filter `t` by `_type` > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 > and we get > > ft=java/lang/Object * > > which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 > and > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 > > > ### Verification > On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time > > _type=java/lang/Object * > > and so after filtering `t` by (new) `_type` and we get > > ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. > > ## But why?! > ### Details on type computation > In short, we are doing > > t = typeof(in(1)) / typeof(in(2)) > ft = t /\ _type (* IGVN *) > ft' = t /\ ft (* Verification *) > > and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". > > To me, the surprising fact was that the intersection > > java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > /\ > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > ~> > java/lang/Object * > > What happened to ... I've edited the description. I'd nevertheless suggest not to care too much about the test not to get sidetracked into details that are irrelevant to the issue. I actually regret I didn't write that symbolically. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3540965454 From mchevalier at openjdk.org Mon Nov 17 10:21:22 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 17 Nov 2025 10:21:22 GMT Subject: RFR: 8371716: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" In-Reply-To: <_WY0NPvc_0tlO31lVAVTl9QyznCQLl9r1nrQ4P6983U=.ecc067a2-8783-469e-850b-b9b612017e57@github.com> References: <_WY0NPvc_0tlO31lVAVTl9QyznCQLl9r1nrQ4P6983U=.ecc067a2-8783-469e-850b-b9b612017e57@github.com> Message-ID: On Mon, 17 Nov 2025 09:09:33 GMT, Damon Fenacci wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". >> >> To me, the surprising fact was that the intersection >> >> java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> /\ >> _type=java/lang/Objec... > > src/hotspot/share/opto/cfgnode.cpp line 1364: > >> 1362: const Type* first_ft = ft; >> 1363: ft = t->filter_speculative(first_ft); >> 1364: #ifdef ASSERT > > More of a flyby rather than a review but I was wondering if it would make sense to extract this assert block since it is the same as the one above. We surely can. But I'd rather avoid premature clean up as it is very uncertain to me how this issue will evolve, and maybe fundamentally change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2533502792 From snatarajan at openjdk.org Mon Nov 17 10:21:36 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 17 Nov 2025 10:21:36 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v5] In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: > The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). > > ### Fix > Implemented the suggested refactoring. > > ### Testing > Github Actions, Tier 1-3 Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: testing and review on moving code to cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26902/files - new: https://git.openjdk.org/jdk/pull/26902/files/43e7b6e8..eec2619b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=03-04 Stats: 176 lines in 2 files changed: 89 ins; 87 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26902/head:pull/26902 PR: https://git.openjdk.org/jdk/pull/26902 From snatarajan at openjdk.org Mon Nov 17 10:21:37 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 17 Nov 2025 10:21:37 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v2] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> <5qtZxVebyVn6WML3Q4508dXPwxkw-CWhD_pE6UaNfF8=.76830409-b57d-410f-a30b-c7d01b62df7f@github.com> <9e1r4VDSzP6VL3GMf8JQSDUcvwzjzy5XGKOFURXpGhk=.ce419221-79b3-44f6-b944-093b2d244f10@github.com> Message-ID: On Wed, 12 Nov 2025 07:40:30 GMT, Christian Hagedorn wrote: >> My reasoning is keep the interface and implementation separate. I have kept it this way. Will that be okay ? > > I'm not sure I understand the benefit of having it separately when the only user is in the source file and it's tightly coupled to the implementation of the `IdealGraphPrinter` class. This will expose it to other files while it's not needed. Or is it just for readability? Yes, it was mostly for readability. I do agree with you and have now followed your suggestion of moving the class to the source file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2533502542 From duke at openjdk.org Mon Nov 17 10:26:33 2025 From: duke at openjdk.org (duke) Date: Mon, 17 Nov 2025 10:26:33 GMT Subject: RFR: 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing [v2] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 15:59:32 GMT, Vishal Chand wrote: >> This PR fixes a potential SEGV and removes dead code: >> ? **Fix**: Prevent potential SEGV in VTransformReductionVectorNode - [from @shipilev] This fixes a crash in diagnostic code when isa_ReductionVector() unexpectedly returns nullptr. While this indicates the graph is already corrupted, the additional crash in `TRACE_OPTIMIZE` makes debugging harder. The fix adds defensive null checking to prevent the diagnostic crash and improve error handling. >> >> ? **Cleanup**: Remove unused first_red variable > > Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/vtransform.cpp > > Co-authored-by: Aleksey Shipil?v @vish-chan Your change (at version 0824593d2a6f4f4ec2ad5f42d303428962866a5a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28323#issuecomment-3540991934 From snatarajan at openjdk.org Mon Nov 17 10:27:05 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 17 Nov 2025 10:27:05 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v6] In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: > The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). > > ### Fix > Implemented the suggested refactoring. > > ### Testing > Github Actions, Tier 1-3 Saranya Natarajan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'master' into JDK-8349835 - testing and review on moving code to cpp - Merge branch 'master' into JDK-8349835 - addressing review comments#2 - fixing test failure - addressing review comments - changing int to bool in a struct - fix to failing test - initial fix ------------- Changes: https://git.openjdk.org/jdk/pull/26902/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=05 Stats: 207 lines in 2 files changed: 91 ins; 109 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/26902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26902/head:pull/26902 PR: https://git.openjdk.org/jdk/pull/26902 From mli at openjdk.org Mon Nov 17 10:27:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Nov 2025 10:27:06 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <8Y3gUUVCNU1ZpfRkZeJqgIUomP6NCDIQqqgN-lRgk5A=.60177ffe-52ba-46de-a099-57d73f096a49@github.com> Message-ID: On Mon, 17 Nov 2025 09:51:39 GMT, Emanuel Peter wrote: >> In this PR, no, this one will only implement CMoveF/D and enable the vectorization of CMoveF/D, so do some preparation for https://github.com/openjdk/jdk/pull/28231. >> To guarantee the generation of CMoveI/L, seems to me we need to improve the cost model when transfrom a phi to a conditional move. I can have a invetigation later, as this impact how & whether CMoveL/I can be generated and be vectorized accordingly. File https://bugs.openjdk.org/browse/JDK-8371984 to track it. > > Ok. Sounds good. Just note: getting the cost model right here can be really difficult. People have played with the cost model in recent years, and it has also led to regressions in some cases. Just FYI, I'm not stopping you from trying if you like ;) Thanks for reminding! :) That's also the reason I won't do it in this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2533522249 From snatarajan at openjdk.org Mon Nov 17 10:36:08 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 17 Nov 2025 10:36:08 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v3] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: On Tue, 28 Oct 2025 08:12:08 GMT, Roberto Casta?eda Lozano wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> addressing review comments#2 > > Thanks for working on this, Saranya! We do not have good test coverage of IGV graph dumping, so only running regular tier testing might not catch possible regressions introduced by this changeset. Consider also comparing the XML graphs dumped before and after the changeset for a few well-known methods with deterministic compilation, e.g.: > > $ ${BASELINE_JAVA} -Xbatch -XX:PrintIdealGraphLevel=6 -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphFile=before.xml > $ ${PATCHED_JAVA} -Xbatch -XX:PrintIdealGraphLevel=6 -XX:CompileOnly=java.lang.String::charAt -XX:PrintIdealGraphFile=after.xml > $ diff before.xml after.xml > > The only expected changes between the two files would be things like memory addresses, process and thread IDs, etc. But all the other properties should remain the same. @robcasloz: I followed the testing methodology you suggested for the below commands. I did hit a crash that I fixed, after that the only difference were to the memory, process, and thread ID. -XX:CompileOnly=java.lang.String::length -XX:CompileOnly=java.lang.String::charAt -XX:CompileOnly=java.util.ArrayList::get -XX:CompileOnly=java.util.ArrayList::add In this testing, I did NOT see prints for `is_cisc_alternate, rematerialize, has_call, is_float, is_vector, is_predicate, is_scalable, and must_spill`. However, these prints look straightforward. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26902#issuecomment-3541033097 From duke at openjdk.org Mon Nov 17 10:43:47 2025 From: duke at openjdk.org (Harshit470250) Date: Mon, 17 Nov 2025 10:43:47 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v4] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge remote-tracking branch 'origin/master' - remove whitespace - Resolved a bug - remove TODO comments - final size added - Final sizes - oop_decoder and load_const_optimized - ... and 4 more: https://git.openjdk.org/jdk/compare/3da054a9...01a9b46b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/91ea135e..01a9b46b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=02-03 Stats: 4616 lines in 109 files changed: 2451 ins; 1339 del; 826 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From duke at openjdk.org Mon Nov 17 11:51:22 2025 From: duke at openjdk.org (Zihao Lin) Date: Mon, 17 Nov 2025 11:51:22 GMT Subject: Integrated: 8368961: Remove redundant checks in ciField.cpp In-Reply-To: References: Message-ID: <5XkibiP2bCL1zcpgjJcWWYqUTYI7rgYcwqG4at2o6kU=.44c76277-4e07-476b-bf25-7ea04e93c656@github.com> On Fri, 7 Nov 2025 10:57:53 GMT, Zihao Lin wrote: > Remove redundant check in 'trust_final_non_static_fields' ciField.cpp > > Remove: > 1. java_lang_System check > 2. is_box_klass check > 3. java_lang_String check This pull request has now been integrated. Changeset: df35412d Author: Zihao Lin Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/df35412db1d7e883148590e24d968cfe2f5c6bbc Stats: 9 lines in 1 file changed: 0 ins; 9 del; 0 mod 8368961: Remove redundant checks in ciField.cpp Reviewed-by: bmaillard, aseoane, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/28191 From rsunderbabu at openjdk.org Mon Nov 17 12:10:06 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Mon, 17 Nov 2025 12:10:06 GMT Subject: RFR: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 09:44:57 GMT, Andrew Haley wrote: >> We have a host of tests under test/hotspot/jtreg/compiler/intrinsics/sha which checks if the SHA intrinsics flags' enable/disable setting is in sync with CPU support in the underlying platform. There might be situations where the intrinsics might not be enabled despite the hardware supporting the relevant instructions. For example, there might be reliability issues or performance issues. In such situations, the tests will fail. >> >> Till now, the approach has been to exclude the platforms where the support is yet to be provided and remove the exclusion after. This necessitates additional work on the test front. >> >> A more compact design would be make predicate probes to rely on intrinsics availability in the platform as opposed to hardware support availability. The migration to intrinsics availability would especially help update releases where feature backport might not be complete. >> >> PS: This fix can/should be propagated to other such tests as well. Once this PR gets approval, I will work on similar tests. > >> Hi, I suppose the failure may occur if we run this test case on CPU **with** SHA512 feature, but **disabling** SHA512Intrinsics. >> >> As **@requires vm.flagless** is set in this jtreg case, if we specify `-XX:-UseSHA512Intrinsics`, this test case is not tested actually. Here shows the log in my machine. >> >> ```shell >> $ make test TEST=test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java JTREG="VM_OPTIONS=-XX:-UseSHA512Intrinsics" >> Building target 'test' in configuration '/tmp/local-build-fastdebug' >> Running tests using JTREG control variable 'VM_OPTIONS=-XX:-UseSHA512Intrinsics' >> Test selection 'test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java', will run: >> * jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java >> Clean up dirs for jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java >> >> Running test 'jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java' >> Test results: no tests selected >> Report written to /tmp/local-build-fastdebug/test-results/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java/html/report.html >> Results written to /tmp/local-build-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java >> Finished running test 'jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java' >> Test report is stored in /tmp/local-build-fastdebug/test-results/jtreg_test_hotspot_jtreg_compiler_intrinsics_sha_cli_TestUseSHA512IntrinsicsOptionOnSupportedCPU_java >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR SKIP >> jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java >> 0 0 0 0 0 >> ============================== >> TEST SUCCESS >> ``` >> >> If so, I don't think it's a bug. Is there anything I misunderstood? > > That is correct. > > > // Determine if the compiler corresponding to the compilation level 'compLevel' > // and to the compilation context 'compilation_context' provides an intrinsic > // for the method 'method'. An intrinsi... @theRealAph could you please help with the review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28053#issuecomment-3541472220 From mli at openjdk.org Mon Nov 17 13:35:17 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Nov 2025 13:35:17 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v3] In-Reply-To: <46DWeMiCRNMC58wGr4T52KXbtRjU0PxQ4L6LuVFMZEo=.867fcc86-edd1-4492-9c1a-58f83d135969@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <5PzMJntiu2waMvciTLvXaUH15Fm3dXZPsDVvkuqWPI0=.68c6456a-e5d3-413e-bef8-d8da95de40bd@github.com> <46DWeMiCRNMC58wGr4T52KXbtRjU0PxQ4L6LuVFMZEo=.867fcc86-edd1-4492-9c1a-58f83d135969@github.com> Message-ID: On Mon, 17 Nov 2025 07:27:30 GMT, Emanuel Peter wrote: >> Make sense. I filed https://bugs.openjdk.org/browse/JDK-8371920 to track the task, will do it later after this pr. > > I would suggest that you already make the move from `@requires` to IR rule level restrictions. But we can look at adding `x64` and `aarch64` in the separate RFE. Merge of scalar and vector tests is done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2534089875 From mli at openjdk.org Mon Nov 17 13:35:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Nov 2025 13:35:15 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v3] In-Reply-To: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: > Hi, > > This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. > > This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. > > Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. > > # Test > ## Jtreg > > in progress... > > ## Performance > > Column names meanings: > * p: with patch > * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > * m: without patch > * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > > #### Average improvement > > NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. > > For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) > -- | -- | -- | -- > 1.022782609 | 2.198717391 | 2.162673913 | 2.199 > > Hamlin Li has updated the pull request incrementally with four additional commits since the last revision: - remove TestScalarConditionalMove.java - merge scalar and vector tests - rename to TestConditionalMove.java - add CMP_N ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28309/files - new: https://git.openjdk.org/jdk/pull/28309/files/5c0d645d..51451ab5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=01-02 Stats: 10114 lines in 4 files changed: 3824 ins; 6290 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28309/head:pull/28309 PR: https://git.openjdk.org/jdk/pull/28309 From mli at openjdk.org Mon Nov 17 13:42:18 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Nov 2025 13:42:18 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <8Y3gUUVCNU1ZpfRkZeJqgIUomP6NCDIQqqgN-lRgk5A=.60177ffe-52ba-46de-a099-57d73f096a49@github.com> Message-ID: On Mon, 17 Nov 2025 06:37:15 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - add CMove+CmpP/N tests >> - fix cmovF/D_cmpP > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2133: > >> 2131: break; >> 2132: case BoolTest::ge: >> 2133: assert(false, "Should go to BoolTest::le case"); > > I am not sure if it's safe to have these assertions for `ge` and `gt`. It seems to me that we should handle all possible condition codes here. Check this bug: https://bugs.openjdk.org/browse/JDK-8358892. We have added handling for `ge` and `gt` in `C2_MacroAssembler::enc_cmove_cmp_fp` to fix it. Make sense! Thanks! I'll add the implementation for these condition codes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2534113207 From mli at openjdk.org Mon Nov 17 13:42:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Nov 2025 13:42:20 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v3] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <5PzMJntiu2waMvciTLvXaUH15Fm3dXZPsDVvkuqWPI0=.68c6456a-e5d3-413e-bef8-d8da95de40bd@github.com> Message-ID: <0aUmOv2i1H2WJDoQV1Uirgof7C42vvPSyY73giIsKcs=.ad6b18f6-36c4-46fc-b26d-dec8d519c535@github.com> On Mon, 17 Nov 2025 07:43:02 GMT, Emanuel Peter wrote: >> Good idea! >> I can do it. What do you think about the name of the merged tests? `TestConditionalMove.java` or `TestScalarAndVectorConditionalMove.java` > > `TestConditionalMove.java` sounds good :) > > It would also be nice if we could move it out of the `irTests` directory, we would like to eventually move all tests away from it, and rather sort the tests by what they test and not by how we test them. Though now it's a little tricky because we check for both vector and scalar things. Still, I would propose that you move it under `c2/vectorization` or `c2/loopopts/superword`, since they do include vectorization tests. An alternative could also be in a new `c2/cmove` directory. I can do the move for this specific file at the last commit of this pr. Or we can move a bunch of tests (some other tests under irTests) in a separate pr, as there are `Asserts` in other tests under `irTests`. I prefer the latter one, as it put related changes in one specific pr. Plesae let me know how you think about it. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2534108304 From rcastanedalo at openjdk.org Mon Nov 17 13:57:23 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 17 Nov 2025 13:57:23 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: <8hzrSIP5a5sdMvLuTH_my204dNYT6sHOJDfOeqA8qI0=.d0d9a056-c6ac-49a4-9d44-0afbeaa473dc@github.com> References: <8hzrSIP5a5sdMvLuTH_my204dNYT6sHOJDfOeqA8qI0=.d0d9a056-c6ac-49a4-9d44-0afbeaa473dc@github.com> Message-ID: On Fri, 14 Nov 2025 14:40:49 GMT, Roberto Casta?eda Lozano wrote: >>> > It is only "hardcoded" to never let hashtags and setFuelCost escape, it just implicitly downgrades a scope on those two "dimensions". >>> >>> Is this a design choice or a constraint of the current implementation? I could imagine situations in which it could be useful to let a hashtag escape across Template boundaries, no? Something like: >>> >>> ``` >>> var innerTemplate = Template.make(() -> transparentScope(let("foo", "42"))); >>> var outerTemplate = Template.make(() -> scope( >>> innerTemplate.asToken(), >>> "// value of foo: #foo" >>> )); >>> outerTemplate.render(); >>> ``` >> >> I think this would lead to issues once you use a template recursively. What would you do if `foo` was already defined, and now you call `innerTemplate`? >> >> So I suppose it is a choice, yes. But I don't think the alternatives would be better. >> - You could just throw an exception at the second definition. But then you would need a way to check for existence of hashtag names ... not great. >> - You could just hide outer definitions... basically they would work like scoped values: you can bind and re-bind them. But that brings its own complexity that I don't want to push on the users if it's not absolutely necessary. >> >> If you really do need access to something from an outer template, you should just pass it via template argument. That makes the flow explicit. That's my opinion. >> >> Or do you already see some case where something like a "scoped value" would be really really useful? I suppose we could still add that in the future. Another thought: hooks are a bit like "scoped value" ... except that they carry no "value" ;) >> >> What do you think? > >> So I suppose it is a choice, yes. But I don't think the alternatives would be better. >> >> * You could just throw an exception at the second definition. But then you would need a way to check for existence of hashtag names ... not great. >> * You could just hide outer definitions... basically they would work like scoped values: you can bind and re-bind them. But that brings its own complexity that I don't want to push on the users if it's not absolutely necessary. >> >> If you really do need access to something from an outer template, you should just pass it via template argument. That makes the flow explicit. That's my opinion. > > Sounds OK to me, I think it would be worth capturing this rationale and advisory somewhere in the documentation. > @robcasloz Alright, I added some extra documentation of hashtag locality with [bba7152](https://github.com/openjdk/jdk/pull/27255/commits/bba71529ac081b0f53cd5106f57fb541bd3e0ead) Thanks, the new documentation looks good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3541944137 From aph at openjdk.org Mon Nov 17 14:35:14 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 17 Nov 2025 14:35:14 GMT Subject: RFR: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support [v2] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 07:16:02 GMT, Ramkumar Sunderbabu wrote: >> We have a host of tests under test/hotspot/jtreg/compiler/intrinsics/sha which checks if the SHA intrinsics flags' enable/disable setting is in sync with CPU support in the underlying platform. There might be situations where the intrinsics might not be enabled despite the hardware supporting the relevant instructions. For example, there might be reliability issues or performance issues. In such situations, the tests will fail. >> >> Till now, the approach has been to exclude the platforms where the support is yet to be provided and remove the exclusion after. This necessitates additional work on the test front. >> >> A more compact design would be make predicate probes to rely on intrinsics availability in the platform as opposed to hardware support availability. The migration to intrinsics availability would especially help update releases where feature backport might not be complete. >> >> PS: This fix can/should be propagated to other such tests as well. Once this PR gets approval, I will work on similar tests. > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > removing requires condition Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28053#pullrequestreview-3473007317 From dlunden at openjdk.org Mon Nov 17 14:41:11 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 17 Nov 2025 14:41:11 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: References: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> Message-ID: On Mon, 17 Nov 2025 07:11:40 GMT, Jatin Bhateja wrote: >> Hard-coded lists in `Matcher::should_attempt_register_biasing()` and `is_commutative_oper` look fragile and hard to verify. (Especially `is_commutative_oper` which is used to check the root of matched ideal tree.) >> >> With proper ADLC support, that information can be placed on individual AD instructions which would make it clearer what is affected. > > Hi @iwanowww , @dlunde , @eme64 , @TobiHartmann , @sviswa7 , your comments have been addressed. > Let me know if this is good to land in. Thanks for the updates @jatin-bhateja, looks good to me. I'm rerunning some tests for sanity before I click approve! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3542190601 From rcastanedalo at openjdk.org Mon Nov 17 14:51:36 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 17 Nov 2025 14:51:36 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v30] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 15:22:18 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > document hashtag locality for Roberto test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 222: > 220: * {@link #setFuelCostScope} transparent transparent non-transparent > 221: * > 222: * Thanks for adding this table, it is very useful. Please consider completing it with an entry for `transparentScope` (even if it may sound obvious from the name). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534386296 From duke at openjdk.org Mon Nov 17 14:53:53 2025 From: duke at openjdk.org (Zihao Lin) Date: Mon, 17 Nov 2025 14:53:53 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v4] In-Reply-To: References: Message-ID: > If nodes both are constant, support constant folding. Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: Add Math to Operations.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28097/files - new: https://git.openjdk.org/jdk/pull/28097/files/3157f735..c840a8c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=02-03 Stats: 7 lines in 2 files changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From rcastanedalo at openjdk.org Mon Nov 17 14:55:36 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 17 Nov 2025 14:55:36 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v30] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 15:22:18 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > document hashtag locality for Roberto test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 624: > 622: * hashtag-replacements and {@link #setFuelCost} are always implicitly > 623: * non-transparent (i.e. non-escaping) for hashtag-replacements and > 624: * {@link #setFuelCost} (e.g. a {@link #let} will not escape the template I could not parse the following sentence: "hashtag-replacements and setFuelCost are always implicitly non-transparent (i.e. non-escaping) for hashtag-replacements and setFuelCost", could you explain it (or rewrite it in a clearer way)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534401314 From mdoerr at openjdk.org Mon Nov 17 15:12:46 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 17 Nov 2025 15:12:46 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v3] In-Reply-To: <6lsTW4mcptKcVAuFHu3h39LMajICZZVDhHwrkxM6Rl8=.787cc24a-ae13-49ed-bc56-9c71ad8659b0@github.com> References: <6lsTW4mcptKcVAuFHu3h39LMajICZZVDhHwrkxM6Rl8=.787cc24a-ae13-49ed-bc56-9c71ad8659b0@github.com> Message-ID: On Fri, 14 Nov 2025 07:28:10 GMT, Shawn M Emery wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> More minor cleanup. > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 61: > >> 59: // used for everything else. >> 60: private int[] sessionKe = null; // key for encryption >> 61: private int[] sessionKd = null; // preprocessed key for decryption > > We really don't need sessionKd, since it's just assigned to K, but I'm fine leaving it as is. @smemery: I have made a proposal to remove `K`: https://github.com/TheRealMDoerr/jdk/commit/2907475958806cad6b5fc83541f66065475a93ec Please take a look! I think it's a bit better readable, but makes the change a bit larger and will probably require a Graal update. What do you prefer? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2534468284 From rcastanedalo at openjdk.org Mon Nov 17 15:18:51 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 17 Nov 2025 15:18:51 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v30] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 15:22:18 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > document hashtag locality for Roberto I only have a few minor suggestions, looks good otherwise! test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 210: > 208: /** > 209: * Samples a random {@link DataName} from the filtered set, according to the weights > 210: * of the contained {@link DataName}s, and making a hashtag replacement for both Suggestion: * of the contained {@link DataName}s, and makes a hashtag replacement for both test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 252: > 250: /** > 251: * Samples a random {@link DataName} from the filtered set, according to the weights > 252: * of the contained {@link DataName}s, and making a hashtag replacement for the Suggestion: * of the contained {@link DataName}s, and makes a hashtag replacement for the test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 183: > 181: /** > 182: * Samples a random {@link StructuralName} from the filtered set, according to the weights > 183: * of the contained {@link StructuralName}s, and making a hashtag replacement for both Suggestion: * of the contained {@link StructuralName}s, and makes a hashtag replacement for both test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 198: > 196: /** > 197: * Samples a random {@link StructuralName} from the filtered set, according to the weights > 198: * of the contained {@link StructuralName}s, and making a hashtag replacement for the Suggestion: * of the contained {@link StructuralName}s, and makes a hashtag replacement for the test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 644: > 642: * scope( > 643: * // CODE2: some code in the inner scope. Names, hashtags and setFuelCost > 644: * // does not escape the inner scope. Suggestion: * // do not escape the inner scope. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 669: > 667: /** > 668: * Creates a {@link ScopeToken} that represents a completely transparent scope, allowing > 669: * anything to escape anything. This means that {@link DataName}s, {@link StructuralName}s, "allowing anything to escape anything" is a bit imprecise (e.g. some elements could never escape the underlying Template). I suggest just removing it, since in the next sentence you are already explaining what can escape what: Suggestion: * Creates a {@link ScopeToken} that represents a completely transparent scope. * This means that {@link DataName}s, {@link StructuralName}s, test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 800: > 798: * } > 799: * """ > 800: * // CODe3: we are back in the outer scope of CODE1, and can use Suggestion: * // CODE3: we are back in the outer scope of CODE1, and can use test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 802: > 800: * // CODe3: we are back in the outer scope of CODE1, and can use > 801: * // more fuel again in nested template uses. setFuelCost > 802: * // automatically restored to what was set before the Suggestion: * // is automatically restored to what was set before the ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3473136602 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534480174 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534482378 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534457009 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534459304 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534409840 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534429863 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534439672 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534438423 From rcastanedalo at openjdk.org Mon Nov 17 15:18:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 17 Nov 2025 15:18:54 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 07:43:53 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> inflate abreviations to full names > > test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 53: > >> 51: * has such a set of hashtag replacements, and implicitly provides access to the >> 52: * hashtag replacements of the outer {@link TemplateFrame}s, up to the outermost >> 53: * of the current {@link Template}. If a hashtag replacemnt is added in a scope, > > Suggestion: > > * of the current {@link Template}. If a hashtag replacement is added in a scope, @eme64 please apply this suggestion. > test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 54: > >> 52: * hashtag replacements of the outer {@link TemplateFrame}s, up to the outermost >> 53: * of the current {@link Template}. If a hashtag replacemnt is added in a scope, >> 54: * we have to find traverse to outer scopes until we find one that is not transparent > > Suggestion: > > * we have to traverse to outer scopes until we find one that is not transparent @eme64 please apply this suggestion. > test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 65: > >> 63: * on how deeply nested the code is at a given point, correlating to the runtime that >> 64: * would be spent if the code was executed. The idea is that once the fuel is depleated, >> 65: * we do not want to nest more deaply, so that there is a reasonable chance that the > > Suggestion: > > * we do not want to nest more deeply, so that there is a reasonable chance that the @eme64 please apply this suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534468193 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534468724 PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2534469828 From duke at openjdk.org Mon Nov 17 15:36:10 2025 From: duke at openjdk.org (duke) Date: Mon, 17 Nov 2025 15:36:10 GMT Subject: RFR: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support [v2] In-Reply-To: References: Message-ID: <18fC-1tOQ5F7sktgVn3MRf7jNOwIrbAKP9CgwjjWVus=.585e8203-a765-4c64-85bd-f140c938fe58@github.com> On Fri, 14 Nov 2025 07:16:02 GMT, Ramkumar Sunderbabu wrote: >> We have a host of tests under test/hotspot/jtreg/compiler/intrinsics/sha which checks if the SHA intrinsics flags' enable/disable setting is in sync with CPU support in the underlying platform. There might be situations where the intrinsics might not be enabled despite the hardware supporting the relevant instructions. For example, there might be reliability issues or performance issues. In such situations, the tests will fail. >> >> Till now, the approach has been to exclude the platforms where the support is yet to be provided and remove the exclusion after. This necessitates additional work on the test front. >> >> A more compact design would be make predicate probes to rely on intrinsics availability in the platform as opposed to hardware support availability. The migration to intrinsics availability would especially help update releases where feature backport might not be complete. >> >> PS: This fix can/should be propagated to other such tests as well. Once this PR gets approval, I will work on similar tests. > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > removing requires condition @rsunderbabu Your change (at version c9cd82c2a4234520c9a7c97d1dbf15d09d722cc1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28053#issuecomment-3542478679 From rrich at openjdk.org Mon Nov 17 15:49:03 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 17 Nov 2025 15:49:03 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v3] In-Reply-To: References: Message-ID: <_QYQGNBzH8VQTQBAatUnEhFLfzUpLEN0qPR3FBNFb1Q=.9282aa56-2996-4d31-9fc9-1f417a7c1ca7@github.com> > With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. > > It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. > > The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. > > The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. > Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. > > So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) > > There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. > > Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. > > ##### Testing with fastdebug builds on AARCH64 and PPC64: > > hotspot_vector_1 > hotspot_vector_2 > jdk_vector > jdk_vector_sanity > > ##### The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: > > compiler/vectorapi/VectorRearrangeTest.java > jdk/incubator/vector/Byte128VectorLoadStoreTests.java > jdk/incubator/vector/Double256VectorLoadStoreTests.java > jdk/incubator/vector/Float128VectorTests.java > jdk/incubator/vector/Long256VectorLoadStoreTests.java > jdk/incubator/vector/Short128VectorLoadStoreTests.java > jdk/incubator/vector/Vector64ConversionTests.java Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Exclude IR check on riscv with rvv ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27969/files - new: https://git.openjdk.org/jdk/pull/27969/files/7729a448..73512366 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27969&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27969&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27969/head:pull/27969 PR: https://git.openjdk.org/jdk/pull/27969 From rrich at openjdk.org Mon Nov 17 15:49:05 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 17 Nov 2025 15:49:05 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v2] In-Reply-To: References: <2Ts5dNdaDuen71ZoTYdKP8UNG44epCiEsIb8DeJpvps=.24618d81-4b98-489a-962b-c04e0d561270@github.com> Message-ID: On Mon, 17 Nov 2025 03:25:33 GMT, Fei Yang wrote: >> Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: >> >> - Enhance comment >> - Fix OptoAssembly for Power 8 > > test/hotspot/jtreg/compiler/vectorapi/TestVectorSpilling.java line 79: > >> 77: @Test >> 78: @IR(counts = {IRNode.MEM_TO_REG_SPILL_COPY_TYPE, "vectorx", "> 0"}, >> 79: phase = {CompilePhase.FINAL_CODE}) > > Hi, I find this IR test is failing on riscv where we are spilling a `vectora`. Maybe we should exclude this case? > > > diff --git a/test/hotspot/jtreg/compiler/vectorapi/TestVectorSpilling.java b/test/hotspot/jtreg/compiler/vectorapi/TestVectorSpilling.java > index 5e8b9341d8e..9d9a85e174c 100644 > --- a/test/hotspot/jtreg/compiler/vectorapi/TestVectorSpilling.java > +++ b/test/hotspot/jtreg/compiler/vectorapi/TestVectorSpilling.java > @@ -76,7 +76,8 @@ static void test16ByteSpilling_runner() { > > @Test > @IR(counts = {IRNode.MEM_TO_REG_SPILL_COPY_TYPE, "vectorx", "> 0"}, > - phase = {CompilePhase.FINAL_CODE}) > + phase = {CompilePhase.FINAL_CODE}, > + applyIfCPUFeature= {"rvv", "false"}) > static long test16ByteSpilling(long l1, long l2, long l3, long l4, long l5, long l6, long l7, long l8, > long l9 /* odd stack arg */) { > // To be scalar replaced and spilled to stack Thanks @RealFYang for doing the testing. I've made the change you suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2534593378 From shade at openjdk.org Mon Nov 17 16:07:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Nov 2025 16:07:09 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v4] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Update src/hotspot/share/compiler/compiler_globals.hpp Co-authored-by: Tobias Hartmann - Revert separate patch - Final - Proper option name and bump the limits - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/2a3b01b4..f381a337 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=02-03 Stats: 176131 lines in 1102 files changed: 118972 ins; 27663 del; 29496 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From mli at openjdk.org Mon Nov 17 16:40:53 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Nov 2025 16:40:53 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v4] In-Reply-To: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: <0VUf9cNuKR6nc_V-Z2ylwW5YpmO13QUEBoDuQcctdCg=.3041a5fa-baf1-41f5-a271-854d68720fd8@github.com> > Hi, > > This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. > > This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. > > Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. > > # Test > ## Jtreg > > in progress... > > ## Performance > > Column names meanings: > * p: with patch > * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > * m: without patch > * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > > #### Average improvement > > NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. > > For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) > -- | -- | -- | -- > 1.022782609 | 2.198717391 | 2.162673913 | 2.199 > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: add BoolTest::ge/gt code and tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28309/files - new: https://git.openjdk.org/jdk/pull/28309/files/51451ab5..cf9168a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=02-03 Stats: 1159 lines in 4 files changed: 968 ins; 4 del; 187 mod Patch: https://git.openjdk.org/jdk/pull/28309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28309/head:pull/28309 PR: https://git.openjdk.org/jdk/pull/28309 From mli at openjdk.org Mon Nov 17 16:47:05 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Nov 2025 16:47:05 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v2] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <8Y3gUUVCNU1ZpfRkZeJqgIUomP6NCDIQqqgN-lRgk5A=.60177ffe-52ba-46de-a099-57d73f096a49@github.com> Message-ID: <4K5xXIGM3anJGkUHGJ75fs6X-zfM_aDNI6Bi9yifK4c=.bb898013-6dbc-4e9a-8666-e8858f87d93f@github.com> On Mon, 17 Nov 2025 13:38:40 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2133: >> >>> 2131: break; >>> 2132: case BoolTest::ge: >>> 2133: assert(false, "Should go to BoolTest::le case"); >> >> I am not sure if it's safe to have these assertions for `ge` and `gt`. It seems to me that we should handle all possible condition codes here. Check this bug: https://bugs.openjdk.org/browse/JDK-8358892. We have added handling for `ge` and `gt` in `C2_MacroAssembler::enc_cmove_cmp_fp` to fix it. > > Make sense! Thanks! > I'll add the implementation for these condition codes. I added some code and tests. But the code path for `ge`/`gt` can not be triggerred (I added some new test based on previous tests added in https://bugs.openjdk.org/browse/JDK-8358892). So for now, I think it's safer for us to keep the `assert`, in this way, in the future when we get it triggerred by some code we can compse a jtreg test and fix it. How do you think about it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2534803387 From duke at openjdk.org Mon Nov 17 17:57:03 2025 From: duke at openjdk.org (Vishal Chand) Date: Mon, 17 Nov 2025 17:57:03 GMT Subject: RFR: 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing [v2] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 17:08:28 GMT, Aleksey Shipilev wrote: >> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/vtransform.cpp >> >> Co-authored-by: Aleksey Shipil?v > > Marked as reviewed by shade (Reviewer). Thanks @shipilev and @eme64 for approvals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28323#issuecomment-3543182433 From jiangli at openjdk.org Mon Nov 17 22:57:07 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 17 Nov 2025 22:57:07 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes Message-ID: Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! ------------- Commit messages: - Fix Whitespace errors - Add TestAesGcmIntrinsic.java. The test is authored by tholenst at google.com and zlukas at google.com. - JDK-8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes Changes: https://git.openjdk.org/jdk/pull/28363/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371864 Stats: 119 lines in 2 files changed: 118 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363 PR: https://git.openjdk.org/jdk/pull/28363 From vpaprotski at openjdk.org Mon Nov 17 23:35:44 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Nov 2025 23:35:44 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: References: Message-ID: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: - whitespace - address first comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28136/files - new: https://git.openjdk.org/jdk/pull/28136/files/6d3f7794..e9133401 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=00-01 Stats: 42 lines in 5 files changed: 17 ins; 15 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28136/head:pull/28136 PR: https://git.openjdk.org/jdk/pull/28136 From vpaprotski at openjdk.org Mon Nov 17 23:35:45 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Nov 2025 23:35:45 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: References: Message-ID: On Mon, 17 Nov 2025 06:44:39 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: >> >> - whitespace >> - address first comments > > src/hotspot/cpu/x86/assembler_x86.cpp line 3867: > >> 3865: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : >> 3866: (vector_len == AVX_512bit ? VM_Version::supports_evex() : false)), ""); >> 3867: InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); > > When you check for AVX512-VL you allow accessing 128/256 bit registers from the higher register bank [X/Y]MM(16-31) > > But your assertions are nowhere checking this. I believe those asserts are in `vex_prefix_and_encode` (https://github.com/openjdk/jdk/blob/6d3f7794ee6658d48eb2120c7bfe66ac412c6d14/src/hotspot/cpu/x86/assembler_x86.cpp#L13164) and `vex_prefix` (https://github.com/openjdk/jdk/blob/6d3f7794ee6658d48eb2120c7bfe66ac412c6d14/src/hotspot/cpu/x86/assembler_x86.cpp#L13047) I also haven't found any other instruction that does this check so I could emulate the style. > src/hotspot/cpu/x86/assembler_x86.cpp line 3882: > >> 3880: >> 3881: void Assembler::evmovsldup(XMMRegister dst, KRegister mask, XMMRegister src, bool merge, int vector_len) { >> 3882: assert(VM_Version::supports_evex(), ""); > > Suggestion: > > assert(vector_len == AVX_512 || VM_Version::supports_avx512vl), ""); Took the patch, but also kept the supports_evex() assert > test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 114: > >> 112: rnd.setSeed(seed); >> 113: //Note: it might be useful to increase this number during development of new intrinsics >> 114: final int repeat = 10000000; > > Instead of high repetition count can you try tuning the tiered compilation threshold. The purpose of the test is to test various (pseudo-random) values and compare the results to the java implementation of same code. A single run-though of the test doesn't always prove that there are no bugs. A bit philosophical.. as is well known, when writing crypto, branches (conditional on secret) are disallowed; but e.g. carry propagation has the same 'conditional execution' effect. (Instead of "have you tested every branch direction" its "have you tested every carry") Besides a very careful range/overflow analysis (which I also did.. ntt functions skate very close to the int limit), exhaustive fuzz testing is the best method to find conditions that manual (range/overflow) analysis hasn't found; fuzz testing has very little math built in, so its also good at finding 'blind spots' I (and whomever has to review) might have not thought of.. > test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 145: > >> 143: coeffs1[j] = rnd.nextInt(); >> 144: coeffs2[j] = rnd.nextInt(); >> 145: } > > You can uses generators for randome initialization of array I think you meant this? coeffs1 = rnd.ints(ML_DSA_N).toArray(); coeffs2 = rnd.ints(ML_DSA_N).toArray(); Didn't know about this, thanks. It does work.. But the original purpose (perhaps misguided, but its done) was to 'factor out' the allocations; the outer loop runs many million times (I've left it running for 6+hours during development) and so I wanted a 'somewhat efficient' test. In hindsight, these (1k) arrays could probably be stack allocated, but I did not want to depend on an optimization when I could just write it without allocations in the mainline ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535460279 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535804056 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535373444 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535199249 From vpaprotski at openjdk.org Mon Nov 17 23:35:45 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Nov 2025 23:35:45 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: <_TeZd3joeNkWYg7ZOgYRwzRJJjwMcUVOfe-pdXzJTv4=.d413a241-c8de-4267-8b98-0b41c7629371@github.com> References: <_TeZd3joeNkWYg7ZOgYRwzRJJjwMcUVOfe-pdXzJTv4=.d413a241-c8de-4267-8b98-0b41c7629371@github.com> Message-ID: On Sun, 16 Nov 2025 16:47:29 GMT, Mark Powers wrote: >> Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: >> >> - whitespace >> - address first comments > > test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 123: > >> 121: try { >> 122: for (int i = 0; i < repeat; i++) { >> 123: // seed = rnd.nextLong(); > > 2 lines commented out This was useful during development and might be useful hint for debugging; instead of deleting, added a comment. Let me know if that works > test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 517: > >> 515: }; >> 516: } >> 517: // java --add-opens java.base/sun.security.provider=ALL-UNNAMED -XX:+UseDilithiumIntrinsics test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java > > This is line is useful. Not sure I would hide it at the bottom of the file. I actually meant to delete it, but will move it to the top. > test/micro/org/openjdk/bench/javax/crypto/full/MLDSABench.java line 2: > >> 1: /* >> 2: * Copyright (c) 2015, 2018, Oracle and/or its affiliates. All rights reserved. > > Copyright date. That was some copy-paste! Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535377021 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535082275 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535078538 From duke at openjdk.org Mon Nov 17 23:44:41 2025 From: duke at openjdk.org (duke) Date: Mon, 17 Nov 2025 23:44:41 GMT Subject: Withdrawn: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 16:34:52 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26827 From rsunderbabu at openjdk.org Tue Nov 18 01:02:19 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Tue, 18 Nov 2025 01:02:19 GMT Subject: Integrated: 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 07:09:38 GMT, Ramkumar Sunderbabu wrote: > We have a host of tests under test/hotspot/jtreg/compiler/intrinsics/sha which checks if the SHA intrinsics flags' enable/disable setting is in sync with CPU support in the underlying platform. There might be situations where the intrinsics might not be enabled despite the hardware supporting the relevant instructions. For example, there might be reliability issues or performance issues. In such situations, the tests will fail. > > Till now, the approach has been to exclude the platforms where the support is yet to be provided and remove the exclusion after. This necessitates additional work on the test front. > > A more compact design would be make predicate probes to rely on intrinsics availability in the platform as opposed to hardware support availability. The migration to intrinsics availability would especially help update releases where feature backport might not be complete. > > PS: This fix can/should be propagated to other such tests as well. Once this PR gets approval, I will work on similar tests. This pull request has now been integrated. Changeset: 69682167 Author: Ramkumar Sunderbabu Committer: Hao Sun URL: https://git.openjdk.org/jdk/commit/696821670e11fee003906806f081038032ac4985 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod 8293484: AArch64: TestUseSHA512IntrinsicsOptionOnSupportedCPU.java fails on CPU with SHA512 feature support Reviewed-by: haosun, aph ------------- PR: https://git.openjdk.org/jdk/pull/28053 From fyang at openjdk.org Tue Nov 18 05:08:47 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 18 Nov 2025 05:08:47 GMT Subject: RFR: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification Message-ID: Hi, please consider this test-only change fixing an IR test failure. This IR test fails on platforms without native support for `Float16`. The reason is that method `Float::floatToFloat16` is inlined into method `TestSubNodeFloatDoubleNegation.testHalfFloat`, which causes unexpected IR graph. One way to fix this would be disabling inlining of methods from the `java.lang.Float` class. After this change, we are doing `CallStaticJava` to convert between `Float16` and `Float` on these platforms: ...... 259 CallStaticJava === 5 6 7 8 1 (22 1 669 1 10 1 22 ) [[ 260 261 262 264 ]] # Static java.lang.Float::float16ToFloat float ( int ) Float16::floatValue @ bci:4 (line 876) Float16::subtract @ b ci:1 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 69) !jvms: Float16::floatValue @ bci:4 (line 876) Float16::subtract @ bci:1 (line 1185) TestSubNodeFloatDoubleNegation: :testHalfFloat @ bci:9 (line 69) ...... 562 CallStaticJava === 553 507 538 8 1 (526 1 1 1 1 559 559 ) [[ 563 564 565 567 ]] # Static java.lang.Float::floatToFloat16 short ( float ) Float16::valueOf @ bci:5 (line 361) Float16::subtra ct @ bci:9 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 67) !jvms: Float16::valueOf @ bci:5 (line 361) Float16::subtract @ bci:9 (line 1185) TestSubNodeFloatDoubleNegat ion::testHalfFloat @ bci:12 (line 67) ...... Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. ------------- Commit messages: - 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification Changes: https://git.openjdk.org/jdk/pull/28364/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28364&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372046 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28364.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28364/head:pull/28364 PR: https://git.openjdk.org/jdk/pull/28364 From dlong at openjdk.org Tue Nov 18 05:26:04 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 18 Nov 2025 05:26:04 GMT Subject: RFR: 8366076: arm32: Fix register allocation for vector instructions In-Reply-To: References: <17J8mScwi2eBCPmmmTJd0ittihe0BfqZYuPgC638L8Q=.6e87a120-905f-4a30-a6f0-7e80fd613144@github.com> <11z84H0pSO4eduTEEVcUelci_1MxZMimuwouswlt8W0=.a0d59c62-092c-4620-b4c2-c2ff62423c4e@github.com> Message-ID: On Sun, 16 Nov 2025 22:55:24 GMT, Max Verevkin wrote: >> src/hotspot/cpu/arm/arm_32.ad line 330: >> >>> 328: R_S16,R_S17,R_S18,R_S19, R_S20,R_S21,R_S22,R_S23, >>> 329: R_S24,R_S25,R_S26,R_S27, R_S28,R_S29,R_S30,R_S31); >>> 330: >> >> Isn't this the same as dflt_low_reg? > > I am not 100% sure if they are completely equivalent and `dflt_low_reg` could be used instead of defining a new class. I figured I should introduce a new class similar to how `sflt_reg` and `dflt_low_reg` are similar yet distinct. A reg_class just produces a RegMask, so there is no need to give identical masks different names. Aliasing can still be done. See `actual_dflt_reg` for example. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27071#discussion_r2536349196 From duke at openjdk.org Tue Nov 18 05:51:07 2025 From: duke at openjdk.org (duke) Date: Tue, 18 Nov 2025 05:51:07 GMT Subject: RFR: 8371121: compiler/whitebox/DeoptimizeRelocatedNMethod.java fails with C1 [v2] In-Reply-To: References: Message-ID: <1mhpkukX8wnj25pWLC8j8z5gTTyxsJHZVU03ZflpwN0=.a8f92a65-00c3-44b0-8815-7cbcdc08d822@github.com> On Thu, 13 Nov 2025 21:17:08 GMT, Chad Rakoczy wrote: >> [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121) >> >> This update aims to improve the test?s stability. A previous failure occurred because the method wasn?t compiled at the time of the check. I believe this could have occurred due to a deoptimization but I have not been able to reproduce. Previously, the test ensured compilation by repeatedly invoking the function. Instead, we now use Whitebox to add the method directly to the compile queue and wait for it to finish compiling. This approach should eliminate issues caused by deoptimization from function calls. > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Fix comment > - Block on comp instead @chadrako Your change (at version 0cba5fc77ce4c0706983974d1c6cf609565d90d2) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28246#issuecomment-3545236873 From thartmann at openjdk.org Tue Nov 18 06:20:15 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 18 Nov 2025 06:20:15 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes In-Reply-To: References: Message-ID: On Mon, 17 Nov 2025 22:34:14 GMT, Jiangli Zhou wrote: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 41: > 39: public class TestAesGcmIntrinsic { > 40: > 41: static final SecureRandom SECURE_RANDOM = newDefaultSecureRandom(); Drive-by comment: Java code should use 4x whitespace indentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2536463222 From thartmann at openjdk.org Tue Nov 18 06:50:07 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 18 Nov 2025 06:50:07 GMT Subject: RFR: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 04:57:47 GMT, Fei Yang wrote: > Hi, please consider this test-only change fixing an IR test failure. > > This IR test fails on platforms without native support for `Float16`. The reason is that method `Float::floatToFloat16` is inlined into method `TestSubNodeFloatDoubleNegation.testHalfFloat`, which causes unexpected IR graph. One way to fix this would be disabling inlining of methods from the `java.lang.Float` class. > > After this change, we are doing `CallStaticJava` to convert between `Float16` and `Float` on these platforms: > > ...... > 259 CallStaticJava === 5 6 7 8 1 (22 1 669 1 10 1 22 ) [[ 260 261 262 264 ]] # Static java.lang.Float::float16ToFloat float ( int ) Float16::floatValue @ bci:4 (line 876) Float16::subtract @ b ci:1 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 69) !jvms: Float16::floatValue @ bci:4 (line 876) Float16::subtract @ bci:1 (line 1185) TestSubNodeFloatDoubleNegation: :testHalfFloat @ bci:9 (line 69) > > ...... > > 562 CallStaticJava === 553 507 538 8 1 (526 1 1 1 1 559 559 ) [[ 563 564 565 567 ]] # Static java.lang.Float::floatToFloat16 short ( float ) Float16::valueOf @ bci:5 (line 361) Float16::subtra ct @ bci:9 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 67) !jvms: Float16::valueOf @ bci:5 (line 361) Float16::subtract @ bci:9 (line 1185) TestSubNodeFloatDoubleNegat ion::testHalfFloat @ bci:12 (line 67) > > ...... > > > Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. Test was added by [JDK-8351515](https://bugs.openjdk.org/browse/JDK-8351515), @mhaessig might want to have a look at this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28364#issuecomment-3545670495 From duke at openjdk.org Tue Nov 18 06:52:28 2025 From: duke at openjdk.org (Vishal Chand) Date: Tue, 18 Nov 2025 06:52:28 GMT Subject: Integrated: 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 13:24:14 GMT, Vishal Chand wrote: > This PR fixes a potential SEGV and removes dead code: > ? **Fix**: Prevent potential SEGV in VTransformReductionVectorNode - [from @shipilev] This fixes a crash in diagnostic code when isa_ReductionVector() unexpectedly returns nullptr. While this indicates the graph is already corrupted, the additional crash in `TRACE_OPTIMIZE` makes debugging harder. The fix adds defensive null checking to prevent the diagnostic crash and improve error handling. > > ? **Cleanup**: Remove unused first_red variable This pull request has now been integrated. Changeset: 16557739 Author: Vishal Chand Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/16557739791ada59dc1991f65a0218434df01f9e Stats: 6 lines in 1 file changed: 4 ins; 1 del; 1 mod 8371881: C2: Fix potential SEGV in VTransformReductionVectorNode tracing Reviewed-by: shade, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28323 From epeter at openjdk.org Tue Nov 18 07:59:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 07:59:23 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v31] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano - Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/bba71529..1d10b71e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=29-30 Stats: 12 lines in 4 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Tue Nov 18 08:02:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 08:02:31 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v30] In-Reply-To: References: Message-ID: <_cjv1sJ15_xweZHoIGQ62I57khgV9KtsaCY9NB1MgDw=.19d9352b-f56c-4506-bc9f-5167b2abb4bc@github.com> On Mon, 17 Nov 2025 14:48:17 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> document hashtag locality for Roberto > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 222: > >> 220: * {@link #setFuelCostScope} transparent transparent non-transparent >> 221: * >> 222: * > > Thanks for adding this table, it is very useful. Please consider completing it with an entry for `transparentScope` (even if it may sound obvious from the name). Yes, adding it. Good catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2536725969 From epeter at openjdk.org Tue Nov 18 08:16:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 08:16:52 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v32] In-Reply-To: References: Message-ID: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix up documentation for Roberto ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/1d10b71e..7c2f747f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=30-31 Stats: 15 lines in 1 file changed: 3 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From epeter at openjdk.org Tue Nov 18 08:16:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 08:16:55 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v29] In-Reply-To: References: <8hzrSIP5a5sdMvLuTH_my204dNYT6sHOJDfOeqA8qI0=.d0d9a056-c6ac-49a4-9d44-0afbeaa473dc@github.com> Message-ID: On Mon, 17 Nov 2025 13:53:30 GMT, Roberto Casta?eda Lozano wrote: >>> So I suppose it is a choice, yes. But I don't think the alternatives would be better. >>> >>> * You could just throw an exception at the second definition. But then you would need a way to check for existence of hashtag names ... not great. >>> * You could just hide outer definitions... basically they would work like scoped values: you can bind and re-bind them. But that brings its own complexity that I don't want to push on the users if it's not absolutely necessary. >>> >>> If you really do need access to something from an outer template, you should just pass it via template argument. That makes the flow explicit. That's my opinion. >> >> Sounds OK to me, I think it would be worth capturing this rationale and advisory somewhere in the documentation. > >> @robcasloz Alright, I added some extra documentation of hashtag locality with [bba7152](https://github.com/openjdk/jdk/pull/27255/commits/bba71529ac081b0f53cd5106f57fb541bd3e0ead) > > Thanks, the new documentation looks good! @robcasloz Thanks for all the suggestions! I applied them all :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3546097188 From epeter at openjdk.org Tue Nov 18 08:16:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 08:16:56 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v30] In-Reply-To: References: Message-ID: <-ECtNJNh0RR5BMxzku2ur1XOHmxC1xpkte43zYhAB7Y=.14f7602c-10ef-4389-b18f-42d2c382196a@github.com> On Mon, 17 Nov 2025 14:52:20 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> document hashtag locality for Roberto > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 624: > >> 622: * hashtag-replacements and {@link #setFuelCost} are always implicitly >> 623: * non-transparent (i.e. non-escaping) for hashtag-replacements and >> 624: * {@link #setFuelCost} (e.g. a {@link #let} will not escape the template > > I could not parse the following sentence: "hashtag-replacements and setFuelCost are always implicitly non-transparent (i.e. non-escaping) for hashtag-replacements and setFuelCost", could you explain it (or rewrite it in a clearer way)? Right, I am repeating the `hashtag-replacements and setFuelCost` part. Fixed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27255#discussion_r2536779770 From epeter at openjdk.org Tue Nov 18 08:30:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 08:30:48 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v33] In-Reply-To: References: Message-ID: <0n4X2pauKWeCJlhNspG3ls5-qlVvNtN8xkJhmdbbjxA=.959cc8d5-bc8d-4cdf-af29-434fdb9cf506@github.com> > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 142 additional commits since the last revision: - Merge branch 'master' into JDK-8367531-fix-addDataName - fix up documentation for Roberto - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano - Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano - document hashtag locality for Roberto - inflate abreviations to full names - better documentation, inspired by Christian - Update test/hotspot/jtreg/compiler/lib/template_framework/ScopeToken.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/lib/template_framework/Template.java Co-authored-by: Christian Hagedorn - ... and 132 more: https://git.openjdk.org/jdk/compare/2584e350...79377438 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27255/files - new: https://git.openjdk.org/jdk/pull/27255/files/7c2f747f..79377438 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27255&range=31-32 Stats: 191326 lines in 1438 files changed: 130958 ins; 29765 del; 30603 mod Patch: https://git.openjdk.org/jdk/pull/27255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27255/head:pull/27255 PR: https://git.openjdk.org/jdk/pull/27255 From mhaessig at openjdk.org Tue Nov 18 08:42:05 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 18 Nov 2025 08:42:05 GMT Subject: RFR: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 04:57:47 GMT, Fei Yang wrote: > Hi, please consider this test-only change fixing an IR test failure. > > This IR test fails on platforms without native support for `Float16`. The reason is that method `Float::floatToFloat16` is inlined into method `TestSubNodeFloatDoubleNegation.testHalfFloat`, which causes unexpected IR graph. One way to fix this would be disabling inlining of methods from the `java.lang.Float` class. > > After this change, we are doing `CallStaticJava` to convert between `Float16` and `Float` on these platforms: > > ...... > 259 CallStaticJava === 5 6 7 8 1 (22 1 669 1 10 1 22 ) [[ 260 261 262 264 ]] # Static java.lang.Float::float16ToFloat float ( int ) Float16::floatValue @ bci:4 (line 876) Float16::subtract @ b ci:1 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 69) !jvms: Float16::floatValue @ bci:4 (line 876) Float16::subtract @ bci:1 (line 1185) TestSubNodeFloatDoubleNegation: :testHalfFloat @ bci:9 (line 69) > > ...... > > 562 CallStaticJava === 553 507 538 8 1 (526 1 1 1 1 559 559 ) [[ 563 564 565 567 ]] # Static java.lang.Float::floatToFloat16 short ( float ) Float16::valueOf @ bci:5 (line 361) Float16::subtra ct @ bci:9 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 67) !jvms: Float16::valueOf @ bci:5 (line 361) Float16::subtract @ bci:9 (line 1185) TestSubNodeFloatDoubleNegat ion::testHalfFloat @ bci:12 (line 67) > > ...... > > > Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. Thank you for looking into this, @RealFYang. The changes look good. I tested locally on x64 with and without FP16 support and just kicked off a CI run. I'll report back with the results in a few hours. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28364#issuecomment-3546199994 From rcastanedalo at openjdk.org Tue Nov 18 08:42:35 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 18 Nov 2025 08:42:35 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v33] In-Reply-To: <0n4X2pauKWeCJlhNspG3ls5-qlVvNtN8xkJhmdbbjxA=.959cc8d5-bc8d-4cdf-af29-434fdb9cf506@github.com> References: <0n4X2pauKWeCJlhNspG3ls5-qlVvNtN8xkJhmdbbjxA=.959cc8d5-bc8d-4cdf-af29-434fdb9cf506@github.com> Message-ID: On Tue, 18 Nov 2025 08:30:48 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 142 additional commits since the last revision: > > - Merge branch 'master' into JDK-8367531-fix-addDataName > - fix up documentation for Roberto > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano > - Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano > - document hashtag locality for Roberto > - inflate abreviations to full names > - better documentation, inspired by Christian > - Update test/hotspot/jtreg/compiler/lib/template_framework/ScopeToken.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/lib/template_framework/Template.java > > Co-authored-by: Christian Hagedorn > - ... and 132 more: https://git.openjdk.org/jdk/compare/aaa24cd2...79377438 Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27255#pullrequestreview-3476193785 From shade at openjdk.org Tue Nov 18 09:04:23 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Nov 2025 09:04:23 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes In-Reply-To: References: Message-ID: On Mon, 17 Nov 2025 22:34:14 GMT, Jiangli Zhou wrote: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! Good catch! Some stylistic comments for the product fix, and suggestions for the test. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3530: > 3528: __ bind(MESG_BELOW_32_BLKS); > 3529: __ subl(len, 16 * 16); > 3530: __ cmpl(len, 256); >From the stylistic logic, this should be written as `16 * 16`, to match the surrounding `subl` and `addl`. test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 39: > 37: import javax.crypto.spec.SecretKeySpec; > 38: > 39: public class TestAesGcmIntrinsic { This sounds like `TestGCMSplitBound` or some such; it is not a generic test for AES/GCM intrinsic. test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 41: > 39: public class TestAesGcmIntrinsic { > 40: > 41: static final SecureRandom SECURE_RANDOM = newDefaultSecureRandom(); Do you really need a `SecureRandom` here? `Random RANDOM = Utils.getRandomInstance();` gets you the pre-seeded random instance, which can be used to repeatably reproduce failures. test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 93: > 91: } > 92: } > 93: for (int messageSize = SPLIT_LEN; messageSize < SPLIT_LEN + 300; messageSize++) { `[SPLIT_LEN - 300; SPLIT_LEN + 300]` for completeness, perhaps? test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 96: > 94: byte[] message = randBytes(messageSize); > 95: try { > 96: byte[] ciphertext = gcmEncrypt(key, message, aad); I believe it makes sense to check that round-trip is successful, e.g. that `decrypt(encrypt(message)) == message`. Currently we implicitly rely on exceptions being thrown from the incorrectly executing code, which is IMO too weak -- in the boundary conditions like these, there might be bugs that _do not_ manifest in visible exceptions, and just the encryption is subtly broken. test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 109: > 107: TestAesGcmIntrinsic test = new TestAesGcmIntrinsic(); > 108: long startTime = System.currentTimeMillis(); > 109: while (System.currentTimeMillis() - startTime < 60 * 1000) { I get that you want a stress test. But time-limiting puts the test into weird condition: it can have different number of iterations, depending on auxiliary load on the machine. These tests are running in parallel with lots of other tests, so it is not uncommon. Do you even need to repeat `jitFunc()` call multiple times? Looks like it traverses the interesting configurations in one go? ------------- PR Review: https://git.openjdk.org/jdk/pull/28363#pullrequestreview-3476170085 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2536853113 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2536930778 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2536948179 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2536932434 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2536921000 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2536910947 From mli at openjdk.org Tue Nov 18 09:27:44 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Nov 2025 09:27:44 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v5] In-Reply-To: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: > Hi, > > This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. > > This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. > > Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. > > # Test > ## Jtreg > > in progress... > > ## Performance > > Column names meanings: > * p: with patch > * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > * m: without patch > * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > > #### Average improvement > > NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. > > For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) > -- | -- | -- | -- > 1.022782609 | 2.198717391 | 2.162673913 | 2.199 > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: replace assert with log_warning ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28309/files - new: https://git.openjdk.org/jdk/pull/28309/files/cf9168a2..572a7b74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28309/head:pull/28309 PR: https://git.openjdk.org/jdk/pull/28309 From aseoane at openjdk.org Tue Nov 18 10:15:06 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 18 Nov 2025 10:15:06 GMT Subject: RFR: 8213762: Deprecate Xmaxjitcodesize Message-ID: This PR deprecates the `Xmaxjitcodesize` flag in JDK 26. Please see the CSR for specific details on why this flag is being deprecated and workarounds for users interested in keeping similar behaviour in the future. ------------- Commit messages: - Add deprecation notice to Xmaxjitcodesize Changes: https://git.openjdk.org/jdk/pull/28297/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28297&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8213762 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28297.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28297/head:pull/28297 PR: https://git.openjdk.org/jdk/pull/28297 From duke at openjdk.org Tue Nov 18 15:13:16 2025 From: duke at openjdk.org (Zihao Lin) Date: Tue, 18 Nov 2025 15:13:16 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v12] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - fix conflict - Merge master - remove C2AccessValuePtr - fix assert - add more assert - rid of access.addr().type() - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - ... and 4 more: https://git.openjdk.org/jdk/compare/dcba014a...329e290a ------------- Changes: https://git.openjdk.org/jdk/pull/24258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=11 Stats: 230 lines in 18 files changed: 33 ins; 55 del; 142 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From mli at openjdk.org Tue Nov 18 15:13:47 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Nov 2025 15:13:47 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 07:12:38 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify stub_id name src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2651: > 2649: address generate_counterMode_AESCrypt() { > 2650: assert(UseAESCTRIntrinsics, "need AES instructions (Zvkned extension) support"); > 2651: assert(UseZbb, "need basic bit manipulation (Zbb extension) support"); also needs an `assert(UseZvkn, "");`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2538578641 From mhaessig at openjdk.org Tue Nov 18 15:19:58 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 18 Nov 2025 15:19:58 GMT Subject: RFR: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 04:57:47 GMT, Fei Yang wrote: > Hi, please consider this test-only change fixing an IR test failure. > > This IR test fails on platforms without native support for `Float16`. The reason is that method `Float::floatToFloat16` is inlined into method `TestSubNodeFloatDoubleNegation.testHalfFloat`, which causes unexpected IR graph. One way to fix this would be disabling inlining of methods from the `java.lang.Float` class. > > After this change, we are doing `CallStaticJava` to convert between `Float16` and `Float` on these platforms: > > ...... > 259 CallStaticJava === 5 6 7 8 1 (22 1 669 1 10 1 22 ) [[ 260 261 262 264 ]] # Static java.lang.Float::float16ToFloat float ( int ) Float16::floatValue @ bci:4 (line 876) Float16::subtract @ b ci:1 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 69) !jvms: Float16::floatValue @ bci:4 (line 876) Float16::subtract @ bci:1 (line 1185) TestSubNodeFloatDoubleNegation: :testHalfFloat @ bci:9 (line 69) > > ...... > > 562 CallStaticJava === 553 507 538 8 1 (526 1 1 1 1 559 559 ) [[ 563 564 565 567 ]] # Static java.lang.Float::floatToFloat16 short ( float ) Float16::valueOf @ bci:5 (line 361) Float16::subtra ct @ bci:9 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 67) !jvms: Float16::valueOf @ bci:5 (line 361) Float16::subtract @ bci:9 (line 1185) TestSubNodeFloatDoubleNegat ion::testHalfFloat @ bci:12 (line 67) > > ...... > > > Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. Testing passed. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28364#pullrequestreview-3478350912 From mli at openjdk.org Tue Nov 18 15:21:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Nov 2025 15:21:15 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 07:12:38 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify stub_id name Hey, have brief look, and some minor comments first. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2700: > 2698: } > 2699: > 2700: void counterMode_AESCrypt(int round, Register in, Register out, Register key, Register counter, Maybe move this `counterMode_AESCrypt` above `generate_counterMode_AESCrypt`? src/hotspot/cpu/riscv/vm_version_riscv.cpp line 447: > 445: FLAG_SET_DEFAULT(UseAESCTRIntrinsics, false); > 446: } > 447: } Suggestion: if (FLAG_IS_DEFAULT(UseAESCTRIntrinsics) && UseZbb) { FLAG_SET_DEFAULT(UseAESCTRIntrinsics, true); } if (UseAESCTRIntrinsics && !UseZbb) { warning("Cannot enable UseAESCTRIntrinsics on cpu without UseZbb support."); FLAG_SET_DEFAULT(UseAESCTRIntrinsics, false); } src/hotspot/cpu/riscv/vm_version_riscv.cpp line 458: > 456: } > 457: if (UseAESCTRIntrinsics) { > 458: warning("AES/CTR intrinsics are not available on this CPU"); Suggestion: warning("Cannot enable UseAESCTRIntrinsics on cpu without UseZvkn support."); ------------- PR Review: https://git.openjdk.org/jdk/pull/25281#pullrequestreview-3478352645 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2538607124 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2538595480 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2538602383 From epeter at openjdk.org Tue Nov 18 15:23:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 15:23:27 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v4] In-Reply-To: References: Message-ID: On Mon, 17 Nov 2025 14:53:53 GMT, Zihao Lin wrote: >> If nodes both are constant, support constant folding. > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > Add Math to Operations.java test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 252: > 250: ops.add(Expression.make(LONGS, "Math.unsignedMultiplyHigh(", LONGS, ",", LONGS, ")")); > 251: > 252: // TODO: other classes. Suggestion: // TODO: rest of Math and other classes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2538606463 From duke at openjdk.org Tue Nov 18 15:23:28 2025 From: duke at openjdk.org (Zihao Lin) Date: Tue, 18 Nov 2025 15:23:28 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v4] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 15:17:00 GMT, Emanuel Peter wrote: >> Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: >> >> Add Math to Operations.java > > test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 252: > >> 250: ops.add(Expression.make(LONGS, "Math.unsignedMultiplyHigh(", LONGS, ",", LONGS, ")")); >> 251: >> 252: // TODO: other classes. > > Suggestion: > > // TODO: rest of Math and other classes. Thank you ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2538613256 From duke at openjdk.org Tue Nov 18 15:23:24 2025 From: duke at openjdk.org (Zihao Lin) Date: Tue, 18 Nov 2025 15:23:24 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v5] In-Reply-To: References: Message-ID: > If nodes both are constant, support constant folding. Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: Apply suggestion from @eme64 Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28097/files - new: https://git.openjdk.org/jdk/pull/28097/files/c840a8c8..db57746d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From epeter at openjdk.org Tue Nov 18 15:35:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 15:35:39 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v5] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 15:23:24 GMT, Zihao Lin wrote: >> If nodes both are constant, support constant folding. > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestion from @eme64 > > Co-authored-by: Emanuel Peter Would it be an idea to still have a `MulHiValue`, and then pass it in a `signed/unsigned` flag? That way we could avoid some code duplication. Because the only difference seems to be `multiply_high_signed` vs `multiply_high_unsigned`, right? You could even have a method `multiply_high` that takes such a `signedness` flag. ------------- PR Review: https://git.openjdk.org/jdk/pull/28097#pullrequestreview-3478417685 PR Comment: https://git.openjdk.org/jdk/pull/28097#issuecomment-3548206118 From epeter at openjdk.org Tue Nov 18 15:40:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 15:40:59 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v26] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 12:35:01 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add missing comma from suggestion application > > Thanks for the update Emanuel! These look good. I will now have a look at the rest of your code ? @chhagedorn @robcasloz @mhaessig Thanks for all your suggestions and in-depth reviews! I'm doing some last testing and will probably integrate tomorrow ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3548224300 From hgreule at openjdk.org Tue Nov 18 15:44:45 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 18 Nov 2025 15:44:45 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v5] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 15:23:24 GMT, Zihao Lin wrote: >> If nodes both are constant, support constant folding. > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestion from @eme64 > > Co-authored-by: Emanuel Peter Looks mostly good, and I'm fine with addressing more general implementations separately. There is some code duplication now, which could probably avoided using templates, but I'm not sure if that's any cleaner. If no one else has a problem with the duplication, we can leave it as-is. Edit: I see @eme64 had similar thoughts :) src/hotspot/share/opto/mulnode.cpp line 641: > 639: // Both are constant, directly computed the result > 640: if (longType1->is_con() && longType2->is_con()) { > 641: jlong highResult = multiply_high_unsigned(longType1->get_con(), longType2->get_con()); We are going from an unsigned value to a signed here, I think this is implementation-defined? Maybe it's better to use julong and `TypeLong::make_or_top(TypeIntPrototype{{min_jlong, max_jlong}, {highResult, highResult}, {0, 0}})`? (It might also make sense to have a helper function like `TypeLong::make_unsigned` for that, but I'll let others comment on whether that should be done separately) test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1533: > 1531: public static final String MUL_HI_L = PREFIX + "MUL_HI_L" + POSTFIX; > 1532: static { > 1533: superWordNodes(MUL_HI_L, "MulHiL"); This looks wrong, and I think it might make more sense to move these definitions closer to MUL_L. ------------- PR Review: https://git.openjdk.org/jdk/pull/28097#pullrequestreview-3478392898 PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2538693102 PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2538642420 From mli at openjdk.org Tue Nov 18 16:54:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Nov 2025 16:54:42 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 07:12:38 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify stub_id name src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2670: > 2668: __ enter(); > 2669: > 2670: Label L_EXIT; Suggestion: Label L_exit; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2538954824 From mli at openjdk.org Tue Nov 18 17:00:11 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Nov 2025 17:00:11 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 07:12:38 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify stub_id name src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2813: > 2811: __ bind(L_exit); > 2812: __ sw(used, Address(used_ptr)); > 2813: __ mv(x10, input_len); is this mv necessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2538982253 From mli at openjdk.org Tue Nov 18 17:04:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Nov 2025 17:04:06 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 07:12:38 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify stub_id name src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2663: > 2661: const Register input_len = c_rarg4; > 2662: const Register saved_encrypted_ctr = c_rarg5; > 2663: const Register used_ptr = c_rarg6; Suggestion: const Register used_len_addr = c_rarg6; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2538997048 From mli at openjdk.org Tue Nov 18 17:06:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Nov 2025 17:06:51 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 07:12:38 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify stub_id name src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2676: > 2674: // Compute #rounds for AES based on the length of the key array > 2675: __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT))); > 2676: __ mv(t0, 52); what's this `52`? I see it also in `generate_aescrypt_encryptBlock`, do they mean similar things? Can you add some comment about it? and give a name rather than use the magic number. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2539009706 From kvn at openjdk.org Tue Nov 18 17:23:32 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Nov 2025 17:23:32 GMT Subject: RFR: 8213762: Deprecate Xmaxjitcodesize In-Reply-To: References: Message-ID: <4QfNZ1lcx8Lawq_iWSm6gAQNUoChkcn3wDhDyl1C7Dk=.d4499f64-ab6e-4e98-8cdf-79ff23163347@github.com> On Thu, 13 Nov 2025 14:57:30 GMT, Anton Seoane Ampudia wrote: > This PR deprecates the `Xmaxjitcodesize` flag in JDK 26. Please see the CSR for specific details on why this flag is being deprecated and workarounds for users interested in keeping similar behaviour in the future. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28297#pullrequestreview-3478934880 From mdoerr at openjdk.org Tue Nov 18 17:37:35 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 18 Nov 2025 17:37:35 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v3] In-Reply-To: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> References: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> Message-ID: On Fri, 14 Nov 2025 17:21:50 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > More minor cleanup. @valeriep, @jnimeh: I've seen that you have reviewed other changes in this area and I need reviews from Security Group people. I will certainly find reviewers for the hotspot part. May I ask you to take a look at the Java part? I would slightly prefer doing a bit more changes, but wanted to check with you, first: https://github.com/TheRealMDoerr/jdk/commit/2907475958806cad6b5fc83541f66065475a93ec ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3548795174 From mli at openjdk.org Tue Nov 18 17:49:21 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Nov 2025 17:49:21 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 07:12:38 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify stub_id name src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2709: > 2707: // while (used < BLOCK_SIZE) { > 2708: // if (len == 0) goto L_exit; > 2709: // out = in ^ saved_encrypted_ctr[used]); do you mean `*out = *in ^ saved_encrypted_ctr[used]);`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2539144619 From duke at openjdk.org Tue Nov 18 18:23:21 2025 From: duke at openjdk.org (Shawn M Emery) Date: Tue, 18 Nov 2025 18:23:21 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v3] In-Reply-To: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> References: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> Message-ID: <_rE68-lxqQMg8l1T9Mj2zl3vf2eXCSNc3SpIKRNOFvA=.6a9f3cbb-ef76-4079-bdec-a37f12d337fa@github.com> On Fri, 14 Nov 2025 17:21:50 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > More minor cleanup. @valeriepeng or @jnimeh are good choices for review and someone besides myself will need to be a reviewer given that I don't have reviewer privileges. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3548990626 From mli at openjdk.org Tue Nov 18 18:27:21 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Nov 2025 18:27:21 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: <5HbBb-mjtZWqWTu-HQe7KrRyHG5z-UK4rbVhMzLv4bw=.b1b7e986-dbcf-4ab0-86b4-513f3f1f91ae@github.com> On Thu, 13 Nov 2025 07:12:38 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify stub_id name Some more comments and questions. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2708: > 2706: // L_encrypt_next: > 2707: // while (used < BLOCK_SIZE) { > 2708: // if (len == 0) goto L_exit; The logic of the code here is different from the logic of assembly code. Here it checks `len == 0` at the beginning of while loop; assembly code checks `len == 0` at the end of while loop. Will this difference bring some logic difference in some corner case? If not, why make it a bit different from each other? Does it bring some performance difference with following change? Label L_next, L_main_loop, L_exit; // remove L_encrypt_next ... __ bind(L_next); __ bgeu(used, block_size, L_main_loop); __ beqz(len, L_exit); ... // scalar processing __ subi(len, len, 1); __ j(L_next); ... __ mv(used, 0); // Check if we have a full block_size __ bltu(len, block_size, L_next); // remove L_encrypt_next ... Ask this question because the change make the comment and assembly implementation consistent, and the code easy to understand. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2748: > 2746: }; > 2747: > 2748: __ vsetivli(x0, 4, Assembler::e32, Assembler::m1); A general question, can we make it bigger than `4`, or even `m2`? ------------- PR Review: https://git.openjdk.org/jdk/pull/25281#pullrequestreview-3479185242 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2539244466 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2539247773 From valeriep at openjdk.org Tue Nov 18 18:43:07 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Tue, 18 Nov 2025 18:43:07 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v3] In-Reply-To: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> References: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> Message-ID: <8NWK4oQBOjqg1Z7D-NsWuWn_ZhU9E6jWSWedJJQSJ08=.b0c9ef61-7880-4300-a90a-9b89d0a1ec8f@github.com> On Fri, 14 Nov 2025 17:21:50 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > More minor cleanup. src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 62: > 60: private int[] sessionKe = null; // key for encryption > 61: private int[] sessionKd = null; // preprocessed key for decryption > 62: private int[] K = null; // preprocessed key in case of decryption I find the comment confusing as `K` is sometimes assigned with `sessionKe`, so it can't be used only for decryption? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2539296610 From valeriep at openjdk.org Tue Nov 18 18:55:05 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Tue, 18 Nov 2025 18:55:05 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v3] In-Reply-To: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> References: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> Message-ID: On Fri, 14 Nov 2025 17:21:50 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > More minor cleanup. I can review until COB this Thursday, then I will be on vacation and return on Dec 2nd. Just FYI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3549101690 From epeter at openjdk.org Tue Nov 18 20:19:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 20:19:37 GMT Subject: RFR: 8213762: Deprecate Xmaxjitcodesize In-Reply-To: References: Message-ID: <2g9ZwjJFZlz8wfEjQbi87xDgbNFB27WcGCja3XJJB8o=.4a43e98e-f5e4-4d5d-baa9-bf96dabd5cac@github.com> On Thu, 13 Nov 2025 14:57:30 GMT, Anton Seoane Ampudia wrote: > This PR deprecates the `Xmaxjitcodesize` flag in JDK 26. Please see the CSR for specific details on why this flag is being deprecated and workarounds for users interested in keeping similar behaviour in the future. Looks good to me. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28297#pullrequestreview-3479639066 From epeter at openjdk.org Tue Nov 18 20:27:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 20:27:07 GMT Subject: RFR: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 04:57:47 GMT, Fei Yang wrote: > Hi, please consider this test-only change fixing an IR test failure. > > This IR test fails on platforms without native support for `Float16`. The reason is that method `Float::floatToFloat16` is inlined into method `TestSubNodeFloatDoubleNegation.testHalfFloat`, which causes unexpected IR graph. One way to fix this would be disabling inlining of methods from the `java.lang.Float` class. > > After this change, we are doing `CallStaticJava` to convert between `Float16` and `Float` on these platforms: > > ...... > 259 CallStaticJava === 5 6 7 8 1 (22 1 669 1 10 1 22 ) [[ 260 261 262 264 ]] # Static java.lang.Float::float16ToFloat float ( int ) Float16::floatValue @ bci:4 (line 876) Float16::subtract @ b ci:1 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 69) !jvms: Float16::floatValue @ bci:4 (line 876) Float16::subtract @ bci:1 (line 1185) TestSubNodeFloatDoubleNegation: :testHalfFloat @ bci:9 (line 69) > > ...... > > 562 CallStaticJava === 553 507 538 8 1 (526 1 1 1 1 559 559 ) [[ 563 564 565 567 ]] # Static java.lang.Float::floatToFloat16 short ( float ) Float16::valueOf @ bci:5 (line 361) Float16::subtra ct @ bci:9 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 67) !jvms: Float16::valueOf @ bci:5 (line 361) Float16::subtract @ bci:9 (line 1185) TestSubNodeFloatDoubleNegat ion::testHalfFloat @ bci:12 (line 67) > > ...... > > > Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. Looks reasonable. test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java line 43: > 41: TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", > 42: "-XX:CompileCommand=inline,jdk.incubator.vector.Float16::*", > 43: "-XX:CompileCommand=dontinline,java.lang.Float::*"); Why not just limit to `Float.floatToIntBits`, and add a code comment why you have it here? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28364#pullrequestreview-3479657708 PR Review Comment: https://git.openjdk.org/jdk/pull/28364#discussion_r2539615633 From vlivanov at openjdk.org Tue Nov 18 21:41:44 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 18 Nov 2025 21:41:44 GMT Subject: RFR: 8280469: C2: CHA support for interface calls when inlining through method handle linker [v2] In-Reply-To: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> References: <2GnXbYUICH6o4udyZQEqlCL6-jz9-CzSnUrZmkSbP4E=.a1d35eb5-5a62-4aff-9544-e1e0716761db@github.com> Message-ID: <_dVgA85nf4icqZbXvxl3v5IHONCJK65mQ1xlNQsP-aA=.83bed20a-21d3-49dd-922c-f431dc563916@github.com> On Mon, 3 Nov 2025 18:38:13 GMT, Vladimir Ivanov wrote: >> Expand the optimization for interface calls introduced by [JDK-6986483](https://bugs.openjdk.org/browse/JDK-6986483) to calls through `MethodHandle.linkToInterface`. >> >> The implementation is straightforward except the fact that symbolic information is lost during `MemberName` resolution. The fix uses declaring class instead, but it's more conservative than what is done for invokeinterface case. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > naming Thanks for the reviews, Vladimir, Roland, and Chen. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28094#issuecomment-3549501809 From mdoerr at openjdk.org Tue Nov 18 21:48:12 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 18 Nov 2025 21:48:12 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v4] In-Reply-To: References: Message-ID: <0HJnSUSQA8RuwnNxu-SiGvZTzHYLJ5kY0_B6lG2EbAQ=.10868fac-1516-4a80-b4e5-9ff14997ba01@github.com> > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Remove K from AES_Crypt ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28299/files - new: https://git.openjdk.org/jdk/pull/28299/files/621616a4..2b981288 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=02-03 Stats: 23 lines in 3 files changed: 6 ins; 4 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/28299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28299/head:pull/28299 PR: https://git.openjdk.org/jdk/pull/28299 From mdoerr at openjdk.org Tue Nov 18 21:48:15 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 18 Nov 2025 21:48:15 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v3] In-Reply-To: <8NWK4oQBOjqg1Z7D-NsWuWn_ZhU9E6jWSWedJJQSJ08=.b0c9ef61-7880-4300-a90a-9b89d0a1ec8f@github.com> References: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> <8NWK4oQBOjqg1Z7D-NsWuWn_ZhU9E6jWSWedJJQSJ08=.b0c9ef61-7880-4300-a90a-9b89d0a1ec8f@github.com> Message-ID: <4RdGgD3PA7s5RYgaZsHA-V2pgqh8BrP19FczkmVYDbM=.f9cb08ec-2fb7-4af6-9ad2-f232fb4a9004@github.com> On Tue, 18 Nov 2025 18:40:12 GMT, Valerie Peng wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> More minor cleanup. > > src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 62: > >> 60: private int[] sessionKe = null; // key for encryption >> 61: private int[] sessionKd = null; // preprocessed key for decryption >> 62: private int[] K = null; // preprocessed key in case of decryption > > I find the comment confusing as `K` is sometimes assigned with `sessionKe`, so it can't be used only for decryption? Thanks for looking at it! I've merged my additional proposal. `K` is removed, now. Does the Java part look ok? I'll ask for a hotspot review once the Java part is fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2539714325 From valeriep at openjdk.org Tue Nov 18 21:48:16 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Tue, 18 Nov 2025 21:48:16 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v3] In-Reply-To: <4RdGgD3PA7s5RYgaZsHA-V2pgqh8BrP19FczkmVYDbM=.f9cb08ec-2fb7-4af6-9ad2-f232fb4a9004@github.com> References: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> <8NWK4oQBOjqg1Z7D-NsWuWn_ZhU9E6jWSWedJJQSJ08=.b0c9ef61-7880-4300-a90a-9b89d0a1ec8f@github.com> <4RdGgD3PA7s5RYgaZsHA-V2pgqh8BrP19FczkmVYDbM=.f9cb08ec-2fb7-4af6-9ad2-f232fb4a9004@github.com> Message-ID: On Tue, 18 Nov 2025 21:38:00 GMT, Martin Doerr wrote: >> src/java.base/share/classes/com/sun/crypto/provider/AES_Crypt.java line 62: >> >>> 60: private int[] sessionKe = null; // key for encryption >>> 61: private int[] sessionKd = null; // preprocessed key for decryption >>> 62: private int[] K = null; // preprocessed key in case of decryption >> >> I find the comment confusing as `K` is sometimes assigned with `sessionKe`, so it can't be used only for decryption? > > Thanks for looking at it! I've merged my additional proposal. `K` is removed, now. Does the Java part look ok? I'll ask for a hotspot review once the Java part is fine. Java part looks fine to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2539716623 From duke at openjdk.org Tue Nov 18 21:49:48 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 18 Nov 2025 21:49:48 GMT Subject: Integrated: 8371121: compiler/whitebox/DeoptimizeRelocatedNMethod.java fails with C1 In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 00:59:54 GMT, Chad Rakoczy wrote: > [JDK-8371121](https://bugs.openjdk.org/browse/JDK-8371121) > > This update aims to improve the test?s stability. A previous failure occurred because the method wasn?t compiled at the time of the check. I believe this could have occurred due to a deoptimization but I have not been able to reproduce. Previously, the test ensured compilation by repeatedly invoking the function. Instead, we now use Whitebox to add the method directly to the compile queue and wait for it to finish compiling. This approach should eliminate issues caused by deoptimization from function calls. This pull request has now been integrated. Changeset: 27a38d90 Author: Chad Rakoczy Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/27a38d9093958ae4851bc61b8d3f0d71dc780823 Stats: 11 lines in 1 file changed: 2 ins; 7 del; 2 mod 8371121: compiler/whitebox/DeoptimizeRelocatedNMethod.java fails with C1 Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28246 From epeter at openjdk.org Tue Nov 18 21:50:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Nov 2025 21:50:37 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash In-Reply-To: References: Message-ID: <_o15-0h9_Lrf_n8VTWbumbT3ulKT6zOBLJ3E-dqIdvQ=.b2a914ad-c518-420e-b19a-f018aa0cbd41@github.com> On Fri, 14 Nov 2025 19:56:14 GMT, Marc Chevalier wrote: > This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. > > # Analysis > ## Obervationally > ### IGVN > During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: > > in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) > in(2): null > > We compute the join (HS' meet): > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 > > t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > But the current `_type` (of the `PhiNode` as a `TypeNode`) is > > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > > We filter `t` by `_type` > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 > and we get > > ft=java/lang/Object * > > which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 > and > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 > > > ### Verification > On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time > > _type=java/lang/Object * > > and so after filtering `t` by (new) `_type` and we get > > ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. > > ## But why?! > ### Details on type computation > In short, we are doing > > t = typeof(in(1)) / typeof(in(2)) > ft = t /\ _type (* IGVN *) > ft' = t /\ ft (* Verification *) > > and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". > > To me, the surprising fact was that the intersection > > java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > /\ > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > ~> > java/lang/Object * > > What happened to ... test/hotspot/jtreg/compiler/igvn/ClashingSpeculativeTypePhiNode.java line 27: > 25: /** > 26: * @test > 27: * @bug 8360561 Suggestion: * @bug 8371716 Or was the other number on purpose? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2539626573 From vlivanov at openjdk.org Tue Nov 18 22:32:20 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 18 Nov 2025 22:32:20 GMT Subject: Integrated: 8280469: C2: CHA support for interface calls when inlining through method handle linker In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 21:34:27 GMT, Vladimir Ivanov wrote: > Expand the optimization for interface calls introduced by [JDK-6986483](https://bugs.openjdk.org/browse/JDK-6986483) to calls through `MethodHandle.linkToInterface`. > > The implementation is straightforward except the fact that symbolic information is lost during `MemberName` resolution. The fix uses declaring class instead, but it's more conservative than what is done for invokeinterface case. > > Testing: hs-tier1 - hs-tier5 This pull request has now been integrated. Changeset: 256a9bef Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/256a9beffc106d6657a912a33f97e7f97acbb1e1 Stats: 206 lines in 4 files changed: 174 ins; 2 del; 30 mod 8280469: C2: CHA support for interface calls when inlining through method handle linker Reviewed-by: kvn, roland ------------- PR: https://git.openjdk.org/jdk/pull/28094 From vlivanov at openjdk.org Tue Nov 18 23:55:20 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 18 Nov 2025 23:55:20 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v12] In-Reply-To: References: Message-ID: On Sat, 15 Nov 2025 02:24:47 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions Looks much better now, Jatin. It looks like `Matcher::should_attempt_register_biasing()` has some implicit expectations about `mdef` shape. Is it possible to materialize them (as asserts on mach nodes with `Flag_ndd_demotable` or `Flag_ndd_commutative` flags set)? So, a misplaced declaration can be caught during testing. src/hotspot/cpu/x86/x86.ad line 2641: > 2639: } > 2640: > 2641: if (mdef->num_opnds() <= oper_index || mdef->operand_index(oper_index) < 0) { Move `mdef->operand_num_edges(oper_index) == 1` check here? src/hotspot/cpu/x86/x86.ad line 2648: > 2646: // can be demoted to REX/REX2 encodings. For commutative operations with register > 2647: // operands, allocation of definition operand is biased towards both the operands. > 2648: return (((mdef->flags() & Node::PD::Flag_ndd_demotable) != 0) && It is called either with ` oper_index == 1` or ` oper_index == 2`. Can you make it explicit that any other operand doesn't participate in register biasing? Also, I'd expand the check, so it becomes clear that 1st operand requires `Flag_ndd_demotable` and 2nd requires `Flag_ndd_demotable` + `Flag_ndd_commutative` set. src/hotspot/share/opto/matcher.hpp line 509: > 507: > 508: public: > 509: static bool should_attempt_register_biasing(const MachNode* mdef, int oper_index); I suggest to call it `is_register_biasing_candidate(const MachNode* mdef, int oper_index)`. ------------- PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3480085019 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2539963456 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2539976175 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2539955192 From fyang at openjdk.org Wed Nov 19 03:14:52 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 19 Nov 2025 03:14:52 GMT Subject: RFR: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification [v2] In-Reply-To: References: Message-ID: > Hi, please consider this test-only change fixing an IR test failure. > > This IR test fails on platforms without native support for `Float16`. The reason is that method `Float::floatToFloat16` is inlined into method `TestSubNodeFloatDoubleNegation.testHalfFloat`, which causes unexpected IR graph. One way to fix this would be disabling inlining of methods from the `java.lang.Float` class. > > After this change, we are doing `CallStaticJava` to convert between `Float16` and `Float` on these platforms: > > ...... > 259 CallStaticJava === 5 6 7 8 1 (22 1 669 1 10 1 22 ) [[ 260 261 262 264 ]] # Static java.lang.Float::float16ToFloat float ( int ) Float16::floatValue @ bci:4 (line 876) Float16::subtract @ b ci:1 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 69) !jvms: Float16::floatValue @ bci:4 (line 876) Float16::subtract @ bci:1 (line 1185) TestSubNodeFloatDoubleNegation: :testHalfFloat @ bci:9 (line 69) > > ...... > > 562 CallStaticJava === 553 507 538 8 1 (526 1 1 1 1 559 559 ) [[ 563 564 565 567 ]] # Static java.lang.Float::floatToFloat16 short ( float ) Float16::valueOf @ bci:5 (line 361) Float16::subtra ct @ bci:9 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 67) !jvms: Float16::valueOf @ bci:5 (line 361) Float16::subtract @ bci:9 (line 1185) TestSubNodeFloatDoubleNegat ion::testHalfFloat @ bci:12 (line 67) > > ...... > > > Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28364/files - new: https://git.openjdk.org/jdk/pull/28364/files/f9cefa48..3ef234bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28364&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28364&range=00-01 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28364.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28364/head:pull/28364 PR: https://git.openjdk.org/jdk/pull/28364 From fyang at openjdk.org Wed Nov 19 03:14:55 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 19 Nov 2025 03:14:55 GMT Subject: RFR: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification [v2] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 20:22:55 GMT, Emanuel Peter wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Review > > test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java line 43: > >> 41: TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", >> 42: "-XX:CompileCommand=inline,jdk.incubator.vector.Float16::*", >> 43: "-XX:CompileCommand=dontinline,java.lang.Float::*"); > > Why not just limit to `Float.floatToIntBits`, and add a code comment why you have it here? Done. Verified on aarch64, x86_64 and riscv64 platforms. Please take another look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28364#discussion_r2540333230 From wenanjian at openjdk.org Wed Nov 19 05:00:22 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 19 Nov 2025 05:00:22 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: <5HbBb-mjtZWqWTu-HQe7KrRyHG5z-UK4rbVhMzLv4bw=.b1b7e986-dbcf-4ab0-86b4-513f3f1f91ae@github.com> References: <5HbBb-mjtZWqWTu-HQe7KrRyHG5z-UK4rbVhMzLv4bw=.b1b7e986-dbcf-4ab0-86b4-513f3f1f91ae@github.com> Message-ID: On Tue, 18 Nov 2025 18:23:43 GMT, Hamlin Li wrote: > Some more comments and questions. Thanks for the careful reviews! I will check the comments and reply one by one later ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3550805556 From wenanjian at openjdk.org Wed Nov 19 07:29:08 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 19 Nov 2025 07:29:08 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v27] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: update some comments, names and Pseudocode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/c1e29200..5bdfc649 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=25-26 Stats: 162 lines in 2 files changed: 77 ins; 71 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Wed Nov 19 07:29:16 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 19 Nov 2025 07:29:16 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 15:10:49 GMT, Hamlin Li wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> modify stub_id name > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2651: > >> 2649: address generate_counterMode_AESCrypt() { >> 2650: assert(UseAESCTRIntrinsics, "need AES instructions (Zvkned extension) support"); >> 2651: assert(UseZbb, "need basic bit manipulation (Zbb extension) support"); > > also needs an `assert(UseZvkn, "");`? fixed! > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2663: > >> 2661: const Register input_len = c_rarg4; >> 2662: const Register saved_encrypted_ctr = c_rarg5; >> 2663: const Register used_ptr = c_rarg6; > > Suggestion: > > const Register used_len_addr = c_rarg6; done > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2670: > >> 2668: __ enter(); >> 2669: >> 2670: Label L_EXIT; > > Suggestion: > > Label L_exit; I try to make it different from the L_exit in counterMode_AESCrypt function, should I change this to L_exit2 or L_exit_main? > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2676: > >> 2674: // Compute #rounds for AES based on the length of the key array >> 2675: __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT))); >> 2676: __ mv(t0, 52); > > what's this `52`? I see it also in `generate_aescrypt_encryptBlock`, do they mean similar things? > Can you add some comment about it? and give a name rather than use the magic number. key length could be only {11, 13, 15} * 4 = {44, 52, 60}?I notice that x86 and aarch64 use directly 52?I think add some more comment will be enough? > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2700: > >> 2698: } >> 2699: >> 2700: void counterMode_AESCrypt(int round, Register in, Register out, Register key, Register counter, > > Maybe move this `counterMode_AESCrypt` above `generate_counterMode_AESCrypt`? good point, fixed it! > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2708: > >> 2706: // L_encrypt_next: >> 2707: // while (used < BLOCK_SIZE) { >> 2708: // if (len == 0) goto L_exit; > > The logic of the code here is different from the logic of assembly code. > Here it checks `len == 0` at the beginning of while loop; assembly code checks `len == 0` at the end of while loop. > Will this difference bring some logic difference in some corner case? If not, why make it a bit different from each other? Does it bring some performance difference with following change? > > Label L_next, L_main_loop, L_exit; // remove L_encrypt_next > ... > __ bind(L_next); > __ bgeu(used, block_size, L_main_loop); > __ beqz(len, L_exit); > ... // scalar processing > __ subi(len, len, 1); > __ j(L_next); > ... > __ mv(used, 0); > // Check if we have a full block_size > __ bltu(len, block_size, L_next); // remove L_encrypt_next > ... > > > Ask this question because the change make the comment and assembly implementation consistent, and the code easy to understand. Oh sure, There is indeed a bit of Pseudocode logic here that is consistent with the previous version, and I'll make some modifications. > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2709: > >> 2707: // while (used < BLOCK_SIZE) { >> 2708: // if (len == 0) goto L_exit; >> 2709: // out = in ^ saved_encrypted_ctr[used]); > > do you mean `*out = *in ^ saved_encrypted_ctr[used]);`? yes, fixed it > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2813: > >> 2811: __ bind(L_exit); >> 2812: __ sw(used, Address(used_ptr)); >> 2813: __ mv(x10, input_len); > > is this mv necessary? it's a return value saved to x10, it seems necessary according to aarch64 and x86, aarch64 used r0 to save it and x86 used rax > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 447: > >> 445: FLAG_SET_DEFAULT(UseAESCTRIntrinsics, false); >> 446: } >> 447: } > > Suggestion: > > if (FLAG_IS_DEFAULT(UseAESCTRIntrinsics) && UseZbb) { > FLAG_SET_DEFAULT(UseAESCTRIntrinsics, true); > } > > if (UseAESCTRIntrinsics && !UseZbb) { > warning("Cannot enable UseAESCTRIntrinsics on cpu without UseZbb support."); > FLAG_SET_DEFAULT(UseAESCTRIntrinsics, false); > } Thanks, fixed! > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 458: > >> 456: } >> 457: if (UseAESCTRIntrinsics) { >> 458: warning("AES/CTR intrinsics are not available on this CPU"); > > Suggestion: > > warning("Cannot enable UseAESCTRIntrinsics on cpu without UseZvkn support."); Thanks, fixed it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540834580 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540838391 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540836978 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540838774 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540835345 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540839090 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540845734 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540844372 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540834756 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540835074 From wenanjian at openjdk.org Wed Nov 19 07:32:29 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 19 Nov 2025 07:32:29 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: <5HbBb-mjtZWqWTu-HQe7KrRyHG5z-UK4rbVhMzLv4bw=.b1b7e986-dbcf-4ab0-86b4-513f3f1f91ae@github.com> References: <5HbBb-mjtZWqWTu-HQe7KrRyHG5z-UK4rbVhMzLv4bw=.b1b7e986-dbcf-4ab0-86b4-513f3f1f91ae@github.com> Message-ID: <9cU7TShltwcj7AsPtyogUttRcBMrWnf3bWduds9GJhc=.30f2747e-5f37-4853-a260-9ea25ab8109d@github.com> On Tue, 18 Nov 2025 18:23:33 GMT, Hamlin Li wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> modify stub_id name > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2748: > >> 2746: }; >> 2747: >> 2748: __ vsetivli(x0, 4, Assembler::e32, Assembler::m1); > > A general question, can we make it bigger than `4`, or even `m2`? This is a good question! I spent a relatively long time on it earlier. Initially, I tried m2 and m4. In the case of m4, I had already developed a version passed all the test(which really take a long time to test), it seems faster, but since the Java API supports non-complete block data encrypt or decrypt, it is difficult to ensure the time for counter increment is consistent under various circumstances, which may pose a security risk thanks to the remind of Andrew. Additionally, the Java API allows the counter to grow up to 128 bits, RV does not have a very suitable vector 128-bit add currently. Using other types such as 64-bit requires consideration of the overflow issue, and using a version higher than m1 makes it more harder to ensure the time for each counter increment. Based on these considerations, I selected m1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2540859234 From bmaillard at openjdk.org Wed Nov 19 08:31:05 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 19 Nov 2025 08:31:05 GMT Subject: RFR: 8371674: C2 fails with Missed optimization opportunity in PhaseIterGVN for MoveL2D [v2] In-Reply-To: References: Message-ID: > This PR addresses yet another missed optimization in `PhaseIterGVN`. The way this optimization is triggered is a bit different this time though, and the notification is missing in `Node::has_special_unique_user`. > > ## Analysis > > The affected optimization is the transformation of `MoveX2Y (LoadX mem)` into `LoadY mem`. This is implemented in `MoveNode::Ideal`. The optimization is as follows: > > ```c++ > // Fold reinterpret cast into memory operation: > // MoveX2Y (LoadX mem) => LoadY mem > LoadNode* ld = in(1)->isa_Load(); > if (ld != nullptr && (ld->outcnt() == 1)) { // replace only > const Type* rt = bottom_type(); > if (ld->has_reinterpret_variant(rt)) { > if (phase->C->post_loop_opts_phase()) { > return ld->convert_to_reinterpret_load(*phase, rt); > } else { > // attempt the transformation once loop opts are over > phase->C->record_for_post_loop_opts_igvn(this); > } > } > } > > > The optimization is triggered only if the input is a `LoadNode` and the `MoveNode` is its only user. This is a relatively unusual pattern. > > The bug was found by the fuzzer. At some point during IGVN, we have the following subgraph: > > > CountedLoop LoadL > \ / \ > Phi MoveL2D > > In `RegionNode::Ideal`, we end up calling `set_req_X` on the `Phi` node to delete the edge from the `Phi` node to `LoadL`. As a result, the `LoadL` node only has one user left, and the `MoveNode::Ideal` gets triggered at the next verification pass. > > ## Proposed Solution > > Add this particular case to `Node::has_special_unique_user`, which gets called by `Node::set_req_X`. > > ## Summary of changes > > This PR brings the following changes: > - Detect the optimization pattern in `Node::has_special_unique_user`. > - Add new test `TestMissingOptMoveX2YLoadX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. I tried to get a reproducer for each of the `Move` nodes, but I was only able to get one for `MoveL2D`. > > ### Testing > > - [x] https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371674 > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Rename test and add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28290/files - new: https://git.openjdk.org/jdk/pull/28290/files/7eecac24..2e9f05e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28290&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28290&range=00-01 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28290/head:pull/28290 PR: https://git.openjdk.org/jdk/pull/28290 From bmaillard at openjdk.org Wed Nov 19 08:31:08 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 19 Nov 2025 08:31:08 GMT Subject: RFR: 8371674: C2 fails with Missed optimization opportunity in PhaseIterGVN for MoveL2D [v2] In-Reply-To: <0-ADAzn-YzszXJq-OaAv_PT8sLgxNkGOSLrfMpNZdYM=.279ef48b-3267-42a0-8273-0ca398eb5284@github.com> References: <0-ADAzn-YzszXJq-OaAv_PT8sLgxNkGOSLrfMpNZdYM=.279ef48b-3267-42a0-8273-0ca398eb5284@github.com> Message-ID: On Fri, 14 Nov 2025 16:08:30 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename test and add comment > > test/hotspot/jtreg/compiler/c2/TestMissingOptMoveX2YLoadX.java line 54: > >> 52: while (++e < 37) { >> 53: for (f = 1; f < 7; f++) { >> 54: h >>>= (int)(--g - Double.longBitsToDouble(j[e])); > > Drive-by comment, might review more fully next week: could the same happen with `MoveI2F`? Or with `MoveD2L`, i.e. `Double.doubleRawBitsToLong`? Probably yes. Not sure if it's worth duplicating the test, up to you. Theoretically yes, but as mentioned in the description I could only reproduce it with `Double.longBitsToDouble`. I added a comment in the test to make it clear there as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28290#discussion_r2541015656 From galder at openjdk.org Wed Nov 19 08:40:37 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 19 Nov 2025 08:40:37 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax Message-ID: Trivial cleanup to move tests out of a test class whose description does not match these tests ------------- Commit messages: - 8371792: Refactor barrier loop tests out of TestIfMinMax Changes: https://git.openjdk.org/jdk/pull/28385/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28385&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371792 Stats: 123 lines in 2 files changed: 86 ins; 36 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28385/head:pull/28385 PR: https://git.openjdk.org/jdk/pull/28385 From bmaillard at openjdk.org Wed Nov 19 08:49:16 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 19 Nov 2025 08:49:16 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Fri, 14 Nov 2025 07:19:12 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/phaseX.cpp line 1101: >> >>> 1099: bool failure = verify_Identity_for(n); >>> 1100: assert(!failure, "Missed Identity optimization opportunity in PhaseIterGVN for %s", n->Name()); >>> 1101: } >> >> The alternative would be to directly assert in the verify methods, but I suppose that would be a bigger code change. >> >> Hmm, I did see some cases in the verify methods that are maybe not directly "missed optimization opportunity" but some other kind of issue. Maybe we should assert directly for those, rather than returning and ending up at this assert. > > Look at: > `Ideal optimization did not make progress but created new unused nodes.` > And > `Ideal optimization did not make progress but node hash changed.` > > That's all I could find now, but you should double check ;) > The alternative would be to directly assert in the verify methods, but I suppose that would be a bigger code change. Yes, I also considered it. I don't really have a strong opinion, but maybe you do. Asserting directly in the verify methods would allow us to have more targeted asserts, and more accurate reports for triaging. On the other side, as you mentioned, this would be more code changes. > Hmm, I did see some cases in the verify methods that are maybe not directly "missed optimization opportunity" but some other kind of issue. Maybe we should assert directly for those, rather than returning and ending up at this assert. These are not labelled as such in the printing, but I would argue these are still missed optimization opportunities, aren't they? I mean if things are still moving when calling `Ideal`, it means that this could have been done earlier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2541071132 From mchevalier at openjdk.org Wed Nov 19 08:55:07 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 19 Nov 2025 08:55:07 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: References: Message-ID: > This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. > > # Analysis > ## Obervationally > ### IGVN > During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: > > in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) > in(2): null > > We compute the join (HS' meet): > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 > > t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > But the current `_type` (of the `PhiNode` as a `TypeNode`) is > > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > > We filter `t` by `_type` > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 > and we get > > ft=java/lang/Object * > > which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 > and > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 > > > ### Verification > On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time > > _type=java/lang/Object * > > and so after filtering `t` by (new) `_type` and we get > > ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. > > ## But why?! > ### Details on type computation > In short, we are doing > > t = typeof(in(1)) / typeof(in(2)) > ft = t /\ _type (* IGVN *) > ft' = t /\ ft (* Verification *) > > and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". > > To me, the surprising fact was that the intersection > > java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > /\ > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > ~> > java/lang/Object * > > What happened to ... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Fix bug number ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28331/files - new: https://git.openjdk.org/jdk/pull/28331/files/818895ec..7ac02cbf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28331/head:pull/28331 PR: https://git.openjdk.org/jdk/pull/28331 From rcastanedalo at openjdk.org Wed Nov 19 09:01:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 19 Nov 2025 09:01:52 GMT Subject: RFR: 8372097: C2: PhasePrintLevel requires setting PrintPhaseLevel explicitly to be active Message-ID: <3lLhDPNgImbNIz-0CGOOmSh8IGA-jwXjVb045IiJw8Q=.e417355f-87ba-4900-a11a-d37f39ab4ec9@github.com> This changeset aligns the behavior of `PrintPhaseLevel` with its description in `c2_globals.hpp` in the default case of `-XX:PrintPhaseLevel=0`. In particular, after the changeset, running `java -XX:CompileCommand=PhasePrintLevel,*::*,N` does print the phase names corresponding to level `N` for the matched methods, as expected: $ java -Xbatch -XX:CompileCommand=PhasePrintLevel,java.lang.StringLatin1::equals,2 CompileCommand: PhasePrintLevel java/lang/StringLatin1.equals intx PhasePrintLevel = 2 1. After Parsing 2. Iter GVN 1 3. Incremental Inline 4. Incremental Boxing Inline 5. Before Loop Optimizations 6. PhaseIdealLoop 1 7. PhaseIdealLoop 2 ... The changeset makes the behavior of the `PrintPhaseLevel` flag and `PhasePrintLevel` compile command consistent with the behavior of the pre-existing, analogous `PrintIdealGraphLevel` flag and `IGVPrintLevel` compile command. The changeset adds tests covering and documenting different combinations of flag and compile-command-specified print levels, and fixes a typo in the flag description in `c2_globals.hpp`. **Testing:** tier1-3 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). ------------- Commit messages: - Restrict compilation to test method, add bug number - Fix typo in description of PrintPhaseLevel - Add tests - Relax condition in Compile::should_print_phase Changes: https://git.openjdk.org/jdk/pull/28386/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28386&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372097 Stats: 112 lines in 3 files changed: 110 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28386/head:pull/28386 PR: https://git.openjdk.org/jdk/pull/28386 From mchevalier at openjdk.org Wed Nov 19 08:59:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 19 Nov 2025 08:59:03 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: <_o15-0h9_Lrf_n8VTWbumbT3ulKT6zOBLJ3E-dqIdvQ=.b2a914ad-c518-420e-b19a-f018aa0cbd41@github.com> References: <_o15-0h9_Lrf_n8VTWbumbT3ulKT6zOBLJ3E-dqIdvQ=.b2a914ad-c518-420e-b19a-f018aa0cbd41@github.com> Message-ID: On Tue, 18 Nov 2025 20:26:38 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug number > > test/hotspot/jtreg/compiler/igvn/ClashingSpeculativeTypePhiNode.java line 27: > >> 25: /** >> 26: * @test >> 27: * @bug 8360561 > > Suggestion: > > * @bug 8371716 > > Or was the other number on purpose? It wasn't. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541108733 From mdoerr at openjdk.org Wed Nov 19 09:08:34 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 19 Nov 2025 09:08:34 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v3] In-Reply-To: References: <-srlt_N0wyBwCwOmZTJBdGNFm66doGBHr6Yx83pqSpQ=.be8e4031-f542-49ad-8271-ac9a2c8b9128@github.com> <8NWK4oQBOjqg1Z7D-NsWuWn_ZhU9E6jWSWedJJQSJ08=.b0c9ef61-7880-4300-a90a-9b89d0a1ec8f@github.com> <4RdGgD3PA7s5RYgaZsHA-V2pgqh8BrP19FczkmVYDbM=.f9cb08ec-2fb7-4af6-9ad2-f232fb4a9004@github.com> Message-ID: On Tue, 18 Nov 2025 21:40:01 GMT, Valerie Peng wrote: >> Thanks for looking at it! I've merged my additional proposal. `K` is removed, now. Does the Java part look ok? I'll ask for a hotspot review once the Java part is fine. > > Java part looks fine to me. Thanks for reviewing! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2541140164 From mhaessig at openjdk.org Wed Nov 19 09:11:08 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 19 Nov 2025 09:11:08 GMT Subject: RFR: 8372097: C2: PhasePrintLevel requires setting PrintPhaseLevel explicitly to be active In-Reply-To: <3lLhDPNgImbNIz-0CGOOmSh8IGA-jwXjVb045IiJw8Q=.e417355f-87ba-4900-a11a-d37f39ab4ec9@github.com> References: <3lLhDPNgImbNIz-0CGOOmSh8IGA-jwXjVb045IiJw8Q=.e417355f-87ba-4900-a11a-d37f39ab4ec9@github.com> Message-ID: On Wed, 19 Nov 2025 08:51:58 GMT, Roberto Casta?eda Lozano wrote: > This changeset aligns the behavior of `PrintPhaseLevel` with its description in `c2_globals.hpp` in the default case of `-XX:PrintPhaseLevel=0`. In particular, after the changeset, running `java -XX:CompileCommand=PhasePrintLevel,*::*,N` does print the phase names corresponding to level `N` for the matched methods, as expected: > > > $ java -Xbatch -XX:CompileCommand=PhasePrintLevel,java.lang.StringLatin1::equals,2 > CompileCommand: PhasePrintLevel java/lang/StringLatin1.equals intx PhasePrintLevel = 2 > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > ... > > > The changeset makes the behavior of the `PrintPhaseLevel` flag and `PhasePrintLevel` compile command consistent with the behavior of the pre-existing, analogous `PrintIdealGraphLevel` flag and `IGVPrintLevel` compile command. The changeset adds tests covering and documenting different combinations of flag and compile-command-specified print levels, and fixes a typo in the flag description in `c2_globals.hpp`. > > **Testing:** tier1-3 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). Thank you for taking the time to fix this, @robcasloz! I learned a lot from the test you wrote. This looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28386#pullrequestreview-3481593327 From mdoerr at openjdk.org Wed Nov 19 09:13:42 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 19 Nov 2025 09:13:42 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v4] In-Reply-To: <0HJnSUSQA8RuwnNxu-SiGvZTzHYLJ5kY0_B6lG2EbAQ=.10868fac-1516-4a80-b4e5-9ff14997ba01@github.com> References: <0HJnSUSQA8RuwnNxu-SiGvZTzHYLJ5kY0_B6lG2EbAQ=.10868fac-1516-4a80-b4e5-9ff14997ba01@github.com> Message-ID: <-IeLB8uRff2Hu6rXMg_C1kr1vF46RUzULxFjMastVE8=.1aa135cc-38b1-4c37-966c-59cc51d300f5@github.com> On Tue, 18 Nov 2025 21:48:12 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Remove K from AES_Crypt @vnkozlov: May I ask you to take a look at the C2 part? I had to adapt the library_call code to the new Java implementation which stores the key in "sessionKe" and "sessionKd", now. I think the hotspot part is also more comprehensive this way because it makes it clear which key is used for what. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3551612586 From chagedorn at openjdk.org Wed Nov 19 09:22:04 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Nov 2025 09:22:04 GMT Subject: RFR: 8372097: C2: PhasePrintLevel requires setting PrintPhaseLevel explicitly to be active In-Reply-To: <3lLhDPNgImbNIz-0CGOOmSh8IGA-jwXjVb045IiJw8Q=.e417355f-87ba-4900-a11a-d37f39ab4ec9@github.com> References: <3lLhDPNgImbNIz-0CGOOmSh8IGA-jwXjVb045IiJw8Q=.e417355f-87ba-4900-a11a-d37f39ab4ec9@github.com> Message-ID: <9D8A-K71caLc9h26vduBNM0h_lHNdoEy35AF8YzrGTM=.d4e7c977-eac9-4aa9-8057-9f5e5436aa06@github.com> On Wed, 19 Nov 2025 08:51:58 GMT, Roberto Casta?eda Lozano wrote: > This changeset aligns the behavior of `PrintPhaseLevel` with its description in `c2_globals.hpp` in the default case of `-XX:PrintPhaseLevel=0`. In particular, after the changeset, running `java -XX:CompileCommand=PhasePrintLevel,*::*,N` does print the phase names corresponding to level `N` for the matched methods, as expected: > > > $ java -Xbatch -XX:CompileCommand=PhasePrintLevel,java.lang.StringLatin1::equals,2 > CompileCommand: PhasePrintLevel java/lang/StringLatin1.equals intx PhasePrintLevel = 2 > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > ... > > > The changeset makes the behavior of the `PrintPhaseLevel` flag and `PhasePrintLevel` compile command consistent with the behavior of the pre-existing, analogous `PrintIdealGraphLevel` flag and `IGVPrintLevel` compile command. The changeset adds tests covering and documenting different combinations of flag and compile-command-specified print levels, and fixes a typo in the flag description in `c2_globals.hpp`. > > **Testing:** tier1-3 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). Looks good! Not something for this PR: What I find confusing is the non-matching flag and compile command name: - `PrintPhaseLevel` vs. `PhasePrintLevel` - `PrintIdealGraphLevel` vs `IGVPrintLevel` I would advocate to use matching names. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28386#pullrequestreview-3481653131 From epeter at openjdk.org Wed Nov 19 09:27:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Nov 2025 09:27:35 GMT Subject: RFR: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification [v2] In-Reply-To: References: Message-ID: <_Em1rZENoaouplOETRTHExOFh1Q6gADO_jgloNgzWl8=.4fd56cf9-e17a-42b8-a6ce-b76bffac46f2@github.com> On Wed, 19 Nov 2025 03:14:52 GMT, Fei Yang wrote: >> Hi, please consider this test-only change fixing an IR test failure. >> >> This IR test fails on platforms without native support for `Float16`. The reason is that method `Float::floatToFloat16` is inlined into method `TestSubNodeFloatDoubleNegation.testHalfFloat`, which causes unexpected IR graph. One way to fix this would be disabling inlining of methods from the `java.lang.Float` class. >> >> After this change, we are doing `CallStaticJava` to convert between `Float16` and `Float` on these platforms: >> >> ...... >> 259 CallStaticJava === 5 6 7 8 1 (22 1 669 1 10 1 22 ) [[ 260 261 262 264 ]] # Static java.lang.Float::float16ToFloat float ( int ) Float16::floatValue @ bci:4 (line 876) Float16::subtract @ b ci:1 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 69) !jvms: Float16::floatValue @ bci:4 (line 876) Float16::subtract @ bci:1 (line 1185) TestSubNodeFloatDoubleNegation: :testHalfFloat @ bci:9 (line 69) >> >> ...... >> >> 562 CallStaticJava === 553 507 538 8 1 (526 1 1 1 1 559 559 ) [[ 563 564 565 567 ]] # Static java.lang.Float::floatToFloat16 short ( float ) Float16::valueOf @ bci:5 (line 361) Float16::subtra ct @ bci:9 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 67) !jvms: Float16::valueOf @ bci:5 (line 361) Float16::subtract @ bci:9 (line 1185) TestSubNodeFloatDoubleNegat ion::testHalfFloat @ bci:12 (line 67) >> >> ...... >> >> >> Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Review Thanks for the updates :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28364#pullrequestreview-3481682769 From bmaillard at openjdk.org Wed Nov 19 09:28:13 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 19 Nov 2025 09:28:13 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 08:30:56 GMT, Galder Zamarre?o wrote: > Trivial cleanup to move tests out of a test class whose description does not match these tests Looks good to me, thanks for making the change @galderz! I have also submitted internal testing just in case. ------------- Marked as reviewed by bmaillard (Committer). PR Review: https://git.openjdk.org/jdk/pull/28385#pullrequestreview-3481678220 From epeter at openjdk.org Wed Nov 19 09:35:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Nov 2025 09:35:15 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Wed, 19 Nov 2025 08:44:16 GMT, Beno?t Maillard wrote: >> Look at: >> `Ideal optimization did not make progress but created new unused nodes.` >> And >> `Ideal optimization did not make progress but node hash changed.` >> >> That's all I could find now, but you should double check ;) > >> The alternative would be to directly assert in the verify methods, but I suppose that would be a bigger code change. > > Yes, I also considered it. I don't really have a strong opinion, but maybe you do. Asserting directly in the verify methods would allow us to have more targeted asserts, and more accurate reports for triaging. On the other side, as you mentioned, this would be more code changes. > >> Hmm, I did see some cases in the verify methods that are maybe not directly "missed optimization opportunity" but some other kind of issue. Maybe we should assert directly for those, rather than returning and ending up at this assert. > > These are not labelled as such in the printing, but I would argue these are still missed optimization opportunities, aren't they? I mean if things are still moving when calling `Ideal`, it means that this could have been done earlier. Honestly, I would do the changes, and just assert in the specific methods. That also helps us with more precise stack traces. The required changes are not that large and surely not that complicated, and we may actually end up with less code over all. `Ideal optimization did not make progress but created new unused nodes.` This one could get triggered even if we don't make progress at all. It may be that some `Ideal` optimization always generates nodes but then does not actually insert them into the graph. That could be considered wasteful, and that is really all that assert tells us. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2541229658 From mli at openjdk.org Wed Nov 19 09:54:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Nov 2025 09:54:51 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: Message-ID: <9oPWTWflnwws0wxHBP58IiQRIZz4Tt5bthr7RiC3BE0=.94d60901-8fad-4597-9e55-c669de73a8e6@github.com> On Wed, 19 Nov 2025 07:22:28 GMT, Anjian Wen wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2676: >> >>> 2674: // Compute #rounds for AES based on the length of the key array >>> 2675: __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT))); >>> 2676: __ mv(t0, 52); >> >> what's this `52`? I see it also in `generate_aescrypt_encryptBlock`, do they mean similar things? >> Can you add some comment about it? and give a name rather than use the magic number. > > key length could be only {11, 13, 15} * 4 = {44, 52, 60}?I notice that x86 and aarch64 use directly 52?I think add some more comment will be enough? Can you add some comments in other existing code with magic 52 if they mean the same thing? Thanks! >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2748: >> >>> 2746: }; >>> 2747: >>> 2748: __ vsetivli(x0, 4, Assembler::e32, Assembler::m1); >> >> A general question, can we make it bigger than `4`, or even `m2`? > > This is a good question! I spent a relatively long time on it earlier. > Initially, I tried m2 and m4. In the case of m4, I had already developed a version passed all the test(which really take a long time to test), it seems faster, but since the Java API supports non-complete block data encrypt or decrypt, it is difficult to ensure the time for counter increment is consistent under various circumstances, which may pose a security risk thanks to the remind of Andrew. Additionally, the Java API allows the counter to grow up to 128 bits, RV does not have a very suitable vector 128-bit add currently. Using other types such as 64-bit requires consideration of the overflow issue, and using a version higher than m1 makes it more harder to ensure the time for each counter increment. Based on these considerations, I selected m1. Thank you for explanation! >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2813: >> >>> 2811: __ bind(L_exit); >>> 2812: __ sw(used, Address(used_ptr)); >>> 2813: __ mv(x10, input_len); >> >> is this mv necessary? > > it's a return value saved to x10, it seems necessary according to aarch64 and x86, aarch64 used r0 to save it and x86 used rax There is a `mv` before exit of `generate_counterMode_AESCrypt`, is this one still necessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2541293550 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2541301228 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2541300426 From rcastanedalo at openjdk.org Wed Nov 19 10:01:40 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 19 Nov 2025 10:01:40 GMT Subject: RFR: 8371419: IGV: Add view to visualise dominator tree and dominator information In-Reply-To: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> References: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> Message-ID: On Thu, 13 Nov 2025 11:59:20 GMT, Damon Fenacci wrote: > This change introduces a dominator tree view in IGV?s CFG panel, enabling users to toggle between the control flow graph and the dominator tree. This makes dominator relationships easier to inspect than the current stdout-based output (`-XX:+PrintDominators`). > > ## Motivation > * Today, dominator information is difficult to access (e.g. via `-XX:+PrintDominators`, which is hard to read and correlate with the graph). > * IGV already computes dominators for some phases but does not visualize them. > * Comparing dominator trees across graphs/phases was not supported. > > ## What?s New > 1. Toggle in the CFG view (toolbar button (image) to switch between: > * Control Flow Graph (CFG) > * Dominator Tree > 2. Dominator edge coloring to indicate provenance: > * Blue: dominator info provided by C2 (from GCM phase onward for now, a follow RFE will handle loop optimization dominator information) > * Red: dominator info computed by IGV (pre-GCM) > 3. Graph comparison enhancements: > * Compare dominator trees between graphs (new) > * Compare CFG differences between graphs (previously missing) > 4. Node annotations: > * `idom`: immediate dominator > * `dom_depth`: dominator depth > * `block`: numeric block ID for all nodes in a block > > The resulting main view looks like this: > Screenshot 2025-11-13 at 15 04 12 > > ## Testing > * Tier 1-3 > * Manual testing in IGV Thank you for this work Damon, this looks very useful! I have a few high-level comments: - I agree with @dlunde's comment, as a user I think the dominator tree should be a separate view and not a "mode" of the CFG view. If you do that, please do not forget to extend the combo box in the Options window with the option to select the dominator tree view by default. - Would it be possible to avoid dumping dominator information as node properties, to reduce the size of the graph dump? The block property information that you already dump should be enough for your purposes, no? If you want to show dominator information as node properties, you can instead propagate the information from blocks to their nodes in `ServerCompilerPreProcessor::preProcess()`, similarly to how it is done for liveness information. - I like the idea of distinguishing visually when control-flow information originates from HotSpot and when it is approximated by IGV, currently we just rely on the user implicitly knowing this, which is confusing and error-prone. However, there is an issue with your proposal: once the graph is saved into a file (from IGV) the information is lost, and when the graph is re-opened all dominator trees are shown as originating from HotSpot (blue edges). If we want to do this, I think we need to explicitly reflect in the serialized XML format whether control-flow information is approximated or not. Further, the representation of HotSpot/IGV origin should be consistent between the CFG and dominator tree views. In short, I think this is a great and much-needed IGV feature, but one that would require substantial work to get right, so my suggestion would be leaving it out of the scope of this RFE and creating a separate RFE just for it. What do you think? ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28293#pullrequestreview-3481830084 From dlunden at openjdk.org Wed Nov 19 10:01:41 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 19 Nov 2025 10:01:41 GMT Subject: RFR: 8371419: IGV: Add view to visualise dominator tree and dominator information In-Reply-To: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> References: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> Message-ID: On Thu, 13 Nov 2025 11:59:20 GMT, Damon Fenacci wrote: > This change introduces a dominator tree view in IGV?s CFG panel, enabling users to toggle between the control flow graph and the dominator tree. This makes dominator relationships easier to inspect than the current stdout-based output (`-XX:+PrintDominators`). > > ## Motivation > * Today, dominator information is difficult to access (e.g. via `-XX:+PrintDominators`, which is hard to read and correlate with the graph). > * IGV already computes dominators for some phases but does not visualize them. > * Comparing dominator trees across graphs/phases was not supported. > > ## What?s New > 1. Toggle in the CFG view (toolbar button (image) to switch between: > * Control Flow Graph (CFG) > * Dominator Tree > 2. Dominator edge coloring to indicate provenance: > * Blue: dominator info provided by C2 (from GCM phase onward for now, a follow RFE will handle loop optimization dominator information) > * Red: dominator info computed by IGV (pre-GCM) > 3. Graph comparison enhancements: > * Compare dominator trees between graphs (new) > * Compare CFG differences between graphs (previously missing) > 4. Node annotations: > * `idom`: immediate dominator > * `dom_depth`: dominator depth > * `block`: numeric block ID for all nodes in a block > > The resulting main view looks like this: > Screenshot 2025-11-13 at 15 04 12 > > ## Testing > * Tier 1-3 > * Manual testing in IGV Very nice @dafedafe, could have saved me a lot of time over many past issues! Quick comment: I think that the dominator tree view should be a separate "view" (just right of the CFG view button) instead of a "mode" as you suggest. (I'm not sure about the exact IGV terminology). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28293#issuecomment-3542267772 From mli at openjdk.org Wed Nov 19 09:59:30 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Nov 2025 09:59:30 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: <5HbBb-mjtZWqWTu-HQe7KrRyHG5z-UK4rbVhMzLv4bw=.b1b7e986-dbcf-4ab0-86b4-513f3f1f91ae@github.com> Message-ID: On Wed, 19 Nov 2025 04:56:10 GMT, Anjian Wen wrote: > > Some more comments and questions. > > Thanks for the careful reviews! I will check the comments and reply one by one later Thanks! Overall looks good, I'll have another by this weekend. Thanks for your patience! >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2670: >> >>> 2668: __ enter(); >>> 2669: >>> 2670: Label L_EXIT; >> >> Suggestion: >> >> Label L_exit; > > I try to make it different from the L_exit in counterMode_AESCrypt function, should I change this to L_exit2 or L_exit_main? The labels are in different method, should be fine with same name? I'm not quite sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3551824410 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2541309073 From epeter at openjdk.org Wed Nov 19 10:15:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Nov 2025 10:15:44 GMT Subject: RFR: 8371674: C2 fails with Missed optimization opportunity in PhaseIterGVN for MoveL2D [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 08:31:05 GMT, Beno?t Maillard wrote: >> This PR addresses yet another missed optimization in `PhaseIterGVN`. The way this optimization is triggered is a bit different this time though, and the notification is missing in `Node::has_special_unique_user`. >> >> ## Analysis >> >> The affected optimization is the transformation of `MoveX2Y (LoadX mem)` into `LoadY mem`. This is implemented in `MoveNode::Ideal`. The optimization is as follows: >> >> ```c++ >> // Fold reinterpret cast into memory operation: >> // MoveX2Y (LoadX mem) => LoadY mem >> LoadNode* ld = in(1)->isa_Load(); >> if (ld != nullptr && (ld->outcnt() == 1)) { // replace only >> const Type* rt = bottom_type(); >> if (ld->has_reinterpret_variant(rt)) { >> if (phase->C->post_loop_opts_phase()) { >> return ld->convert_to_reinterpret_load(*phase, rt); >> } else { >> // attempt the transformation once loop opts are over >> phase->C->record_for_post_loop_opts_igvn(this); >> } >> } >> } >> >> >> The optimization is triggered only if the input is a `LoadNode` and the `MoveNode` is its only user. This is a relatively unusual pattern. >> >> The bug was found by the fuzzer. At some point during IGVN, we have the following subgraph: >> >> >> CountedLoop LoadL >> \ / \ >> Phi MoveL2D >> >> In `RegionNode::Ideal`, we end up calling `set_req_X` on the `Phi` node to delete the edge from the `Phi` node to `LoadL`. As a result, the `LoadL` node only has one user left, and the `MoveNode::Ideal` gets triggered at the next verification pass. >> >> ## Proposed Solution >> >> Add this particular case to `Node::has_special_unique_user`, which gets called by `Node::set_req_X`. >> >> ## Summary of changes >> >> This PR brings the following changes: >> - Detect the optimization pattern in `Node::has_special_unique_user`. >> - Add new test `TestMissingOptMoveX2YLoadX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. I tried to get a reproducer for each of the `Move` nodes, but I was only able to get one for `MoveL2D`. >> >> ### Testing >> >> - [x] https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371674 >> - [x] tier1-4, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Rename test and add comment Looks good to me, thanks for working on this :) I've been wondering how we could well test and reproduce all these issues in the past. One idea was to have some sort of special `OpaqueDelayNode` that would fold away in a very specific phase, or maybe at a random time. For example, during post-loop-opts, and then it would exactly trigger your condition here. That would allow us to even have IR rules, and make sure the fix really keeps on working. I had once filed this: [JDK-8357805](https://bugs.openjdk.org/browse/JDK-8357805). ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28290#pullrequestreview-3481884291 From aseoane at openjdk.org Wed Nov 19 10:45:59 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 19 Nov 2025 10:45:59 GMT Subject: RFR: 8213762: Deprecate Xmaxjitcodesize In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 14:57:30 GMT, Anton Seoane Ampudia wrote: > This PR deprecates the `Xmaxjitcodesize` flag in JDK 26. Please see the CSR for specific details on why this flag is being deprecated and workarounds for users interested in keeping similar behaviour in the future. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28297#issuecomment-3552010483 From duke at openjdk.org Wed Nov 19 10:46:00 2025 From: duke at openjdk.org (duke) Date: Wed, 19 Nov 2025 10:46:00 GMT Subject: RFR: 8213762: Deprecate Xmaxjitcodesize In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 14:57:30 GMT, Anton Seoane Ampudia wrote: > This PR deprecates the `Xmaxjitcodesize` flag in JDK 26. Please see the CSR for specific details on why this flag is being deprecated and workarounds for users interested in keeping similar behaviour in the future. @anton-seoane Your change (at version 07c96057a2a77ee3e59c93cf47b102ebc0033464) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28297#issuecomment-3552016511 From epeter at openjdk.org Wed Nov 19 10:59:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Nov 2025 10:59:42 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: References: Message-ID: <593ZadAybr1t8JXOULxOVn9l39MkSmTevA84NbZT-VU=.25801233-004b-4e14-a688-0d883efb0d7a@github.com> On Wed, 19 Nov 2025 10:31:53 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug number > > src/hotspot/share/opto/cfgnode.cpp line 1360: > >> 1358: // same (union of input types), but the new `_type` has now no speculative type, the result of `t->filter_speculative(_type)` >> 1359: // has the speculative type of `t` (if it's not removed because e.g. the resulting type is exact and non null) and not empty >> 1360: // (like the previously returned type). In such a case, doing the filtering one time more allows to reach a fixpoint. > >> From that `ft` has empty speculative type > > I'm not very familiar with speculative types. Does "empty speculative" == TOP speculative type? Or rather "no speculative type", which essencially means it is BOTTOM type? > > Because then if we filter x with TOP we should still get TOP, but if we filter with BOTTOM we get x. And that would fit better with your statement later on: > >> but the new `_type` has now no speculative type, the result of `t->filter_speculative(_type)` has the speculative type of `t` > > Can you clarify please for my understanding? :) And does `cleanup_speculative` happen during `t->filter_speculative(_type)`, right? > src/hotspot/share/opto/cfgnode.cpp line 1365: > >> 1363: ft = t->filter_speculative(first_ft); >> 1364: #ifdef ASSERT >> 1365: // The following logic has been moved into TypeOopPtr::filter. > > Why does this mean? What logic are you referring to? The one here? But then you say it was moved to TypeOopPtr::filter ...but it is still here? Can you clarify? Or are you saying it is moved "from" rather than "into", i.e. this is some sort of code duplication? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541436857 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541500034 From epeter at openjdk.org Wed Nov 19 10:59:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Nov 2025 10:59:43 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: <593ZadAybr1t8JXOULxOVn9l39MkSmTevA84NbZT-VU=.25801233-004b-4e14-a688-0d883efb0d7a@github.com> References: <593ZadAybr1t8JXOULxOVn9l39MkSmTevA84NbZT-VU=.25801233-004b-4e14-a688-0d883efb0d7a@github.com> Message-ID: <5I961dECB28VAaa0iBILYoDQhlXt7CKzQlRVUabEwUc=.a06bd578-a810-4529-85c4-4bf1cde26af1@github.com> On Wed, 19 Nov 2025 10:34:06 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/cfgnode.cpp line 1360: >> >>> 1358: // same (union of input types), but the new `_type` has now no speculative type, the result of `t->filter_speculative(_type)` >>> 1359: // has the speculative type of `t` (if it's not removed because e.g. the resulting type is exact and non null) and not empty >>> 1360: // (like the previously returned type). In such a case, doing the filtering one time more allows to reach a fixpoint. >> >>> From that `ft` has empty speculative type >> >> I'm not very familiar with speculative types. Does "empty speculative" == TOP speculative type? Or rather "no speculative type", which essencially means it is BOTTOM type? >> >> Because then if we filter x with TOP we should still get TOP, but if we filter with BOTTOM we get x. And that would fit better with your statement later on: >> >>> but the new `_type` has now no speculative type, the result of `t->filter_speculative(_type)` has the speculative type of `t` >> >> Can you clarify please for my understanding? :) > > And does `cleanup_speculative` happen during `t->filter_speculative(_type)`, right? I wonder if a minimal example would help here. I'm thinking something like this: In rare cases, `_type` and `t` have incompatible opinion on speculative type, resulting into a too small intersection t: Object (A) _type: Object (B) We filter them. Since A and B have no intersection, the speculative type is removed. This means the speculative type is implicitly "Object", and not TOP, as the intersection of A and B would suggest. ft = t->filter_speculative(_type) = Object After PhiNode::Value, we assign _type = ft. During verification, we run PhiNode::Value again, but this time: t: Object (A) // same as above _type: Object // ft from above And now we get: ft = t->filter_speculative(_type) = Object (A) ... argue about fixpoint next ... It is a first proposal, and a little verbose... maybe you could find something more concise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541467689 From epeter at openjdk.org Wed Nov 19 10:59:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Nov 2025 10:59:41 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 08:55:07 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". >> >> To me, the surprising fact was that the intersection >> >> java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> /\ >> _type=java/lang/Objec... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug number Thanks for the update. I now had a closer look, and got a little confused in multiple places. I'm not sure I have the right mental model yet. I was also a little surprised by the fact that we do have inconsistent profiling. But maybe that should not be surprising? That's where a closer look at the reproducer would help me follow, and annotations in the test itself would be fantastic ;) I have not yet looked at all the debug code in-depth, as I'd like to first understand the rest. src/hotspot/share/opto/cfgnode.cpp line 1357: > 1355: // In rare cases, `_type` and `t` have incompatible opinion on speculative type, resulting into a too small intersection > 1356: // (such as AnyNull), which is removed in cleanup_speculative. From that `ft` has empty speculative type. After the end > 1357: // of the current `Value` call, `ft` (that is returned) is becoming `_type`. If verification happens then, `t` would be the Suggestion: // of the current `Value` call, `_type` is assigned the value of `ft`. If verification happens then, `t` would be the "becoming" is a bit ambiguous here, I first thought it meant `ft = _type`, then realized one could also read it as `_type = ft`. Maybe you have an even better way to express it. src/hotspot/share/opto/cfgnode.cpp line 1360: > 1358: // same (union of input types), but the new `_type` has now no speculative type, the result of `t->filter_speculative(_type)` > 1359: // has the speculative type of `t` (if it's not removed because e.g. the resulting type is exact and non null) and not empty > 1360: // (like the previously returned type). In such a case, doing the filtering one time more allows to reach a fixpoint. > From that `ft` has empty speculative type I'm not very familiar with speculative types. Does "empty speculative" == TOP speculative type? Or rather "no speculative type", which essencially means it is BOTTOM type? Because then if we filter x with TOP we should still get TOP, but if we filter with BOTTOM we get x. And that would fit better with your statement later on: > but the new `_type` has now no speculative type, the result of `t->filter_speculative(_type)` has the speculative type of `t` Can you clarify please for my understanding? :) src/hotspot/share/opto/cfgnode.cpp line 1365: > 1363: ft = t->filter_speculative(first_ft); > 1364: #ifdef ASSERT > 1365: // The following logic has been moved into TypeOopPtr::filter. Why does this mean? What logic are you referring to? The one here? But then you say it was moved to TypeOopPtr::filter ...but it is still here? Can you clarify? src/hotspot/share/opto/cfgnode.cpp line 1367: > 1365: // The following logic has been moved into TypeOopPtr::filter. > 1366: const Type* jt = t->join_speculative(first_ft); > 1367: if (jt->empty()) { // Emptied out??? Suggestion: if (jt->empty()) { Comment seems redundant. src/hotspot/share/opto/cfgnode.cpp line 1375: > 1373: else { > 1374: > 1375: if (jt != ft && jt->base() == ft->base()) { Formatting looks a bit funny here. ------------- PR Review: https://git.openjdk.org/jdk/pull/28331#pullrequestreview-3481920110 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541392783 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541429817 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541495880 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541500946 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541492662 From epeter at openjdk.org Wed Nov 19 11:02:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Nov 2025 11:02:42 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v4] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 19:04:56 GMT, Aleksey Shipilev wrote: >> I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. >> >> At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. >> >> It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. >> >> Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails >> - [x] Linux x86_64 server fastdebug, `all` tests pass >> - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8371581-ccp-spooky-nodes > - More comments > - More restrictive CmpP check > - Tighten up comments and signatures > - Do Value() once > - Fix It looks reasonable to me, thanks for working on this! We have some testing running right now, I'll approve once that passes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28288#issuecomment-3552077753 From mchevalier at openjdk.org Wed Nov 19 11:04:48 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 19 Nov 2025 11:04:48 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: <593ZadAybr1t8JXOULxOVn9l39MkSmTevA84NbZT-VU=.25801233-004b-4e14-a688-0d883efb0d7a@github.com> References: <593ZadAybr1t8JXOULxOVn9l39MkSmTevA84NbZT-VU=.25801233-004b-4e14-a688-0d883efb0d7a@github.com> Message-ID: On Wed, 19 Nov 2025 10:54:38 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/cfgnode.cpp line 1365: >> >>> 1363: ft = t->filter_speculative(first_ft); >>> 1364: #ifdef ASSERT >>> 1365: // The following logic has been moved into TypeOopPtr::filter. >> >> Why does this mean? What logic are you referring to? The one here? But then you say it was moved to TypeOopPtr::filter ...but it is still here? Can you clarify? > > Or are you saying it is moved "from" rather than "into", i.e. this is some sort of code duplication? This block is a duplication of the existing one, to go after the second filtering as @dafedafe noticed. I have no idea what is the comment for. As I told @dafedafe, if it stays, I'll factor it out, but it is premature cleanup at this point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541524849 From epeter at openjdk.org Wed Nov 19 11:04:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Nov 2025 11:04:49 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 08:55:07 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". >> >> To me, the surprising fact was that the intersection >> >> java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> /\ >> _type=java/lang/Objec... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug number test/hotspot/jtreg/compiler/igvn/ClashingSpeculativeTypePhiNode.java line 36: > 34: * -XX:CompileCommand=quiet > 35: * -XX:TypeProfileLevel=222 > 36: * -XX:+AlwaysIncrementalInline Suggestion: * -XX:+AlwaysIncrementalInline GitHub Actions is giving us this: Error: VM option 'AlwaysIncrementalInline' is develop and is available only in debug version of VM. Improperly specified VM option 'AlwaysIncrementalInline' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Probably this would help: `-XX:+IgnoreUnrecognizedVMOptions` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541516825 From mchevalier at openjdk.org Wed Nov 19 11:09:56 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 19 Nov 2025 11:09:56 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: <5I961dECB28VAaa0iBILYoDQhlXt7CKzQlRVUabEwUc=.a06bd578-a810-4529-85c4-4bf1cde26af1@github.com> References: <593ZadAybr1t8JXOULxOVn9l39MkSmTevA84NbZT-VU=.25801233-004b-4e14-a688-0d883efb0d7a@github.com> <5I961dECB28VAaa0iBILYoDQhlXt7CKzQlRVUabEwUc=.a06bd578-a810-4529-85c4-4bf1cde26af1@github.com> Message-ID: On Wed, 19 Nov 2025 10:44:09 GMT, Emanuel Peter wrote: >> And does `cleanup_speculative` happen during `t->filter_speculative(_type)`, right? > > I wonder if a minimal example would help here. I'm thinking something like this: > > In rare cases, `_type` and `t` have incompatible opinion on speculative type, resulting into a too small intersection > t: Object (A) > _type: Object (B) > We filter them. Since A and B have no intersection, the speculative type is removed. This means the speculative type is implicitly "Object", and not TOP, as the intersection of A and B would suggest. > ft = t->filter_speculative(_type) = Object > > After PhiNode::Value, we assign _type = ft. During verification, we run PhiNode::Value again, but this time: > t: Object (A) // same as above > _type: Object // ft from above > And now we get: > ft = t->filter_speculative(_type) = Object (A) > > ... argue about fixpoint next ... > > It is a first proposal, and a little verbose... maybe you could find something more concise. If we have no speculative type, it means we don't have a guess about what the type could be (more precisely than the actual type). You can say it's bottom, or that it's the same as the non-speculative type. And indeed, if you do `t->filter_speculative(_type)` when `_type` has no speculative type, the result has the same speculative type as `t` except when this one doesn't already pass `cleanup_speculative`, but then, it's still a fixpoint. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541539790 From mchevalier at openjdk.org Wed Nov 19 11:20:26 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 19 Nov 2025 11:20:26 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: References: <593ZadAybr1t8JXOULxOVn9l39MkSmTevA84NbZT-VU=.25801233-004b-4e14-a688-0d883efb0d7a@github.com> <5I961dECB28VAaa0iBILYoDQhlXt7CKzQlRVUabEwUc=.a06bd578-a810-4529-85c4-4bf1cde26af1@github.com> Message-ID: On Wed, 19 Nov 2025 11:07:38 GMT, Marc Chevalier wrote: >> I wonder if a minimal example would help here. I'm thinking something like this: >> >> In rare cases, `_type` and `t` have incompatible opinion on speculative type, resulting into a too small intersection >> t: Object (A) >> _type: Object (B) >> We filter them. Since A and B have no intersection, the speculative type is removed. This means the speculative type is implicitly "Object", and not TOP, as the intersection of A and B would suggest. >> ft = t->filter_speculative(_type) = Object >> >> After PhiNode::Value, we assign _type = ft. During verification, we run PhiNode::Value again, but this time: >> t: Object (A) // same as above >> _type: Object // ft from above >> And now we get: >> ft = t->filter_speculative(_type) = Object (A) >> >> ... argue about fixpoint next ... >> >> It is a first proposal, and a little verbose... maybe you could find something more concise. > > If we have no speculative type, it means we don't have a guess about what the type could be (more precisely than the actual type). You can say it's bottom, or that it's the same as the non-speculative type. And indeed, if you do `t->filter_speculative(_type)` when `_type` has no speculative type, the result has the same speculative type as `t` except when this one doesn't already pass `cleanup_speculative`, but then, it's still a fixpoint. I don't see much value in your "minimal example". Isn't it just naming "A" and "B" instead of saying "speculative type of `t`" and "speculative type of `_type`"? I'm not fan of that: that just introduce more symbols, more names that are not in the code. Also I'm very careful with examples, they are often misleading by giving a feeling that it's exactly how it happens. For instance `Since A and B have no intersection` is not always true, it just needs the intersection to be above centerline, a small enough intersection. I might rephrase later, but it seems you got what is happening and I first need a decision on the solution before spending time in cleanup that might be entirely erased. It seems that the fixpoint way was far from consensual. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2541562487 From mchevalier at openjdk.org Wed Nov 19 11:26:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 19 Nov 2025 11:26:03 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 08:55:07 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". >> >> To me, the surprising fact was that the intersection >> >> java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> /\ >> _type=java/lang/Objec... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug number I think reading the code and the comments to understand the situation might not be as good as reading the description of this PR. I regret I gave a reproducer and proposed a solution. Given the very obvious lack of consensus on the Valhalla PR, it is clear that this issue might evolve radically and that the proposed solution may not be the final one. Therefore, I will not do any cleanup before agreeing on the way to go, as it might very well be erased and it would be a very poor use of everybody's time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3552162304 From chagedorn at openjdk.org Wed Nov 19 12:26:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Nov 2025 12:26:32 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge Message-ID: ### Strong Connection between Template Assertion Predicate and Counted Loop In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. #### Maintaining this Property In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 All other opaque nodes are removed. ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 ### Violating the Additional Verification with `-XX:+StressLoopBackedge` In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: image After duplicate backedge, the Template Assertion Predicates are now at the outer non-counted `Loop`: image In `eliminate_useless_predicates()`, we then no longer find these Template Assertion Predicates when walking up from `275 CountedLoop`. But since the counted loop is still in the graph, the additional verification above fails when checking that a useless Template Assertion Predicate is associated with a dead counted loop - which is not the case. ### Solution The solution I propose is to clone the Template Assertion Predicates to the inner counted loop. This can be guarded with an `ifdef ASSERT` because it can only happen with `StressLoopBackedge` which is a develop flag. This is straight forward and solves this "opaque <-> counted loop" mismatching problem. #### Additional Changes - When working on this change, I noticed that the regression test for checking that data control dependencies are correctly updated with Template Assertion Predicates in `TestAssertionPredicates.java` was no longer working (i.e. disabling `TemplateAssertionPredicate::rewire_loop_data_dependencies()` did not crash). It only triggers when running with ZGC (i.e. produce a crash when disabling `rewire_loop_data_dependencies()`). I added an additional jtreg block with that flag setup. - I added an `ifdef ASSERT` block for some code that is only executed for `StressDuplicateBackedge`. - I ran this patch through t1-4 + hs-precheckin-comp + hs-comp-stress, once without `-XX:-StressDuplicateBackedge` and once with. In the latter run, I found that `TestVerifyLoopOptimizationsHitsMemLimit.java` hit the memory limit. This seems expected since we create more loop nodes which results in more verification work. The test already uses quite some memory when run with `VerifyLoopOptimizations`. We now add some more on top which will reach the limit of `100M` set for the test. I propose to just disable this test with `StressDuplicateBackedge`. Note that this also fails before this patch. Thanks, Christian ------------- Commit messages: - Exclude StressDuplicateBackedge for TestVerifyLoopOptimizationsHitsMemLimit.java - 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge Changes: https://git.openjdk.org/jdk/pull/28389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28389&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360510 Stats: 143 lines in 4 files changed: 139 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28389/head:pull/28389 PR: https://git.openjdk.org/jdk/pull/28389 From bmaillard at openjdk.org Wed Nov 19 12:33:25 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 19 Nov 2025 12:33:25 GMT Subject: RFR: 8371674: C2 fails with Missed optimization opportunity in PhaseIterGVN for MoveL2D [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 10:12:06 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename test and add comment > > Looks good to me, thanks for working on this :) > > I've been wondering how we could well test and reproduce all these issues in the past. One idea was to have some sort of special `OpaqueDelayNode` that would fold away in a very specific phase, or maybe at a random time. For example, during post-loop-opts, and then it would exactly trigger your condition here. That would allow us to even have IR rules, and make sure the fix really keeps on working. I had once filed this: [JDK-8357805](https://bugs.openjdk.org/browse/JDK-8357805). @eme64 Thanks for the review! I think that's a great idea, I have also been thinking about this exact problem but I didn't think of intrinsified identity methods. I would be happy to discuss that at some point, I may have some ideas as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28290#issuecomment-3552464210 From bmaillard at openjdk.org Wed Nov 19 12:37:00 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 19 Nov 2025 12:37:00 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: <4IYxeDL_YBvHSgxT9QcMtfSUwb-0EbzwZiQ3cFHEB_M=.57d829ee-9233-4a1e-a315-9bf88ca74a79@github.com> On Wed, 19 Nov 2025 09:30:45 GMT, Emanuel Peter wrote: >>> The alternative would be to directly assert in the verify methods, but I suppose that would be a bigger code change. >> >> Yes, I also considered it. I don't really have a strong opinion, but maybe you do. Asserting directly in the verify methods would allow us to have more targeted asserts, and more accurate reports for triaging. On the other side, as you mentioned, this would be more code changes. >> >>> Hmm, I did see some cases in the verify methods that are maybe not directly "missed optimization opportunity" but some other kind of issue. Maybe we should assert directly for those, rather than returning and ending up at this assert. >> >> These are not labelled as such in the printing, but I would argue these are still missed optimization opportunities, aren't they? I mean if things are still moving when calling `Ideal`, it means that this could have been done earlier. > > Honestly, I would do the changes, and just assert in the specific methods. That also helps us with more precise stack traces. The required changes are not that large and surely not that complicated, and we may actually end up with less code over all. > > `Ideal optimization did not make progress but created new unused nodes.` > > This one could get triggered even if we don't make progress at all. It may be that some `Ideal` optimization always generates nodes but then does not actually insert them into the graph. That could be considered wasteful, and that is really all that assert tells us. What do you think? I agree, doing the changes sounds quite reasonable and is definitely worth it if it gives us more information. > This one could get triggered even if we don't make progress at all. It may be that some Ideal optimization always generates nodes but then does not actually insert them into the graph. That could be considered wasteful, and that is really all that assert tells us. What do you think? Mmh, I never saw that assert actually getting triggered, interesting. I agree as well, in this case it's a bit different. I will do the changes, thanks for the suggestions! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2541824373 From duke at openjdk.org Wed Nov 19 12:52:37 2025 From: duke at openjdk.org (Samuel Chee) Date: Wed, 19 Nov 2025 12:52:37 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v6] In-Reply-To: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: > AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: > > ;; cmpxchg { > 0x0000e708d144cf60: mov x8, x2 > 0x0000e708d144cf64: casal x8, x3, [x0] > 0x0000e708d144cf68: cmp x8, x2 > ;; 0x1F1F1F1F1F1F1F1F > 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f > ;; } cmpxchg > 0x0000e708d144cf70: cset x8, ne // ne = any > 0x0000e708d144cf74: dmb ish > > > According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] > >> Atomically sets the value of a variable to the >> newValue with the memory semantics of setVolatile if >> the variable's current value, referred to as the witness >> value, == the expectedValue, as accessed with the memory >> semantics of getVolatile. > > > > Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. > > Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) > > This is also reflected by C2 not having a dmb for the same respective method. > > [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) > [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) Samuel Chee has updated the pull request incrementally with one additional commit since the last revision: Add "/*with_barrier*/" comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26000/files - new: https://git.openjdk.org/jdk/pull/26000/files/135123cb..6d9030d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26000&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26000&range=04-05 Stats: 16 lines in 1 file changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/26000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26000/head:pull/26000 PR: https://git.openjdk.org/jdk/pull/26000 From duke at openjdk.org Wed Nov 19 12:59:47 2025 From: duke at openjdk.org (Ruben) Date: Wed, 19 Nov 2025 12:59:47 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 12 Nov 2025 14:06:56 GMT, Andrew Haley wrote: >> Samuel Chee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Address review comments. Refine. >> - Merge from the main branch >> - Add cmpxchg_barrier helper >> >> Change-Id: I17acf999140f0c1decb256de8291361c568a4ff8 >> - Add comment >> >> Signed-off-by: Samuel Chee >> Change-Id: I9793ed6ffdff6c044552d069af23620d178f2284 >> - Add back in dmb membar for non-LSE >> >> Change-Id: Ie64565420a1758d3191eaebed82c80584ce54ef6 >> - 8360654: AArch64: Remove redundant dmb from C1 compareAndSet >> >> Change-Id: I79a0079fc2d3d90eeb671b6ed73d963968d4fa53 > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3471: > >> 3469: bool weak, >> 3470: Register result) { >> 3471: cmpxchg(addr, expected, new_val, size, acquire, release, weak, result, false); > > Suggestion: > > cmpxchg(addr, expected, new_val, size, acquire, release, weak, result, /*with_barrier*/false); > > Reason: avoid naked booleans at call sites. > Please do this everywhere. Thank you @theRealAph. Updated, including in `ATOMIC_OP`/`ATOMIC_XCHG` - is that advisable? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2541888020 From chagedorn at openjdk.org Wed Nov 19 13:04:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Nov 2025 13:04:22 GMT Subject: RFR: 8371674: C2 fails with Missed optimization opportunity in PhaseIterGVN for MoveL2D [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 08:31:05 GMT, Beno?t Maillard wrote: >> This PR addresses yet another missed optimization in `PhaseIterGVN`. The way this optimization is triggered is a bit different this time though, and the notification is missing in `Node::has_special_unique_user`. >> >> ## Analysis >> >> The affected optimization is the transformation of `MoveX2Y (LoadX mem)` into `LoadY mem`. This is implemented in `MoveNode::Ideal`. The optimization is as follows: >> >> ```c++ >> // Fold reinterpret cast into memory operation: >> // MoveX2Y (LoadX mem) => LoadY mem >> LoadNode* ld = in(1)->isa_Load(); >> if (ld != nullptr && (ld->outcnt() == 1)) { // replace only >> const Type* rt = bottom_type(); >> if (ld->has_reinterpret_variant(rt)) { >> if (phase->C->post_loop_opts_phase()) { >> return ld->convert_to_reinterpret_load(*phase, rt); >> } else { >> // attempt the transformation once loop opts are over >> phase->C->record_for_post_loop_opts_igvn(this); >> } >> } >> } >> >> >> The optimization is triggered only if the input is a `LoadNode` and the `MoveNode` is its only user. This is a relatively unusual pattern. >> >> The bug was found by the fuzzer. At some point during IGVN, we have the following subgraph: >> >> >> CountedLoop LoadL >> \ / \ >> Phi MoveL2D >> >> In `RegionNode::Ideal`, we end up calling `set_req_X` on the `Phi` node to delete the edge from the `Phi` node to `LoadL`. As a result, the `LoadL` node only has one user left, and the `MoveNode::Ideal` gets triggered at the next verification pass. >> >> ## Proposed Solution >> >> Add this particular case to `Node::has_special_unique_user`, which gets called by `Node::set_req_X`. >> >> ## Summary of changes >> >> This PR brings the following changes: >> - Detect the optimization pattern in `Node::has_special_unique_user`. >> - Add new test `TestMissingOptMoveX2YLoadX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. I tried to get a reproducer for each of the `Move` nodes, but I was only able to get one for `MoveL2D`. >> >> ### Testing >> >> - [x] https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371674 >> - [x] tier1-4, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Rename test and add comment Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28290#pullrequestreview-3482586988 From aseoane at openjdk.org Wed Nov 19 13:04:26 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 19 Nov 2025 13:04:26 GMT Subject: Integrated: 8213762: Deprecate Xmaxjitcodesize In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 14:57:30 GMT, Anton Seoane Ampudia wrote: > This PR deprecates the `Xmaxjitcodesize` flag in JDK 26. Please see the CSR for specific details on why this flag is being deprecated and workarounds for users interested in keeping similar behaviour in the future. This pull request has now been integrated. Changeset: 0bff5f3d Author: Anton Seoane Ampudia Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/0bff5f3dbe69ab2a59db771af1020b04c0132954 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8213762: Deprecate Xmaxjitcodesize Reviewed-by: kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28297 From aph at openjdk.org Wed Nov 19 13:15:19 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 19 Nov 2025 13:15:19 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 12 Nov 2025 14:16:41 GMT, Andrew Haley wrote: >> Samuel Chee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Address review comments. Refine. >> - Merge from the main branch >> - Add cmpxchg_barrier helper >> >> Change-Id: I17acf999140f0c1decb256de8291361c568a4ff8 >> - Add comment >> >> Signed-off-by: Samuel Chee >> Change-Id: I9793ed6ffdff6c044552d069af23620d178f2284 >> - Add back in dmb membar for non-LSE >> >> Change-Id: Ie64565420a1758d3191eaebed82c80584ce54ef6 >> - 8360654: AArch64: Remove redundant dmb from C1 compareAndSet >> >> Change-Id: I79a0079fc2d3d90eeb671b6ed73d963968d4fa53 > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3465: > >> 3463: } >> 3464: >> 3465: void MacroAssembler::cmpxchg(Register addr, Register expected, > > Why do we need all of these non-barrier versions? Ping? I don't know what this is for. C1 will only add barriers for pre-LSE systems, and nothing else cares. You've got several new methods that no one needs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2541963608 From mchevalier at openjdk.org Wed Nov 19 13:16:05 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 19 Nov 2025 13:16:05 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v3] In-Reply-To: References: Message-ID: > This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. > > The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. > > # Analysis > ## Obervationally > ### IGVN > During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: > > in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) > in(2): null > > We compute the join (HS' meet): > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 > > t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > But the current `_type` (of the `PhiNode` as a `TypeNode`) is > > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > > We filter `t` by `_type` > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 > and we get > > ft=java/lang/Object * > > which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 > and > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 > > > ### Verification > On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time > > _type=java/lang/Object * > > and so after filtering `t` by (new) `_type` and we get > > ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. > > ## But why?! > ### Details on type computation > In short, we are doing > > t = typeof(in(1)) / typeof(in(2)) > ft = t /\ _type (* IGVN *) > ft' = t /\ ft (* Verification *) > > and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". > > To me, the surprising fact was that the intersection > > java/lang/Object * (... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: IgnoreUnrecognizedVMOptions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28331/files - new: https://git.openjdk.org/jdk/pull/28331/files/7ac02cbf..e9f3ac98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28331/head:pull/28331 PR: https://git.openjdk.org/jdk/pull/28331 From mhaessig at openjdk.org Wed Nov 19 13:19:48 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 19 Nov 2025 13:19:48 GMT Subject: RFR: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 03:14:52 GMT, Fei Yang wrote: >> Hi, please consider this test-only change fixing an IR test failure. >> >> This IR test fails on platforms without native support for `Float16`. The reason is that method `Float::floatToFloat16` is inlined into method `TestSubNodeFloatDoubleNegation.testHalfFloat`, which causes unexpected IR graph. One way to fix this would be disabling inlining of methods from the `java.lang.Float` class. >> >> After this change, we are doing `CallStaticJava` to convert between `Float16` and `Float` on these platforms: >> >> ...... >> 259 CallStaticJava === 5 6 7 8 1 (22 1 669 1 10 1 22 ) [[ 260 261 262 264 ]] # Static java.lang.Float::float16ToFloat float ( int ) Float16::floatValue @ bci:4 (line 876) Float16::subtract @ b ci:1 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 69) !jvms: Float16::floatValue @ bci:4 (line 876) Float16::subtract @ bci:1 (line 1185) TestSubNodeFloatDoubleNegation: :testHalfFloat @ bci:9 (line 69) >> >> ...... >> >> 562 CallStaticJava === 553 507 538 8 1 (526 1 1 1 1 559 559 ) [[ 563 564 565 567 ]] # Static java.lang.Float::floatToFloat16 short ( float ) Float16::valueOf @ bci:5 (line 361) Float16::subtra ct @ bci:9 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 67) !jvms: Float16::valueOf @ bci:5 (line 361) Float16::subtract @ bci:9 (line 1185) TestSubNodeFloatDoubleNegat ion::testHalfFloat @ bci:12 (line 67) >> >> ...... >> >> >> Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Review Changes look good. I reran the CI and everything passed. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28364#pullrequestreview-3482674655 From rcastanedalo at openjdk.org Wed Nov 19 13:28:38 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 19 Nov 2025 13:28:38 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v6] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: On Mon, 17 Nov 2025 10:27:05 GMT, Saranya Natarajan wrote: >> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). >> >> ### Fix >> Implemented the suggested refactoring. >> >> ### Testing >> Github Actions, Tier 1-3 > > Saranya Natarajan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into JDK-8349835 > - testing and review on moving code to cpp > - Merge branch 'master' into JDK-8349835 > - addressing review comments#2 > - fixing test failure > - addressing review comments > - changing int to bool in a struct > - fix to failing test > - initial fix Thanks for cleaning up this code and testing it thoroughly, Saranya. The changes look good to me overall, I just have a few minor suggestions. There may be more properties that could be printed using the new abstraction, but I am OK with addressing those separately. src/hotspot/share/opto/idealGraphPrinter.cpp line 46: > 44: public: > 45: PrintProperties(IdealGraphPrinter* printer) : _printer(printer) {} > 46: void print_node_properties(Node* node, Compile* C); I suggest to fetch the `Compile` reference within `print_node_properties` from `_printer->C` instead. Suggestion: void print_node_properties(Node* node); src/hotspot/share/opto/idealGraphPrinter.cpp line 47: > 45: PrintProperties(IdealGraphPrinter* printer) : _printer(printer) {} > 46: void print_node_properties(Node* node, Compile* C); > 47: void print_lrg_properties(const LRG &lrg, const char* buffer); Suggestion: void print_lrg_properties(const LRG& lrg, const char* buffer); src/hotspot/share/opto/idealGraphPrinter.cpp line 72: > 70: Node* old = C->matcher()->find_old_node(node); > 71: if (old != nullptr) { > 72: print_property(true, "old_node_idx", C->matcher()->find_old_node(node)->_idx); Suggestion: print_property(true, "old_node_idx", old->_idx); src/hotspot/share/opto/idealGraphPrinter.hpp line 45: > 43: class ciMethod; > 44: class JVMState; > 45: class LRG; This forward declaration is not needed anymore. ------------- PR Review: https://git.openjdk.org/jdk/pull/26902#pullrequestreview-3482570982 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2541981094 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2541975521 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2541920560 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2541898950 From rcastanedalo at openjdk.org Wed Nov 19 13:34:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 19 Nov 2025 13:34:55 GMT Subject: RFR: 8372097: C2: PhasePrintLevel requires setting PrintPhaseLevel explicitly to be active In-Reply-To: <3lLhDPNgImbNIz-0CGOOmSh8IGA-jwXjVb045IiJw8Q=.e417355f-87ba-4900-a11a-d37f39ab4ec9@github.com> References: <3lLhDPNgImbNIz-0CGOOmSh8IGA-jwXjVb045IiJw8Q=.e417355f-87ba-4900-a11a-d37f39ab4ec9@github.com> Message-ID: On Wed, 19 Nov 2025 08:51:58 GMT, Roberto Casta?eda Lozano wrote: > This changeset aligns the behavior of `PrintPhaseLevel` with its description in `c2_globals.hpp` in the default case of `-XX:PrintPhaseLevel=0`. In particular, after the changeset, running `java -XX:CompileCommand=PhasePrintLevel,*::*,N` does print the phase names corresponding to level `N` for the matched methods, as expected: > > > $ java -Xbatch -XX:CompileCommand=PhasePrintLevel,java.lang.StringLatin1::equals,2 > CompileCommand: PhasePrintLevel java/lang/StringLatin1.equals intx PhasePrintLevel = 2 > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > ... > > > The changeset makes the behavior of the `PrintPhaseLevel` flag and `PhasePrintLevel` compile command consistent with the behavior of the pre-existing, analogous `PrintIdealGraphLevel` flag and `IGVPrintLevel` compile command. The changeset adds tests covering and documenting different combinations of flag and compile-command-specified print levels, and fixes a typo in the flag description in `c2_globals.hpp`. > > **Testing:** tier1-3 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). Thanks Manuel and Christian for reviewing! > Not something for this PR: What I find confusing is the non-matching flag and compile command name: > > PrintPhaseLevel vs. PhasePrintLevel > PrintIdealGraphLevel vs IGVPrintLevel > I would advocate to use matching names. I totally agree. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28386#issuecomment-3552730002 From duke at openjdk.org Wed Nov 19 13:35:58 2025 From: duke at openjdk.org (Ruben) Date: Wed, 19 Nov 2025 13:35:58 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 19 Nov 2025 13:12:59 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3465: >> >>> 3463: } >>> 3464: >>> 3465: void MacroAssembler::cmpxchg(Register addr, Register expected, >> >> Why do we need all of these non-barrier versions? > > Ping? I don't know what this is for. C1 will only add barriers for pre-LSE systems, and nothing else cares. You've got several new methods that no one needs. Apologies for the delay in response on this. I'm currently reviewing the call sites of these methods in aarch64.ad - like https://github.com/openjdk/jdk/blob/0bff5f3dbe69ab2a59db771af1020b04c0132954/src/hotspot/cpu/aarch64/aarch64.ad#L8491 via https://github.com/openjdk/jdk/blob/0bff5f3dbe69ab2a59db771af1020b04c0132954/src/hotspot/cpu/aarch64/aarch64.ad#L3358, to understand how barriers are handled in C2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2542031623 From duke at openjdk.org Wed Nov 19 14:01:13 2025 From: duke at openjdk.org (Ruben) Date: Wed, 19 Nov 2025 14:01:13 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 19 Nov 2025 13:33:09 GMT, Ruben wrote: >> Ping? I don't know what this is for. C1 will only add barriers for pre-LSE systems, and nothing else cares. You've got several new methods that no one needs. > > Apologies for the delay in response on this. > I'm currently reviewing the call sites of these methods in aarch64.ad - like https://github.com/openjdk/jdk/blob/0bff5f3dbe69ab2a59db771af1020b04c0132954/src/hotspot/cpu/aarch64/aarch64.ad#L8491 via https://github.com/openjdk/jdk/blob/0bff5f3dbe69ab2a59db771af1020b04c0132954/src/hotspot/cpu/aarch64/aarch64.ad#L3358, to understand how barriers are handled in C2. As far as I can tell, this is currently used for codegen of [getAndAddAcquire](https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/lang/invoke/VarHandle.html#getAndAddAcquire(java.lang.Object...)). The method itself doesn't emit a trailing barrier, however a `membar_acquire` is inserted by `C2AccessFence` - https://github.com/openjdk/jdk/blob/0bff5f3dbe69ab2a59db771af1020b04c0132954/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L324-L332 This barrier doesn't appear to be elided. It seems, in this case the `cmpxchg_barrier` should be used and the `membar_acquire` should be elided. Achieving that would require extra changes - on the C2 side. If this conclusion is correct, should the extra modification be in scope of this PR? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2542131385 From dbriemann at openjdk.org Wed Nov 19 14:34:08 2025 From: dbriemann at openjdk.org (David Briemann) Date: Wed, 19 Nov 2025 14:34:08 GMT Subject: RFR: 8367487: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on Linux aarch64 with Cavium CPU Message-ID: Fix by requiring both floating-point half-precision fphp advanced SIMD half-precision (asimdhp) on aarch64. ------------- Commit messages: - 8367487: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on Linux aarch64 with Cavium CPU Changes: https://git.openjdk.org/jdk/pull/28390/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28390&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8367487 Stats: 15 lines in 1 file changed: 13 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28390.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28390/head:pull/28390 PR: https://git.openjdk.org/jdk/pull/28390 From aph at openjdk.org Wed Nov 19 15:39:11 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 19 Nov 2025 15:39:11 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 19 Nov 2025 13:58:50 GMT, Ruben wrote: > This barrier doesn't appear to be elided. Please post your test case so I understand what is happening. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2542552256 From bmaillard at openjdk.org Wed Nov 19 15:41:12 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 19 Nov 2025 15:41:12 GMT Subject: RFR: 8371674: C2 fails with Missed optimization opportunity in PhaseIterGVN for MoveL2D [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 10:12:06 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename test and add comment > > Looks good to me, thanks for working on this :) > > I've been wondering how we could well test and reproduce all these issues in the past. One idea was to have some sort of special `OpaqueDelayNode` that would fold away in a very specific phase, or maybe at a random time. For example, during post-loop-opts, and then it would exactly trigger your condition here. That would allow us to even have IR rules, and make sure the fix really keeps on working. I had once filed this: [JDK-8357805](https://bugs.openjdk.org/browse/JDK-8357805). Thank you for reviewing @eme64 @chhagedorn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28290#issuecomment-3553388223 From bmaillard at openjdk.org Wed Nov 19 15:49:59 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 19 Nov 2025 15:49:59 GMT Subject: Integrated: 8371674: C2 fails with Missed optimization opportunity in PhaseIterGVN for MoveL2D In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 11:46:14 GMT, Beno?t Maillard wrote: > This PR addresses yet another missed optimization in `PhaseIterGVN`. The way this optimization is triggered is a bit different this time though, and the notification is missing in `Node::has_special_unique_user`. > > ## Analysis > > The affected optimization is the transformation of `MoveX2Y (LoadX mem)` into `LoadY mem`. This is implemented in `MoveNode::Ideal`. The optimization is as follows: > > ```c++ > // Fold reinterpret cast into memory operation: > // MoveX2Y (LoadX mem) => LoadY mem > LoadNode* ld = in(1)->isa_Load(); > if (ld != nullptr && (ld->outcnt() == 1)) { // replace only > const Type* rt = bottom_type(); > if (ld->has_reinterpret_variant(rt)) { > if (phase->C->post_loop_opts_phase()) { > return ld->convert_to_reinterpret_load(*phase, rt); > } else { > // attempt the transformation once loop opts are over > phase->C->record_for_post_loop_opts_igvn(this); > } > } > } > > > The optimization is triggered only if the input is a `LoadNode` and the `MoveNode` is its only user. This is a relatively unusual pattern. > > The bug was found by the fuzzer. At some point during IGVN, we have the following subgraph: > > > CountedLoop LoadL > \ / \ > Phi MoveL2D > > In `RegionNode::Ideal`, we end up calling `set_req_X` on the `Phi` node to delete the edge from the `Phi` node to `LoadL`. As a result, the `LoadL` node only has one user left, and the `MoveNode::Ideal` gets triggered at the next verification pass. > > ## Proposed Solution > > Add this particular case to `Node::has_special_unique_user`, which gets called by `Node::set_req_X`. > > ## Summary of changes > > This PR brings the following changes: > - Detect the optimization pattern in `Node::has_special_unique_user`. > - Add new test `TestMissingOptMoveX2YLoadX.java`, initially obtained from the fuzzer and then heavily reduced, both with the usual tools and manually. I tried to get a reproducer for each of the `Move` nodes, but I was only able to get one for `MoveL2D`. > > ### Testing > > - [x] https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8371674 > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! This pull request has now been integrated. Changeset: 3949b0f2 Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/3949b0f23cd9c936c12ac0306534bc38b5b8d298 Stats: 66 lines in 2 files changed: 64 ins; 0 del; 2 mod 8371674: C2 fails with Missed optimization opportunity in PhaseIterGVN for MoveL2D Reviewed-by: epeter, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28290 From duke at openjdk.org Wed Nov 19 16:56:57 2025 From: duke at openjdk.org (Ruben) Date: Wed, 19 Nov 2025 16:56:57 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 19 Nov 2025 15:37:01 GMT, Andrew Haley wrote: >> As far as I can tell, this is currently used for codegen of [getAndAddAcquire](https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/lang/invoke/VarHandle.html#getAndAddAcquire(java.lang.Object...)). >> >> The method itself doesn't emit a trailing barrier, however a `membar_acquire` is inserted by `C2AccessFence` - https://github.com/openjdk/jdk/blob/0bff5f3dbe69ab2a59db771af1020b04c0132954/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L324-L332 >> This barrier doesn't appear to be elided. >> It seems, in this case the `cmpxchg_barrier` should be used and the `membar_acquire` should be elided. Achieving that would require extra changes - on the C2 side. >> If this conclusion is correct, should the extra modification be in scope of this PR? > >> This barrier doesn't appear to be elided. > > Please post your test case so I understand what is happening. import java.lang.invoke.MethodHandles; import java.lang.invoke.VarHandle; public class Test { int value; static final VarHandle handle; static { try { handle = MethodHandles.lookup().findVarHandle(Test.class, "value", int.class); } catch(Exception e) { throw new RuntimeException(e); } } static void test(Test instance) { handle.getAndAddAcquire(instance, 1); } public static void main(String[] args) { Test instance = new Test(); for (int i = 0; i < 5000000; i++) { test(instance); } System.out.println(instance.value); } } I'm running it as follows: java -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+UseLSE -XX:CompileCommand=compileonly,Test::test Test The output includes: 050 050 + cmpxchgw R29 = [R10], R11, R13 # (int, weak) if [R10] == R11 then [R10] <-- R13csetw R29, EQ # R29 <-- (EQ ? 1 : 0) 064 064 + membar_acquire dmb ishld and 0x0000ef1d383bee54: casl w8, w13, [x10] 0x0000ef1d383bee58: cmp w8, w11 ;; 0x1F1F1F1F1F1F1F1F 0x0000ef1d383bee5c: mov x8, #0x1f1f1f1f1f1f1f1f // #2242545357980376863 ;; } cmpxchg 0x0000ef1d383bee60: cset w29, eq // eq = none ;; membar_acquire 0x0000ef1d383bee64: dmb ishld ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2542797500 From epeter at openjdk.org Wed Nov 19 17:07:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Nov 2025 17:07:45 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 11:22:09 GMT, Marc Chevalier wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug number > > I think reading the code and the comments to understand the situation might not be as good as reading the description of this PR. I regret I gave a reproducer and proposed a solution. > > Given the very obvious lack of consensus on the Valhalla PR, it is clear that this issue might evolve radically and that the proposed solution may not be the final one. Therefore, I will not do any cleanup before agreeing on the way to go, as it might very well be erased and it would be a very poor use of everybody's time. @marc-chevalier @merykitty @rwestrel I only just realized that the conversation from https://github.com/openjdk/valhalla/pull/1717 still had relevance here. I had a very nice call with Marc, and he graciously explained things to me. I'll summarize my thoughts below. It seems to be possible that the local speculative type of the `phi` (`_type`) and the speculative type of its inputs (`t`) do not have an intersection that is either `null` or `top`. Initially surprising: `t->filter_speculative(_type)` does not actually produce an intersection of the speculative type in this case, but removes the speculative type in `cleanup_speculative` because the type is `above_centerline`. Marc told me that Roland added this, to avoid "over speculation" because the type is now too narrow, and more likely to be wrong. My question was if it is sane that we get this kind of inconsistent speculation: it must mean that our profiling has delivered somewhat inconsistent results. We would have to dig a bit deeper into the reproducer now, but it seems that an inlined (inner) method found one type during profiling, but the caller (outer method) found another at the phi. It would be interesting to understand a bit more what this implies. I could see these options: - We assume both assumptions (speculations are correct). i.e. if there is a path for the type of the inner method to the place of the outer method, then it must be a subtype of what we profiled at both places. And it happens that this is only null or top, so we conclude it is impossible for something non-null to come through this path. It would be reasonable to speculate on this. - Maybe profiling is incomplete: we see one type in the inner, and another type in the outer context. So it might be likely that both show up eventually at the phi. - One more thought: the more speculative assumptions are "intersected", the more likely this combined speculation is to be incorrect at runtime. Conceptually, each speculation has a probability of failure, and if you apply many of them, the success probability goes lower and lower. We could now dig deep into if Roland's `cleanup_speculative` logic that prevents "over speculation" is reasonable, or if we should extend it. But that goes out of scope for this bug fix here, in my opinion. The question is now what we should do here. There are at least 3 options: - The speculative type should be `null`, the natural intersection of the two speculative types. To me, that sounds like the most consistent solution. But it would require reconsidering the `cleanup_speculative` logic, a lot of effort. But maybe @merykitty has the desire to tackle such a project? - The speculative type should be `Object` (or the union of the two speculative types). That is the same as removing the speculative type. But currently we don't mark the type as "widened", and so it is possible to later narrow it again once we filter again - so we were not able to reach a fixpoint yet, and that is a problem. - Just pick one of the two speculative types. It's an arbitrary choice. But it's not incorrect. Marc's approach of filtering again acheives exactly that: first step removes the speculative type, the second step picks the speculative type of the incoming type. It seems to be the smallest code change, and so that may be best for a relative edge case like this. There is a bit of a question how much of an edge case this really is. Probably quite. But we also cannot rely on our fuzzers here, as they don't really produce Object Classes at all. Anyway: I'm the least qualified compared to @merykitty and @rwestrel , but I would vote for Marc's current proposal. Plus the required code cleanups, of course. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3553781174 From jbhateja at openjdk.org Wed Nov 19 18:12:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 19 Nov 2025 18:12:22 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v13] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/cccef216..ee8b0368 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=11-12 Stats: 40 lines in 8 files changed: 18 ins; 4 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Wed Nov 19 18:12:24 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 19 Nov 2025 18:12:24 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v8] In-Reply-To: References: <-dYODIlHuNDfG5-uMVa3r9F-9HHN9Xzg_XeI9w_uT48=.b669f76b-ec7a-4350-bb69-a45540ac627f@github.com> Message-ID: On Mon, 17 Nov 2025 14:38:58 GMT, Daniel Lund?n wrote: >> Hi @iwanowww , @dlunde , @eme64 , @TobiHartmann , @sviswa7 , your comments have been addressed. >> Let me know if this is good to land in. > > Thanks for the updates @jatin-bhateja, looks good to me. I'm rerunning some tests for sanity before I click approve! Hi @dlunde , @iwanowww , your comments have been addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3553966244 From jbhateja at openjdk.org Wed Nov 19 18:12:27 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 19 Nov 2025 18:12:27 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v12] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 23:48:57 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Review comments resolutions > > src/hotspot/cpu/x86/x86.ad line 2648: > >> 2646: // can be demoted to REX/REX2 encodings. For commutative operations with register >> 2647: // operands, allocation of definition operand is biased towards both the operands. >> 2648: return (((mdef->flags() & Node::PD::Flag_ndd_demotable) != 0) && > > It is called either with ` oper_index == 1` or ` oper_index == 2`. Can you make it explicit that any other operand doesn't participate in register biasing? Also, I'd expand the check, so it becomes clear that 1st operand requires `Flag_ndd_demotable` and 2nd requires `Flag_ndd_demotable` + `Flag_ndd_commutative` set. Flag_ndd_deomotable applies to any kind of NDD pattern which is a candidate of demotion, while Flag_ndd_commutative is only set over operand commutative operations. One very interesting fact which needs to be highlighted here is that ADLC generates state checks for both the original operand ordering and its flipped variant, e.g. for addI_rReg_rReg_mem pattern where first source operand is register and second one is memory, DFA generates two reduction rules addI_rReg_rReg_mem and addI_rReg_rReg_mem_0, latter one is the flipped one, existing scheme of NDD demotion is able to take care of both the cases, one constraint being that deomatable operand should not cover more than one ideal input edges which makes it a complex operand and any of its component register should not be shared with definition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2543005764 From shade at openjdk.org Wed Nov 19 18:23:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Nov 2025 18:23:52 GMT Subject: RFR: 8372154: AArch64: Match rule failure with some CompareAndSwap operand shapes Message-ID: See bug for more details. Following up on [JDK-8371959](https://bugs.openjdk.org/browse/JDK-8371959) failures, I managed to reproduce the "bad AD" file assert. It is heavily intermittent, and needs hours of runs before we hit the lucky seed, plus [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557) to have broader testing scope. It looks like `CastII` node accepts the wider operand type (`iRegI`), which fails to match against narrower type in CAS match rules (`iRegINoSp`). It makes sense to use `iRegINoSp` for destination regs, so that we do not start writing to these special registers. But for operand registers, it makes little sense, IMO. I note that cas.m4-generated CAE/WCAS stubs actually already have the wider `iRegI` for operand types. So it looks to me the manual CAS match rules should also use `iRegI`. It would be even better to auto-generate these match rules from M4 stencils, and I tried that, but ultimately decided it obscures the actual bug fix. [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) is dedicated to moving the match rules, hopefully without the semantic change. In this change, I dropped `*NoSp` from CAS operand match rules. It fixes the `iRegI` mismatch, and prepares us for harmonizing these rules with the rest of CAS/CAE generated ones. Additional testing: - [x] Linux AArch64 server fastdebug, local `bad AD` assert reproducer no longer fails - [ ] Linux AArch64 server fastdebug, `all` - [ ] Linux AArch64 server fastdebug, jcstress run ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/28398/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28398&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372154 Stats: 15 lines in 1 file changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/28398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28398/head:pull/28398 PR: https://git.openjdk.org/jdk/pull/28398 From aph at openjdk.org Wed Nov 19 19:42:11 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 19 Nov 2025 19:42:11 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 19 Nov 2025 16:41:06 GMT, Ruben wrote: >>> This barrier doesn't appear to be elided. >> >> Please post your test case so I understand what is happening. > > import java.lang.invoke.MethodHandles; > import java.lang.invoke.VarHandle; > > public class Test { > int value; > > static final VarHandle handle; > static { > try { > handle = MethodHandles.lookup().findVarHandle(Test.class, "value", int.class); > } catch(Exception e) { > throw new RuntimeException(e); > } > } > > static void test(Test instance) { > handle.getAndAddAcquire(instance, 1); > } > > public static void main(String[] args) { > Test instance = new Test(); > for (int i = 0; i < 5000000; i++) { > test(instance); > } > System.out.println(instance.value); > } > } > > > I'm running it as follows: > > java -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+UseLSE -XX:CompileCommand=compileonly,Test::test Test > > > The output includes: > > 050 > 050 + cmpxchgw R29 = [R10], R11, R13 # (int, weak) if [R10] == R11 then [R10] <-- R13csetw R29, EQ # R29 <-- (EQ ? 1 : 0) > 064 > 064 + membar_acquire > dmb ishld > > and > > 0x0000ef1d383bee54: casl w8, w13, [x10] > 0x0000ef1d383bee58: cmp w8, w11 > ;; 0x1F1F1F1F1F1F1F1F > 0x0000ef1d383bee5c: mov x8, #0x1f1f1f1f1f1f1f1f // #2242545357980376863 > ;; } cmpxchg > 0x0000ef1d383bee60: cset w29, eq // eq = none > ;; membar_acquire > 0x0000ef1d383bee64: dmb ishld Mmm, interesting. That one will take some extra digging. It would be best to concentrate on eliminating trailing DMB from C1, and keep on until that job is done. That way you will "own" the whole thing. It will take several pull requests to do it, because there are a few cases that should be handled. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2543315506 From skuksenko at openjdk.org Wed Nov 19 21:54:40 2025 From: skuksenko at openjdk.org (Sergey Kuksenko) Date: Wed, 19 Nov 2025 21:54:40 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: References: Message-ID: On Mon, 17 Nov 2025 23:35:44 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: > > - whitespace > - address first comments What is the reason to add a new microbenchmark? We already have enough micros covering MLDSA: org.openjdk.bench.javax.crypto.full.KeyPairGeneratorBench.MLDSA.generateKeyPair org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA.sign org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA.verify org.openjdk.bench.javax.crypto.small.KeyPairGeneratorBench.MLDSA.generateKeyPair org.openjdk.bench.javax.crypto.small.SignatureBench.MLDSA.sign org.openjdk.bench.javax.crypto.small.SignatureBench.MLDSA.verify ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3554770437 From vpaprotski at openjdk.org Wed Nov 19 22:03:09 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 19 Nov 2025 22:03:09 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 21:51:32 GMT, Sergey Kuksenko wrote: > What is the reason to add a new microbenchmark? We already have enough micros covering MLDSA: > > org.openjdk.bench.javax.crypto.full.KeyPairGeneratorBench.MLDSA.generateKeyPair org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA.sign org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA.verify org.openjdk.bench.javax.crypto.small.KeyPairGeneratorBench.MLDSA.generateKeyPair org.openjdk.bench.javax.crypto.small.SignatureBench.MLDSA.sign org.openjdk.bench.javax.crypto.small.SignatureBench.MLDSA.verify I can definitely remove it, got no strong attachment to it.. I did find it useful during development and thought it might be useful during review to verify performance.. but the usefulness of it beyond is indeed debatable. You might notice its a lot more 'granular'; it measures the performance of the intrinsics themselves, not the ("10-level deep") "wrappers". That said, those "wrappers" is what actual user will see and what we should be measuring. This new benchmark is only useful to another intrinsic developer.. (but it should already be usable by other platforms not just Intel?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3554793447 From skuksenko at openjdk.org Wed Nov 19 22:46:00 2025 From: skuksenko at openjdk.org (Sergey Kuksenko) Date: Wed, 19 Nov 2025 22:46:00 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 21:59:03 GMT, Volodymyr Paprotski wrote: > > What is the reason to add a new microbenchmark? We already have enough micros covering MLDSA: > > org.openjdk.bench.javax.crypto.full.KeyPairGeneratorBench.MLDSA.generateKeyPair org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA.sign org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA.verify org.openjdk.bench.javax.crypto.small.KeyPairGeneratorBench.MLDSA.generateKeyPair org.openjdk.bench.javax.crypto.small.SignatureBench.MLDSA.sign org.openjdk.bench.javax.crypto.small.SignatureBench.MLDSA.verify > > I can definitely remove it, got no strong attachment to it.. I did find it useful during development and thought it might be useful during review to verify performance.. but the usefulness of it beyond is indeed debatable. > > You might notice its a lot more 'granular'; it measures the performance of the intrinsics themselves, not the ("10-level deep") "wrappers". That said, those "wrappers" is what actual user will see and what we should be measuring. > > This new benchmark is only useful to another intrinsic developer.. (but it should already be usable by other platforms not just Intel?) I understand your reasons. The question is whether you'll need the microbenchmark in the future. If no (or probably no), please remove the micro. If needed, please move it from the "org.openjdk.bench.javax.crypto.full" package to "org.openjdk.bench.javax.crypto". It is supposed to have only public API micros in packages "small" and "full" ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3554914771 From fyang at openjdk.org Thu Nov 20 02:22:49 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Nov 2025 02:22:49 GMT Subject: RFR: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 08:38:13 GMT, Manuel H?ssig wrote: >> Hi, please consider this test-only change fixing an IR test failure. >> >> This IR test fails on platforms without native support for `Float16`. The reason is that method `Float::floatToFloat16` is inlined into method `TestSubNodeFloatDoubleNegation.testHalfFloat`, which causes unexpected IR graph. One way to fix this would be disabling inlining of methods from the `java.lang.Float` class. >> >> After this change, we are doing `CallStaticJava` to convert between `Float16` and `Float` on these platforms: >> >> ...... >> 259 CallStaticJava === 5 6 7 8 1 (22 1 669 1 10 1 22 ) [[ 260 261 262 264 ]] # Static java.lang.Float::float16ToFloat float ( int ) Float16::floatValue @ bci:4 (line 876) Float16::subtract @ b ci:1 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 69) !jvms: Float16::floatValue @ bci:4 (line 876) Float16::subtract @ bci:1 (line 1185) TestSubNodeFloatDoubleNegation: :testHalfFloat @ bci:9 (line 69) >> >> ...... >> >> 562 CallStaticJava === 553 507 538 8 1 (526 1 1 1 1 559 559 ) [[ 563 564 565 567 ]] # Static java.lang.Float::floatToFloat16 short ( float ) Float16::valueOf @ bci:5 (line 361) Float16::subtra ct @ bci:9 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 67) !jvms: Float16::valueOf @ bci:5 (line 361) Float16::subtract @ bci:9 (line 1185) TestSubNodeFloatDoubleNegat ion::testHalfFloat @ bci:12 (line 67) >> >> ...... >> >> >> Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. > > Thank you for looking into this, @RealFYang. The changes look good. I tested locally on x64 with and without FP16 support and just kicked off a CI run. I'll report back with the results in a few hours. @mhaessig @eme64 : Thanks for the reviews and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28364#issuecomment-3555471264 From fyang at openjdk.org Thu Nov 20 02:22:50 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Nov 2025 02:22:50 GMT Subject: Integrated: 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 04:57:47 GMT, Fei Yang wrote: > Hi, please consider this test-only change fixing an IR test failure. > > This IR test fails on platforms without native support for `Float16`. The reason is that method `Float::floatToFloat16` is inlined into method `TestSubNodeFloatDoubleNegation.testHalfFloat`, which causes unexpected IR graph. One way to fix this would be disabling inlining of methods from the `java.lang.Float` class. > > After this change, we are doing `CallStaticJava` to convert between `Float16` and `Float` on these platforms: > > ...... > 259 CallStaticJava === 5 6 7 8 1 (22 1 669 1 10 1 22 ) [[ 260 261 262 264 ]] # Static java.lang.Float::float16ToFloat float ( int ) Float16::floatValue @ bci:4 (line 876) Float16::subtract @ b ci:1 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 69) !jvms: Float16::floatValue @ bci:4 (line 876) Float16::subtract @ bci:1 (line 1185) TestSubNodeFloatDoubleNegation: :testHalfFloat @ bci:9 (line 69) > > ...... > > 562 CallStaticJava === 553 507 538 8 1 (526 1 1 1 1 559 559 ) [[ 563 564 565 567 ]] # Static java.lang.Float::floatToFloat16 short ( float ) Float16::valueOf @ bci:5 (line 361) Float16::subtra ct @ bci:9 (line 1185) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 67) !jvms: Float16::valueOf @ bci:5 (line 361) Float16::subtract @ bci:9 (line 1185) TestSubNodeFloatDoubleNegat ion::testHalfFloat @ bci:12 (line 67) > > ...... > > > Verified with fastdebug build on aarch64, x86_64 and riscv64 platforms. This pull request has now been integrated. Changeset: a3b1affb Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/a3b1affbfb23eeef32749164aae316e5d55fffaa Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8372046: compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java fails IR verification Reviewed-by: mhaessig, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28364 From wenanjian at openjdk.org Thu Nov 20 02:54:16 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 20 Nov 2025 02:54:16 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v28] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: add more comments for key value 52 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/5bdfc649..10725f4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=26-27 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Thu Nov 20 02:54:17 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 20 Nov 2025 02:54:17 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: <9oPWTWflnwws0wxHBP58IiQRIZz4Tt5bthr7RiC3BE0=.94d60901-8fad-4597-9e55-c669de73a8e6@github.com> References: <9oPWTWflnwws0wxHBP58IiQRIZz4Tt5bthr7RiC3BE0=.94d60901-8fad-4597-9e55-c669de73a8e6@github.com> Message-ID: On Wed, 19 Nov 2025 09:50:05 GMT, Hamlin Li wrote: >> key length could be only {11, 13, 15} * 4 = {44, 52, 60}?I notice that x86 and aarch64 use directly 52?I think add some more comment will be enough? > > Can you add some comments in other existing code with magic 52 if they mean the same thing? Thanks! sure?I have already added comments to the identical parts. >> it's a return value saved to x10, it seems necessary according to aarch64 and x86, aarch64 used r0 to save it and x86 used rax > > There is a `mv` before exit of `generate_counterMode_AESCrypt`, is this one still necessary? Yes, about the `mv` before `generate_counterMode_AESCrypt`, it is for a different branch when input_len is zero at the first time. For the purpose to avoid additional jump, each code exit from `counterMode_AESCrypt` is a Independent exit, so I think we need to keep this `mv` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2544162600 PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2544163484 From erfang at openjdk.org Thu Nov 20 04:01:09 2025 From: erfang at openjdk.org (Eric Fang) Date: Thu, 20 Nov 2025 04:01:09 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v2] In-Reply-To: References: Message-ID: <4vSKAtr0tUG0V193gIvnEFdHm18ZhqflVAwk-09IVQ0=.081806f5-6303-4b4f-975d-7c85427ccae5@github.com> > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Don't read and write the same memory in the JMH benchmarks - Merge branch 'master' into JDK-8370863-mask-cast-opt - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. Current optimizations related to `VectorMaskCastNode` include: 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. This PR does the following optimizations: 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as the vector length remains the same, and this is guranteed in the api level. I conducted some simple research on different mask generation methods and mask operations, and obtained the following table, which includes some potential optimization opportunities that may use this `uncast_mask` function. ``` mask_gen\op toLong anyTrue allTrue trueCount firstTrue lastTrue compare N/A N/A N/A N/A N/A N/A maskAll TBI TBI TBI TBI TBI TBI fromLong TBI TBI N/A TBI TBI TBI mask_gen\op and or xor andNot not laneIsSet compare N/A N/A N/A N/A TBI N/A maskAll TBI TBI TBI TBI TBI TBI fromLong N/A N/A N/A N/A TBI TBI ``` `TBI` indicated that there may be potential optimizations here that require further investigation. Benchmarks: On a Nvidia Grace machine with 128-bit SVE2: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 59.23 0.21 148.12 0.07 2.50 microMaskLoadCastStoreDouble128 ops/us 2.43 0.00 38.31 0.01 15.73 microMaskLoadCastStoreFloat128 ops/us 6.19 0.00 75.67 0.11 12.22 microMaskLoadCastStoreInt128 ops/us 6.19 0.00 75.67 0.03 12.22 microMaskLoadCastStoreLong128 ops/us 2.43 0.00 38.32 0.01 15.74 microMaskLoadCastStoreShort64 ops/us 28.89 0.02 75.60 0.09 2.62 ``` On a Nvidia Grace machine with 128-bit NEON: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 75.75 0.19 149.74 0.08 1.98 microMaskLoadCastStoreDouble128 ops/us 8.71 0.03 38.71 0.05 4.44 microMaskLoadCastStoreFloat128 ops/us 24.05 0.03 76.49 0.05 3.18 microMaskLoadCastStoreInt128 ops/us 24.06 0.02 76.51 0.05 3.18 microMaskLoadCastStoreLong128 ops/us 8.72 0.01 38.71 0.02 4.44 microMaskLoadCastStoreShort64 ops/us 24.64 0.01 76.43 0.06 3.10 ``` On an AMD EPYC 9124 16-Core Processor with AVX3: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 82.13 0.31 115.14 0.08 1.40 microMaskLoadCastStoreDouble128 ops/us 0.32 0.00 0.32 0.00 1.01 microMaskLoadCastStoreFloat128 ops/us 42.18 0.05 57.56 0.07 1.36 microMaskLoadCastStoreInt128 ops/us 42.19 0.01 57.53 0.08 1.36 microMaskLoadCastStoreLong128 ops/us 0.30 0.01 0.32 0.00 1.05 microMaskLoadCastStoreShort64 ops/us 42.18 0.05 57.59 0.01 1.37 ``` On an AMD EPYC 9124 16-Core Processor with AVX2: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 73.53 0.20 114.98 0.03 1.56 microMaskLoadCastStoreDouble128 ops/us 0.29 0.01 0.30 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 30.78 0.14 57.50 0.01 1.87 microMaskLoadCastStoreInt128 ops/us 30.65 0.26 57.50 0.01 1.88 microMaskLoadCastStoreLong128 ops/us 0.30 0.00 0.30 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 24.92 0.00 57.49 0.01 2.31 ``` On an AMD EPYC 9124 16-Core Processor with AVX1: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 79.68 0.01 248.49 0.91 3.12 microMaskLoadCastStoreDouble128 ops/us 0.28 0.00 0.28 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 31.11 0.04 95.48 2.27 3.07 microMaskLoadCastStoreInt128 ops/us 31.10 0.03 99.94 1.87 3.21 microMaskLoadCastStoreLong128 ops/us 0.28 0.00 0.28 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 31.11 0.02 94.97 2.30 3.05 ``` This PR was tested on 128-bit, 256-bit, and 512-bit (QEMU) aarch64 environments, and two 512-bit x64 machines with various configurations, including sve2, sve1, neon, avx3, avx2, avx1, sse4 and sse3, all tests passed. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28313/files - new: https://git.openjdk.org/jdk/pull/28313/files/fca9b3e5..3b0ff7d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=00-01 Stats: 28723 lines in 501 files changed: 18169 ins; 7171 del; 3383 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From erfang at openjdk.org Thu Nov 20 04:05:41 2025 From: erfang at openjdk.org (Eric Fang) Date: Thu, 20 Nov 2025 04:05:41 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 01:17:50 GMT, Eric Fang wrote: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Updated the JMH benchmarks and the new test results: On a Nvidia Grace machine with 128-bit SVE2: Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 64.29 0.02 146.67 0.09 2.28 microMaskLoadCastStoreDouble128 ops/us 10.05 0.00 38.10 0.01 3.79 microMaskLoadCastStoreFloat128 ops/us 19.94 0.00 75.05 0.07 3.76 microMaskLoadCastStoreInt128 ops/us 19.94 0.00 75.13 0.01 3.77 microMaskLoadCastStoreLong128 ops/us 10.04 0.00 38.09 0.01 3.79 microMaskLoadCastStoreShort64 ops/us 31.52 0.02 75.12 0.02 2.38 On a Nvidia Grace machine with 128-bit NEON: Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 73.33 0.01 147.01 0.06 2.00 microMaskLoadCastStoreDouble128 ops/us 8.54 0.03 38.19 0.01 4.47 microMaskLoadCastStoreFloat128 ops/us 23.75 0.01 75.27 0.10 3.17 microMaskLoadCastStoreInt128 ops/us 23.73 0.01 75.25 0.07 3.17 microMaskLoadCastStoreLong128 ops/us 8.56 0.03 38.19 0.01 4.46 microMaskLoadCastStoreShort64 ops/us 24.32 0.00 75.35 0.07 3.10 On an AMD EPYC 9124 16-Core Processor with AVX3: Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 82.39 0.11 115.15 0.03 1.40 microMaskLoadCastStoreDouble128 ops/us 0.32 0.00 0.32 0.00 0.99 microMaskLoadCastStoreFloat128 ops/us 42.10 0.10 57.58 0.02 1.37 microMaskLoadCastStoreInt128 ops/us 42.10 0.08 57.57 0.02 1.37 microMaskLoadCastStoreLong128 ops/us 0.32 0.00 0.32 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 42.16 0.05 57.54 0.04 1.36 On an AMD EPYC 9124 16-Core Processor with AVX2: Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 73.59 0.27 115.14 0.04 1.56 microMaskLoadCastStoreDouble128 ops/us 0.30 0.00 0.30 0.00 1.01 microMaskLoadCastStoreFloat128 ops/us 30.68 0.09 57.57 0.02 1.88 microMaskLoadCastStoreInt128 ops/us 30.75 0.09 57.58 0.01 1.87 microMaskLoadCastStoreLong128 ops/us 0.30 0.00 0.30 0.00 1.00 microMaskLoadCastStoreShort64 ops/us 24.95 0.01 57.59 0.01 2.31 On an AMD EPYC 9124 16-Core Processor with AVX1: Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 73.68 0.02 115.17 0.03 1.56 microMaskLoadCastStoreDouble128 ops/us 0.30 0.00 0.30 0.00 1.01 microMaskLoadCastStoreFloat128 ops/us 30.80 0.12 57.59 0.01 1.87 microMaskLoadCastStoreInt128 ops/us 30.70 0.11 57.58 0.01 1.88 microMaskLoadCastStoreLong128 ops/us 0.30 0.00 0.30 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 24.95 0.01 57.56 0.02 2.31 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28313#issuecomment-3555660413 From jiangli at openjdk.org Thu Nov 20 04:59:20 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Nov 2025 04:59:20 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v2] In-Reply-To: References: Message-ID: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! Jiangli Zhou has updated the pull request incrementally with four additional commits since the last revision: - Address shipilev coments: - Replace time-bound loop with an iteration of three runs. - Add encrypt part and check to make sure the encrypted message is the same as the original. - Address shipilev's comments: - Rename test to TestGCMSplitBound.java - Change test range to [SPLIT_LEN - 300; SPLIT_LEN + 300]. - Stylistic change: '256' to '16 * 16'. - Fix indentation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28363/files - new: https://git.openjdk.org/jdk/pull/28363/files/338a99d0..f1e7291b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=00-01 Stats: 249 lines in 3 files changed: 135 ins; 113 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363 PR: https://git.openjdk.org/jdk/pull/28363 From jiangli at openjdk.org Thu Nov 20 05:06:58 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Nov 2025 05:06:58 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v3] In-Reply-To: References: Message-ID: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Fix Whitespace error. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28363/files - new: https://git.openjdk.org/jdk/pull/28363/files/f1e7291b..528b1b47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363 PR: https://git.openjdk.org/jdk/pull/28363 From jiangli at openjdk.org Thu Nov 20 05:06:58 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Nov 2025 05:06:58 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v3] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 08:33:32 GMT, Aleksey Shipilev wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Whitespace error. > > src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3530: > >> 3528: __ bind(MESG_BELOW_32_BLKS); >> 3529: __ subl(len, 16 * 16); >> 3530: __ cmpl(len, 256); > > From the stylistic logic, this should be written as `16 * 16`, to match the surrounding `subl` and `addl`. Thanks for the detailed review, @shipilev! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2544340100 From jiangli at openjdk.org Thu Nov 20 05:06:59 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Nov 2025 05:06:59 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v3] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 06:17:30 GMT, Tobias Hartmann wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Whitespace error. > > test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 41: > >> 39: public class TestAesGcmIntrinsic { >> 40: >> 41: static final SecureRandom SECURE_RANDOM = newDefaultSecureRandom(); > > Drive-by comment: Java code should use 4x whitespace indentation. @TobiHartmann, thanks! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2544338893 From jiangli at openjdk.org Thu Nov 20 05:12:24 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Nov 2025 05:12:24 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v3] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 08:53:05 GMT, Aleksey Shipilev wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Whitespace error. > > test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 39: > >> 37: import javax.crypto.spec.SecretKeySpec; >> 38: >> 39: public class TestAesGcmIntrinsic { > > This sounds like `TestGCMSplitBound` or some such; it is not a generic test for AES/GCM intrinsic. I renamed to TestAesGcmIntrinsic name, when converting the original test into the jtreg test. `TestGCMSplitBound` SGTM. Changed. > test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 93: > >> 91: } >> 92: } >> 93: for (int messageSize = SPLIT_LEN; messageSize < SPLIT_LEN + 300; messageSize++) { > > `[SPLIT_LEN - 300; SPLIT_LEN + 300]` for completeness, perhaps? Done. > test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 96: > >> 94: byte[] message = randBytes(messageSize); >> 95: try { >> 96: byte[] ciphertext = gcmEncrypt(key, message, aad); > > I believe it makes sense to check that round-trip is successful, e.g. that `decrypt(encrypt(message)) == message`. Currently we implicitly rely on exceptions being thrown from the incorrectly executing code, which is IMO too weak -- in the boundary conditions like these, there might be bugs that _do not_ manifest in visible exceptions, and just the encryption is subtly broken. That's a good idea. I added decrypt part and the check as suggested. > test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 109: > >> 107: TestAesGcmIntrinsic test = new TestAesGcmIntrinsic(); >> 108: long startTime = System.currentTimeMillis(); >> 109: while (System.currentTimeMillis() - startTime < 60 * 1000) { > > I get that you want a stress test. But time-limiting puts the test into weird condition: it can have different number of iterations, depending on auxiliary load on the machine. These tests are running in parallel with lots of other tests, so it is not uncommon. Do you even need to repeat `jitFunc()` call multiple times? Looks like it traverses the interesting configurations in one go? I did some testing today. For 200 runs, removing the time-limited loop, there is 89 runs out of 200 fail. So I changed to use an iteration of three runs, all 200 runs fail without the fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2544352236 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2544352547 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2544348398 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2544346000 From jiangli at openjdk.org Thu Nov 20 05:12:25 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Nov 2025 05:12:25 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 05:06:08 GMT, Jiangli Zhou wrote: >> test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 96: >> >>> 94: byte[] message = randBytes(messageSize); >>> 95: try { >>> 96: byte[] ciphertext = gcmEncrypt(key, message, aad); >> >> I believe it makes sense to check that round-trip is successful, e.g. that `decrypt(encrypt(message)) == message`. Currently we implicitly rely on exceptions being thrown from the incorrectly executing code, which is IMO too weak -- in the boundary conditions like these, there might be bugs that _do not_ manifest in visible exceptions, and just the encryption is subtly broken. > > That's a good idea. I added decrypt part and the check as suggested. With the changes, there were more common parts in the test. I moved common code into helper methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2544350223 From jiangli at openjdk.org Thu Nov 20 05:16:56 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Nov 2025 05:16:56 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v3] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 08:57:02 GMT, Aleksey Shipilev wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Whitespace error. > > test/jdk/com/sun/crypto/provider/Cipher/AES/TestAesGcmIntrinsic.java line 41: > >> 39: public class TestAesGcmIntrinsic { >> 40: >> 41: static final SecureRandom SECURE_RANDOM = newDefaultSecureRandom(); > > Do you really need a `SecureRandom` here? `Random RANDOM = Utils.getRandomInstance();` gets you the pre-seeded random instance, which can be used to repeatably reproduce failures. I kept the `SecureRandom` without changing. I think that could be more related to what the original reproducible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2544361801 From duke at openjdk.org Thu Nov 20 05:38:34 2025 From: duke at openjdk.org (Harshit470250) Date: Thu, 20 Nov 2025 05:38:34 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v5] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge remote-tracking branch 'origin/master' - remove whitespace - Resolved a bug - remove TODO comments - final size added - Final sizes - ... and 5 more: https://git.openjdk.org/jdk/compare/5353dcf9...bb7d05fc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/01a9b46b..bb7d05fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=03-04 Stats: 22326 lines in 362 files changed: 14173 ins; 5765 del; 2388 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From dlong at openjdk.org Thu Nov 20 05:52:28 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 20 Nov 2025 05:52:28 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v3] In-Reply-To: References: Message-ID: On Sun, 2 Nov 2025 15:48:37 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Move dual to ASSERT only It's worth noting that Graal implements `meet` and `join` separately in its `Stamp` type system, with no "dual" tricks, and apparently no worries about being "symmetrical" or even mentioning the words "lattice" or "centerline". For example: [compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/common/type/AbstractObjectStamp.java](https://github.com/oracle/graal/blob/ee2e127f76d2b2fe39e74aa2994d785d0591b567/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/common/type/AbstractObjectStamp.java) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3555976722 From epeter at openjdk.org Thu Nov 20 06:28:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Nov 2025 06:28:55 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v13] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 13:57:08 GMT, Mikhail Ablakatov wrote: >> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. >> >> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. >> >> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms >> >> >> Fujitsu A64FX (SVE 512-bit): >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Merge commit 'c8679713402186b24608fa4c91397b6a4fd5ebf3' into 8343689 > > Change-Id: Icfa70da585e034774e4ff0f60b8f0c9ce0598399 > - cleanup: remove redundand local variables > > Change-Id: I6fb6a9a7a236537612caa5d53c5516ed2f260bad > - cleanup: remove a trivial switch-case statement > > Change-Id: Ib914ce02ae9d88057cb0b88d4880df6ca64f8184 > - Assert the exact supported VL of 32B in SVE-specific methods > > Change-Id: I8768c653ff563cd8a7a75cd06a6523a9526d15ec > - cleanup: fix long line formatting > > Change-Id: I173e70a2fa9a45f56fe50d4a6b81699665e3433d > - fixup: remove VL asserts in match rules to fix failures on >= 512b SVE platforms > > Change-Id: I721f5a97076d645905ee1716f7d57ec8c90ef6e9 > - Merge branch 'master' into 8343689 > > Change-Id: Iebe758e4c7b3ab0de5f580199f8909e96b8c6274 > - cleanup: start the SVE Integer Misc - Unpredicated section > - Merge branch 'master' > - Address review comments and simplify the implementation > > - remove the loops from gt128b methods making them 256b only > - fixup: missed fnoregs in instruct reduce_mulL_256b > - use an extra vtmp3 reg for the 256b integer method > - remove a no longer needed change in reduce_mul_integral_le128b > - cleanup: unify comments > - ... and 14 more: https://git.openjdk.org/jdk/compare/c8679713...e564d6c1 Might this work have overlap with this recent SVE regression, mostly for long reduction? [JDK-8372153](https://bugs.openjdk.org/browse/JDK-8372153) AArch64: Performance regression in long reduction microbenchmarks after JDK-8340093 It would be good to merge with the changes of [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093), and to show the benchmarks from its PR https://github.com/openjdk/jdk/pull/27803 on various SVE machines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-3556124319 From epeter at openjdk.org Thu Nov 20 06:33:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Nov 2025 06:33:33 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v13] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 13:57:08 GMT, Mikhail Ablakatov wrote: >> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. >> >> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. >> >> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms >> >> >> Fujitsu A64FX (SVE 512-bit): >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Merge commit 'c8679713402186b24608fa4c91397b6a4fd5ebf3' into 8343689 > > Change-Id: Icfa70da585e034774e4ff0f60b8f0c9ce0598399 > - cleanup: remove redundand local variables > > Change-Id: I6fb6a9a7a236537612caa5d53c5516ed2f260bad > - cleanup: remove a trivial switch-case statement > > Change-Id: Ib914ce02ae9d88057cb0b88d4880df6ca64f8184 > - Assert the exact supported VL of 32B in SVE-specific methods > > Change-Id: I8768c653ff563cd8a7a75cd06a6523a9526d15ec > - cleanup: fix long line formatting > > Change-Id: I173e70a2fa9a45f56fe50d4a6b81699665e3433d > - fixup: remove VL asserts in match rules to fix failures on >= 512b SVE platforms > > Change-Id: I721f5a97076d645905ee1716f7d57ec8c90ef6e9 > - Merge branch 'master' into 8343689 > > Change-Id: Iebe758e4c7b3ab0de5f580199f8909e96b8c6274 > - cleanup: start the SVE Integer Misc - Unpredicated section > - Merge branch 'master' > - Address review comments and simplify the implementation > > - remove the loops from gt128b methods making them 256b only > - fixup: missed fnoregs in instruct reduce_mulL_256b > - use an extra vtmp3 reg for the 256b integer method > - remove a no longer needed change in reduce_mul_integral_le128b > - cleanup: unify comments > - ... and 14 more: https://git.openjdk.org/jdk/compare/c8679713...e564d6c1 There is also a report about failing SVE IR tests: https://bugs.openjdk.org/browse/JDK-8371768 This is also due to [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). Since you are touching code in `Matcher::match_rule_supported_auto_vectorization`, this could also have an effect on which cases get vectorized, and hence it may or may not change if those IR rules pass or fail. Sadly, I cannot really help with these performance and test issues, as I do not have access to SVE machines at the moment. But let me know if you have any questions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-3556140713 From fyang at openjdk.org Thu Nov 20 07:46:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Nov 2025 07:46:54 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v5] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: On Tue, 18 Nov 2025 09:27:44 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > replace assert with log_warning @Hamlin-Li : Thanks for the update. I am having another look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28309#issuecomment-3556397859 From shade at openjdk.org Thu Nov 20 07:55:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Nov 2025 07:55:51 GMT Subject: RFR: 8372154: AArch64: Match rule failure with some CompareAndSwap operand shapes [v2] In-Reply-To: References: Message-ID: <6UEFErXO09V1ViH0p7jDLE2tVhnCWRTuOSBrRZpXt8c=.0aeb8c97-4649-451d-87c4-46850a791102@github.com> > See bug for more details. > > Following up on [JDK-8371959](https://bugs.openjdk.org/browse/JDK-8371959) failures, I managed to reproduce the "bad AD" file assert. It is heavily intermittent, and needs hours of runs before we hit the lucky seed, plus [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557) to have broader testing scope. > > It looks like `CastII` node accepts the wider operand type (`iRegI`), which fails to match against narrower type in CAS match rules (`iRegINoSp`). It makes sense to use `iRegINoSp` for destination regs, so that we do not start writing to these special registers. But for operand registers, it makes little sense, IMO. I note that cas.m4-generated CAE/WCAS stubs actually already have the wider `iRegI` for operand types. > > So it looks to me the manual CAS match rules should also use `iRegI`. It would be even better to auto-generate these match rules from M4 stencils, and I tried that, but ultimately decided it obscures the actual bug fix. [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) is dedicated to moving the match rules, hopefully without the semantic change. > > In this change, I dropped `*NoSp` from CAS operand match rules. It fixes the `iRegI` mismatch, and prepares us for harmonizing these rules with the rest of CAS/CAE generated ones. > > Additional testing: > - [x] Linux AArch64 server fastdebug, local `bad AD` assert reproducer no longer fails > - [ ] Linux AArch64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, jcstress run Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8372154-aarch64-cas-operand-match - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28398/files - new: https://git.openjdk.org/jdk/pull/28398/files/fa95aa0f..cb459969 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28398&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28398&range=00-01 Stats: 23201 lines in 445 files changed: 14248 ins; 5807 del; 3146 mod Patch: https://git.openjdk.org/jdk/pull/28398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28398/head:pull/28398 PR: https://git.openjdk.org/jdk/pull/28398 From qamai at openjdk.org Thu Nov 20 08:58:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 20 Nov 2025 08:58:01 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" Message-ID: Hi, This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. To be more specific, for this issue, we have the graph that looks like: ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: ConI -> ConvI2L -> VectorMaskGen After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. Please take a look and leave your thoughts, thanks a lot. ------------- Commit messages: - assert in Load/StoreVectorMaskedNode::Ideal Changes: https://git.openjdk.org/jdk/pull/28410/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28410&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371964 Stats: 11 lines in 1 file changed: 9 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28410/head:pull/28410 PR: https://git.openjdk.org/jdk/pull/28410 From rcastanedalo at openjdk.org Thu Nov 20 09:17:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 20 Nov 2025 09:17:11 GMT Subject: Integrated: 8372097: C2: PhasePrintLevel requires setting PrintPhaseLevel explicitly to be active In-Reply-To: <3lLhDPNgImbNIz-0CGOOmSh8IGA-jwXjVb045IiJw8Q=.e417355f-87ba-4900-a11a-d37f39ab4ec9@github.com> References: <3lLhDPNgImbNIz-0CGOOmSh8IGA-jwXjVb045IiJw8Q=.e417355f-87ba-4900-a11a-d37f39ab4ec9@github.com> Message-ID: On Wed, 19 Nov 2025 08:51:58 GMT, Roberto Casta?eda Lozano wrote: > This changeset aligns the behavior of `PrintPhaseLevel` with its description in `c2_globals.hpp` in the default case of `-XX:PrintPhaseLevel=0`. In particular, after the changeset, running `java -XX:CompileCommand=PhasePrintLevel,*::*,N` does print the phase names corresponding to level `N` for the matched methods, as expected: > > > $ java -Xbatch -XX:CompileCommand=PhasePrintLevel,java.lang.StringLatin1::equals,2 > CompileCommand: PhasePrintLevel java/lang/StringLatin1.equals intx PhasePrintLevel = 2 > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > ... > > > The changeset makes the behavior of the `PrintPhaseLevel` flag and `PhasePrintLevel` compile command consistent with the behavior of the pre-existing, analogous `PrintIdealGraphLevel` flag and `IGVPrintLevel` compile command. The changeset adds tests covering and documenting different combinations of flag and compile-command-specified print levels, and fixes a typo in the flag description in `c2_globals.hpp`. > > **Testing:** tier1-3 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). This pull request has now been integrated. Changeset: 6fc8e499 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/6fc8e4998019a2f3ef05ff3e73a4c855c0366d7a Stats: 112 lines in 3 files changed: 110 ins; 0 del; 2 mod 8372097: C2: PhasePrintLevel requires setting PrintPhaseLevel explicitly to be active Reviewed-by: mhaessig, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28386 From epeter at openjdk.org Thu Nov 20 09:33:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Nov 2025 09:33:38 GMT Subject: RFR: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries [v33] In-Reply-To: <0n4X2pauKWeCJlhNspG3ls5-qlVvNtN8xkJhmdbbjxA=.959cc8d5-bc8d-4cdf-af29-434fdb9cf506@github.com> References: <0n4X2pauKWeCJlhNspG3ls5-qlVvNtN8xkJhmdbbjxA=.959cc8d5-bc8d-4cdf-af29-434fdb9cf506@github.com> Message-ID: On Tue, 18 Nov 2025 08:30:48 GMT, Emanuel Peter wrote: >> I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. >> >> So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. >> >> Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. >> >> **Major issue with Template Framework: lambda vs token order** >> >> The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. >> Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). >> Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. >> >> var testTemplate = Template.make(() -> body( >> ... >> addDataName("name", someType, MUTABLE), >> let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), >> ... >> )); >> >> >> **Two possible solutions: all-in on lambda execution or all-in on tokens** >> >> First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 142 additional commits since the last revision: > > - Merge branch 'master' into JDK-8367531-fix-addDataName > - fix up documentation for Roberto > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano > - Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano > - document hashtag locality for Roberto > - inflate abreviations to full names > - better documentation, inspired by Christian > - Update test/hotspot/jtreg/compiler/lib/template_framework/ScopeToken.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/lib/template_framework/Template.java > > Co-authored-by: Christian Hagedorn > - ... and 132 more: https://git.openjdk.org/jdk/compare/4f1dcaf7...79377438 Testing was very slow, but finally completed :) Thanks again to all the reviewers @chhagedorn @mhaessig @robcasloz ! And also to @galderz for running taking the time to work with the Framework, and reporting difficulties. That gave me the motivation to fix things :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27255#issuecomment-3556861771 From epeter at openjdk.org Thu Nov 20 09:35:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Nov 2025 09:35:49 GMT Subject: Integrated: 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries In-Reply-To: References: Message-ID: <8rnt0ZDT_RXandR74ucrzdC9aPmBiD18M1uVvVMR3UQ=.b17d3946-83e9-40db-9db0-5b53909e2e28@github.com> On Fri, 12 Sep 2025 10:16:10 GMT, Emanuel Peter wrote: > I got some feedback from users of the Template Framework, especially @galderz . And personally, I already was slightly unsatisfied by some of the issues described below, but did not expect it to be as bad as it is. > > So I'm sorry, but I think we need to do a significant re-design. It is now still early enough, and only trivial changes are required for the "real" uses of the framework. Only the framework internal tests require significant changes. > > Many thanks to @galderz for trying out the framework, and reporting the issues. And thanks to @chhagedorn for spending a few hours in an offline meeting discussing the issue. > > **Major issue with Template Framework: lambda vs token order** > > The template rendering involves some state, such as keeping track of hashtag replacements, names and fuel cost. > Some methods have side-effects (`addDataName`, `let`, ...) and others are simple queries (`sample`, ...). > Sadly, the first version of the template framework was not very consistent, and created tokens (deferred evaluation, during token evaluation) for some, and for others it queried the state and returned the result immediately (during lambda execution). One nasty consequence is that an immediately returning query can "float" above a state affecting token. For example, `addDataName` generated a token (so that we know if it is to be added for the template frame or a hook anchoring frame), but answered sampling queries immediately (because that means we can use the returned value immediately and make decisions based on it immediately, which is nice). Looking at the example below, this had the confusing result that `addDataName` only generates a token at first, then `sample` does not have that name available yet, and only later during token evaluation is the name actually added. > > var testTemplate = Template.make(() -> body( > ... > addDataName("name", someType, MUTABLE), > let("name", dataNames(MUTABLE).exactOf(someType).sample().name()), > ... > )); > > > **Two possible solutions: all-in on lambda execution or all-in on tokens** > > First, I thought I want to go all-in on lambda execution, and have everything have immediate effect and return results immediately. This would have the nice effect that the user feels like they are directly in control of the execution order. But I did not find a good way without exposing too many internals to the user, or getting rid of the nice "token lists" we currently have inside Templates (the list is directly concatenated). Look at the f... This pull request has now been integrated. Changeset: b41146cd Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/b41146cd1e5d412f69b893bfb2fd65b6206bb0d2 Stats: 4295 lines in 41 files changed: 3238 ins; 269 del; 788 mod 8367531: Template Framework: use scopes and tokens instead of misbehaving immediate-return-queries Co-authored-by: Christian Hagedorn Reviewed-by: rcastanedalo, mhaessig, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/27255 From chagedorn at openjdk.org Thu Nov 20 09:57:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Nov 2025 09:57:09 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 08:42:46 GMT, Quan Anh Mai wrote: > Hi, > > This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. > > To be more specific, for this issue, we have the graph that looks like: > > ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen > > with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: > > ConI -> ConvI2L -> VectorMaskGen > > After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. > > Please take a look and leave your thoughts, thanks a lot. src/hotspot/share/opto/vectornode.cpp line 1152: > 1150: // Dead node, should go away > 1151: return nullptr; > 1152: } Do I understand correctly that this widening/removal of the `CastLL` node is happening on an actual dead path that is going to be removed anyway? It sounds like this problem is specific to post loop opts IGVN phases where we are allowed to widen `CastII/LL` nodes. Could we assert that this bailout only happens after post loop opts? Apart from that, I think your fix is reasonable. Were you able to also extract a reproducer? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28410#discussion_r2545153008 From roland at openjdk.org Thu Nov 20 10:03:40 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 20 Nov 2025 10:03:40 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 12:19:02 GMT, Christian Hagedorn wrote: > ### Strong Connection between Template Assertion Predicate and Counted Loop > In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. > > #### Maintaining this Property > In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 > > All other opaque nodes are removed. > > ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes > As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 > > ### Violating the Additional Verification with `-XX:+StressLoopBackedge` > In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: > > image > > After duplicate backedge, the Template Assertion Predicates are now at the outer non-counted `Loop`: > > References: <6UEFErXO09V1ViH0p7jDLE2tVhnCWRTuOSBrRZpXt8c=.0aeb8c97-4649-451d-87c4-46850a791102@github.com> Message-ID: On Thu, 20 Nov 2025 07:55:51 GMT, Aleksey Shipilev wrote: >> See bug for more details. >> >> Following up on [JDK-8371959](https://bugs.openjdk.org/browse/JDK-8371959) failures, I managed to reproduce the "bad AD" file assert. It is heavily intermittent, and needs hours of runs before we hit the lucky seed, plus [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557) to have broader testing scope. >> >> It looks like `CastII` node accepts the wider operand type (`iRegI`), which fails to match against narrower type in CAS match rules (`iRegINoSp`). It makes sense to use `iRegINoSp` for destination regs, so that we do not start writing to these special registers. But for operand registers, it makes little sense, IMO. I note that cas.m4-generated CAE/WCAS stubs actually already have the wider `iRegI` for operand types. >> >> So it looks to me the manual CAS match rules should also use `iRegI`. It would be even better to auto-generate these match rules from M4 stencils, and I tried that, but ultimately decided it obscures the actual bug fix. [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) is dedicated to moving the match rules, hopefully without the semantic change. >> >> In this change, I dropped `*NoSp` from CAS operand match rules. It fixes the `iRegI` mismatch, and prepares us for harmonizing these rules with the rest of CAS/CAE generated ones. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, local `bad AD` assert reproducer no longer fails >> - [x] Linux AArch64 server fastdebug, `all` >> - [ ] Linux AArch64 server fastdebug, jcstress run > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8372154-aarch64-cas-operand-match > - Fix Yes, good catch. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28398#pullrequestreview-3486828474 From qamai at openjdk.org Thu Nov 20 10:14:07 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 20 Nov 2025 10:14:07 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into typejoin - Move dual to ASSERT only - Keep old version for verification - whitespace - Reimplement Type::join ------------- Changes: https://git.openjdk.org/jdk/pull/28051/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=03 Stats: 1885 lines in 7 files changed: 1014 ins; 479 del; 392 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051 From epeter at openjdk.org Thu Nov 20 10:19:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Nov 2025 10:19:19 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v4] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 19:04:56 GMT, Aleksey Shipilev wrote: >> I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. >> >> At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. >> >> It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. >> >> Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails >> - [x] Linux x86_64 server fastdebug, `all` tests pass >> - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8371581-ccp-spooky-nodes > - More comments > - More restrictive CmpP check > - Tighten up comments and signatures > - Do Value() once > - Fix Tests seem to be passing. @shipilev Thanks for working on this :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28288#pullrequestreview-3486870241 From rrich at openjdk.org Thu Nov 20 10:21:34 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 20 Nov 2025 10:21:34 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: References: Message-ID: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> > With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. > > It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. > > The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. > > The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. > Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. > > So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) > > There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. > > Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. > > ##### Testing with fastdebug builds on AARCH64 and PPC64: > > hotspot_vector_1 > hotspot_vector_2 > jdk_vector > jdk_vector_sanity > > ##### The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: > > compiler/vectorapi/VectorRearrangeTest.java > jdk/incubator/vector/Byte128VectorLoadStoreTests.java > jdk/incubator/vector/Double256VectorLoadStoreTests.java > jdk/incubator/vector/Float128VectorTests.java > jdk/incubator/vector/Long256VectorLoadStoreTests.java > jdk/incubator/vector/Short128VectorLoadStoreTests.java > jdk/incubator/vector/Vector64ConversionTests.java Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' - Exclude IR check on riscv with rvv - Enhance comment - Fix OptoAssembly for Power 8 - PPC: OptoAssembly for vector spilling - Assert aligned sp offsets in vector spilling - Delete TMP and !UseNewCode - Align Matcher::_new_SP for better vector spilling - TMP: trace unaligned vector spilling - Add test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27969/files - new: https://git.openjdk.org/jdk/pull/27969/files/73512366..40077745 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27969&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27969&range=02-03 Stats: 270394 lines in 2359 files changed: 173038 ins; 58553 del; 38803 mod Patch: https://git.openjdk.org/jdk/pull/27969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27969/head:pull/27969 PR: https://git.openjdk.org/jdk/pull/27969 From qamai at openjdk.org Thu Nov 20 10:23:11 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 20 Nov 2025 10:23:11 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v2] In-Reply-To: References: Message-ID: > Hi, > > This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. > > To be more specific, for this issue, we have the graph that looks like: > > ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen > > with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: > > ConI -> ConvI2L -> VectorMaskGen > > After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: assert post_loop_opts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28410/files - new: https://git.openjdk.org/jdk/pull/28410/files/c88f5b47..ecaead7f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28410&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28410&range=00-01 Stats: 7 lines in 1 file changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28410/head:pull/28410 PR: https://git.openjdk.org/jdk/pull/28410 From qamai at openjdk.org Thu Nov 20 10:23:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 20 Nov 2025 10:23:14 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v2] In-Reply-To: References: Message-ID: <4MItF4KodwK0fPsG1hcNYtkOA3DUbaUZ3HixYQYs9iI=.2a3835a8-cec6-4d83-9f3e-2e049dc24d9c@github.com> On Thu, 20 Nov 2025 09:46:31 GMT, Christian Hagedorn wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> assert post_loop_opts > > src/hotspot/share/opto/vectornode.cpp line 1152: > >> 1150: // Dead node, should go away >> 1151: return nullptr; >> 1152: } > > Do I understand correctly that this widening/removal of the `CastLL` node is happening on an actual dead path that is going to be removed anyway? > > It sounds like this problem is specific to post loop opts IGVN phases where we are allowed to widen `CastII/LL` nodes. Could we assert that this bailout only happens after post loop opts? > > Apart from that, I think your fix is reasonable. Were you able to also extract a reproducer? Done, running `compiler/arraycopy/TestArrayCopyDisjoint.java` with `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:VerifyConstraintCasts=1 -XX:+StressLoopPeeling` encounters this issue. Do you think it is necessary to add a separate case for that test, then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28410#discussion_r2545301464 From chagedorn at openjdk.org Thu Nov 20 10:29:21 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Nov 2025 10:29:21 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 12:19:02 GMT, Christian Hagedorn wrote: > ### Strong Connection between Template Assertion Predicate and Counted Loop > In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. > > #### Maintaining this Property > In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 > > All other opaque nodes are removed. > > ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes > As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 > > ### Violating the Additional Verification with `-XX:+StressLoopBackedge` > In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: > > image > > After duplicate backedge, the Template Assertion Predicates are now at the outer non-counted `Loop`: > > Do we need to clone the template predicates for correctness as well? Maybe pre/main/post loops are created next then unrolling happens? Yes, I think so. We could probably come up with some case where we would also hit correctness issues down the road with more loop opts to be applied similar to the cases we have in `TestAssertionPredicates.java`. > Otherwise another way to fix this may have been to replace the CountedLoop with a Loop and let the next round of loop opts create a new CountedLoop Which would mean that this is not really an option: When we convert back to a `Loop`, we will remove the Template Assertion Predicates. > Where do the template predicates branch to when they fail? A predicate uncommon trap? Not sure if I understand your question. The Template Assertion Predicates themselves are never executed and just serve as templates to create Initialized Assertion Predicates from which will result in a halt if they fail at runtime. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28389#issuecomment-3557123650 From adinn at openjdk.org Thu Nov 20 10:39:53 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 20 Nov 2025 10:39:53 GMT Subject: RFR: 8372154: AArch64: Match rule failure with some CompareAndSwap operand shapes [v2] In-Reply-To: <6UEFErXO09V1ViH0p7jDLE2tVhnCWRTuOSBrRZpXt8c=.0aeb8c97-4649-451d-87c4-46850a791102@github.com> References: <6UEFErXO09V1ViH0p7jDLE2tVhnCWRTuOSBrRZpXt8c=.0aeb8c97-4649-451d-87c4-46850a791102@github.com> Message-ID: On Thu, 20 Nov 2025 07:55:51 GMT, Aleksey Shipilev wrote: >> See bug for more details. >> >> Following up on [JDK-8371959](https://bugs.openjdk.org/browse/JDK-8371959) failures, I managed to reproduce the "bad AD" file assert. It is heavily intermittent, and needs hours of runs before we hit the lucky seed, plus [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557) to have broader testing scope. >> >> It looks like `CastII` node accepts the wider operand type (`iRegI`), which fails to match against narrower type in CAS match rules (`iRegINoSp`). It makes sense to use `iRegINoSp` for destination regs, so that we do not start writing to these special registers. But for operand registers, it makes little sense, IMO. I note that cas.m4-generated CAE/WCAS stubs actually already have the wider `iRegI` for operand types. >> >> So it looks to me the manual CAS match rules should also use `iRegI`. It would be even better to auto-generate these match rules from M4 stencils, and I tried that, but ultimately decided it obscures the actual bug fix. [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) is dedicated to moving the match rules, hopefully without the semantic change. >> >> In this change, I dropped `*NoSp` from CAS operand match rules. It fixes the `iRegI` mismatch, and prepares us for harmonizing these rules with the rest of CAS/CAE generated ones. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, local `bad AD` assert reproducer no longer fails >> - [x] Linux AArch64 server fastdebug, `all` >> - [ ] Linux AArch64 server fastdebug, jcstress run > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8372154-aarch64-cas-operand-match > - Fix Marked as reviewed by adinn (Reviewer). Ah, got it. I was confused by the explanation in the description. It's the instruction encodings that are using the wrong input type restrictions (iRegXNoSp). The match rules specify the wider register classes (iRegX). Well spotted! ------------- PR Review: https://git.openjdk.org/jdk/pull/28398#pullrequestreview-3487004950 PR Comment: https://git.openjdk.org/jdk/pull/28398#issuecomment-3557185156 From qamai at openjdk.org Thu Nov 20 10:39:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 20 Nov 2025 10:39:55 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:14:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into typejoin > - Move dual to ASSERT only > - Keep old version for verification > - whitespace > - Reimplement Type::join May someone take a look at this, please. Further note: A fatal drawback of the current design is that it is impossible to have any invariant for the internal structure of the `Type` objects. This explodes the space of possibilities and makes it impossible to reason about the soundness of multiple operations. An extreme example is Valhalla, where each `TypeAryPtr` carries 64 variants with different flatness, null restriction, and atomicity. In reality, only a small portion of those variants are really meaningful (`flat` arrays cannot be `not_flat` at the same time, and arrays that are `not_flat` or `not_null_free` must be `atomic`, etc), but these invariants are impossible to maintain in the presence of `dual` types. A small example is the computation of `TypeAryKlassPtr::xdual`: https://github.com/openjdk/valhalla/blob/0e99563e6b3c76349271f58b93c2d857005fca5d/src/hotspot/share/opto/type.cpp#L7144 https://github.com/openjdk/valhalla/blob/0e99563e6b3c76349271f58b93c2d857005fca5d/src/hotspot/share/opto/type.hpp#L2077 You may ask yourself why `not_flat` and `not_null_free` are inverted by `flat` and `null_free` are not. I'm not sure, either, and looking at the implementation of `TypeAryKlassPtr::xmeet` and `TypeAryPtr::xmeet` has torn my brain apart. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3557187070 From qamai at openjdk.org Thu Nov 20 10:56:03 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 20 Nov 2025 10:56:03 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v3] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 13:16:05 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > IgnoreUnrecognizedVMOptions I'm not sure if I'm correct, but I think speculative types themselves may not be consistent. For example, if they are consistent, then you will expect that the profiled types of the return values of a method `a` when calling from method `b` would be a subset of the profiled types of the returned values of `a` in general. However, this may not be the case, as we can ask for the second information first, then another type is introduced, then suddenly a method seems not to return a type `C`, but it does seem to return `C` if calling from `b`. As a result, maybe we can abandon trying to verify the correctness of speculative type computations. Additionally, in the test case, the speculative type being empty is correct, the path is speculatively unreachable, maybe we can use that information to cut off the branches, simplify the CFG for better compilation? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3557287250 From adinn at openjdk.org Thu Nov 20 10:59:38 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 20 Nov 2025 10:59:38 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 05:48:07 GMT, Dean Long wrote: > It's worth noting that Graal implements meet and join separately in its Stamp type system, with no "dual" tricks, and apparently no worries about being "symmetrical" or even mentioning the words "lattice" or "centerline". ... It's also worth noting that the theory on which most of the C2 optimization is based stresses heavily the need for the type hierarchy to be a 'well-formed' lattice and relies on that to ensure that the join or meet for any two types is both correct and as strong as possible. The use of dual types to derive the join/meet is not critical and is not computationally optimal but it does enable a uniform computation model which provides the strength guarantee. That strength guarantee depends on the lattice obeying certain well-formedness constraints which are not always met in the C2 type system. I recall the issue was that for some pairings there is not always a unique strongest type for the meet (or join?) of two OopPtr types (although there will always be a weaker Ptr type that is the parent of all such strongest valid types). This means that in some cases C2 misses opportunities to perform some optimizations. Years ago Cliff Click wrote a series of articles describing how the lattice worked which explained how this possibility arose but it disappeared from the interwebs quite a while back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3557317590 From shade at openjdk.org Thu Nov 20 11:12:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Nov 2025 11:12:06 GMT Subject: RFR: 8371768: AArch64: test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java fails on SVE after JDK-8340093 Message-ID: Looks like the test should be more resilient with UseSVE > 0, which _can_ vectorise. It does not look all that reliable to me to failOn when vectorization actually happens. So I dropped some non-arch-specific rules, and amended AArch64-specific rules for UseSVE. Testing: - [x] Linux AArch64 server fastdebug, affected test on machine with UseSVE=1 by default - [x] Linux AArch64 server fastdebug, affected test on machine with UseSVE=0 overridden ------------- Commit messages: - A bit of mop up - UseSVE works Changes: https://git.openjdk.org/jdk/pull/28423/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28423&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371768 Stats: 163 lines in 1 file changed: 16 ins; 89 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/28423.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28423/head:pull/28423 PR: https://git.openjdk.org/jdk/pull/28423 From qamai at openjdk.org Thu Nov 20 11:15:17 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 20 Nov 2025 11:15:17 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:57:11 GMT, Andrew Dinn wrote: >> It's worth noting that Graal implements `meet` and `join` separately in its `Stamp` type system, with no "dual" tricks, and apparently no worries about being "symmetrical" or even mentioning the words "lattice" or "centerline". For example: >> [compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/common/type/AbstractObjectStamp.java](https://github.com/oracle/graal/blob/ee2e127f76d2b2fe39e74aa2994d785d0591b567/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/common/type/AbstractObjectStamp.java) > >> It's worth noting that Graal implements meet and join separately in its Stamp type system, with no "dual" tricks, and apparently no worries about being "symmetrical" or even mentioning the words "lattice" or "centerline". ... > > It's also worth noting that the theory on which most of the C2 optimization is based stresses heavily the need for the type hierarchy to be a 'well-formed' lattice and relies on that to ensure that the join or meet for any two types is both correct and as strong as possible. The use of dual types to derive the join/meet is not critical and is not computationally optimal but it does enable a uniform computation model which provides the strength guarantee. > > That strength guarantee depends on the lattice obeying certain well-formedness constraints which are not always met in the C2 type system. I recall the issue was that for some pairings there is not always a unique strongest type for the meet (or join?) of two OopPtr types (although there will always be a weaker Ptr type that is the parent of all such strongest valid types). This means that in some cases C2 misses opportunities to perform some optimizations. Years ago Cliff Click wrote a series of articles describing how the lattice worked which explained how this possibility arose but it disappeared from the interwebs quite a while back. @adinn Oh, I think I kind of understand. For example, the issue I found with the current implementation when joining 2 nullable types. If we try to join `BotPTR: Integer` and `BotPTR: Float`, we will try to meet `TopPTR: Integer` and `TopPTR: Float`. In this case, we have: - `TopPTR: Integer < Null` and `TopPTR: Float < Null` or - `TopPTR: Integer < AnyNull: Integer < NotNull: Integer < NotNull: Number` and `TopPTR: Float < AnyNull: Float < NotNull: Float < NotNull: Number`. As a result, there are 2 LCAs for `TopPTR: Integer` and `TopPTR: Float` which are `Null` and `NotNull: Number`, and they do not subtype each other. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3557416742 From mhaessig at openjdk.org Thu Nov 20 11:33:29 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 20 Nov 2025 11:33:29 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:57:11 GMT, Andrew Dinn wrote: > Years ago Cliff Click wrote a series of articles describing how the lattice worked which explained how this possibility arose but it disappeared from the interwebs quite a while back. Luckily, it was archived: https://web.archive.org/web/20160806073716/http://www.cliffc.org/blog/2012/02/12/too-much-theory/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3557499064 From chagedorn at openjdk.org Thu Nov 20 12:11:45 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Nov 2025 12:11:45 GMT Subject: RFR: 8349835: C2: simplify IGV property printing [v6] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: On Mon, 17 Nov 2025 10:27:05 GMT, Saranya Natarajan wrote: >> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). >> >> ### Fix >> Implemented the suggested refactoring. >> >> ### Testing >> Github Actions, Tier 1-3 > > Saranya Natarajan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into JDK-8349835 > - testing and review on moving code to cpp > - Merge branch 'master' into JDK-8349835 > - addressing review comments#2 > - fixing test failure > - addressing review comments > - changing int to bool in a struct > - fix to failing test > - initial fix Thanks for the updates! Apart from Roberto's suggestions, it looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26902#pullrequestreview-3487497803 From chagedorn at openjdk.org Thu Nov 20 12:15:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Nov 2025 12:15:53 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 08:30:56 GMT, Galder Zamarre?o wrote: > Trivial cleanup to move tests out of a test class whose description does not match these tests Looks good to me, too! test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxLongLoopBarrier.java line 2: > 1: /* > 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Not sure if this should be 2025 instead even though the code was added in 2024. No strong opinion, though. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28385#pullrequestreview-3487517659 PR Review Comment: https://git.openjdk.org/jdk/pull/28385#discussion_r2545798620 From chagedorn at openjdk.org Thu Nov 20 12:56:20 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Nov 2025 12:56:20 GMT Subject: RFR: 8371789: C2: More explicit dump results for TypePtr In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 11:50:46 GMT, Quan Anh Mai wrote: > Hi, > > This patch tries to clear up the dumped information of `TypePtr` and its subclasses. It makes it immediately clear the states of the `Type` object without us having to look into the implementation of `dump2`, for example, to know that the absence of `:NotNull` implies that it is a `BotPTR`. > > Please take a look and kindly review, thanks a lot. I think that is a good improvement and makes the output much clearer (especially to detect whether it's a klass pointer). I also like that we are now more explicit with previously hidden defaults. A concern could be that it's now too verbose. But personally, I don't think that it matter too much when debugging. And if we find that we are struggling with that, we could still come back and tighten the output again. Once this is in Valhalla, we could do a similar clean-up there (there is already an RFE filed for that: [JDK-8332036](https://bugs.openjdk.org/browse/JDK-8332036)). If you are up for it, feel free to take this over as well :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28292#pullrequestreview-3487725738 From chagedorn at openjdk.org Thu Nov 20 13:03:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Nov 2025 13:03:51 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v2] In-Reply-To: <4MItF4KodwK0fPsG1hcNYtkOA3DUbaUZ3HixYQYs9iI=.2a3835a8-cec6-4d83-9f3e-2e049dc24d9c@github.com> References: <4MItF4KodwK0fPsG1hcNYtkOA3DUbaUZ3HixYQYs9iI=.2a3835a8-cec6-4d83-9f3e-2e049dc24d9c@github.com> Message-ID: On Thu, 20 Nov 2025 10:18:40 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/vectornode.cpp line 1152: >> >>> 1150: // Dead node, should go away >>> 1151: return nullptr; >>> 1152: } >> >> Do I understand correctly that this widening/removal of the `CastLL` node is happening on an actual dead path that is going to be removed anyway? >> >> It sounds like this problem is specific to post loop opts IGVN phases where we are allowed to widen `CastII/LL` nodes. Could we assert that this bailout only happens after post loop opts? >> >> Apart from that, I think your fix is reasonable. Were you able to also extract a reproducer? > > Done, running `compiler/arraycopy/TestArrayCopyDisjoint.java` with `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:VerifyConstraintCasts=1 -XX:+StressLoopPeeling` encounters this issue. Do you think it is necessary to add a separate case for that test, then? Thanks for the update! If it's a short running test/config, then I think it would be good to have this extra config to cover the changes of this patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28410#discussion_r2545981102 From chagedorn at openjdk.org Thu Nov 20 13:03:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Nov 2025 13:03:56 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v2] In-Reply-To: References: Message-ID: <2zTOU6YwGm8T0V6UeSG26-ZCr6PYlJtyb9QrOC99ZKo=.c44428f0-2609-47fc-8388-5dd78c16ed9e@github.com> On Thu, 20 Nov 2025 10:23:11 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > assert post_loop_opts src/hotspot/share/opto/vectornode.cpp line 1154: > 1152: // into a constant that is outside the range of the removed cast, we may encounter it here. > 1153: // This should be a dead node then. > 1154: assert(Compile::current()->post_loop_opts_phase(), ""); For good measure, you should re-add the "Unexpected load size" message. Same below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28410#discussion_r2545983208 From duke at openjdk.org Thu Nov 20 14:21:13 2025 From: duke at openjdk.org (Ruben) Date: Thu, 20 Nov 2025 14:21:13 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Wed, 19 Nov 2025 19:37:53 GMT, Andrew Haley wrote: > It would be best to concentrate on eliminating trailing DMB from C1 Sure, I will consider the C2 to be outside the scope. Similarly, there are references for `atomic_addal` and `atomic_addalw` in the C2. I initially expected these to be used in codegen for the above test, however it doesn't happen - apparently because the function isn't an intrinsic candidate. https://github.com/openjdk/jdk/blob/f125c76f5b53d90a09f58c22d6def7d843feaa50/src/java.base/share/classes/jdk/internal/misc/Unsafe.java#L2503-L2511 > because there are a few cases that should be handled. I haven't yet looked in details into other cases in C1 - presumably, an optimization similar to https://github.com/openjdk/jdk/pull/26748 should be possible for stores; this would need further investigation. However, is there anything else that should be handled within this pull request? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2546241545 From snatarajan at openjdk.org Thu Nov 20 14:39:36 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Thu, 20 Nov 2025 14:39:36 GMT Subject: RFR: 8349835: C2: Simplify IGV property printing [v7] In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: <2ypfeBvd-4QqRofRuG4_ticaKt_YclSHDV6nTWRPA7Y=.93db9fae-3263-4a7a-8bfa-2405cda18b2e@github.com> > The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). > > ### Fix > Implemented the suggested refactoring. > > ### Testing > Github Actions, Tier 1-3 Saranya Natarajan has updated the pull request incrementally with two additional commits since the last revision: - merge conflict - addressing review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26902/files - new: https://git.openjdk.org/jdk/pull/26902/files/38cfdefb..85f3495c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=05-06 Stats: 14 lines in 2 files changed: 3 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/26902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26902/head:pull/26902 PR: https://git.openjdk.org/jdk/pull/26902 From snatarajan at openjdk.org Thu Nov 20 14:42:30 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Thu, 20 Nov 2025 14:42:30 GMT Subject: RFR: 8349835: C2: Simplify IGV property printing [v6] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: On Wed, 19 Nov 2025 13:26:06 GMT, Roberto Casta?eda Lozano wrote: >> Saranya Natarajan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - Merge branch 'master' into JDK-8349835 >> - testing and review on moving code to cpp >> - Merge branch 'master' into JDK-8349835 >> - addressing review comments#2 >> - fixing test failure >> - addressing review comments >> - changing int to bool in a struct >> - fix to failing test >> - initial fix > > Thanks for cleaning up this code and testing it thoroughly, Saranya. The changes look good to me overall, I just have a few minor suggestions. > There may be more properties that could be printed using the new abstraction, but I am OK with addressing those separately. Thank you for the review @robcasloz. I have now addressed these and filed [JDK-8372273](https://bugs.openjdk.org/browse/JDK-8372273) to address the rest of the node properties. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26902#issuecomment-3558401901 From shade at openjdk.org Thu Nov 20 14:52:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Nov 2025 14:52:03 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Wed, 24 Sep 2025 13:08:14 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Drop atomic counters > - Initial version Shoo bots, still looking for reviewers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3558454577 From aph at openjdk.org Thu Nov 20 15:29:25 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 20 Nov 2025 15:29:25 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v5] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Thu, 20 Nov 2025 14:17:52 GMT, Ruben wrote: > > It would be best to concentrate on eliminating trailing DMB from C1 > > Sure, I will consider the C2 to be outside the scope. > > Similarly, there are references for `atomic_addal` and `atomic_addalw` in the C2. I initially expected these to be used in codegen for the above test, however it doesn't happen - apparently because the function isn't an intrinsic candidate. > > https://github.com/openjdk/jdk/blob/f125c76f5b53d90a09f58c22d6def7d843feaa50/src/java.base/share/classes/jdk/internal/misc/Unsafe.java#L2503-L2511 Mmm, perhaps. > > because there are a few cases that should be handled. > > I haven't yet looked in details into other cases in C1 - presumably, an optimization similar to #26748 should be possible for stores; this would need further investigation. > > However, is there anything else that should be handled within this pull request? I don't think so. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26000#discussion_r2546525113 From dlunden at openjdk.org Thu Nov 20 15:34:57 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 20 Nov 2025 15:34:57 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v13] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 18:12:22 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Thanks @jatin-bhateja, looks good. Testing also looks good. Please wait for approval from @iwanowww before integrating. I would also like to run sanity testing one last time just before you integrate. ------------- Marked as reviewed by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3488499766 From shade at openjdk.org Thu Nov 20 15:34:57 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Nov 2025 15:34:57 GMT Subject: RFR: 8372154: AArch64: Match rule failure with some CompareAndSwap operand shapes [v2] In-Reply-To: <6UEFErXO09V1ViH0p7jDLE2tVhnCWRTuOSBrRZpXt8c=.0aeb8c97-4649-451d-87c4-46850a791102@github.com> References: <6UEFErXO09V1ViH0p7jDLE2tVhnCWRTuOSBrRZpXt8c=.0aeb8c97-4649-451d-87c4-46850a791102@github.com> Message-ID: <7Ez-dTPbYFvVZ2ChiqBH4x9Sdy0cBpQFXTaxAIAoxso=.e6c0352c-9c57-46f3-869c-fd0d953015f0@github.com> On Thu, 20 Nov 2025 07:55:51 GMT, Aleksey Shipilev wrote: >> See bug for more details. >> >> Following up on [JDK-8371959](https://bugs.openjdk.org/browse/JDK-8371959) failures, I managed to reproduce the "bad AD" file assert. It is heavily intermittent, and needs hours of runs before we hit the lucky seed, plus [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557) to have broader testing scope. >> >> It looks like `CastII` node accepts the wider operand type (`iRegI`), which fails to match against narrower type in CAS match rules (`iRegINoSp`). It makes sense to use `iRegINoSp` for destination regs, so that we do not start writing to these special registers. But for operand registers, it makes little sense, IMO. I note that cas.m4-generated CAE/WCAS stubs actually already have the wider `iRegI` for operand types. >> >> So it looks to me the manual CAS match rules should also use `iRegI`. It would be even better to auto-generate these match rules from M4 stencils, and I tried that, but ultimately decided it obscures the actual bug fix. [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) is dedicated to moving the match rules, hopefully without the semantic change. >> >> In this change, I dropped `*NoSp` from CAS operand match rules. It fixes the `iRegI` mismatch, and prepares us for harmonizing these rules with the rest of CAS/CAE generated ones. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, local `bad AD` assert reproducer no longer fails >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, jcstress run > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8372154-aarch64-cas-operand-match > - Fix Thanks for reviews! 8-hour jcstress run turns out clean on Graviton 3 machine. If there are no other comments, I'll integrate soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28398#issuecomment-3558678917 From krk at openjdk.org Thu Nov 20 15:52:24 2025 From: krk at openjdk.org (Kerem Kat) Date: Thu, 20 Nov 2025 15:52:24 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist Message-ID: Do not try to replace `fallthrough_memproj` when it is null, fixes crash. Test case is simplified from the ticket. Verified that the case crashes without the fix. ------------- Commit messages: - copyright format fix? - 8370502: C2: segfault while adding node to IGVN worklist Changes: https://git.openjdk.org/jdk/pull/28432/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370502 Stats: 68 lines in 2 files changed: 56 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From mhaessig at openjdk.org Thu Nov 20 16:33:41 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 20 Nov 2025 16:33:41 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist In-Reply-To: References: Message-ID: <4X6V2ONReceYNZ7I_zUlK7VxZflXz0vJzhuUWnZEGoM=.3a5a263b-4c0a-4c20-8f65-52e7f09bb03f@github.com> On Thu, 20 Nov 2025 14:18:39 GMT, Kerem Kat wrote: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Thank you for working on this, @krk. And nice job reducing the test further! I have a few questions and style comments below. src/hotspot/share/opto/macro.cpp line 2314: > 2312: > 2313: Node* ctrl = unlock->in(TypeFunc::Control); > 2314: Node* mem = unlock->in(TypeFunc::Memory); Do I understand correctly, that `mem` is never `nullptr` because `UnlockNode` is a subclass of a `SafepointNode` which always has a memory input? src/hotspot/share/opto/macro.cpp line 2323: > 2321: > 2322: // Make the merge point > 2323: Node *region = new RegionNode(3); Suggestion: Node* region = new RegionNode(3); Nit: For better or for worse, this is how we denote pointers in hotspot. src/hotspot/share/opto/macro.cpp line 2352: > 2350: _igvn.replace_node(_callprojs.fallthrough_proj, region); > 2351: > 2352: if (_callprojs.fallthrough_memproj != nullptr) { Why do we not have to hook up the memory input to the fall through projection if it does not exist in the first place? src/hotspot/share/opto/macro.cpp line 2355: > 2353: // create a Phi for the memory state > 2354: Node *mem_phi = new PhiNode( region, Type::MEMORY, TypeRawPtr::BOTTOM); > 2355: Node *memproj = transform_later(new ProjNode(call, TypeFunc::Memory)); Suggestion: Node* mem_phi = new PhiNode( region, Type::MEMORY, TypeRawPtr::BOTTOM); Node* memproj = transform_later(new ProjNode(call, TypeFunc::Memory)); test/hotspot/jtreg/compiler/c2/Test8370502.java line 1: > 1: /* The indentation in Java files should be 4 spaces. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28432#pullrequestreview-3488659451 PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2546754846 PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2546677211 PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2546766726 PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2546678544 PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2546690640 From krk at openjdk.org Thu Nov 20 16:40:49 2025 From: krk at openjdk.org (Kerem Kat) Date: Thu, 20 Nov 2025 16:40:49 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v2] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request incrementally with three additional commits since the last revision: - fix test spacing - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/80111b97..e8699d79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=00-01 Stats: 20 lines in 2 files changed: 1 ins; 1 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From krk at openjdk.org Thu Nov 20 16:40:51 2025 From: krk at openjdk.org (Kerem Kat) Date: Thu, 20 Nov 2025 16:40:51 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v2] In-Reply-To: <4X6V2ONReceYNZ7I_zUlK7VxZflXz0vJzhuUWnZEGoM=.3a5a263b-4c0a-4c20-8f65-52e7f09bb03f@github.com> References: <4X6V2ONReceYNZ7I_zUlK7VxZflXz0vJzhuUWnZEGoM=.3a5a263b-4c0a-4c20-8f65-52e7f09bb03f@github.com> Message-ID: On Thu, 20 Nov 2025 16:09:23 GMT, Manuel H?ssig wrote: >> Kerem Kat has updated the pull request incrementally with three additional commits since the last revision: >> >> - fix test spacing >> - Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Manuel H?ssig >> - Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Manuel H?ssig > > test/hotspot/jtreg/compiler/c2/Test8370502.java line 1: > >> 1: /* > > The indentation in Java files should be 4 spaces. I followed another test's format style. Now I can see that there is a mixture of 2 and 4 space `Test*.java` files. Fixing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2546785517 From duke at openjdk.org Thu Nov 20 16:44:09 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 20 Nov 2025 16:44:09 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v5] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 15:40:50 GMT, Hannes Greule wrote: >> Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestion from @eme64 >> >> Co-authored-by: Emanuel Peter > > src/hotspot/share/opto/mulnode.cpp line 641: > >> 639: // Both are constant, directly computed the result >> 640: if (longType1->is_con() && longType2->is_con()) { >> 641: jlong highResult = multiply_high_unsigned(longType1->get_con(), longType2->get_con()); > > We are going from an unsigned value to a signed here, I think this is implementation-defined? Maybe it's better to use julong and `TypeLong::make_or_top(TypeIntPrototype{{min_jlong, max_jlong}, {highResult, highResult}, {0, 0}})`? > > (It might also make sense to have a helper function like `TypeLong::make_unsigned` for that, but I'll let others comment on whether that should be done separately) I think TypeLong::make is doing the work your mentioned, do we need another function to do it? const TypeLong* TypeLong::make(jlong con) { julong ucon = con; return (new TypeLong(TypeIntPrototype{{con, con}, {ucon, ucon}, {~ucon, ucon}}, WidenMin, false))->hashcons()->is_long(); } > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1533: > >> 1531: public static final String MUL_HI_L = PREFIX + "MUL_HI_L" + POSTFIX; >> 1532: static { >> 1533: superWordNodes(MUL_HI_L, "MulHiL"); > > This looks wrong, and I think it might make more sense to move these definitions closer to MUL_L. You are right, I will use beforeMatchingNameRegex to instead of superWordNodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2546804962 PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2546811683 From shade at openjdk.org Thu Nov 20 16:56:37 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Nov 2025 16:56:37 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test Message-ID: As you can see in the report, current matchers rely heavily on mainline C2 implementation to match specific stores. This fails when we try to backport MergeStores fixes to 25u. It would be better to relax the matchers a bit to cater for 25u backports, and also making test more robust for future MergeStores changes, if any. Additional testing: - [x] Linux x86_64 server fastdebug, mainline, affected test, 100x - [x] Linux AArch64 server fastdebug, mainline, affected test, 100x - [x] Linux x86_64 server fastdebug, jdk25u, affected test, 100x ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/28437/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28437&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372266 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28437.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28437/head:pull/28437 PR: https://git.openjdk.org/jdk/pull/28437 From roland at openjdk.org Thu Nov 20 16:59:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 20 Nov 2025 16:59:55 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:53:03 GMT, Quan Anh Mai wrote: > Additionally, in the test case, the speculative type being empty is correct, the path is speculatively unreachable, maybe we can use that information to cut off the branches, simplify the CFG for better compilation? Attached is another test case that reproduces the same issue. [TestSpeculativeTypes.java](https://github.com/user-attachments/files/23659130/TestSpeculativeTypes.java) I run that one with: $ for i in `seq 100`; do java -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:CompileOnly=TestSpeculativeTypes::test2 -XX:CompileOnly=TestSpeculativeTypes::inlined3 -XX:CompileCommand=quiet -XX:TypeProfileLevel=200 -XX:+AlwaysIncrementalInline -XX:VerifyIterativeGVN=10 -XX:CompileCommand=dontinline,TestSpeculativeTypes::notInlined1 -XX:+StressIncrementalInlining TestSpeculativeTypes || break; done It usually fails after a few runs. That one has conflicting profile data but no dead path. What you're suggesting has some risk and unclear benefits so I think would need to be investigated separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3559087312 From jrose at openjdk.org Thu Nov 20 17:14:39 2025 From: jrose at openjdk.org (John R Rose) Date: Thu, 20 Nov 2025 17:14:39 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Wed, 24 Sep 2025 13:08:14 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Drop atomic counters > - Initial version src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4845: > 4843: push(temp_reg); > 4844: movptr(temp_reg, recv); > 4845: recv_reg = temp_reg; I can mentally do the appropriate `assert_different_registers` here, but an explicit one to confirm would be better. (Same comment for the next arm of the if/else.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2546917073 From krk at openjdk.org Thu Nov 20 18:28:57 2025 From: krk at openjdk.org (Kerem Kat) Date: Thu, 20 Nov 2025 18:28:57 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v2] In-Reply-To: <4X6V2ONReceYNZ7I_zUlK7VxZflXz0vJzhuUWnZEGoM=.3a5a263b-4c0a-4c20-8f65-52e7f09bb03f@github.com> References: <4X6V2ONReceYNZ7I_zUlK7VxZflXz0vJzhuUWnZEGoM=.3a5a263b-4c0a-4c20-8f65-52e7f09bb03f@github.com> Message-ID: On Thu, 20 Nov 2025 16:26:13 GMT, Manuel H?ssig wrote: >> Kerem Kat has updated the pull request incrementally with three additional commits since the last revision: >> >> - fix test spacing >> - Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Manuel H?ssig >> - Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Manuel H?ssig > > src/hotspot/share/opto/macro.cpp line 2314: > >> 2312: >> 2313: Node* ctrl = unlock->in(TypeFunc::Control); >> 2314: Node* mem = unlock->in(TypeFunc::Memory); > > Do I understand correctly, that `mem` is never `nullptr` because `UnlockNode` is a subclass of a `SafepointNode` which always has a memory input? `UnlockNode` is created at https://github.com/openjdk/jdk/blob/b3acc4841f6d9c8fd484df68fd2882dab0aa1788/src/hotspot/share/opto/graphKit.cpp#L3542 and will assert under `GraphKit:memory` if memory is somehow null. > src/hotspot/share/opto/macro.cpp line 2352: > >> 2350: _igvn.replace_node(_callprojs.fallthrough_proj, region); >> 2351: >> 2352: if (_callprojs.fallthrough_memproj != nullptr) { > > Why do we not have to hook up the memory input to the fall through projection if it does not exist in the first place? Could you clarify the question? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2547163948 PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2547168105 From shade at openjdk.org Thu Nov 20 19:42:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Nov 2025 19:42:03 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 05:06:58 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Fix Whitespace error. test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 59: > 57: private static final int TAG_SIZE_IN_BYTES = 16; > 58: > 59: private Cipher getCipher(final byte[] key, final byte[] aad, byte[] nonce) Suggestion: private Cipher getCipher(final byte[] key, final byte[] aad, final byte[] nonce) test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 65: > 63: new GCMParameterSpec(8 * TAG_SIZE_IN_BYTES, nonce, 0, nonce.length); > 64: Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding"); > 65: cipher.init(Cipher.ENCRYPT_MODE, keySpec, params); Er. This is used from `gcmDecrypt`? How does it work without `Cipher.DECRYPT_MODE`? test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 89: > 87: System.arraycopy(ciphertext, 0, nonce, 0, IV_SIZE_IN_BYTES); > 88: Cipher cipher = getCipher(key, aad, nonce); > 89: return cipher.doFinal(ciphertext, IV_SIZE_IN_BYTES, ciphertext.length - IV_SIZE_IN_BYTES); Indenting is still 2-space here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2547113559 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2547438961 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2547112755 From jiangli at openjdk.org Thu Nov 20 20:03:32 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Nov 2025 20:03:32 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v4] In-Reply-To: References: Message-ID: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28363/files - new: https://git.openjdk.org/jdk/pull/28363/files/528b1b47..e99a441b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363 PR: https://git.openjdk.org/jdk/pull/28363 From jiangli at openjdk.org Thu Nov 20 20:03:35 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Nov 2025 20:03:35 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 18:08:40 GMT, Aleksey Shipilev wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Whitespace error. > > test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 89: > >> 87: System.arraycopy(ciphertext, 0, nonce, 0, IV_SIZE_IN_BYTES); >> 88: Cipher cipher = getCipher(key, aad, nonce); >> 89: return cipher.doFinal(ciphertext, IV_SIZE_IN_BYTES, ciphertext.length - IV_SIZE_IN_BYTES); > > Indenting is still 2-space here. Indeed. Fixed, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2547495799 From jiangli at openjdk.org Thu Nov 20 20:10:00 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Nov 2025 20:10:00 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v5] In-Reply-To: References: Message-ID: <05tqW1f2I7vfUwOz5JmZLKK7ztECI4GCbzhzHH3ktKo=.106c1ad2-5d95-4fa0-a79e-e6a2751fadb4@github.com> > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Update test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java Applied, thanks. Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28363/files - new: https://git.openjdk.org/jdk/pull/28363/files/e99a441b..54fbf2b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363 PR: https://git.openjdk.org/jdk/pull/28363 From snatarajan at openjdk.org Thu Nov 20 20:57:42 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Thu, 20 Nov 2025 20:57:42 GMT Subject: RFR: 8349835: C2: Simplify IGV property printing [v8] In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: <_sItxPCxhSDj4-O3z7bN1PJwL3V1I06FQhCSYa6lDTA=.1d659740-1016-42df-b35c-28733dab8522@github.com> > The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). > > ### Fix > Implemented the suggested refactoring. > > ### Testing > Github Actions, Tier 1-3 Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: fix for merge mistake ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26902/files - new: https://git.openjdk.org/jdk/pull/26902/files/85f3495c..eaec2213 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=06-07 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26902/head:pull/26902 PR: https://git.openjdk.org/jdk/pull/26902 From kxu at openjdk.org Thu Nov 20 21:26:52 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 20 Nov 2025 21:26:52 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v22] In-Reply-To: References: Message-ID: <2Jynx8KCfMqnUciI9ya5Mr6DIg83TouU8vFXw5d8Jhc=.fd3b467b-a75f-4527-8ea9-65a9e03686e1@github.com> > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: - fix bad merge with ctrl_is_member() - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp - Merge branch 'master' into counted-loop-refactor - add missed minor changes - fix bad merge - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp - follow-up review 3383037106 - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Christian Hagedorn - mark LoopExitTest::is_valid_with_bt() const - fix iv increment basic type and truncated increment check - ... and 32 more: https://git.openjdk.org/jdk/compare/b9ee9541...584a6968 ------------- Changes: https://git.openjdk.org/jdk/pull/24458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=21 Stats: 1203 lines in 3 files changed: 610 ins; 288 del; 305 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From vlivanov at openjdk.org Thu Nov 20 21:32:45 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 20 Nov 2025 21:32:45 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v13] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 18:12:22 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution src/hotspot/cpu/x86/x86.ad line 2649: > 2647: } > 2648: > 2649: // First operand of MachNode corresponding to Intel APX NDD selection Very informative comments! Thank you. I suggest to shape it as follows: if ((mdef->flags() & Node::PD::Flag_ndd_demotable) != 0) { switch (oper_index) { // First operand of MachNode corresponding to Intel APX NDD selection // pattern can share its assigned register with definition operand if // their live ranges do not overlap, in such a scenario we can demote // it to legacy map0/map1 instruction by replacing its 4-byte extended // EVEX prefix with shorter REX/REX2 encoding. Demotion candidates // are decorated with a special flag by instruction selector. case 1: return true; // For commutative operation allocation of definition // operand can also be biased towards second operand. case 2: return (mdef->flags() & Node::PD::Flag_ndd_commutative) != 0); // No register biasing supported for other operands. default: return false; } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2547714121 From vlivanov at openjdk.org Thu Nov 20 21:32:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 20 Nov 2025 21:32:46 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v13] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 21:24:35 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > src/hotspot/cpu/x86/x86.ad line 2649: > >> 2647: } >> 2648: >> 2649: // First operand of MachNode corresponding to Intel APX NDD selection > > Very informative comments! Thank you. > > I suggest to shape it as follows: > > if ((mdef->flags() & Node::PD::Flag_ndd_demotable) != 0) { > switch (oper_index) { > // First operand of MachNode corresponding to Intel APX NDD selection > // pattern can share its assigned register with definition operand if > // their live ranges do not overlap, in such a scenario we can demote > // it to legacy map0/map1 instruction by replacing its 4-byte extended > // EVEX prefix with shorter REX/REX2 encoding. Demotion candidates > // are decorated with a special flag by instruction selector. > case 1: return true; > > // For commutative operation allocation of definition > // operand can also be biased towards second operand. > case 2: return (mdef->flags() & Node::PD::Flag_ndd_commutative) != 0); > > // No register biasing supported for other operands. > default: return false; > } > } Speaking of implicit invariants, `Node::PD::Flag_ndd_commutative` implies `Node::PD::Flag_ndd_demotable` is also set. Please, add an assert to catch missing `Node::PD::Flag_ndd_demotable` flag. Another constraint to assert: `mdef->operand_num_edges(oper_index) == 1` should be true for 1st operand when `Node::PD::Flag_ndd_demotable` is set and, also, for 2nd operand when `Node::PD::Flag_ndd_commutative` is set. Also, any operand ordering constraints in AD instruction declaration? Is it possible to mess the order of declarations, so register biasing candidate operands don't occur as 1st and 2nd operands? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2547726591 From sviswanathan at openjdk.org Thu Nov 20 21:34:56 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 20 Nov 2025 21:34:56 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: References: Message-ID: On Mon, 17 Nov 2025 23:35:44 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: > > - whitespace > - address first comments src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1283: > 1281: // r1 = r1 & quotient; // copy 0 or keep as is, using EqMsk as filter > 1282: for (int i = 0; i < regCnt; i++) { > 1283: // FIXME: replace with void evmovdqul(Address dst, KRegister mask, XMMRegister src, bool merge, int vector_len);? Is the fixme a leftover? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2547729185 From vlivanov at openjdk.org Thu Nov 20 22:30:04 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 20 Nov 2025 22:30:04 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v2] In-Reply-To: References: <4MItF4KodwK0fPsG1hcNYtkOA3DUbaUZ3HixYQYs9iI=.2a3835a8-cec6-4d83-9f3e-2e049dc24d9c@github.com> Message-ID: On Thu, 20 Nov 2025 12:58:55 GMT, Christian Hagedorn wrote: >> Done, running `compiler/arraycopy/TestArrayCopyDisjoint.java` with `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:VerifyConstraintCasts=1 -XX:+StressLoopPeeling` encounters this issue. Do you think it is necessary to add a separate case for that test, then? > > Thanks for the update! If it's a short running test/config, then I think it would be good to have this extra config to cover the changes of this patch. Is it truly specific to post-loop opts phase? Isn't it yet another paradoxical IR shape occurring in effectively dead code? In the longer term, it would be good to ensure such effectively dead nodes eventually go away. Or, better, eagerly trigger their elimination. Otherwise, it could cause issues later in compilation process unless the problematic conditions are explicitly handled everywhere (e.g., during matching or code generation for `vmask_gen_imm` on x64 and AArch64). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28410#discussion_r2547888182 From vlivanov at openjdk.org Thu Nov 20 22:31:06 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 20 Nov 2025 22:31:06 GMT Subject: RFR: 8371789: C2: More explicit dump results for TypePtr In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 11:50:46 GMT, Quan Anh Mai wrote: > Hi, > > This patch tries to clear up the dumped information of `TypePtr` and its subclasses. It makes it immediately clear the states of the `Type` object without us having to look into the implementation of `dump2`, for example, to know that the absence of `:NotNull` implies that it is a `BotPTR`. > > Please take a look and kindly review, thanks a lot. Nice improvement! ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28292#pullrequestreview-3490245858 From vpaprotski at openjdk.org Thu Nov 20 22:55:07 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 20 Nov 2025 22:55:07 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: next set of comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28136/files - new: https://git.openjdk.org/jdk/pull/28136/files/e9133401..b04f4f0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=01-02 Stats: 424 lines in 2 files changed: 1 ins; 423 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28136/head:pull/28136 PR: https://git.openjdk.org/jdk/pull/28136 From sviswanathan at openjdk.org Thu Nov 20 23:09:54 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 20 Nov 2025 23:09:54 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 22:55:07 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > next set of comments Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28136#pullrequestreview-3490441448 From vpaprotski at openjdk.org Thu Nov 20 23:17:36 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 20 Nov 2025 23:17:36 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: References: Message-ID: <-NP71XXG0bisxVHds8O-uXhLZqbnVLijJoJDwVq2ZBk=.2478c442-fc34-4ba0-9811-1f910ee3ee36@github.com> On Wed, 19 Nov 2025 22:40:41 GMT, Sergey Kuksenko wrote: > I understand your reasons. The question is whether you'll need the microbenchmark in the future. If no (or probably no), please remove the micro. If needed, please move it from the "org.openjdk.bench.javax.crypto.full" package to "org.openjdk.bench.javax.crypto". It is supposed to have only public API micros in packages "small" and "full" @kuksenko I decided to just remove it. If anyone wants it back, its in my git history (I usually keep my branches after merge..) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3560569194 From vpaprotski at openjdk.org Thu Nov 20 23:17:39 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 20 Nov 2025 23:17:39 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 21:31:31 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision: >> >> - whitespace >> - address first comments > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1283: > >> 1281: // r1 = r1 & quotient; // copy 0 or keep as is, using EqMsk as filter >> 1282: for (int i = 0; i < regCnt; i++) { >> 1283: // FIXME: replace with void evmovdqul(Address dst, KRegister mask, XMMRegister src, bool merge, int vector_len);? > > Is the fixme a leftover? Yes. Removed. (I think I was considering merging this instruction with the storeXmm, but there really isnt a good way to do that) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2548005781 From vlivanov at openjdk.org Thu Nov 20 23:43:30 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 20 Nov 2025 23:43:30 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: <-NP71XXG0bisxVHds8O-uXhLZqbnVLijJoJDwVq2ZBk=.2478c442-fc34-4ba0-9811-1f910ee3ee36@github.com> References: <-NP71XXG0bisxVHds8O-uXhLZqbnVLijJoJDwVq2ZBk=.2478c442-fc34-4ba0-9811-1f910ee3ee36@github.com> Message-ID: On Thu, 20 Nov 2025 23:13:41 GMT, Volodymyr Paprotski wrote: > If anyone wants it back, its in my git history (I usually keep my branches after merge..) You could put a comment with the link into JBS issue to make it easier to discover later. (Or just attach the source file there.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3560656214 From jiangli at openjdk.org Fri Nov 21 00:14:20 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Nov 2025 00:14:20 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v6] In-Reply-To: References: Message-ID: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! Jiangli Zhou has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8371864' of ssh://github.com/jianglizhou/jdk into JDK-8371864 - Use Cipher.DECRYPT_MODE for gcmDecrypt. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28363/files - new: https://git.openjdk.org/jdk/pull/28363/files/54fbf2b1..8617ab4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363 PR: https://git.openjdk.org/jdk/pull/28363 From dlong at openjdk.org Fri Nov 21 00:18:00 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Nov 2025 00:18:00 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v2] In-Reply-To: References: Message-ID: <4mKilmLiEmqcjagWuqRsDZM1aOennI8PFK5W316PtFc=.f7f91da4-8f85-497d-bf62-fbc3b5711db2@github.com> On Thu, 20 Nov 2025 16:40:49 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request incrementally with three additional commits since the last revision: > > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig Why is fallthrough_memproj null, and why is this an issue only for expand_unlock_node but not expand_lock_node or other code that tries to replace fallthrough_memproj? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3560735492 From jiangli at openjdk.org Fri Nov 21 00:18:00 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Nov 2025 00:18:00 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 19:37:11 GMT, Aleksey Shipilev wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Whitespace error. > > test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 65: > >> 63: new GCMParameterSpec(8 * TAG_SIZE_IN_BYTES, nonce, 0, nonce.length); >> 64: Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding"); >> 65: cipher.init(Cipher.ENCRYPT_MODE, keySpec, params); > > Er. This is used from `gcmDecrypt`? How does it work without `Cipher.DECRYPT_MODE`? Good catch. Interestingly the test passed for me on my local machine. Fixed to use Cipher.DECRYPT_MODE when doing gcmDecrypt. Also an interesting new finding, with the decrypted message verification, I see there are 2 failures out of 200 runs with AVX512. I'm filing a new issue on the specifically, so it can be investigated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2548112676 From dlong at openjdk.org Fri Nov 21 01:27:22 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Nov 2025 01:27:22 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:57:11 GMT, Andrew Dinn wrote: >> It's worth noting that Graal implements `meet` and `join` separately in its `Stamp` type system, with no "dual" tricks, and apparently no worries about being "symmetrical" or even mentioning the words "lattice" or "centerline". For example: >> [compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/common/type/AbstractObjectStamp.java](https://github.com/oracle/graal/blob/ee2e127f76d2b2fe39e74aa2994d785d0591b567/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/common/type/AbstractObjectStamp.java) > >> It's worth noting that Graal implements meet and join separately in its Stamp type system, with no "dual" tricks, and apparently no worries about being "symmetrical" or even mentioning the words "lattice" or "centerline". ... > > It's also worth noting that the theory on which most of the C2 optimization is based stresses heavily the need for the type hierarchy to be a 'well-formed' lattice and relies on that to ensure that the join or meet for any two types is both correct and as strong as possible. The use of dual types to derive the join/meet is not critical and is not computationally optimal but it does enable a uniform computation model which provides the strength guarantee. > > That strength guarantee depends on the lattice obeying certain well-formedness constraints which are not always met in the C2 type system. I recall the issue was that for some pairings there is not always a unique strongest type for the meet (or join?) of two OopPtr types (although there will always be a weaker Ptr type that is the parent of all such strongest valid types). This means that in some cases C2 misses opportunities to perform some optimizations. Years ago Cliff Click wrote a series of articles describing how the lattice worked which explained how this possibility arose but it disappeared from the interwebs quite a while back. @adinn , I found this, part of a 3-part series: https://web.archive.org/web/20170223192730/http://www.cliffc.org/blog/2012/03/24/too-much-theory-part-3/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3560918116 From jiangli at openjdk.org Fri Nov 21 01:31:39 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Nov 2025 01:31:39 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v7] In-Reply-To: References: Message-ID: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Change to just create a byte array for 'nonce' without generating random data in gcmDecrypt. Suggested by AI. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28363/files - new: https://git.openjdk.org/jdk/pull/28363/files/8617ab4c..d26d0ee9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363 PR: https://git.openjdk.org/jdk/pull/28363 From dlong at openjdk.org Fri Nov 21 01:37:02 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Nov 2025 01:37:02 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 11:29:56 GMT, Manuel H?ssig wrote: >>> It's worth noting that Graal implements meet and join separately in its Stamp type system, with no "dual" tricks, and apparently no worries about being "symmetrical" or even mentioning the words "lattice" or "centerline". ... >> >> It's also worth noting that the theory on which most of the C2 optimization is based stresses heavily the need for the type hierarchy to be a 'well-formed' lattice and relies on that to ensure that the join or meet for any two types is both correct and as strong as possible. The use of dual types to derive the join/meet is not critical and is not computationally optimal but it does enable a uniform computation model which provides the strength guarantee. >> >> That strength guarantee depends on the lattice obeying certain well-formedness constraints which are not always met in the C2 type system. I recall the issue was that for some pairings there is not always a unique strongest type for the meet (or join?) of two OopPtr types (although there will always be a weaker Ptr type that is the parent of all such strongest valid types). This means that in some cases C2 misses opportunities to perform some optimizations. Years ago Cliff Click wrote a series of articles describing how the lattice worked which explained how this possibility arose but it disappeared from the interwebs quite a while back. > >> Years ago Cliff Click wrote a series of articles describing how the lattice worked which explained how this possibility arose but it disappeared from the interwebs quite a while back. > > Luckily, it was archived: https://web.archive.org/web/20160806073716/http://www.cliffc.org/blog/2012/02/12/too-much-theory/ Thanks @mhaessig , you beat me to it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3560945264 From dlong at openjdk.org Fri Nov 21 01:42:19 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Nov 2025 01:42:19 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:14:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into typejoin > - Move dual to ASSERT only > - Keep old version for verification > - whitespace > - Reimplement Type::join For completeness, part 2: https://web.archive.org/web/20170109192419/http://www.cliffc.org/blog/2012/02/27/too-much-theory-part-2/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3560954478 From fyang at openjdk.org Fri Nov 21 03:41:46 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Nov 2025 03:41:46 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v5] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: On Tue, 18 Nov 2025 09:27:44 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > replace assert with log_warning src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1590: > 1588: // jump if cmp1 < cmp2 or either is NaN > 1589: // not jump (i.e. move src to dst) if cmp1 >= cmp2 > 1590: float_blt(cmp1, cmp2, no_set); I compared this with the existing `MacroAssembler::cmov_cmp_fp_ge` [1] and I witnessed some difference in the case of `NaN` handling. In `MacroAssembler::cmov_cmp_fp_ge`, we set the `is_unordered` param to true when calling `float_blt` or `double_blt`, which is not the case here. I assume we need similar handling here as well, right? [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L1338 src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1636: > 1634: // jump if cmp1 <= cmp2 or either is NaN > 1635: // not jump (i.e. move src to dst) if cmp1 > cmp2 > 1636: float_ble(cmp1, cmp2, no_set); Same question here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2548424215 PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2548424568 From rcastanedalo at openjdk.org Fri Nov 21 06:51:23 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 21 Nov 2025 06:51:23 GMT Subject: RFR: 8349835: C2: Simplify IGV property printing [v8] In-Reply-To: <_sItxPCxhSDj4-O3z7bN1PJwL3V1I06FQhCSYa6lDTA=.1d659740-1016-42df-b35c-28733dab8522@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> <_sItxPCxhSDj4-O3z7bN1PJwL3V1I06FQhCSYa6lDTA=.1d659740-1016-42df-b35c-28733dab8522@github.com> Message-ID: On Thu, 20 Nov 2025 20:57:42 GMT, Saranya Natarajan wrote: >> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). >> >> ### Fix >> Implemented the suggested refactoring. >> >> ### Testing >> Github Actions, Tier 1-3 > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > fix for merge mistake Thanks for addressing my comments Saranya, I just have one final suggestion, looks good otherwise. src/hotspot/share/opto/idealGraphPrinter.cpp line 70: > 68: print_property(_printer->C->matcher()->is_dontcare(node), "is_dontcare"); > 69: print_property(!(_printer->C->matcher()->is_dontcare(node)),"is_dontcare", IdealGraphPrinter::FALSE_VALUE); > 70: Node* old = _printer->C->matcher()->find_old_node(node); Suggestion: Matcher* matcher = _printer->C->matcher(); if (matcher != nullptr) { print_property(matcher->is_shared(node),"is_shared"); print_property(!matcher->is_shared(node), "is_shared", IdealGraphPrinter::FALSE_VALUE); print_property(matcher->is_dontcare(node), "is_dontcare"); print_property(!matcher->is_dontcare(node),"is_dontcare", IdealGraphPrinter::FALSE_VALUE); Node* old = matcher->find_old_node(node); ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26902#pullrequestreview-3491425928 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2548723380 From hgreule at openjdk.org Fri Nov 21 07:05:53 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 21 Nov 2025 07:05:53 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v5] In-Reply-To: References: Message-ID: <28yL0IfHtLkAOtMAGiLeFGNW7C-WdTRsIMalvsKeras=.60e1d08f-d403-4b38-88d4-1b51f361fd05@github.com> On Thu, 20 Nov 2025 16:39:54 GMT, Zihao Lin wrote: >> src/hotspot/share/opto/mulnode.cpp line 641: >> >>> 639: // Both are constant, directly computed the result >>> 640: if (longType1->is_con() && longType2->is_con()) { >>> 641: jlong highResult = multiply_high_unsigned(longType1->get_con(), longType2->get_con()); >> >> We are going from an unsigned value to a signed here, I think this is implementation-defined? Maybe it's better to use julong and `TypeLong::make_or_top(TypeIntPrototype{{min_jlong, max_jlong}, {highResult, highResult}, {0, 0}})`? >> >> (It might also make sense to have a helper function like `TypeLong::make_unsigned` for that, but I'll let others comment on whether that should be done separately) > > I think TypeLong::make is doing the work your mentioned, do we need another function to do it? > > > const TypeLong* TypeLong::make(jlong con) { > julong ucon = con; > return (new TypeLong(TypeIntPrototype{{con, con}, {ucon, ucon}, {~ucon, ucon}}, > WidenMin, false))->hashcons()->is_long(); > } Sorry if it wasn't clear, but the problem is that `multiply_high_unsigned` returns an *unsigned* long which you currently convert into a *signed* long. But from my understanding this is implementation-defined and I *think* you need to avoid that (I might be wrong though, happy to be corrected by someone else here :) ). That would mean you need to make `highResult` a `julong` and then you can't use `TypeLong::make` anymore as this would result in the same problem again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2548753897 From shade at openjdk.org Fri Nov 21 07:20:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 21 Nov 2025 07:20:31 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 16:48:01 GMT, Aleksey Shipilev wrote: > As you can see in the report, current matchers rely heavily on mainline C2 implementation to match specific stores. This fails when we try to backport MergeStores fixes to 25u. It would be better to relax the matchers a bit to cater for 25u backports, and also making test more robust for future MergeStores changes, if any. > > Additional testing: > - [x] Linux x86_64 server fastdebug, mainline, affected test, 100x > - [x] Linux AArch64 server fastdebug, mainline, affected test, 100x > - [x] Linux x86_64 server fastdebug, jdk25u, affected test, 100x Hey bots, figure out the issue exists. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28437#issuecomment-3561748487 From mhaessig at openjdk.org Fri Nov 21 08:02:34 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 21 Nov 2025 08:02:34 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v2] In-Reply-To: References: <4X6V2ONReceYNZ7I_zUlK7VxZflXz0vJzhuUWnZEGoM=.3a5a263b-4c0a-4c20-8f65-52e7f09bb03f@github.com> Message-ID: <5Pvs8fapNnI_RZU8YDnaat9PrqRWc0Gn8tfongrPs-Y=.8c71eb91-34e7-48a0-a0ed-ee75986dbbdb@github.com> On Thu, 20 Nov 2025 18:26:06 GMT, Kerem Kat wrote: >> src/hotspot/share/opto/macro.cpp line 2352: >> >>> 2350: _igvn.replace_node(_callprojs.fallthrough_proj, region); >>> 2351: >>> 2352: if (_callprojs.fallthrough_memproj != nullptr) { >> >> Why do we not have to hook up the memory input to the fall through projection if it does not exist in the first place? > > Could you clarify the question? The code before assumed that `fallthrough_memproj` is always not null. So does the rest of the code also expect this? If so, then we should perhaps use `mem` for that purpose. This is related to @dean-long's question below. There is an invariant that is being violated and your fix should take into account why it is violated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2548864355 From thartmann at openjdk.org Fri Nov 21 08:25:19 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 21 Nov 2025 08:25:19 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v2] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 16:40:49 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request incrementally with three additional commits since the last revision: > > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig Also, +1 to Dean's question. We need a better understanding of how we ended up in this situation. test/hotspot/jtreg/compiler/c2/Test8370502.java line 34: > 32: package compiler.c2; > 33: > 34: public class Test8370502 { Drive-by comment: Please rename the test to something more descriptive. We don't use bug numbers for test names anymore. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3561938940 PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2548916456 From mhaessig at openjdk.org Fri Nov 21 08:33:27 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 21 Nov 2025 08:33:27 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:14:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into typejoin > - Move dual to ASSERT only > - Keep old version for verification > - whitespace > - Reimplement Type::join I have done a bit of digging as to why we have the symmetry requirement and found [Cliff talking about it in the Coffee Compiler Club](https://youtu.be/pBx6hoNV_eQ?t=240). He uses the symmetry to guarantee work-order independence (he also calls this the [Church-Rosser property](https://en.wikipedia.org/wiki/Church%E2%80%93Rosser_theorem) of the lattice in some places) and that provably guarantees that the optimistic analysis in the form of CCP will fixpoint at a better result than the pessimistic analysis of IGVN. He mentions playing around with non-symmetric lattices for C2 but running into issues with CCP giving different results based on the order of the work list. So I guess the main question I have is how we would ensure that any lattice ensures work order independence if it is not by construction. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3561958554 From dlunden at openjdk.org Fri Nov 21 08:35:39 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 21 Nov 2025 08:35:39 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v13] In-Reply-To: References: Message-ID: <0SEns_tIvS8plwKppk2cq8FekKyH_rHeysJVLV9rbSI=.2bec3568-ab0f-449d-95a4-59690e4a13d0@github.com> On Thu, 20 Nov 2025 21:30:21 GMT, Vladimir Ivanov wrote: >> src/hotspot/cpu/x86/x86.ad line 2649: >> >>> 2647: } >>> 2648: >>> 2649: // First operand of MachNode corresponding to Intel APX NDD selection >> >> Very informative comments! Thank you. >> >> I suggest to shape it as follows: >> >> if ((mdef->flags() & Node::PD::Flag_ndd_demotable) != 0) { >> switch (oper_index) { >> // First operand of MachNode corresponding to Intel APX NDD selection >> // pattern can share its assigned register with definition operand if >> // their live ranges do not overlap, in such a scenario we can demote >> // it to legacy map0/map1 instruction by replacing its 4-byte extended >> // EVEX prefix with shorter REX/REX2 encoding. Demotion candidates >> // are decorated with a special flag by instruction selector. >> case 1: return true; >> >> // For commutative operation allocation of definition >> // operand can also be biased towards second operand. >> case 2: return (mdef->flags() & Node::PD::Flag_ndd_commutative) != 0); >> >> // No register biasing supported for other operands. >> default: return false; >> } >> } > > Speaking of implicit invariants, `Node::PD::Flag_ndd_commutative` implies `Node::PD::Flag_ndd_demotable` is also set. Please, add an assert to catch missing `Node::PD::Flag_ndd_demotable` flag. > > Another constraint to assert: `mdef->operand_num_edges(oper_index) == 1` should be true for 1st operand when `Node::PD::Flag_ndd_demotable` is set and, also, for 2nd operand when `Node::PD::Flag_ndd_commutative` is set. > > Also, any operand ordering constraints in AD instruction declaration? Is it possible to mess the order of declarations, so register biasing candidate operands don't occur as 1st and 2nd operands? Two nits: - Comma splice: "overlap, in such a scenario" -> "overlap. In such a scenario" - "For commutative operation allocation of definition operand can also be biased towards second operand" reads a bit strange to me. Perhaps "Commutative operation allocations of definition operands can also be biased towards the second operand"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2548946958 From qamai at openjdk.org Fri Nov 21 09:08:28 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 21 Nov 2025 09:08:28 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 08:29:27 GMT, Manuel H?ssig wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into typejoin >> - Move dual to ASSERT only >> - Keep old version for verification >> - whitespace >> - Reimplement Type::join > > I have done a bit of digging as to why we have the symmetry requirement and found [Cliff talking about it in the Coffee Compiler Club](https://youtu.be/pBx6hoNV_eQ?t=240). He uses the symmetry to guarantee work-order independence (he also calls this the [Church-Rosser property](https://en.wikipedia.org/wiki/Church%E2%80%93Rosser_theorem) of the lattice in some places) and that provably guarantees that the optimistic analysis in the form of CCP will fixpoint at a better result than the pessimistic analysis of IGVN. He mentions playing around with non-symmetric lattices for C2 but running into issues with CCP giving different results based on the order of the work list. So I guess the main question I have is how we would ensure that any lattice ensures work order independence if it is not by construction. @mhaessig Firstly, the type system is already not well-formed, as pointed out above, so it is unsound to think that the properties you mentioned are withheld. Furthermore, the construction of the lattice is not enough to enforce those properties, you need each and every operation on the lattice to conform to certain properties, too. For example, without monotonicity, we can easily come up with cases where CCP gives worse results compared to GVN. > the optimistic analysis in the form of CCP will fixpoint at a better result than the pessimistic analysis of IGVN This is guaranteed by monotonicity, since the type of each node at the beginning of CCP is a subset of the corresponding type during GVN. This property will be withheld throughout each iteration of CCP. > CCP giving different results based on the order of the work list Tbh I don't see this being an issue at all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3562064240 From mchevalier at openjdk.org Fri Nov 21 09:17:09 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 21 Nov 2025 09:17:09 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:14:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into typejoin > - Move dual to ASSERT only > - Keep old version for verification > - whitespace > - Reimplement Type::join Honestly, I need proofs here. Using normal lattice vocabulary there. As far as I understand CCP is a standard analysis that starts with bottom and join and widen until reaching an approximation of a fixpoint, hopefully, the least one. A widening is guaranteed to eventually be stationary (even if it means defaulting to top), which guarantee jumping above a fixpoint in finite time. IGVN is the dual strategy: we start from top, and we refine using meet and narrowing, which is not a dual widening: we must stay on the same side of the fixpoint (since we started on the safe one). I don't understand how the structure of the lattice would give any guarantee about the respective results of those. It seems to me that it entirely depends on the quality of our widening. For instance, a widening that would always map to top, and converge in one iteration, would make CCP terminates very fast, and be sound, while being provable worse (or equally bad) than any sound analysis. I'm not even convinced that the symmetry hypothesis is necessary to have the mentioned result. (It is clearly not sufficient from my previous example). We probably need associativity, but then, if we have an abstract domain that is a lattice, that should be rather straightforward. If we start having simple posets instead, we can be very sound, but associativity might require more care... ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3562088359 PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3562094419 From shade at openjdk.org Fri Nov 21 09:42:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 21 Nov 2025 09:42:35 GMT Subject: RFR: 8372154: AArch64: Match rule failure with some CompareAndSwap operand shapes [v2] In-Reply-To: <6UEFErXO09V1ViH0p7jDLE2tVhnCWRTuOSBrRZpXt8c=.0aeb8c97-4649-451d-87c4-46850a791102@github.com> References: <6UEFErXO09V1ViH0p7jDLE2tVhnCWRTuOSBrRZpXt8c=.0aeb8c97-4649-451d-87c4-46850a791102@github.com> Message-ID: <8LNT4PH-y9xyn1hrX8BOAYfq8ZIQ0QrpHKh6SSG0WoQ=.41d8d3aa-16de-4004-9de4-e1b0f8901d9e@github.com> On Thu, 20 Nov 2025 07:55:51 GMT, Aleksey Shipilev wrote: >> See bug for more details. >> >> Following up on [JDK-8371959](https://bugs.openjdk.org/browse/JDK-8371959) failures, I managed to reproduce the "bad AD" file assert. It is heavily intermittent, and needs hours of runs before we hit the lucky seed, plus [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557) to have broader testing scope. >> >> It looks like `CastII` node accepts the wider operand type (`iRegI`), which fails to match against narrower type in CAS match rules (`iRegINoSp`). It makes sense to use `iRegINoSp` for destination regs, so that we do not start writing to these special registers. But for operand registers, it makes little sense, IMO. I note that cas.m4-generated CAE/WCAS stubs actually already have the wider `iRegI` for operand types. >> >> So it looks to me the manual CAS match rules should also use `iRegI`. It would be even better to auto-generate these match rules from M4 stencils, and I tried that, but ultimately decided it obscures the actual bug fix. [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) is dedicated to moving the match rules, hopefully without the semantic change. >> >> In this change, I dropped `*NoSp` from CAS operand match rules. It fixes the `iRegI` mismatch, and prepares us for harmonizing these rules with the rest of CAS/CAE generated ones. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, local `bad AD` assert reproducer no longer fails >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, jcstress run > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8372154-aarch64-cas-operand-match > - Fix There we go. Nothing bad is going to happen if I push on Friday, I am sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28398#issuecomment-3562192388 From shade at openjdk.org Fri Nov 21 09:42:37 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 21 Nov 2025 09:42:37 GMT Subject: Integrated: 8372154: AArch64: Match rule failure with some CompareAndSwap operand shapes In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 17:41:48 GMT, Aleksey Shipilev wrote: > See bug for more details. > > Following up on [JDK-8371959](https://bugs.openjdk.org/browse/JDK-8371959) failures, I managed to reproduce the "bad AD" file assert. It is heavily intermittent, and needs hours of runs before we hit the lucky seed, plus [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557) to have broader testing scope. > > It looks like `CastII` node accepts the wider operand type (`iRegI`), which fails to match against narrower type in CAS match rules (`iRegINoSp`). It makes sense to use `iRegINoSp` for destination regs, so that we do not start writing to these special registers. But for operand registers, it makes little sense, IMO. I note that cas.m4-generated CAE/WCAS stubs actually already have the wider `iRegI` for operand types. > > So it looks to me the manual CAS match rules should also use `iRegI`. It would be even better to auto-generate these match rules from M4 stencils, and I tried that, but ultimately decided it obscures the actual bug fix. [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) is dedicated to moving the match rules, hopefully without the semantic change. > > In this change, I dropped `*NoSp` from CAS operand match rules. It fixes the `iRegI` mismatch, and prepares us for harmonizing these rules with the rest of CAS/CAE generated ones. > > Additional testing: > - [x] Linux AArch64 server fastdebug, local `bad AD` assert reproducer no longer fails > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstress run This pull request has now been integrated. Changeset: 88ec4e61 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/88ec4e615a3008408184b7ed92010adc75d63853 Stats: 15 lines in 1 file changed: 0 ins; 0 del; 15 mod 8372154: AArch64: Match rule failure with some CompareAndSwap operand shapes Reviewed-by: aph, adinn ------------- PR: https://git.openjdk.org/jdk/pull/28398 From roland at openjdk.org Fri Nov 21 09:51:26 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Nov 2025 09:51:26 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:26:01 GMT, Christian Hagedorn wrote: > Not sure if I understand your question. The Template Assertion Predicates themselves are never executed and just serve as templates to create Initialized Assertion Predicates from which will result in a halt if they fail at runtime. Right. But if I remember correctly, the false projection of nodes 309 and 321 (in your graph snippets) is an uncommon trap. That uncommon trap has incorrect state if it can be reached between the outer and inner loop. That's harmless, because when a predicate is initialized, it is wired to a halt node directly. Is this correct? If it is, then, I would add a comment so the state of the uncommon trap is not used by accident in some later change. Or is it possible to wire the template assertion predicate directly to a `Halt` node when it's cloned? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28389#issuecomment-3562221771 From chagedorn at openjdk.org Fri Nov 21 10:11:02 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 21 Nov 2025 10:11:02 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 12:19:02 GMT, Christian Hagedorn wrote: > ### Strong Connection between Template Assertion Predicate and Counted Loop > In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. > > #### Maintaining this Property > In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 > > All other opaque nodes are removed. > > ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes > As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 > > ### Violating the Additional Verification with `-XX:+StressLoopBackedge` > In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: > > image > > After duplicate backedge, the Template Assertion Predicates are now at the outer non-counted `Loop`: > > References: Message-ID: On Thu, 13 Nov 2025 16:12:09 GMT, Emanuel Peter wrote: > Do you know why we insert a new `CastPP` there, and why it is put not at the ctrl of the CastPP, but of the phi? I suppose the ctrl of the phi is correct, but we do lose information there, and that later prevents the `CastPP` to common. When the `Phi` is removed because all of its inputs are the same once uncasted, there is a risk of losing a dependency. To prevent that, a `CastPP` is inserted. All we know is that some casts along some inputs of the `Phi` may carry a dependency that we don't want to loose. The only possible control for the `CastPP` then is the one of the `Phi`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3562330911 From roland at openjdk.org Fri Nov 21 10:19:31 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Nov 2025 10:19:31 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 10:13:31 GMT, Roland Westrelin wrote: >> @rwestrel >>> Then Phi#514 which has 2 CastPPs as input with identical inputs is >> transformed into another CastPP at the Phi constrol with the data >> control of the CastPP as input. PhiNode::unique_input() with >> uncast = true is where that happens. That's where things go wrong I >> think. >> >> Right, this is where we go from 2->3 `CastPP`. Every additional `CastPP` with the same input seems to be a liability. >> >>> The 2 CastPPs have the same data input but not same control and igvn can't common them. >> >> Do you know why we insert a new `CastPP` there, and why it is put not at the ctrl of the CastPP, but of the phi? >> I suppose the ctrl of the phi is correct, but we do lose information there, and that later prevents the `CastPP` to common. >> >>> The fix I propose is to delay the call to PhiNode::unique_input() >> with uncast = true if the Phi's inputs are cast nodes and have yet >> to be processed by igvn. This causes identical CastPPs to common and >> then only the Phi has 2 identical inputs is transformed to that >> input (rather than have a new CastPPs be created at a different >> control). >> >> Ok, so in this case, we prevent the phi from looking through the two CastPP and creating a new third one, because that would create a third CastPP with a new ctrl, that we cannot fold away later. Rather, we hope that the two CastPP get commoned first. >> >> Ok, it is starting to make sense to me now. >> >> @rwestrel Does what I describe match what you tried to explain so far? > >> Do you know why we insert a new `CastPP` there, and why it is put not at the ctrl of the CastPP, but of the phi? I suppose the ctrl of the phi is correct, but we do lose information there, and that later prevents the `CastPP` to common. > > When the `Phi` is removed because all of its inputs are the same once uncasted, there is a risk of losing a dependency. To prevent that, a `CastPP` is inserted. All we know is that some casts along some inputs of the `Phi` may carry a dependency that we don't want to loose. The only possible control for the `CastPP` then is the one of the `Phi`. > @rwestrel Do you think we could have the same assert also after every IGVN? That a chain of `AddP` all have the same base? I think that would be a nice addition to this fix here, and would strengthen the invariant. > > This could be part of `VerifyIterativeGVN`. That would make sense I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3562338524 From roland at openjdk.org Fri Nov 21 10:21:00 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Nov 2025 10:21:00 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 12:19:02 GMT, Christian Hagedorn wrote: > ### Strong Connection between Template Assertion Predicate and Counted Loop > In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. > > #### Maintaining this Property > In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 > > All other opaque nodes are removed. > > ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes > As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 > > ### Violating the Additional Verification with `-XX:+StressLoopBackedge` > In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: > > image > > After duplicate backedge, the Template Assertion Predicates are now at the outer non-counted `Loop`: > > References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> <_sItxPCxhSDj4-O3z7bN1PJwL3V1I06FQhCSYa6lDTA=.1d659740-1016-42df-b35c-28733dab8522@github.com> Message-ID: On Thu, 20 Nov 2025 20:57:42 GMT, Saranya Natarajan wrote: >> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). >> >> ### Fix >> Implemented the suggested refactoring. >> >> ### Testing >> Github Actions, Tier 1-3 > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > fix for merge mistake Thanks again @sarannat. I just have one more nit/doubt. Looks good to me otherwise. src/hotspot/share/opto/idealGraphPrinter.cpp line 80: > 78: print_property(true, "mask", buffer); > 79: print_property(true, "mask_size", lrg.mask_size()); > 80: if (lrg._degree_valid) { I know it was there already, but do we still need this if? ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/26902#pullrequestreview-3492074926 PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2549251366 From roland at openjdk.org Fri Nov 21 10:24:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Nov 2025 10:24:46 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: <9rO80v9ZGSbJq9LkOHzl91DIp9uW6dGeTQeTIPwr6hw=.5a00ec0f-b5b4-45c2-a349-e1029a7e6fc3@github.com> On Thu, 13 Nov 2025 16:22:30 GMT, Emanuel Peter wrote: > What if the CastPP above the phi are not yet on the worklist, because their inputs are only later going to change? But that would require CastPP to have different inputs/ctrl in the first place, and that probably cannot happen from unrolling the loop, or other similar operations, such as pre/main/post? It's hard to see how that would happen. It's also hard to be convinced it can't happen. > Another thought: if it is so important that we common the just duplicated `CastPP` first, then maybe we really should take care not to duplicate them in the first place. Is that a feasible alternative approach? The duplication comes from loop body cloning so I'm not sure how we could prevent the duplication. We could try to common the `CastPP` nodes once `PhaseIdealLoop::peeled_dom_test_elim()` is called. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3562360158 From snatarajan at openjdk.org Fri Nov 21 10:28:47 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Fri, 21 Nov 2025 10:28:47 GMT Subject: RFR: 8349835: C2: Simplify IGV property printing [v9] In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: > The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). > > ### Fix > Implemented the suggested refactoring. > > ### Testing > Github Actions, Tier 1-3 Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review comments on Matcher* ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26902/files - new: https://git.openjdk.org/jdk/pull/26902/files/eaec2213..d26073a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=07-08 Stats: 7 lines in 1 file changed: 1 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26902/head:pull/26902 PR: https://git.openjdk.org/jdk/pull/26902 From snatarajan at openjdk.org Fri Nov 21 10:44:28 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Fri, 21 Nov 2025 10:44:28 GMT Subject: RFR: 8349835: C2: Simplify IGV property printing [v8] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> <_sItxPCxhSDj4-O3z7bN1PJwL3V1I06FQhCSYa6lDTA=.1d659740-1016-42df-b35c-28733dab8522@github.com> Message-ID: <-HT5Rtc0fSYm16hDb1cDF7pnYEsBuhKXdCOUYe7v9OM=.40199b4a-d36d-4ac1-a4aa-c8db5e17f15f@github.com> On Fri, 21 Nov 2025 10:16:23 GMT, Damon Fenacci wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> fix for merge mistake > > src/hotspot/share/opto/idealGraphPrinter.cpp line 80: > >> 78: print_property(true, "mask", buffer); >> 79: print_property(true, "mask_size", lrg.mask_size()); >> 80: if (lrg._degree_valid) { > > I know it was there already, but do we still need this if? When I did the testing as suggested by @robcasloz, I noticed that without the` if (lrg._degree_valid)`, the call to `lrg.degree()` crashes when` lrg._degree_valid` is not true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2549326750 From chagedorn at openjdk.org Fri Nov 21 10:45:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 21 Nov 2025 10:45:01 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 12:19:02 GMT, Christian Hagedorn wrote: > ### Strong Connection between Template Assertion Predicate and Counted Loop > In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. > > #### Maintaining this Property > In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 > > All other opaque nodes are removed. > > ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes > As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 > > ### Violating the Additional Verification with `-XX:+StressLoopBackedge` > In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: > > image > > After duplicate backedge, the Template Assertion Predicates are now at the outer non-counted `Loop`: > > References: Message-ID: <2SSes0JlZvm16cc_d0_NetssM5opgnZ2Ii1-Uw-ZdTQ=.93a5b991-1cb4-4f29-94bc-9ec01c247063@github.com> > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Format corrections - Sharpening biasing candidate selection based on RegisterMask. Review feedback incorporated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/ee8b0368..f0513b87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=12-13 Stats: 81 lines in 9 files changed: 25 ins; 7 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Fri Nov 21 10:53:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 21 Nov 2025 10:53:54 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v13] In-Reply-To: <0SEns_tIvS8plwKppk2cq8FekKyH_rHeysJVLV9rbSI=.2bec3568-ab0f-449d-95a4-59690e4a13d0@github.com> References: <0SEns_tIvS8plwKppk2cq8FekKyH_rHeysJVLV9rbSI=.2bec3568-ab0f-449d-95a4-59690e4a13d0@github.com> Message-ID: On Fri, 21 Nov 2025 08:33:36 GMT, Daniel Lund?n wrote: >> Speaking of implicit invariants, `Node::PD::Flag_ndd_commutative` implies `Node::PD::Flag_ndd_demotable` is also set. Please, add an assert to catch missing `Node::PD::Flag_ndd_demotable` flag. >> >> Another constraint to assert: `mdef->operand_num_edges(oper_index) == 1` should be true for 1st operand when `Node::PD::Flag_ndd_demotable` is set and, also, for 2nd operand when `Node::PD::Flag_ndd_commutative` is set. >> >> Also, any operand ordering constraints in AD instruction declaration? Is it possible to mess the order of declarations, so register biasing candidate operands don't occur as 1st and 2nd operands? > > Two nits: > - Comma splice: "overlap, in such a scenario" -> "overlap. In such a scenario" > - "For commutative operation allocation of definition operand can also be biased towards second operand" reads a bit strange to me. Perhaps "Commutative operation allocations of definition operands can also be biased towards the second operand"? > Speaking of implicit invariants, `Node::PD::Flag_ndd_commutative` implies `Node::PD::Flag_ndd_demotable` is also set. Please, add an assert to catch missing `Node::PD::Flag_ndd_demotable` flag. > > Another constraint to assert: `mdef->operand_num_edges(oper_index) == 1` should be true for 1st operand when `Node::PD::Flag_ndd_demotable` is set and, also, for 2nd operand when `Node::PD::Flag_ndd_commutative` is set. Extended Flag_ndd_commutative semantics to imply demotion: Flag_ndd_demotable_commutative > > Also, any operand ordering constraints in AD instruction declaration? Is it possible to mess the order of declarations, so register biasing candidate operands don't occur as 1st and 2nd operands? Extended this to make the candidate selection operand order agnostic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2549358629 From rrich at openjdk.org Fri Nov 21 11:03:58 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 21 Nov 2025 11:03:58 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> Message-ID: <6rf41XebDLm6Ja6iS4nHNrTrzlogRheQK-z-5CvkXCk=.cb31fc4f-f8ca-4d56-8e3d-06bcfae5f4ed@github.com> On Thu, 20 Nov 2025 10:21:34 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' > - Exclude IR check on riscv with rvv > - Enhance comment > - Fix OptoAssembly for Power 8 > - PPC: OptoAssembly for vector spilling > - Assert aligned sp offsets in vector spilling > - Delete TMP and !UseNewCode > - Align Matcher::_new_SP for better vector spilling > - TMP: trace unaligned vector spilling > - Add test I'd like to give a little example that's supposed to show that this pr will help reduce frame size rather then increase it. Example: - VectorX v1, v2 are spilled - register sets are aligned to the set size, here SlotsPerVecX = 4 - simplification: no out args, out preserve Baseline: _new_SP always aligned to SlotsPerLong = 2 Slots 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... 99 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- | | | | | | |un-| | | | | | | | | |usd| v1 | v2 | locks | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ^ | _new_SP _old_SP |<--- own frame --->| Spill area: slots 6 - 15 = 10 slots Frame size (_new_SP to _old_SP ([see diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3837-L3856))) = 100 - 6 = 94 slots Slots 6 and 7 are unused because v1 and v2 are aligned to their size. They are part of the frame. Pr: _new_SP aligned SlotsPerVecX = 4 because there are spills of that size Slots 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... 99 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- | | | | | | |un-| | | | | | | | | |usd| v1 | v2 | locks | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ^ | _new_SP _old_SP |<--- own frame --->| Spill area: slots 8 - 15 = 8 slots Frame size (_new_SP to _old_SP ([see diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3837-L3856))) = 100 - 8 = 92 slots Slots 6 and 7 are still only used for alignment but they are not part of the frame. The resulting frame size is smaller with this pr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27969#issuecomment-3562513261 From snatarajan at openjdk.org Fri Nov 21 11:10:35 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Fri, 21 Nov 2025 11:10:35 GMT Subject: RFR: 8349835: C2: Simplify IGV property printing [v10] In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: > The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). > > ### Fix > Implemented the suggested refactoring. > > ### Testing > Github Actions, Tier 1-3 Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review comments - removing reduntant lrg._degree_valid ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26902/files - new: https://git.openjdk.org/jdk/pull/26902/files/d26073a6..0cbbf30e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26902/head:pull/26902 PR: https://git.openjdk.org/jdk/pull/26902 From roland at openjdk.org Fri Nov 21 11:17:15 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Nov 2025 11:17:15 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v5] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 16:20:26 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8370939 > - review > - more > - more > - more > - more > - test > - fix Anyone else for a review of this bug fix? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3562563197 From roland at openjdk.org Fri Nov 21 11:19:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Nov 2025 11:19:51 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v3] In-Reply-To: References: Message-ID: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into JDK-8351889 - verif - Merge branch 'master' into JDK-8351889 - test seed - more - Merge branch 'master' into JDK-8351889 - Merge branch 'master' into JDK-8351889 - more - test - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25386/files - new: https://git.openjdk.org/jdk/pull/25386/files/bf984838..d52f2ded Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=01-02 Stats: 525999 lines in 5163 files changed: 366993 ins; 99823 del; 59183 mod Patch: https://git.openjdk.org/jdk/pull/25386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25386/head:pull/25386 PR: https://git.openjdk.org/jdk/pull/25386 From roland at openjdk.org Fri Nov 21 11:19:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Nov 2025 11:19:51 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 10:15:18 GMT, Roland Westrelin wrote: > This could be part of `VerifyIterativeGVN`. I added some verification code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3562560562 From rcastanedalo at openjdk.org Fri Nov 21 11:26:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 21 Nov 2025 11:26:18 GMT Subject: RFR: 8349835: C2: Simplify IGV property printing [v10] In-Reply-To: References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: <_p_9ih1N3ar7RViMmUMz0tZOauBykfeWGTK6sP_3e34=.dc028ca9-2361-445f-b9f7-3b3ccea08d5d@github.com> On Fri, 21 Nov 2025 11:10:35 GMT, Saranya Natarajan wrote: >> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). >> >> ### Fix >> Implemented the suggested refactoring. >> >> ### Testing >> Github Actions, Tier 1-3 > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments - removing reduntant lrg._degree_valid Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26902#pullrequestreview-3492326134 From roland at openjdk.org Fri Nov 21 11:29:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Nov 2025 11:29:11 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v6] In-Reply-To: References: Message-ID: <3_O4WsDAECSNuSxFasov6t2ySprWFnYaiXB4Tqr_Emw=.f18fa14f-c349-464c-bda6-ef1e41ede7c2@github.com> > In: > > > for (int i = 100; i < 1100; i++) { > v += floatArray[i - 100]; > Objects.checkIndex(i, longRange); > } > > > The int counted loop has both an int range check and a long range. The > int range check is optimized first. Assertion predicates are inserted > above the loop. One predicates checks that: > > > init - 100 > > The loop is then transformed to enable the optimization of the long > range check. The loop is short running, so there's no need to create a > loop nest. The counted loop is mostly left as is but, the loop's > bounds are changed from: > > > for (int i = 100; i < 1100; i++) { > > > to: > > > for (int i = 0; i < 1000; i++) { > > > The reason for that the long range check transformation expects the > loop to start at 0. > > Pre/main/post loops are created. Template Assertion predicates are > added above the main loop. The loop is unrolled. Initialized assertion > predicates are created. The one created from the condition: > > > init - 100 > > checks the value of `i` out of the pre loop which is 1. That check fails. > > The root cause of the failure is that when bounds of the counted loop > are changed, template assertion predicates need to be updated with and > adjusted init input. > > When the bounds of the loop are known, the assertion predicates can be > updated in place. Otherwise, when the loop is speculated to be short > running, the assertion predicates are updated when they are cloned. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' into JDK-8366888 - review - Merge branch 'master' into JDK-8366888 - Merge branch 'master' into JDK-8366888 - whitespaces - review - Merge branch 'master' into JDK-8366888 - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - ... and 3 more: https://git.openjdk.org/jdk/compare/48d8bbb7...2d329d48 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27250/files - new: https://git.openjdk.org/jdk/pull/27250/files/b0d7aab1..2d329d48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=04-05 Stats: 61237 lines in 847 files changed: 41519 ins; 14019 del; 5699 mod Patch: https://git.openjdk.org/jdk/pull/27250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27250/head:pull/27250 PR: https://git.openjdk.org/jdk/pull/27250 From roland at openjdk.org Fri Nov 21 11:29:13 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Nov 2025 11:29:13 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop In-Reply-To: <1EgDjfhpch9SuqvjEuZUyB0Y_NzmeBEWmDWRK-C0XEY=.3ebe62c7-abfa-426e-90c8-fafc2750f6a2@github.com> References: <1EgDjfhpch9SuqvjEuZUyB0Y_NzmeBEWmDWRK-C0XEY=.3ebe62c7-abfa-426e-90c8-fafc2750f6a2@github.com> Message-ID: On Fri, 10 Oct 2025 10:25:43 GMT, Christian Hagedorn wrote: >> In: >> >> >> for (int i = 100; i < 1100; i++) { >> v += floatArray[i - 100]; >> Objects.checkIndex(i, longRange); >> } >> >> >> The int counted loop has both an int range check and a long range. The >> int range check is optimized first. Assertion predicates are inserted >> above the loop. One predicates checks that: >> >> >> init - 100 > >> >> The loop is then transformed to enable the optimization of the long >> range check. The loop is short running, so there's no need to create a >> loop nest. The counted loop is mostly left as is but, the loop's >> bounds are changed from: >> >> >> for (int i = 100; i < 1100; i++) { >> >> >> to: >> >> >> for (int i = 0; i < 1000; i++) { >> >> >> The reason for that the long range check transformation expects the >> loop to start at 0. >> >> Pre/main/post loops are created. Template Assertion predicates are >> added above the main loop. The loop is unrolled. Initialized assertion >> predicates are created. The one created from the condition: >> >> >> init - 100 > >> >> checks the value of `i` out of the pre loop which is 1. That check fails. >> >> The root cause of the failure is that when bounds of the counted loop >> are changed, template assertion predicates need to be updated with and >> adjusted init input. >> >> When the bounds of the loop are known, the assertion predicates can be >> updated in place. Otherwise, when the loop is speculated to be short >> running, the assertion predicates are updated when they are cloned. > > I'll have a look today or on Monday :-) @chhagedorn I would need you to re-approve that one with the merge to latest + the tweak to the comment that Benoit asked for. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27250#issuecomment-3562593118 From roland at openjdk.org Fri Nov 21 11:33:42 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Nov 2025 11:33:42 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v6] In-Reply-To: References: Message-ID: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into JDK-8370939 - review - Merge branch 'master' into JDK-8370939 - review - more - more - more - more - test - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28088/files - new: https://git.openjdk.org/jdk/pull/28088/files/2cc796b1..bf46ba3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=04-05 Stats: 61237 lines in 847 files changed: 41519 ins; 14019 del; 5699 mod Patch: https://git.openjdk.org/jdk/pull/28088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28088/head:pull/28088 PR: https://git.openjdk.org/jdk/pull/28088 From galder at openjdk.org Fri Nov 21 12:43:43 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 21 Nov 2025 12:43:43 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 12:12:16 GMT, Christian Hagedorn wrote: >> Trivial cleanup to move tests out of a test class whose description does not match these tests > > test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxLongLoopBarrier.java line 2: > >> 1: /* >> 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. > > Not sure if this should be 2025 instead even though the code was added in 2024. No strong opinion, though. Hmmm, I was wondering that too. I guess the file is new for 2025 so I'll go with that instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28385#discussion_r2549642466 From galder at openjdk.org Fri Nov 21 12:53:00 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 21 Nov 2025 12:53:00 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v2] In-Reply-To: References: Message-ID: > Trivial cleanup to move tests out of a test class whose description does not match these tests Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Change copyright to Amazon ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28385/files - new: https://git.openjdk.org/jdk/pull/28385/files/2ed8c4bf..bb287ba6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28385&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28385&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28385/head:pull/28385 PR: https://git.openjdk.org/jdk/pull/28385 From galder at openjdk.org Fri Nov 21 12:53:01 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 21 Nov 2025 12:53:01 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v2] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 12:40:48 GMT, Galder Zamarre?o wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxLongLoopBarrier.java line 2: >> >>> 1: /* >>> 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. >> >> Not sure if this should be 2025 instead even though the code was added in 2024. No strong opinion, though. > > Hmmm, I was wondering that too. I guess the file is new for 2025 so I'll go with that instead. I've changed the copyright to Amazon since @caojoshua added the test originally. Their copyright notice does not have year so I added it as is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28385#discussion_r2549659197 From kxu at openjdk.org Fri Nov 21 15:54:08 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 21 Nov 2025 15:54:08 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v23] In-Reply-To: References: Message-ID: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: fix trip counter loop-variant detection ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/584a6968..392a010d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=21-22 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From chagedorn at openjdk.org Fri Nov 21 16:37:21 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 21 Nov 2025 16:37:21 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v23] In-Reply-To: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> References: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> Message-ID: On Fri, 21 Nov 2025 15:54:08 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix trip counter loop-variant detection Was too busy this week, will try to come back to this next week! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3563799029 From vpaprotski at openjdk.org Fri Nov 21 17:17:53 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 21 Nov 2025 17:17:53 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2] In-Reply-To: References: <-NP71XXG0bisxVHds8O-uXhLZqbnVLijJoJDwVq2ZBk=.2478c442-fc34-4ba0-9811-1f910ee3ee36@github.com> Message-ID: <0zIQmkXqAv1UktDyJ4wh83qqB7FGS9bM80Z3562IuHs=.f1499d7d-b025-4cf4-b7a9-b9436d0f9ab3@github.com> On Thu, 20 Nov 2025 23:39:05 GMT, Vladimir Ivanov wrote: >>> I understand your reasons. The question is whether you'll need the microbenchmark in the future. If no (or probably no), please remove the micro. If needed, please move it from the "org.openjdk.bench.javax.crypto.full" package to "org.openjdk.bench.javax.crypto". It is supposed to have only public API micros in packages "small" and "full" >> >> @kuksenko I decided to just remove it. If anyone wants it back, its in my git history (I usually keep my branches after merge..) > >> If anyone wants it back, its in my git history (I usually keep my branches after merge..) > > You could put a comment with the link into JBS issue to make it easier to discover later. (Or just attach the source file there.) @iwanowww thanks for the suggestion! attached to JBS. @mcpowers would you mind running your internal test suite for this PR? I am thinking of integrating early next week, if no objections; getting close to the release deadline, dont want to cut it even closer.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3563948572 From bmaillard at openjdk.org Fri Nov 21 18:31:11 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 21 Nov 2025 18:31:11 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar Message-ID: This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 In our case, it happens that the `Load` node gets folded to a constant during the initial `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only has one usage, and this triggers the optimization during verification. static int test0() { var c = new MyClass(); // the conversion ensures that the ConL node only has one use // in the end, which triggers the optimization return (int) c.l; } The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in `PhaseGVN::transform`. For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with `can_reshape` later. This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` prevents its from occurring when boxing elimination is enabled. Boxing elimination is disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear that the issue was on mainline. ### Testing - [x] [GitHub Actions](TODO) - [x] tier1-3, plus some internal testing Thank you for reviewing! ------------- Commit messages: - Record in GraphKit::insert_mem_bar_volatile for consistency - Improve test and fix - Add test Changes: https://git.openjdk.org/jdk/pull/28448/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8367627 Stats: 80 lines in 2 files changed: 80 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28448/head:pull/28448 PR: https://git.openjdk.org/jdk/pull/28448 From kvn at openjdk.org Fri Nov 21 18:53:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Nov 2025 18:53:07 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 16:48:01 GMT, Aleksey Shipilev wrote: > As you can see in the report, current matchers rely heavily on mainline C2 implementation to match specific stores. This fails when we try to backport MergeStores fixes to 25u. It would be better to relax the matchers a bit to cater for 25u backports, and also making test more robust for future MergeStores changes, if any. > > Additional testing: > - [x] Linux x86_64 server fastdebug, mainline, affected test, 100x > - [x] Linux AArch64 server fastdebug, mainline, affected test, 100x > - [x] Linux x86_64 server fastdebug, jdk25u, affected test, 100x Looks good to me. I submitted testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/28437#pullrequestreview-3493982807 From kvn at openjdk.org Fri Nov 21 18:59:19 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Nov 2025 18:59:19 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 16:48:01 GMT, Aleksey Shipilev wrote: > As you can see in the report, current matchers rely heavily on mainline C2 implementation to match specific stores. This fails when we try to backport MergeStores fixes to 25u. It would be better to relax the matchers a bit to cater for 25u backports, and also making test more robust for future MergeStores changes, if any. > > Additional testing: > - [x] Linux x86_64 server fastdebug, mainline, affected test, 100x > - [x] Linux AArch64 server fastdebug, mainline, affected test, 100x > - [x] Linux x86_64 server fastdebug, jdk25u, affected test, 100x @eme64 please look on this too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28437#issuecomment-3564247375 From jiangli at openjdk.org Fri Nov 21 19:33:11 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Nov 2025 19:33:11 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v3] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 00:14:43 GMT, Jiangli Zhou wrote: >> test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 65: >> >>> 63: new GCMParameterSpec(8 * TAG_SIZE_IN_BYTES, nonce, 0, nonce.length); >>> 64: Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding"); >>> 65: cipher.init(Cipher.ENCRYPT_MODE, keySpec, params); >> >> Er. This is used from `gcmDecrypt`? How does it work without `Cipher.DECRYPT_MODE`? > > Good catch. Interestingly the test passed for me on my local machine. Fixed to use Cipher.DECRYPT_MODE when doing gcmDecrypt. > > Also an interesting new finding, with the decrypted message verification, I see there are 2 failures out of 200 runs with AVX512. I'm filing a new issue on the specifically, so it can be investigated. Filed https://bugs.openjdk.org/browse/JDK-8372364. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2550760003 From kvn at openjdk.org Fri Nov 21 20:31:32 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Nov 2025 20:31:32 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Wed, 24 Sep 2025 13:08:14 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Drop atomic counters > - Initial version Few comments. src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 1258: > 1256: Register recv) { > 1257: int mdp_offset = md->byte_offset_of_slot(data, in_ByteSize(0)); > 1258: __ type_profile(recv, mdo, mdp_offset); I looked on callers and `mdo` is not used after this code. I think we can convert it into `mdp` by adding `mdp_offset` so you don't need 3rd parameter for `type_profile()`. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4807: > 4805: > 4806: Register offset = rscratch1; > 4807: assert_different_registers(mdp, recv, offset); We also have `rscratch2` which we can use for registers shuffling in the following code. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4818: > 4816: addptr(offset, receiver_step); > 4817: cmpptr(offset, end_receiver_offset); > 4818: jccb(Assembler::notEqual, L_loop); Fix indention since these instructions also in the loop. ------------- PR Review: https://git.openjdk.org/jdk/pull/25305#pullrequestreview-3494071362 PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2550739123 PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2550800089 PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2550750927 From kvn at openjdk.org Fri Nov 21 20:59:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Nov 2025 20:59:59 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 19:19:20 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Drop atomic counters >> - Initial version > > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 1258: > >> 1256: Register recv) { >> 1257: int mdp_offset = md->byte_offset_of_slot(data, in_ByteSize(0)); >> 1258: __ type_profile(recv, mdo, mdp_offset); > > I looked on callers and `mdo` is not used after this code. I think we can convert it into `mdp` by adding `mdp_offset` so you don't need 3rd parameter for `type_profile()`. Hmm, but it will be additional instruction. I withdraw this suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2550940821 From vlivanov at openjdk.org Fri Nov 21 22:07:08 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 21 Nov 2025 22:07:08 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v4] In-Reply-To: References: Message-ID: <_qg1PKd-1JVDr1utbe5Mhc9bDuB4PSl-Eth6-QqyCY4=.25578acf-ce2c-4503-a605-9c0c4de5127d@github.com> On Thu, 13 Nov 2025 19:04:56 GMT, Aleksey Shipilev wrote: >> I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. >> >> At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. >> >> It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. >> >> Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails >> - [x] Linux x86_64 server fastdebug, `all` tests pass >> - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8371581-ccp-spooky-nodes > - More comments > - More restrictive CmpP check > - Tighten up comments and signatures > - Do Value() once > - Fix Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28288#pullrequestreview-3494518124 From dlong at openjdk.org Fri Nov 21 22:57:37 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Nov 2025 22:57:37 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v6] In-Reply-To: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> References: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> Message-ID: On Fri, 21 Nov 2025 11:33:42 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370939 > - review > - Merge branch 'master' into JDK-8370939 > - review > - more > - more > - more > - more > - test > - fix I don't see why we need _has_mh_late_inlines at all. During parse, we can just check _late_inlines.length() == 0, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3564903954 From dlong at openjdk.org Fri Nov 21 23:37:03 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Nov 2025 23:37:03 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:14:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into typejoin > - Move dual to ASSERT only > - Keep old version for verification > - whitespace > - Reimplement Type::join Regarding the example where JrtFileSystemProvider:exact join sun/nio/fs/LinuxFileSystemProvider = FileSystemProvider:AnyNull *,iid=top I think that can be considered a non-canonical result, similar to comparing different non-canonical versions of NaN in floating point. In my opinion the `FileSystemProvider` from the common superclass is wrong for join, and is just an artifact of how join is using meet and dual. Also, I don't see how the AnyNull is useful here either. I think a canonical Type::TOP would be better. I believe Graal will return either empty() or unrestricted() for similar cases. To get this in real code, we would need something like: void func(LinuxFileSystemProvider p) { if (p.getClass() == JrtFileSystemProvider.class) { // This is unreachable dead code because JrtFileSystemProvider is not a subclass of LinuxFileSystemProvider Object pp = p; // What can we deduce about the type here? } } but the type of "pp" seems irrelevant if this is dead code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3564990002 From kvn at openjdk.org Fri Nov 21 23:45:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Nov 2025 23:45:11 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 16:48:01 GMT, Aleksey Shipilev wrote: > As you can see in the report, current matchers rely heavily on mainline C2 implementation to match specific stores. This fails when we try to backport MergeStores fixes to 25u. It would be better to relax the matchers a bit to cater for 25u backports, and also making test more robust for future MergeStores changes, if any. > > Additional testing: > - [x] Linux x86_64 server fastdebug, mainline, affected test, 100x > - [x] Linux AArch64 server fastdebug, mainline, affected test, 100x > - [x] Linux x86_64 server fastdebug, jdk25u, affected test, 100x My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28437#pullrequestreview-3494718922 From vlivanov at openjdk.org Fri Nov 21 23:46:09 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 21 Nov 2025 23:46:09 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v6] In-Reply-To: References: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> Message-ID: On Fri, 21 Nov 2025 22:53:45 GMT, Dean Long wrote: > I don't see why we need _has_mh_late_inlines at all. During parse, we can just check _late_inlines.length() == 0, right? `_has_mh_late_inlines` and `_late_inlines.length() == 0` are not equivalent. A MH late inline candidate is not placed on `_late_inlines` unless it is eligible for inlining. But now I see a slight change in behavior in the following part of `Compile::Compile`: if (_late_inlines.length() == 0 && !has_mh_late_inlines() && !failing() && has_stringbuilder()) { inline_string_calls(true); } After `dec_number_of_mh_late_inlines()` is gone, `inline_string_calls()` won't be called during parsing if any MH late inline calls are observed irrespective of whether they are all inlined by that point or not. Roland, do you see any problem with it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3565004490 From snatarajan at openjdk.org Fri Nov 21 23:48:54 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Fri, 21 Nov 2025 23:48:54 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness Message-ID: **Issue:** Some compiler tests uses randomization but does not have `@key randomness` in the jtreg header. **Fix:** The list of test cases that did not have `@key randomness` were listed using `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"`. This PR adds `@key randomness` to these tests. **Note:** The following tests that are still listed with `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"` after this PR are confirmed to be helper or support file for actual test. _test/hotspot/jtreg/compiler/codegen/aes/TestAESBase.java test/hotspot/jtreg/compiler/compilercontrol/jcmd/StressAddJcmdBase.java test/hotspot/jtreg/compiler/compilercontrol/parser/HugeDirectiveUtil.java test/hotspot/jtreg/compiler/compilercontrol/share/scenario/CommandGenerator.java test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java test/hotspot/jtreg/compiler/lib/ir_framework/test/ArgumentValue.java test/hotspot/jtreg/compiler/lib/ir_framework/AbstractInfo.java test/hotspot/jtreg/compiler/lib/ir_framework/CompLevel.java test/hotspot/jtreg/compiler/lib/generators/Generators.java test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java test/hotspot/jtreg/compiler/intrinsics/mathexact/Verify.java test/hotspot/jtreg/compiler/intrinsics/bmi/BMITestRunner.java test/hotspot/jtreg/compiler/intrinsics/unsafe/ByteBufferTest.java test/hotspot/jtreg/compiler/arraycopy/stress/StressBooleanArrayCopy.java test/hotspot/jtreg/compiler/arraycopy/stress/StressIntArrayCopy.java test/hotspot/jtreg/compiler/arraycopy/stress/StressLongArrayCopy.java test/hotspot/jtreg/compiler/arraycopy/stress/StressCharArrayCopy.java test/hotspot/jtreg/compiler/arraycopy/stress/StressObjectArrayCopy.java test/hotspot/jtreg/compiler/arraycopy/stress/StressByteArrayCopy.java test/hotspot/jtreg/compiler/arraycopy/stress/StressFloatArrayCopy.java test/hotspot/jtreg/compiler/arraycopy/stress/StressShortArrayCopy.java test/hotspot/jtreg/compiler/arraycopy/stress/StressDoubleArrayCopy.java test/hotspot/jtreg/compiler/codecache/cli/codeheapsize/JVMStartupRunner.java test/hotspot/jtreg/compiler/vectorapi/reshape/utils/VectorReshapeHelper.java test/hotspot/jtreg/compiler/jvmci/compilerToVM/DummyClass.java_ ------------- Commit messages: - initial fix Changes: https://git.openjdk.org/jdk/pull/28463/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28463&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370489 Stats: 105 lines in 40 files changed: 87 ins; 6 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/28463.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28463/head:pull/28463 PR: https://git.openjdk.org/jdk/pull/28463 From dlong at openjdk.org Sat Nov 22 00:10:51 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 22 Nov 2025 00:10:51 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:14:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into typejoin > - Move dual to ASSERT only > - Keep old version for verification > - whitespace > - Reimplement Type::join Let's say we wanted to preserve duality and do the reverse of what we have now, and implement meet on top of join. Then our current data structure representation is not expressive enough and we lose information, because it only remembers a single Klass type. To represent T1.join(T2).join(T3), and not lose information needed by dual and meet, it seems like we would need to remember all 3 types in the worse case. I think this means we cannot guarantee perfect symmetry or "mirror image" without changing our representation. The good news is I don't think we need perfect symmetry. I think what Cliff attributes to symmetry in his youtube video really comes from commutative and associative, and symmetry is only need if we want to reduce meet to join or join to meet through the dual. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3565049965 From sviswanathan at openjdk.org Sat Nov 22 00:33:11 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 22 Nov 2025 00:33:11 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v7] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 01:31:39 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Change to just create a byte array for 'nonce' without generating random data in gcmDecrypt. Suggested by AI. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3531: > 3529: __ subl(len, 16 * 16); > 3530: __ cmpl(len, 16 * 16); > 3531: __ jcc(Assembler::lessEqual, ENC_DEC_DONE); I think the fix should instead be to just move the addl to pos before the MESG_BELOW_32_BLKS, as below: + __ addl(pos, 16 * 16); __ bind(MESG_BELOW_32_BLKS); __ subl(len, 16 * 16); - __ addl(pos, 16 * 16); This is because on fall through path addl is needed but not while coming from line 3479 via jcc. For the latter, the addl has already been done on line 3477. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2551375508 From dlong at openjdk.org Sat Nov 22 00:47:53 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 22 Nov 2025 00:47:53 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 10:14:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into typejoin > - Move dual to ASSERT only > - Keep old version for verification > - whitespace > - Reimplement Type::join Also, I think it would make sense to target this for jdk 27 after the fork. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3565147827 From duke at openjdk.org Sat Nov 22 01:18:37 2025 From: duke at openjdk.org (Shawn M Emery) Date: Sat, 22 Nov 2025 01:18:37 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v4] In-Reply-To: <0HJnSUSQA8RuwnNxu-SiGvZTzHYLJ5kY0_B6lG2EbAQ=.10868fac-1516-4a80-b4e5-9ff14997ba01@github.com> References: <0HJnSUSQA8RuwnNxu-SiGvZTzHYLJ5kY0_B6lG2EbAQ=.10868fac-1516-4a80-b4e5-9ff14997ba01@github.com> Message-ID: On Tue, 18 Nov 2025 21:48:12 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Remove K from AES_Crypt The updated intrinsics changes looks good as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3565218590 From serb at openjdk.org Sat Nov 22 02:31:53 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Sat, 22 Nov 2025 02:31:53 GMT Subject: RFR: 8365071: ARM32: JFR intrinsic jvm_commit triggers C2 regalloc assert In-Reply-To: References: <6MHwDW0E9bOzpj5B3pzlNmOCRPtFtnrk55NmTTxbhLM=.f0026c26-2c80-4766-8984-da9f34a31c8d@github.com> Message-ID: On Tue, 19 Aug 2025 04:38:54 GMT, Boris Ulasevich wrote: >> On 32-bit ARM, the jvm_commit JFR intrinsic builder feeds null (RegP) into a TypeLong Phi, causing mixed long/pointer register sizing and triggering the C2 register allocator assert(_num_regs == reg || !_num_regs). >> >> The fix is trivial: use an appropriate ConL constant instead. This has no effect on 64-bit systems (the generated assembly is identical) but resolves a JFR issue on 32-bit systems. > > Thanks! Hi @bulasevich, do you plan to backport this patch to jdk21u-dev? Seems it is also affected, I have encounter the same crash. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26684#issuecomment-3565352748 From jbhateja at openjdk.org Sat Nov 22 02:59:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 22 Nov 2025 02:59:21 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v15] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/f0513b87..72a5f876 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=13-14 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From vlivanov at openjdk.org Sat Nov 22 04:56:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 22 Nov 2025 04:56:52 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v15] In-Reply-To: References: Message-ID: <1l7k_cuDs1BmgVi3GOfp_UcreWl8uZEtWvXtZxgWSGk=.015b280b-cff7-40e1-acae-60a33a3d065f@github.com> On Sat, 22 Nov 2025 02:59:21 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated src/hotspot/share/opto/chaitin.cpp line 1522: > 1520: uint copy_lrg = _lrg_map.find(lrg._copy_bias); > 1521: OptoReg::Name reg = select_bias_lrg_color(lrg, copy_lrg); > 1522: if (reg != OptoReg::Bad) { Please, use `OptoReg::is_valid(reg)` here. I find it more readable. Also, there's repetitive pattern for `lrg._copy_bias` and `lrg._copy_bias2`. Would be nice to hide it behind a single `select_bias_lrg_color(_lrg_map, lrg)` call. src/hotspot/share/opto/chaitin.cpp line 1661: > 1659: } > 1660: > 1661: Node* def = lrg->_def; I'm concerned about the approach chosen here. It iterates over all operands trying to find a candidate for biasing irrespective of the shape of Mach node. Instead, I'd be much more comfortable with 2 operand probes at fixed positions (ideally, at indices 1 and 2). Any mismatches in Mach node shape should be reported. In other words, any failed operand probe on a mach node marked with `Flag_ndd_demotable` or `Flag_ndd_demotable_commutative` should trigger an assert. (Corresponding AD instructions can be adjusted to fit the desired pattern.) src/hotspot/share/opto/chaitin.cpp line 1663: > 1661: Node* def = lrg->_def; > 1662: MachNode* mdef = lrg->is_singledef() && !lrg->_is_bound && def->is_Mach() ? def->as_Mach() : nullptr; > 1663: if (mdef != nullptr) { Please, reshape it as follows: if (lrg->is_singledef() && !lrg->_is_bound && def->is_Mach()) { MachNode* mdef = def->as_Mach(); src/hotspot/share/opto/chaitin.cpp line 1665: > 1663: if (mdef != nullptr) { > 1664: int i = 1; > 1665: uint lrg_def = _lrg_map.find(def); The whole block can be guarded by `lrg->_copy_bias == 0` condition. src/hotspot/share/opto/chaitin.cpp line 1667: > 1665: uint lrg_def = _lrg_map.find(def); > 1666: for (; i < mdef->num_opnds(); i++) { > 1667: if (Matcher::is_register_biasing_candidate(mdef, 1, i)) { `_copy_bias` and `_copy_bias2` initialization code is mostly a duplication. Please, extract it into a helper function. src/hotspot/share/opto/chaitin.cpp line 1681: > 1679: // For commutative operation, def allocation can also be > 1680: // biased towards LRG of second input's def. > 1681: for (; i < mdef->num_opnds(); i++) { Same here (`lrg->_copy_bias2 == 0`). src/hotspot/share/opto/chaitin.cpp line 1686: > 1684: if (in2 != nullptr) { > 1685: uint lrg_in2 = _lrg_map.find(in2); > 1686: if (lrg_in2 != 0 && lrg->_copy_bias == 0 && !_ifg->test_edge_sq(lrg_def, lrg_in2)) { Do you have a typo here? (`s/_copy_bias/_copy_bias2/`) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2552100990 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2552064851 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2552067802 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2551966801 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2552072823 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2551968813 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2551970868 From duke at openjdk.org Sat Nov 22 08:56:37 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 22 Nov 2025 08:56:37 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v6] In-Reply-To: References: Message-ID: > If nodes both are constant, support constant folding. Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28097/files - new: https://git.openjdk.org/jdk/pull/28097/files/db57746d..053966a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=04-05 Stats: 138 lines in 4 files changed: 110 ins; 13 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From mdoerr at openjdk.org Sat Nov 22 11:04:53 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 22 Nov 2025 11:04:53 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v4] In-Reply-To: References: <0HJnSUSQA8RuwnNxu-SiGvZTzHYLJ5kY0_B6lG2EbAQ=.10868fac-1516-4a80-b4e5-9ff14997ba01@github.com> Message-ID: On Sat, 22 Nov 2025 01:15:11 GMT, Shawn M Emery wrote: > The updated intrinsics changes looks good as well, except why are lines 7456 and 8631 not changing in src/hotspot/share/opto/library_call.cpp? Thanks a lot for reviewing! These two lines use the default `is_decrypt = false` because `inline_counterMode_AESCrypt()` and `inline_galoisCounterMode_AESCrypt()` only do encryption. We could make that explicit if you prefer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3566535235 From qamai at openjdk.org Sat Nov 22 11:19:25 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 22 Nov 2025 11:19:25 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Sat, 22 Nov 2025 00:44:56 GMT, Dean Long wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into typejoin >> - Move dual to ASSERT only >> - Keep old version for verification >> - whitespace >> - Reimplement Type::join > > Also, I think it would make sense to target this for jdk 27 after the fork. @dean-long I think you are misunderstanding, the answer is incorrect because the result it gives, `java/nio/file/spi/FileSystemProvider:AnyNull *,iid=top`, is empty, while the correct answer is the set which contains the single value `null`, and is not empty. The reason for this inaccuracy is that there are 2 LCAs for the inputs on the lattice, and they do not subtype each other, we choose the wrong one out of those LCAs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3566574122 From duke at openjdk.org Sat Nov 22 14:48:17 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 22 Nov 2025 14:48:17 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v7] In-Reply-To: References: Message-ID: > If nodes both are constant, support constant folding. Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28097/files - new: https://git.openjdk.org/jdk/pull/28097/files/053966a1..dfbbd5da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=05-06 Stats: 33 lines in 4 files changed: 12 ins; 11 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From duke at openjdk.org Sat Nov 22 14:51:49 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 22 Nov 2025 14:51:49 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v5] In-Reply-To: <28yL0IfHtLkAOtMAGiLeFGNW7C-WdTRsIMalvsKeras=.60e1d08f-d403-4b38-88d4-1b51f361fd05@github.com> References: <28yL0IfHtLkAOtMAGiLeFGNW7C-WdTRsIMalvsKeras=.60e1d08f-d403-4b38-88d4-1b51f361fd05@github.com> Message-ID: On Fri, 21 Nov 2025 07:02:58 GMT, Hannes Greule wrote: >> I think TypeLong::make is doing the work your mentioned, do we need another function to do it? >> >> >> const TypeLong* TypeLong::make(jlong con) { >> julong ucon = con; >> return (new TypeLong(TypeIntPrototype{{con, con}, {ucon, ucon}, {~ucon, ucon}}, >> WidenMin, false))->hashcons()->is_long(); >> } > > Sorry if it wasn't clear, but the problem is that `multiply_high_unsigned` returns an *unsigned* long which you currently convert into a *signed* long. But from my understanding this is implementation-defined and I *think* you need to avoid that (I might be wrong though, happy to be corrected by someone else here :) ). That would mean you need to make `highResult` a `julong` and then you can't use `TypeLong::make` anymore as this would result in the same problem again. Understood, I add `TypeLong::make_unsigned` to solve it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2553171552 From duke at openjdk.org Sat Nov 22 15:09:51 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 22 Nov 2025 15:09:51 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v5] In-Reply-To: References: Message-ID: On Tue, 18 Nov 2025 15:32:54 GMT, Emanuel Peter wrote: > Would it be an idea to still have a `MulHiValue`, and then pass it in a `signed/unsigned` flag? That way we could avoid some code duplication. Because the only difference seems to be `multiply_high_signed` vs `multiply_high_unsigned`, right? It?s a good idea. But when i try to remove some duplicate code here, it's always have to check signed flag. like if (signed-flag) { ... } else { ... } The duplicate code only is if (t1 == Type::TOP || t2 == Type::TOP) { return Type::TOP; } if (t1 == TypeLong::ZERO || t2 == TypeLong::ZERO) { return TypeLong::ZERO; } So shared a helper will not to reduce the code complexity, for now I?d prefer to keep the two implementations separate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28097#issuecomment-3566785777 From duke at openjdk.org Sat Nov 22 15:15:52 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 22 Nov 2025 15:15:52 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v7] In-Reply-To: References: Message-ID: On Sat, 22 Nov 2025 14:48:17 GMT, Zihao Lin wrote: >> If nodes both are constant, support constant folding. > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > Fix In addition, I take some time to add a range-based folding for MulHiLNode and UMulHiLNode. Please help to take a look, thank you folks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28097#issuecomment-3566789337 From jbhateja at openjdk.org Sat Nov 22 18:35:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 22 Nov 2025 18:35:15 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v16] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Extending biaising heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/72a5f876..151b51af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=14-15 Stats: 101 lines in 9 files changed: 30 ins; 26 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Sat Nov 22 18:35:19 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 22 Nov 2025 18:35:19 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v15] In-Reply-To: <1l7k_cuDs1BmgVi3GOfp_UcreWl8uZEtWvXtZxgWSGk=.015b280b-cff7-40e1-acae-60a33a3d065f@github.com> References: <1l7k_cuDs1BmgVi3GOfp_UcreWl8uZEtWvXtZxgWSGk=.015b280b-cff7-40e1-acae-60a33a3d065f@github.com> Message-ID: On Sat, 22 Nov 2025 04:51:47 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated > > src/hotspot/share/opto/chaitin.cpp line 1522: > >> 1520: uint copy_lrg = _lrg_map.find(lrg._copy_bias); >> 1521: OptoReg::Name reg = select_bias_lrg_color(lrg, copy_lrg); >> 1522: if (reg != OptoReg::Bad) { > > Please, use `OptoReg::is_valid(reg)` here. I find it more readable. > > Also, there's repetitive pattern for `lrg._copy_bias` and `lrg._copy_bias2`. Would be nice to hide it behind a single `select_bias_lrg_color(_lrg_map, lrg)` call. Done. > src/hotspot/share/opto/chaitin.cpp line 1661: > >> 1659: } >> 1660: >> 1661: Node* def = lrg->_def; > > I'm concerned about the approach chosen here. It iterates over all operands trying to find a candidate for biasing irrespective of the shape of Mach node. > > Instead, I'd be much more comfortable with 2 operand probes at fixed positions (ideally, at indices 1 and 2). Any mismatches in Mach node shape should be reported. In other words, any failed operand probe on a mach node marked with `Flag_ndd_demotable` or `Flag_ndd_demotable_commutative` should trigger an assert. (Corresponding AD instructions can be adjusted to fit the desired pattern.) Thanks @iwanowww, My intent with current version was to get your feedback on generic implementation :-) , entire processing is triggered only for MachNodes marked with Flag_ndd_demotable and Flag_ndd_demotable_commutative, my previous versions were only probing specific operands but expected regimented operand ordering. Current scheme is generic in nature, but I agree that regimented scheme with asserts is more appealing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2553292523 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2553292229 From jrose at openjdk.org Sat Nov 22 21:45:51 2025 From: jrose at openjdk.org (John R Rose) Date: Sat, 22 Nov 2025 21:45:51 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Wed, 24 Sep 2025 13:08:14 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Drop atomic counters > - Initial version Code is good. Consider changing a name and adding documentation. src/hotspot/cpu/x86/interp_masm_x86.cpp line 524: > 522: LP64_ONLY(assert(Rsub_klass != r13, "r13 holds bcp");) > 523: assert(Rsub_klass != rcx, "rcx holds 2ndary super array length"); > 524: assert(Rsub_klass != rdi, "rdi holds 2ndary super array scan ptr"); I think you can kill this assert as well; rdi is no longer relevant to this function. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4760: > 4758: } > 4759: > 4760: void MacroAssembler::type_profile(Register recv, Register mdp, int mdp_offset) { The name chosen is subtly misleading. We have value (argument/parameter/return) profiling as well as receiver profiling. Since this particular macro-instruction is closely coupled to `ReceiverTypeData`, I suggest calling it `profile_receiver_type`, and documenting, up top, that it is precisely for collecting data into that structure. The name being replaced (`record_klass_in_profile_helper`) has the same problem. This is a historical artifact; the name was chosen before other sorts of type profiles were introduced. (And `profile_receiver_type` is surely better than `receiver_type_profile`, which is not a verb phrase.) Eventually we may wish to improve the other kinds of profiling, which have their own structures and representations. I thought for a while about what that might look like, and particularly if it factored into a different set of macro-instructions. Could we factor this proposed macro into a "find entry" part and an "increment counter" part? But no, it doesn't seem to pay off. There's benefit to preserving the jewel-like conciseness of the code pattern here. So I guess future work on other type profiles is mostly independent. But we do need a more specific name, that makes very clear the coupling to `ReceiverTypeData`. Even if the old code had that problem also. Putting it way out here in the macro-assembler makes such a problem worse, since the interpreter "knows about" MDOs, but the macro-assembler doesn't. I don't object to moving this down to the macro-assembler. It is no longer coupled to the interpreter, after the JIT learned the same trick. I think we should prepare ourselves, mentally, for similar moves with the other type profile mechanisms. I think the definition of `class ReceiverTypeData` should mention this macro. Otherwise we won't know where to look for updates (since it's no longer bundled with the interpreter). This macro is, in effect, a member of that class. (That's true of other MDO structures: Random assembly code is part of their APIs. The C++ code is very vague about how and where this happens. That's a problem for another time, I guess.) Another point. I would like to see pseudo-code that sketches what this complicated macro emits. (I was the author of the other pseudo-code deleted by this patch; I like that sort of thing.) I suggest: // Traverse and update a ReceiverTypeData record in a method-data object. // This operation can be performed either by the interpreter or by JIT code. // The receiver klass has already been loaded into recv. // The base address of the MDO is mdp, and the byte offset mdp_offset is also applied. // The emitted code traverses the array of entries and picks // one where the expected receiver matches, or allocates a free one if necessary. // If a matching entry exists (perhaps upon creation), a receiver count is incremented. // If no matching entry exists, the shared (CounterData) count is incremented. // For safety, receiver cells are claimed with a CAS. For speed, counter updates are not. // Duplicate receiver allocation is possible due to races, but this is unlikely. // Occasional races on counters may introduce inconsequential noise. // // Here is pseudocode for the emitted assembly: // // int i, *cntp; // for (i = 0; i < receiver_count(); i++) { // optimistic loop // if (receiver(i) == recv) goto found_recv; // } // for (i = 0; i < receiver_count(); i++) { // allocating loop // if (receiver(i) == null) { } // if (receiver(i) == recv) goto found_recv; // } // cntp = &count(); // shared poly count, used if no match for recv // goto count_update; // found_recv: // cntp = &receiver_count(i); // count_update: // ++*cntp; // void MacroAssembler::receiver_type_profile? I did wonder if the `mdp`/`mdp_offset` pair would be better expressed by an `Address`. Note that `Address::plus_disp` would let you move the cursor around. But then you wouldn't have such fine control over the x86 address scaling feature, which apparently contributes to the compactness of the code. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4812: > 4810: > 4811: // Optimistic: search for already set up receiver. > 4812: movptr(offset, base_receiver_offset); I wondered about using REP-CMPSQ to search the receiver array. It would require reformatting the MDO to make the receiver klasses contiguous. The x86 manual ORM (August 2023) cheers me down: > Using a REP prefix with string move instructions can provide high performance in the situations described above. However, using a REP prefix with string scan instructions (SCASB, SCASW, SCASD, SCASQ) or compare instructions (CMPSB, CMPSW, SMPSD, SMPSQ) is not recommended for high performance. Consider using SIMD instructions instead. I still wonder if, at some point, it will be profitable to make the receivers contiguous so we can use SIMD instructions to search them. Probably not any time soon. ------------- Changes requested by jrose (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25305#pullrequestreview-3489875340 PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2547626482 PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2553359799 PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2553367490 From jbhateja at openjdk.org Sun Nov 23 03:06:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 23 Nov 2025 03:06:57 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v17] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/151b51af..038a292e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=15-16 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Sun Nov 23 03:15:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 23 Nov 2025 03:15:17 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v18] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/038a292e..bb41ff78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=16-17 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jiangli at openjdk.org Sun Nov 23 04:54:15 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Sun, 23 Nov 2025 04:54:15 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: References: Message-ID: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28363/files - new: https://git.openjdk.org/jdk/pull/28363/files/d26d0ee9..4ea57ee7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=06-07 Stats: 4 lines in 1 file changed: 1 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363 PR: https://git.openjdk.org/jdk/pull/28363 From jiangli at openjdk.org Sun Nov 23 05:26:24 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Sun, 23 Nov 2025 05:26:24 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v7] In-Reply-To: References: Message-ID: <1SB9sT1cB4CxswmAJpnomlCvypzbM1eHEPwWMR0mvMY=.d470192a-dba3-4951-9f33-cf3bf0dbf287@github.com> On Sat, 22 Nov 2025 00:17:51 GMT, Sandhya Viswanathan wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Change to just create a byte array for 'nonce' without generating random data in gcmDecrypt. Suggested by AI. > > src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3531: > >> 3529: __ subl(len, 16 * 16); >> 3530: __ cmpl(len, 16 * 16); >> 3531: __ jcc(Assembler::lessEqual, ENC_DEC_DONE); > > I think the fix should instead be to just move the addl to pos before the MESG_BELOW_32_BLKS, as below: > > + __ addl(pos, 16 * 16); > __ bind(MESG_BELOW_32_BLKS); > __ subl(len, 16 * 16); > - __ addl(pos, 16 * 16); > > This is because on fall through path addl is needed but not while coming from line 3479 via jcc. For the latter, the addl has already been done on line 3477. Hmmm, I think you are correct. Examining the entire flow today, I see `ENCRYPT_16_BLKS` doesn't increment `pos`. Thanks for pointing out that! Before my change, add `pos` was done when it fell through to `MESG_BELOW_32_BLKS`. Removed the `cmpl/jcc` change from `MESG_BELOW_32_BLKS` and moved `addl` to above `MESG_BELOW_32_BLKS`, as suggested. I did a bit debugging for the rare failures occurring to the new test case. One of the failures had message with length `1048833`. That would eventually go through `ENCRYPT_16_BLKS` then fall through `MESG_BELOW_32_BLKS`. With the updated fix, 200 runs all pass on AVX512 machines. So the rare failures were also related to this then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2553794067 From duke at openjdk.org Sun Nov 23 05:30:47 2025 From: duke at openjdk.org (Shawn M Emery) Date: Sun, 23 Nov 2025 05:30:47 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v4] In-Reply-To: <0HJnSUSQA8RuwnNxu-SiGvZTzHYLJ5kY0_B6lG2EbAQ=.10868fac-1516-4a80-b4e5-9ff14997ba01@github.com> References: <0HJnSUSQA8RuwnNxu-SiGvZTzHYLJ5kY0_B6lG2EbAQ=.10868fac-1516-4a80-b4e5-9ff14997ba01@github.com> Message-ID: On Tue, 18 Nov 2025 21:48:12 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Remove K from AES_Crypt It looks like the S390 architecture uses it for encryption and decryption in AES/CTR mode, but S390 only needs the symmetric key to derive the encryption and decryption schedules. This can be found for both in the first round. For x86, yes, encryption is only performed for both AES/CTR and AES/GCM. So, yes, I think having the 'is_decrypt' argument explicit would be ideal. nit: some comments still refer to the 'K' array in src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp, src/hotspot/share/opto/library_call.cpp, src/hotspot/cpu/riscv/stubGenerator_riscv.cpp (line 2446), and src/hotspot/cpu/ppc/stubGenerator_ppc.cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3567503379 From jbhateja at openjdk.org Sun Nov 23 11:50:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 23 Nov 2025 11:50:08 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v2] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added HalffloatVector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Cleaning up interface as per review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28002/files - new: https://git.openjdk.org/jdk/pull/28002/files/c60d533c..ea3ef19b Webrevs: - full: Webrev is not available because diff is too large - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=00-01 Stats: 162997 lines in 187 files changed: 75266 ins; 74548 del; 13183 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From wenanjian at openjdk.org Mon Nov 24 03:11:07 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 24 Nov 2025 03:11:07 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: modify label L_EXIT to L_exit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25281/files - new: https://git.openjdk.org/jdk/pull/25281/files/10725f4f..9a006e9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=27-28 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Mon Nov 24 03:11:08 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 24 Nov 2025 03:11:08 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: <5HbBb-mjtZWqWTu-HQe7KrRyHG5z-UK4rbVhMzLv4bw=.b1b7e986-dbcf-4ab0-86b4-513f3f1f91ae@github.com> Message-ID: On Wed, 19 Nov 2025 09:54:58 GMT, Hamlin Li wrote: >> I try to make it different from the L_exit in counterMode_AESCrypt function, should I change this to L_exit2 or L_exit_main? > > The labels are in different method, should be fine with same name? I'm not quite sure. I have test it with the Label change to L_exit, which seems to be fine. fixed it! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2554551939 From wenanjian at openjdk.org Mon Nov 24 03:46:20 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 24 Nov 2025 03:46:20 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics Message-ID: Support AES CBC intrinsic on RISCV, Already passed the tests in test/hotspot/jtreg/compiler/codegen/aes/ test/jdk/com/sun/crypto ------------- Commit messages: - modify some format and add some comments - RISC-V: implement AES CBC mode intrinsics Changes: https://git.openjdk.org/jdk/pull/28320/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28320&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371968 Stats: 224 lines in 1 file changed: 224 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28320/head:pull/28320 PR: https://git.openjdk.org/jdk/pull/28320 From epeter at openjdk.org Mon Nov 24 07:02:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Nov 2025 07:02:26 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 16:48:01 GMT, Aleksey Shipilev wrote: > As you can see in the report, current matchers rely heavily on mainline C2 implementation to match specific stores. This fails when we try to backport MergeStores fixes to 25u. It would be better to relax the matchers a bit to cater for 25u backports, and also making test more robust for future MergeStores changes, if any. > > Additional testing: > - [x] Linux x86_64 server fastdebug, mainline, affected test, 100x > - [x] Linux AArch64 server fastdebug, mainline, affected test, 100x > - [x] Linux x86_64 server fastdebug, jdk25u, affected test, 100x But how do you now know that the `StoreL` is really coming from the merged `StoreI`, and that it is not some other unrelated `StoreL`? The info that it comes from an int-array is relevant here `int[int:4]`, don't you think? What about all the other MergeStores IR tests? For consistency you would now have to adjust those too, but I hope you don't do that ;) `./test/hotspot/jtreg/compiler/c2/TestMergeStores.java` The motivation seems to be that printing of store nodes was a bit different in JDK25. But then we just have to adjust the matching a bit, maybe weaken the IR rule for backports. But I'd prefer not to weaken the IR rule on mainline. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28437#issuecomment-3569186962 From chagedorn at openjdk.org Mon Nov 24 07:26:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Nov 2025 07:26:23 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 23:31:16 GMT, Saranya Natarajan wrote: > **Issue:** Some compiler tests uses randomization but does not have `@key randomness` in the jtreg header. > > **Fix:** The list of test cases that did not have `@key randomness` were listed using `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"`. This PR adds `@key randomness` to these tests. > > **Note:** The following tests that are still listed with `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"` after this PR are confirmed to be helper or support file for actual test. > _test/hotspot/jtreg/compiler/codegen/aes/TestAESBase.java > test/hotspot/jtreg/compiler/compilercontrol/jcmd/StressAddJcmdBase.java > test/hotspot/jtreg/compiler/compilercontrol/parser/HugeDirectiveUtil.java > test/hotspot/jtreg/compiler/compilercontrol/share/scenario/CommandGenerator.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/ArgumentValue.java > test/hotspot/jtreg/compiler/lib/ir_framework/AbstractInfo.java > test/hotspot/jtreg/compiler/lib/ir_framework/CompLevel.java > test/hotspot/jtreg/compiler/lib/generators/Generators.java > test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java > test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java > test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java > test/hotspot/jtreg/compiler/intrinsics/mathexact/Verify.java > test/hotspot/jtreg/compiler/intrinsics/bmi/BMITestRunner.java > test/hotspot/jtreg/compiler/intrinsics/unsafe/ByteBufferTest.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressBooleanArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressIntArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressLongArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressCharArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressObjectArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressByteArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressFloatArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressShortArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressDoubleArrayCopy.java > test/hotspot/jtreg/compiler/codecache/cli/codeheapsize/JVMStartupRunner.java > test/hotspot/jtreg/compiler/vectorapi/reshape/utils/VectorReshapeHelper.java > test/hotspot/jtreg/compiler/jvmci/compilerToVM/DummyClass.java_ Looks good and trivial, thanks for cleaning these up! test/hotspot/jtreg/compiler/vectorization/TestVectorZeroCount.java line 25: > 23: > 24: package compiler.vectorization; > 25: import java.util.Random; Suggestion: package compiler.vectorization; import java.util.Random; ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28463#pullrequestreview-3498644956 PR Review Comment: https://git.openjdk.org/jdk/pull/28463#discussion_r2554894805 From dfenacci at openjdk.org Mon Nov 24 07:38:27 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Nov 2025 07:38:27 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 23:31:16 GMT, Saranya Natarajan wrote: > **Issue:** Some compiler tests uses randomization but does not have `@key randomness` in the jtreg header. > > **Fix:** The list of test cases that did not have `@key randomness` were listed using `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"`. This PR adds `@key randomness` to these tests. > > **Note:** The following tests that are still listed with `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"` after this PR are confirmed to be helper or support file for actual test. > _test/hotspot/jtreg/compiler/codegen/aes/TestAESBase.java > test/hotspot/jtreg/compiler/compilercontrol/jcmd/StressAddJcmdBase.java > test/hotspot/jtreg/compiler/compilercontrol/parser/HugeDirectiveUtil.java > test/hotspot/jtreg/compiler/compilercontrol/share/scenario/CommandGenerator.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/ArgumentValue.java > test/hotspot/jtreg/compiler/lib/ir_framework/AbstractInfo.java > test/hotspot/jtreg/compiler/lib/ir_framework/CompLevel.java > test/hotspot/jtreg/compiler/lib/generators/Generators.java > test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java > test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java > test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java > test/hotspot/jtreg/compiler/intrinsics/mathexact/Verify.java > test/hotspot/jtreg/compiler/intrinsics/bmi/BMITestRunner.java > test/hotspot/jtreg/compiler/intrinsics/unsafe/ByteBufferTest.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressBooleanArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressIntArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressLongArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressCharArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressObjectArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressByteArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressFloatArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressShortArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressDoubleArrayCopy.java > test/hotspot/jtreg/compiler/codecache/cli/codeheapsize/JVMStartupRunner.java > test/hotspot/jtreg/compiler/vectorapi/reshape/utils/VectorReshapeHelper.java > test/hotspot/jtreg/compiler/jvmci/compilerToVM/DummyClass.java_ Thanks for the cleanup @sarannat. Looks good to me. test/hotspot/jtreg/compiler/vectorapi/Test8278948.java line 33: > 31: import jdk.test.lib.Utils; > 32: > 33: /** Do we need javadoc style comments for JTreg? (we don't seem to be too consistent in our tests) ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28463#pullrequestreview-3498676733 PR Review Comment: https://git.openjdk.org/jdk/pull/28463#discussion_r2554920604 From chagedorn at openjdk.org Mon Nov 24 07:43:40 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Nov 2025 07:43:40 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v6] In-Reply-To: <3_O4WsDAECSNuSxFasov6t2ySprWFnYaiXB4Tqr_Emw=.f18fa14f-c349-464c-bda6-ef1e41ede7c2@github.com> References: <3_O4WsDAECSNuSxFasov6t2ySprWFnYaiXB4Tqr_Emw=.f18fa14f-c349-464c-bda6-ef1e41ede7c2@github.com> Message-ID: On Fri, 21 Nov 2025 11:29:11 GMT, Roland Westrelin wrote: >> In: >> >> >> for (int i = 100; i < 1100; i++) { >> v += floatArray[i - 100]; >> Objects.checkIndex(i, longRange); >> } >> >> >> The int counted loop has both an int range check and a long range. The >> int range check is optimized first. Assertion predicates are inserted >> above the loop. One predicates checks that: >> >> >> init - 100 > >> >> The loop is then transformed to enable the optimization of the long >> range check. The loop is short running, so there's no need to create a >> loop nest. The counted loop is mostly left as is but, the loop's >> bounds are changed from: >> >> >> for (int i = 100; i < 1100; i++) { >> >> >> to: >> >> >> for (int i = 0; i < 1000; i++) { >> >> >> The reason for that the long range check transformation expects the >> loop to start at 0. >> >> Pre/main/post loops are created. Template Assertion predicates are >> added above the main loop. The loop is unrolled. Initialized assertion >> predicates are created. The one created from the condition: >> >> >> init - 100 > >> >> checks the value of `i` out of the pre loop which is 1. That check fails. >> >> The root cause of the failure is that when bounds of the counted loop >> are changed, template assertion predicates need to be updated with and >> adjusted init input. >> >> When the bounds of the loop are known, the assertion predicates can be >> updated in place. Otherwise, when the loop is speculated to be short >> running, the assertion predicates are updated when they are cloned. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into JDK-8366888 > - review > - Merge branch 'master' into JDK-8366888 > - Merge branch 'master' into JDK-8366888 > - whitespaces > - review > - Merge branch 'master' into JDK-8366888 > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 3 more: https://git.openjdk.org/jdk/compare/679b2b4b...2d329d48 Still good! Let me submit some more testing again with latest master. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27250#pullrequestreview-3498708626 From shade at openjdk.org Mon Nov 24 07:49:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Nov 2025 07:49:53 GMT Subject: RFR: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes [v4] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 19:04:56 GMT, Aleksey Shipilev wrote: >> I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. >> >> At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. >> >> It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. >> >> Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails >> - [x] Linux x86_64 server fastdebug, `all` tests pass >> - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8371581-ccp-spooky-nodes > - More comments > - More restrictive CmpP check > - Tighten up comments and signatures > - Do Value() once > - Fix Thanks everyone! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28288#issuecomment-3569349918 From shade at openjdk.org Mon Nov 24 07:49:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Nov 2025 07:49:54 GMT Subject: Integrated: 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes In-Reply-To: References: Message-ID: <9EZABc2JYXCvPrH64YGkPKFYxafMj8zMT2d8ishdtIk=.00581ede-09a9-4ed4-bf3d-2aa3d79e7857@github.com> On Thu, 13 Nov 2025 10:49:14 GMT, Aleksey Shipilev wrote: > I started this as investigation into one rare/intermittent CTW failure that I get with [JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557). The bug seems to reproduce on a very specific JAR with a very specific random seed, so no easy regression test. > > At this point I believe we found that PhaseCCP does not reach the fix point for a peculiar reason: `LoadN` that looks deeply into the graph is not revisited and thus misses the chance to update its type. There is an exception for loads in `verify_Value_for`, but it seems to only apply to constants, and does not apply to `LoadN` in question. Revisiting `LoadN` shows that updating the types downstream performs type widenings (= current types are too narrow), which AFAICS says that this unsound analysis can lead to miscompilation. See more debugging breadcrumbs in the bug. > > It looks like we can reach the fixpoint by recording the nodes we need to revisit and doing another CCP round. This also makes CCP verification stricter: we effectively move 2 exceptional cases recorded in `verify_Value_for` into the analysis itself. > > Testing shows there are no ill effects on correctness doing this. But I would appreciate someone more savvy in this code to sanity check all of this. > > Additional testing: > - [x] Linux x86_64 server fastdebug, CTW reproducer no longer fails > - [x] Linux x86_64 server fastdebug, `all` tests pass > - [x] Linux x86_64 server fastdebug, Maven Central CTW passes (!) This pull request has now been integrated. Changeset: 99be0e73 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/99be0e73ce9779e85c9ec6598e0a7ce964d62e82 Stats: 66 lines in 2 files changed: 45 ins; 1 del; 20 mod 8371581: C2: PhaseCCP should reach fixpoint by revisiting deeply-Value-d nodes Reviewed-by: epeter, vlivanov, qamai ------------- PR: https://git.openjdk.org/jdk/pull/28288 From bmaillard at openjdk.org Mon Nov 24 09:12:57 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 24 Nov 2025 09:12:57 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 09:31:56 GMT, Beno?t Maillard wrote: > This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. > > The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for > `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially > introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. > > > > https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 > > In our case, it happens that the `Load` node gets folded to a constant during the initial > `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being > returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only > has one usage, and this triggers the optimization during verification. > > > static int test0() { > var c = new MyClass(); > // the conversion ensures that the ConL node only has one use > // in the end, which triggers the optimization > return (int) c.l; > } > > > The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, > because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in > `PhaseGVN::transform`. > > For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created > and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with > `can_reshape` later. > > > This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` > prevents its from occurring when boxing elimination is enabled. Boxing elimination is > disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), > which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear > that the issue was on mainline. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Testing revealed failures with `-XX:+AlwaysIncrementalInline`, the fix might be incorrect or incomplete. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28448#issuecomment-3569659040 From bmaillard at openjdk.org Mon Nov 24 09:30:12 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 24 Nov 2025 09:30:12 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v2] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. > > The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for > `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially > introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. > > > > https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 > > In our case, it happens that the `Load` node gets folded to a constant during the initial > `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being > returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only > has one usage, and this triggers the optimization during verification. > > > static int test0() { > var c = new MyClass(); > // the conversion ensures that the ConL node only has one use > // in the end, which triggers the optimization > return (int) c.l; > } > > > The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, > because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in > `PhaseGVN::transform`. > > For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created > and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with > `can_reshape` later. > > > This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` > prevents its from occurring when boxing elimination is enabled. Boxing elimination is > disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), > which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear > that the issue was on mainline. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Add run with -XX:+AlwaysIncrementalInline, and add intermediate run for -XX:-DoEscapeAnalysis ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28448/files - new: https://git.openjdk.org/jdk/pull/28448/files/64ada0cb..c3c2ceee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=00-01 Stats: 14 lines in 1 file changed: 14 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28448/head:pull/28448 PR: https://git.openjdk.org/jdk/pull/28448 From duke at openjdk.org Mon Nov 24 09:36:41 2025 From: duke at openjdk.org (Harshit470250) Date: Mon, 24 Nov 2025 09:36:41 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v6] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge remote-tracking branch 'origin/master' - remove whitespace - Resolved a bug - remove TODO comments - final size added - ... and 6 more: https://git.openjdk.org/jdk/compare/870872d6...9e184e43 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/bb7d05fc..9e184e43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=04-05 Stats: 37055 lines in 461 files changed: 26727 ins; 7453 del; 2875 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From chagedorn at openjdk.org Mon Nov 24 09:36:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Nov 2025 09:36:58 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v2] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 12:53:00 GMT, Galder Zamarre?o wrote: >> Trivial cleanup to move tests out of a test class whose description does not match these tests > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Change copyright to Amazon Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28385#pullrequestreview-3499294028 From epeter at openjdk.org Mon Nov 24 09:49:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Nov 2025 09:49:57 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 10:13:31 GMT, Roland Westrelin wrote: > > Do you know why we insert a new `CastPP` there, and why it is put not at the ctrl of the CastPP, but of the phi? I suppose the ctrl of the phi is correct, but we do lose information there, and that later prevents the `CastPP` to common. > > When the `Phi` is removed because all of its inputs are the same once uncasted, there is a risk of losing a dependency. To prevent that, a `CastPP` is inserted. All we know is that some casts along some inputs of the `Phi` may carry a dependency that we don't want to loose. The only possible control for the `CastPP` then is the one of the `Phi`. In general we can probably not do anything better. But in this case that fails here, we could have looked at both `CastPP`, and seen that they have the same ctrl, and used that one, no? But I'm not sure that is worth it yet. > The duplication comes from loop body cloning so I'm not sure how we could prevent the duplication. We could try to common the CastPP nodes once PhaseIdealLoop::peeled_dom_test_elim() is called. Right, that could be an option. Do you think that is worth it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3569810303 From hgreule at openjdk.org Mon Nov 24 09:53:10 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 24 Nov 2025 09:53:10 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v7] In-Reply-To: References: Message-ID: On Sat, 22 Nov 2025 14:48:17 GMT, Zihao Lin wrote: >> If nodes both are constant, support constant folding. > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > Fix > In addition, I take some time to add a range-based folding for MulHiLNode and UMulHiLNode. Please help to take a look, thank you folks! @linzihao1999 I think if you want to expand beyond simple constant folding, then it's better to just do the full range calculation. That should be more or less like https://github.com/openjdk/jdk/blob/43af7b59765fa9820726de276bae9d1fcd2ba3ca/src/hotspot/share/opto/mulnode.cpp#L402-L407 just with the `multiply_high_*` instead. This way, you don't need to check for ZERO, or for constants, or check for overflows. src/hotspot/share/opto/mulnode.cpp line 25: > 23: */ > 24: > 25: #include "jni_md.h" Please check which of the changed includes are actually needed. src/hotspot/share/opto/type.cpp line 1922: > 1920: > 1921: const TypeLong* TypeLong::make_unsigned(julong ucon) { > 1922: jlong con = ucon; This is the same problem of implementation-definedness again, that's why I suggest to just use the full signed range and let canonicalization deal with the rest. ------------- PR Review: https://git.openjdk.org/jdk/pull/28097#pullrequestreview-3499356584 PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2555474014 PR Review Comment: https://git.openjdk.org/jdk/pull/28097#discussion_r2555479623 From bmaillard at openjdk.org Mon Nov 24 09:56:10 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 24 Nov 2025 09:56:10 GMT Subject: RFR: 8367627: [v3] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. > > The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for > `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially > introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. > > > > https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 > > In our case, it happens that the `Load` node gets folded to a constant during the initial > `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being > returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only > has one usage, and this triggers the optimization during verification. > > > static int test0() { > var c = new MyClass(); > // the conversion ensures that the ConL node only has one use > // in the end, which triggers the optimization > return (int) c.l; > } > > > The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, > because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in > `PhaseGVN::transform`. > > For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created > and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with > `can_reshape` later. > > > This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` > prevents its from occurring when boxing elimination is enabled. Boxing elimination is > disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), > which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear > that the issue was on mainline. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into JDK-8367627 - Add notification in Node::has_special_unique_user - Add run with -XX:+AlwaysIncrementalInline, and add intermediate run for -XX:-DoEscapeAnalysis - Record in GraphKit::insert_mem_bar_volatile for consistency - Improve test and fix - Add test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28448/files - new: https://git.openjdk.org/jdk/pull/28448/files/c3c2ceee..be428cb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=01-02 Stats: 35061 lines in 436 files changed: 25550 ins; 6980 del; 2531 mod Patch: https://git.openjdk.org/jdk/pull/28448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28448/head:pull/28448 PR: https://git.openjdk.org/jdk/pull/28448 From epeter at openjdk.org Mon Nov 24 11:53:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Nov 2025 11:53:19 GMT Subject: RFR: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph Message-ID: **Analysis** This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. The aliasing runtime check happens before the pre-loop. The values needed for the aliasing runtime check thus need to be available not just at the pre-loop, but even earlier: already at the aliasing check. Sadly, so far we only ever checked for `is_pre_loop_invariant`, and not `is_available_for_speculative_check`. We now found an example with the fuzzer that has a `pre_init` value that is pinned after the aliasing runtime check but before the pre-loop. Thus it passed the checks, and then created a bad graph (cyclic path, think "use before definition"). This a very rare case. Getting the ctrl pinned after the aliasing runtime check but before the pre-loop requires some very specific order of loop-opts, of unroll/pre-main-post/peeling etc. But it can happen, and so we must handle it right. Solution: check for the stronger condition `is_available_for_speculative_check`. **Future Work** We should improve the debug printing when aliasing checks cannot be inserted. Currently the tag `SW_REJECTIONS` is a bit messy, we should fix that up. But it would be too many changes for this bug fix here. **Details** During `SuperWord`, we want to insert the aliasing runtime check above `687 ParsePredicate` which is annotated with `#Auto_Vectorization_Check`. For this, we require the `pre_init` value: `1244 AddI`. Sadly, this value is pinned lower down. image So when we insert the aliasing runtime check, we create a bad (circular) graph: image ------------- Commit messages: - better comments - refined fix - initial fix - JDK-8371146 Changes: https://git.openjdk.org/jdk/pull/28449/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28449&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371146 Stats: 164 lines in 3 files changed: 155 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/28449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28449/head:pull/28449 PR: https://git.openjdk.org/jdk/pull/28449 From mli at openjdk.org Mon Nov 24 11:56:26 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 24 Nov 2025 11:56:26 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v6] In-Reply-To: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: <7kh5C9nj7bf6432cG35kDDvV6zhnKEspe8AcYetJ1do=.e1d9ebd3-d80d-4621-8c1e-c77dc721d0df@github.com> > Hi, > > This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. > > This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. > > Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. > > # Test > ## Jtreg > > in progress... > > ## Performance > > Column names meanings: > * p: with patch > * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > * m: without patch > * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > > #### Average improvement > > NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. > > For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) > -- | -- | -- | -- > 1.022782609 | 2.198717391 | 2.162673913 | 2.199 > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix is_unordered ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28309/files - new: https://git.openjdk.org/jdk/pull/28309/files/572a7b74..46b32186 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28309/head:pull/28309 PR: https://git.openjdk.org/jdk/pull/28309 From mli at openjdk.org Mon Nov 24 11:56:31 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 24 Nov 2025 11:56:31 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v5] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: On Fri, 21 Nov 2025 03:35:03 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> replace assert with log_warning > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1590: > >> 1588: // jump if cmp1 < cmp2 or either is NaN >> 1589: // not jump (i.e. move src to dst) if cmp1 >= cmp2 >> 1590: float_blt(cmp1, cmp2, no_set); > > I compared this with the existing `MacroAssembler::cmov_cmp_fp_ge` [1] and I witnessed some difference in the case of `NaN` handling. In `MacroAssembler::cmov_cmp_fp_ge`, we set the `is_unordered` param to true when calling `float_blt` or `double_blt`, which is not the case here. I assume we need similar handling here as well, right? > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L1338 Make sense, fixed. > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1636: > >> 1634: // jump if cmp1 <= cmp2 or either is NaN >> 1635: // not jump (i.e. move src to dst) if cmp1 > cmp2 >> 1636: float_ble(cmp1, cmp2, no_set); > > Same question here. Make sense, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2556004073 PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2556004423 From epeter at openjdk.org Mon Nov 24 11:58:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Nov 2025 11:58:02 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v3] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 11:19:51 GMT, Roland Westrelin wrote: >> The test case has an out of loop `Store` with an `AddP` address >> expression that has other uses and is in the loop body. Schematically, >> only showing the address subgraph and the bases for the `AddP`s: >> >> >> Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> CastPP#110 >> >> >> Both `AddP`s have the same base, a `CastPP` that's also in the loop >> body. >> >> That loop is a counted loop and only has 3 iterations so is fully >> unrolled. First, one iteration is peeled: >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> The `AddP`s and `CastPP` are cloned (because in the loop body). As >> part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is >> called. It finds the test that guards `CastPP#283` in the peeled >> iteration dominates and replaces the test that guards `CastPP#110` >> (the test in the peeled iteration is the clone of the test in the >> loop). That causes `CastPP#110`'s control to be updated to that of the >> test in the peeled iteration and to be yanked from the loop. So now >> `CastPP#283` and `CastPP#110` have the same inputs. >> >> Next unrolling happens: >> >> >> /-> CastPP#110 >> /-> AddP#400 -> AddP#401 -> CastPP#110 >> Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 >> \ -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> `AddP`s are cloned once more but not the `CastPP`s because they are >> both in the peeled iteration now. A new `Phi` is added. >> >> Next igvn runs. It's going to push the `AddP`s through the `Phi`s. >> >> Through `Phi#477`: >> >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 >> \ -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> Through `Phi#360`: >> >> >> /-> AddP#134 -> CastPP#110 >> /-> Phi#509 -> AddP#401 -> CastPP#110 >> Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 >> -> Phi#514 -> CastPP#283 >> ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into JDK-8351889 > - verif > - Merge branch 'master' into JDK-8351889 > - test seed > - more > - Merge branch 'master' into JDK-8351889 > - Merge branch 'master' into JDK-8351889 > - more > - test > - fix src/hotspot/share/opto/phaseX.cpp line 2085: > 2083: } > 2084: return false; > 2085: } Why not call it `verify_node_invariants_for`? You should also assert immediately. @benoitmaillard Is about to make that change for everything: https://github.com/openjdk/jdk/pull/28295 src/hotspot/share/opto/phaseX.hpp line 623: > 621: // '-XX:VerifyIterativeGVN=10000' > 622: return ((VerifyIterativeGVN % 100000) / 10000) == 1; > 623: } You will need to add extra documentation to the flag. And also there is a test that uses the flag. You should adjust it to enable this bit as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2556012167 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2555714627 From chagedorn at openjdk.org Mon Nov 24 12:10:41 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Nov 2025 12:10:41 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v6] In-Reply-To: <3_O4WsDAECSNuSxFasov6t2ySprWFnYaiXB4Tqr_Emw=.f18fa14f-c349-464c-bda6-ef1e41ede7c2@github.com> References: <3_O4WsDAECSNuSxFasov6t2ySprWFnYaiXB4Tqr_Emw=.f18fa14f-c349-464c-bda6-ef1e41ede7c2@github.com> Message-ID: On Fri, 21 Nov 2025 11:29:11 GMT, Roland Westrelin wrote: >> In: >> >> >> for (int i = 100; i < 1100; i++) { >> v += floatArray[i - 100]; >> Objects.checkIndex(i, longRange); >> } >> >> >> The int counted loop has both an int range check and a long range. The >> int range check is optimized first. Assertion predicates are inserted >> above the loop. One predicates checks that: >> >> >> init - 100 > >> >> The loop is then transformed to enable the optimization of the long >> range check. The loop is short running, so there's no need to create a >> loop nest. The counted loop is mostly left as is but, the loop's >> bounds are changed from: >> >> >> for (int i = 100; i < 1100; i++) { >> >> >> to: >> >> >> for (int i = 0; i < 1000; i++) { >> >> >> The reason for that the long range check transformation expects the >> loop to start at 0. >> >> Pre/main/post loops are created. Template Assertion predicates are >> added above the main loop. The loop is unrolled. Initialized assertion >> predicates are created. The one created from the condition: >> >> >> init - 100 > >> >> checks the value of `i` out of the pre loop which is 1. That check fails. >> >> The root cause of the failure is that when bounds of the counted loop >> are changed, template assertion predicates need to be updated with and >> adjusted init input. >> >> When the bounds of the loop are known, the assertion predicates can be >> updated in place. Otherwise, when the loop is speculated to be short >> running, the assertion predicates are updated when they are cloned. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into JDK-8366888 > - review > - Merge branch 'master' into JDK-8366888 > - Merge branch 'master' into JDK-8366888 > - whitespaces > - review > - Merge branch 'master' into JDK-8366888 > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/predicates.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 3 more: https://git.openjdk.org/jdk/compare/e8666d08...2d329d48 Testing looked good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27250#issuecomment-3570480127 From roland at openjdk.org Mon Nov 24 14:43:35 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 24 Nov 2025 14:43:35 GMT Subject: RFR: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 09:56:02 GMT, Emanuel Peter wrote: > **Analysis** > > This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. > > The aliasing runtime check happens before the pre-loop. The values needed for the aliasing runtime check thus need to be available not just at the pre-loop, but even earlier: already at the aliasing check. > > Sadly, so far we only ever checked for `is_pre_loop_invariant`, and not `is_available_for_speculative_check`. We now found an example with the fuzzer that has a `pre_init` value that is pinned after the aliasing runtime check but before the pre-loop. Thus it passed the checks, and then created a bad graph (cyclic path, think "use before definition"). > > This a very rare case. Getting the ctrl pinned after the aliasing runtime check but before the pre-loop requires some very specific order of loop-opts, of unroll/pre-main-post/peeling etc. But it can happen, and so we must handle it right. > > Solution: check for the stronger condition `is_available_for_speculative_check`. > > **Future Work** > > We should improve the debug printing when aliasing checks cannot be inserted. Currently the tag `SW_REJECTIONS` is a bit messy, we should fix that up. But it would be too many changes for this bug fix here. > > **Details** > > During `SuperWord`, we want to insert the aliasing runtime check above `687 ParsePredicate` which is annotated with `#Auto_Vectorization_Check`. For this, we require the `pre_init` value: `1244 AddI`. Sadly, this value is pinned lower down. > image > > So when we insert the aliasing runtime check, we create a bad (circular) graph: > image Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28449#pullrequestreview-3500766378 From duke at openjdk.org Mon Nov 24 15:46:20 2025 From: duke at openjdk.org (Zihao Lin) Date: Mon, 24 Nov 2025 15:46:20 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: Fix test failed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/329e290a..35ec9135 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=11-12 Stats: 21 lines in 1 file changed: 14 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From aseoane at openjdk.org Mon Nov 24 15:57:56 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 24 Nov 2025 15:57:56 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value Message-ID: This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. **Testing:** passes tiers 1-2 ------------- Commit messages: - Limit SpecTrapLimitExtraEntries to a sane range Changes: https://git.openjdk.org/jdk/pull/28451/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28451&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8364490 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28451.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28451/head:pull/28451 PR: https://git.openjdk.org/jdk/pull/28451 From mpowers at openjdk.org Mon Nov 24 16:32:52 2025 From: mpowers at openjdk.org (Mark Powers) Date: Mon, 24 Nov 2025 16:32:52 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 22:55:07 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > next set of comments Always faster and never slower: SignatureBench.MLDSA with `+UseDilithiumIntrinsics` shows an average 1.61% improvement across all algorithms and data sizes. Measuring SignatureBench.MLDSA against a baseline build without the fix, shows an average 2.24% improvement across all algorithms and data sizes. There's nothing special about my benchmark. It's the one in OpenJDK (javax.crypto.full.SignatureBench). Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3571668350 From duke at openjdk.org Mon Nov 24 16:42:40 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 24 Nov 2025 16:42:40 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 22:55:07 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > next set of comments Good work! I just found a few typos in the comments. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 88: > 86: // +-----+-----+-----+-----+----- > 87: // > 88: // NOTE: size 0 and 1 are used for initial and final shuffles respectivelly of Typo: respectivelly -> respectively src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 248: > 246: // We do Montgomery multiplications of two AVX registers in 4 steps: > 247: // 1. Do the multiplications of the corresponding even numbered slots into > 248: // the odd numbered slots of a scratch2 register. Typo: scratch2 -> scratch src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 249: > 247: // 1. Do the multiplications of the corresponding even numbered slots into > 248: // the odd numbered slots of a scratch2 register. > 249: // 2. Swap the even and odd numbered slots of the original input registers.* Typo: unnecessary '*' at the end src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 250: > 248: // the odd numbered slots of a scratch2 register. > 249: // 2. Swap the even and odd numbered slots of the original input registers.* > 250: // 3. Similar to step 1, but into output register. Typo: into output register -> into an output register src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 253: > 251: // 4. Combine the outputs of step 1 and step 3 into the output of the Montgomery > 252: // multiplication. > 253: // (*For levels 0-6 in the Ntt and levels 1-7 of the inverse Ntt, need NOT swap Typo: unnecessary '(*' at the beginning src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 282: > 280: const XMMRegister* scratch = scratch1 == input1 ? output: scratch1; > 281: > 282: // scratch = input1_even*intput2_even Suggestion: // scratch = input1_even * intput2_even src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 479: > 477: // level 0 - 128 > 478: // scratch1 = coeffs3 * zetas1 > 479: // coeffs3, coeffs1 = coeffs1?scratch1 Suggestion: // coeffs3, coeffs1 = coeffs1 ? scratch1 src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 524: > 522: // coeffs1_2 = coeffs1_2 + scratch1 > 523: loadXmms(Zetas3, zetas, level * 512, vector_len, _masm); > 524: shuffle(Scratch1, Coeffs1_2, Coeffs2_2, distance * 32); //Coeffs2_2 freed Suggestion: // Coeffs2_2 freed src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 529: > 527: > 528: loadXmms(Zetas3, zetas, 4*64 + level * 512, vector_len, _masm); > 529: shuffle(Scratch1, Coeffs3_2, Coeffs4_2, distance * 32); //Coeffs4_2 freed Suggestion: // Coeffs4_2 freed src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 554: > 552: const XMMRegister Coeffs2_2[] = {xmm4, xmm5, xmm6, xmm7}; > 553: > 554: // Since we cannot fit the entire payload into registers, we process process input -> process the input src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 555: > 553: > 554: // Since we cannot fit the entire payload into registers, we process > 555: // input in two stages. First half, load 8 registers 32 integers each apart. First half -> For the first half src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 557: > 555: // input in two stages. First half, load 8 registers 32 integers each apart. > 556: // With one load, we can process level 0-2 (128-, 64- and 32-integers apart) > 557: // Remaining levels, load 8 registers from consecutive memory (16-, 8-, 4-, Remaining -> For the remaining src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 558: > 556: // With one load, we can process level 0-2 (128-, 64- and 32-integers apart) > 557: // Remaining levels, load 8 registers from consecutive memory (16-, 8-, 4-, > 558: // 2-, 1-integer appart) appart -> apart src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 559: > 557: // Remaining levels, load 8 registers from consecutive memory (16-, 8-, 4-, > 558: // 2-, 1-integer appart) > 559: // Levels 5, 6, 7 (4-, 2-, 1-integer appart) require shuffles within registers appart -> apart src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 560: > 558: // 2-, 1-integer appart) > 559: // Levels 5, 6, 7 (4-, 2-, 1-integer appart) require shuffles within registers > 560: // Other levels, shuffles can be done by re-aranging register order Other -> on the other re-aranging register order -> rearranging the register order src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 562: > 560: // Other levels, shuffles can be done by re-aranging register order > 561: > 562: // Four batches of 8 registers each, 128 bytes appart appart -> apart src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 701: > 699: // In each of these iterations half of the coefficients are added to and > 700: // subtracted from the other half of the coefficients then the result of > 701: // the substration is (Montgomery) multiplied by the corresponding zetas. substration -> subtraction (I know this was in my own comment :-( ) src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 850: > 848: } > 849: > 850: // Four batches of 8 registers each, 128 bytes appart appart -> apart ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3571728756 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556771999 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556825899 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556836110 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556839540 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556845331 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556853907 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556865521 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556913637 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556915972 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556943987 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556925142 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556945036 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556949814 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556953155 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556942168 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556956323 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556978873 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2556961642 From shade at openjdk.org Mon Nov 24 16:46:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Nov 2025 16:46:01 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 06:59:58 GMT, Emanuel Peter wrote: > But how do you now know that the `StoreL` is really coming from the merged `StoreI`, and that it is not some other unrelated `StoreL`? Well, because there are no `long` stores in Java code at all, so whatever that `StoreL` came from, it is JIT-generated? So then the IR test verifies that whatever happens with EA and MergeStores makes sure the store either goes away, or some merged store remains. I personally dislike overly-specific tests that rely on particulars of optimization sequencing or some such, and would rather have a test that checks the generic final state, without over-specificity. > The motivation seems to be that printing of store nodes was a bit different in JDK25. But then we just have to adjust the matching a bit, maybe weaken the IR rule for backports. But I'd prefer not to weaken the IR rule on mainline. Yes. I mean, there is a tradeoff somewhere here: either mainline relaxes the test and then JDK 25 matches the test version, or JDK 25 diverges. We _usually_ try to avoid divergence, if we can, because they continuously bite us. If your preference about not relaxing the mainline version is strong, then I can yield and diverge JDK 25. It would likely be literally the same fix I have here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28437#issuecomment-3571739533 From mdoerr at openjdk.org Mon Nov 24 17:00:06 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 24 Nov 2025 17:00:06 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v4] In-Reply-To: References: <0HJnSUSQA8RuwnNxu-SiGvZTzHYLJ5kY0_B6lG2EbAQ=.10868fac-1516-4a80-b4e5-9ff14997ba01@github.com> Message-ID: On Sun, 23 Nov 2025 05:27:23 GMT, Shawn M Emery wrote: > It looks like the S390 architecture uses it for encryption and decryption in AES/CTR mode, but S390 only needs the symmetric key to derive the encryption and decryption schedules. This can be found for both in the first round. For x86, yes, encryption is only performed for both AES/CTR and AES/GCM. So, yes, I think having the 'is_decrypt' argument explicit would be ideal. nit: some comments still refer to the 'K' array in src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp, src/hotspot/share/opto/library_call.cpp, src/hotspot/cpu/riscv/stubGenerator_riscv.cpp (line 2446), and src/hotspot/cpu/ppc/stubGenerator_ppc.cpp. Thanks for your feedback! I've made the changes with https://github.com/openjdk/jdk/pull/28299/commits/30b5b531338c225e505a30dcca3453e35b68b256. I hope I found all places. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3571788446 From mdoerr at openjdk.org Mon Nov 24 17:00:04 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 24 Nov 2025 17:00:04 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v5] In-Reply-To: References: Message-ID: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Address review comments. - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt - Remove K from AES_Crypt - More minor cleanup. - Improve comment and minor cleanup. - 8371820: Further AES performance improvements for key schedule generation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28299/files - new: https://git.openjdk.org/jdk/pull/28299/files/2b981288..30b5b531 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=03-04 Stats: 65976 lines in 954 files changed: 44668 ins; 14938 del; 6370 mod Patch: https://git.openjdk.org/jdk/pull/28299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28299/head:pull/28299 PR: https://git.openjdk.org/jdk/pull/28299 From vpaprotski at openjdk.org Mon Nov 24 17:19:12 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 24 Nov 2025 17:19:12 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 16:28:44 GMT, Mark Powers wrote: > SignatureBench.MLDSA with `+UseDilithiumIntrinsics` shows an average 1.61% improvement across all algorithms and data sizes. Measuring SignatureBench.MLDSA against a baseline build without the fix, shows an average 2.24% improvement across all algorithms and data sizes. Need bit of clarification.. (I think you are saying there is a regression?). - `+UseDilithiumIntrinsics` should be redundant (i.e. `vm_version_x86.cpp` should automatically detect and turn the feature on). - So if I read correctly.. the baseline measured is already has the original intrinsics (implicitly) enabled.. - therefore there is a 2.24% noise in the benchmark? In my measurements for AVX512 parts, I had seen between 0%->6% across `SignatureBench.MLDSA` - (some variation on desktop-vs-server parts..) - `SignatureBench.MLDSA.verify` was worse, only 0->2% depending on keysize (iirc, bigger portion of benchmark was in SHA3 instead) - `SignatureBench.MLDSA.sign` was better, 4-6% (also depending on datasize) That is also why I had included the other (deleted) microbenchmark.. `SignatureBench.MLDSA` has a lot of 'other things' (e.g. SHA3) also happening, so the AVX512 intrinsic changes were harder to differentiate from noise.. - I had measured ~25%-50% improvement on purely the 5 intrinsics changed.. Hence the claim 'never worse'.. A more precise claim..: - "New intrinsics seem to be better, but (at least for AVX512) existing intrinsics were already plenty good for MLDSA" ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3571871477 From sviswanathan at openjdk.org Mon Nov 24 17:28:24 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 24 Nov 2025 17:28:24 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: On Sun, 23 Nov 2025 04:54:15 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28363#pullrequestreview-3501483238 From sviswanathan at openjdk.org Mon Nov 24 17:28:27 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 24 Nov 2025 17:28:27 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v7] In-Reply-To: <1SB9sT1cB4CxswmAJpnomlCvypzbM1eHEPwWMR0mvMY=.d470192a-dba3-4951-9f33-cf3bf0dbf287@github.com> References: <1SB9sT1cB4CxswmAJpnomlCvypzbM1eHEPwWMR0mvMY=.d470192a-dba3-4951-9f33-cf3bf0dbf287@github.com> Message-ID: On Sun, 23 Nov 2025 05:23:04 GMT, Jiangli Zhou wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3531: >> >>> 3529: __ subl(len, 16 * 16); >>> 3530: __ cmpl(len, 16 * 16); >>> 3531: __ jcc(Assembler::lessEqual, ENC_DEC_DONE); >> >> I think the fix should instead be to just move the addl to pos before the MESG_BELOW_32_BLKS, as below: >> >> + __ addl(pos, 16 * 16); >> __ bind(MESG_BELOW_32_BLKS); >> __ subl(len, 16 * 16); >> - __ addl(pos, 16 * 16); >> >> This is because on fall through path addl is needed but not while coming from line 3479 via jcc. For the latter, the addl has already been done on line 3477. > > Hmmm, I think you are correct. Examining the entire flow today, I see `ENCRYPT_16_BLKS` doesn't increment `pos`. Thanks for pointing out that! Before my change, add `pos` was done when it fell through to `MESG_BELOW_32_BLKS`. > > Removed the `cmpl/jcc` change from `MESG_BELOW_32_BLKS` and moved `addl` to above `MESG_BELOW_32_BLKS`, as suggested. > > I did a bit debugging for the rare failures occurring to the new test case. One of the failures had message with length `1048833`. That would eventually go through `ENCRYPT_16_BLKS` then fall through `MESG_BELOW_32_BLKS`. With the updated fix, 200 runs all pass on AVX512 machines. So the rare failures were also related to this then. It looks good to me now. Please close JDK-8372364 as it was an artifact of the prior fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2557130932 From mpowers at openjdk.org Mon Nov 24 17:57:44 2025 From: mpowers at openjdk.org (Mark Powers) Date: Mon, 24 Nov 2025 17:57:44 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 22:55:07 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > next set of comments The 2.24% improvement is the difference between `+UseDilithiumIntrinsics` and `-UseDilithiumIntrinsics.` I just repeated the testing that you documented in the description section of this PR on a different machine. My baseline is simply a build without your changes. I compared this with a build containing your changes and see a 2.24% improvement. Verification showed the least amount of improvement (same as what you observed). "never worse" is just my way of saying "always faster". ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3572037049 From qamai at openjdk.org Mon Nov 24 18:10:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 24 Nov 2025 18:10:50 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: > Hi, > > This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. > > To be more specific, for this issue, we have the graph that looks like: > > ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen > > with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: > > ConI -> ConvI2L -> VectorMaskGen > > After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28410/files - new: https://git.openjdk.org/jdk/pull/28410/files/ecaead7f..ec7298ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28410&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28410&range=01-02 Stats: 8 lines in 2 files changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28410/head:pull/28410 PR: https://git.openjdk.org/jdk/pull/28410 From qamai at openjdk.org Mon Nov 24 18:14:39 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 24 Nov 2025 18:14:39 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: <4MItF4KodwK0fPsG1hcNYtkOA3DUbaUZ3HixYQYs9iI=.2a3835a8-cec6-4d83-9f3e-2e049dc24d9c@github.com> Message-ID: On Thu, 20 Nov 2025 22:26:33 GMT, Vladimir Ivanov wrote: >> Thanks for the update! If it's a short running test/config, then I think it would be good to have this extra config to cover the changes of this patch. > > Is it truly specific to post-loop opts phase? Isn't it yet another paradoxical IR shape occurring in effectively dead code? > > In the longer term, it would be good to ensure such effectively dead nodes eventually go away. Or, better, eagerly trigger their elimination. Otherwise, it could cause issues later in compilation process unless the problematic conditions are explicitly handled everywhere (e.g., during matching or code generation for `vmask_gen_imm` on x64 and AArch64). That's a good idea, so I change the function to returning `top` in those cases. For the `VectorMaskGenNode` itself, the situation seems harder, because it can float anywhere, so after loop opts and cast node removal, GVN may common multiple different instances of `VectorMaskGenNode`, and it is inevitable that it may be executed with an out-of-bounds input value. I think we just need to make sure that its uses can live with the fact that the result would be unspecified then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28410#discussion_r2557261246 From qamai at openjdk.org Mon Nov 24 18:15:18 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 24 Nov 2025 18:15:18 GMT Subject: RFR: 8371789: C2: More explicit dump results for TypePtr In-Reply-To: References: Message-ID: <-dkxpbmOAM22xiW9H1l9jFVp-OX5q0jAN79lUs13mow=.9fac566f-868a-4fb1-91c2-62d64273cf6c@github.com> On Thu, 13 Nov 2025 11:50:46 GMT, Quan Anh Mai wrote: > Hi, > > This patch tries to clear up the dumped information of `TypePtr` and its subclasses. It makes it immediately clear the states of the `Type` object without us having to look into the implementation of `dump2`, for example, to know that the absence of `:NotNull` implies that it is a `BotPTR`. > > Please take a look and kindly review, thanks a lot. Thanks a lot for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28292#issuecomment-3572097226 From qamai at openjdk.org Mon Nov 24 18:15:19 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 24 Nov 2025 18:15:19 GMT Subject: Integrated: 8371789: C2: More explicit dump results for TypePtr In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 11:50:46 GMT, Quan Anh Mai wrote: > Hi, > > This patch tries to clear up the dumped information of `TypePtr` and its subclasses. It makes it immediately clear the states of the `Type` object without us having to look into the implementation of `dump2`, for example, to know that the absence of `:NotNull` implies that it is a `BotPTR`. > > Please take a look and kindly review, thanks a lot. This pull request has now been integrated. Changeset: 8bafc2f0 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/8bafc2f0aecbbe548573712a9dc31c9764f82f71 Stats: 232 lines in 3 files changed: 64 ins; 121 del; 47 mod 8371789: C2: More explicit dump results for TypePtr Reviewed-by: chagedorn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/28292 From mullan at openjdk.org Mon Nov 24 19:04:46 2025 From: mullan at openjdk.org (Sean Mullan) Date: Mon, 24 Nov 2025 19:04:46 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: On Sun, 23 Nov 2025 04:54:15 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. @jianglizhou Please wait until someone from the Security Group reviews this - thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28363#issuecomment-3572274370 From jiangli at openjdk.org Mon Nov 24 19:14:54 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 24 Nov 2025 19:14:54 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: On Sun, 23 Nov 2025 04:54:15 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. > It looks good to me now. Please close JDK-8372364 as it was an artifact of the prior fix. @sviswa7 thanks for reviewing! > @jianglizhou Please wait until someone from the Security Group reviews this - thanks. Will do. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28363#issuecomment-3572310742 From dfuchs at openjdk.org Mon Nov 24 19:29:15 2025 From: dfuchs at openjdk.org (Daniel Fuchs) Date: Mon, 24 Nov 2025 19:29:15 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v3] In-Reply-To: <15AReOBUAseO-BiCWHW7N-OSOcknDc0Box3c90cXRZU=.5d7341db-94ea-4cdf-b3cd-fabe414dd88d@github.com> References: <15AReOBUAseO-BiCWHW7N-OSOcknDc0Box3c90cXRZU=.5d7341db-94ea-4cdf-b3cd-fabe414dd88d@github.com> Message-ID: <_SMDMjDoXuDI_Sujt62HD_YewzTQQlvqMSkpffJKq3A=.64a03981-30d1-48e5-a767-d4121c617296@github.com> On Thu, 13 Nov 2025 09:27:02 GMT, Jatin Bhateja wrote: >>> > > Some quick comments. >>> > > We should be consistent in the naming, and rename `Halfloat*` to `Float16*`. >>> > >>> > >>> > I concur, especially since there are multiple 16-bit floating-point formats in use including the IEEE 754 float16 as well as bfloat16. >>> >>> There are nomenclature issues that I am facing. Currently, all the Float16 concrete classes use the Halffloat prefix i.e., Halffloat64Vector, Halffloat128Vector; converting these to Float16 looks a little confusing, i.e., Float1664Vector, Float16128Vector, etc Kindly suggest a better name to represent these classes. >> >> Maybe we move the shape to the end e.g., `Float16Vector128`, `IntVector128`, `IntVectorMax`? > >> > > > Some quick comments. >> > > > We should be consistent in the naming, and rename `Halfloat*` to `Float16*`. >> > > >> > > >> > > I concur, especially since there are multiple 16-bit floating-point formats in use including the IEEE 754 float16 as well as bfloat16. >> > >> > >> > There are nomenclature issues that I am facing. Currently, all the Float16 concrete classes use the Halffloat prefix i.e., Halffloat64Vector, Halffloat128Vector; converting these to Float16 looks a little confusing, i.e., Float1664Vector, Float16128Vector, etc Kindly suggest a better name to represent these classes. >> >> Maybe we move the shape to the end e.g., `Float16Vector128`, `IntVector128`, `IntVectorMax`? > > This looks good, since all these are concrete vector classes not exposed to users. @jatin-bhateja it looks like you should be merging latest changes from master; Some changes shown in the diff obviously do not belong to this fix: https://github.com/openjdk/jdk/pull/28002/files#diff-7798f606ce2bbf96fd99999c8c0ef9a4bb0455c128dd7e1249dea8db23d35402 Hopefully merging latest changes from master will make them go away? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3571013379 From jbhateja at openjdk.org Mon Nov 24 19:29:14 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 24 Nov 2025 19:29:14 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v3] In-Reply-To: References: Message-ID: <6ma1bZs5YmEe_PtNmR69pVoJ_YAWy5fUQrsnnk8nH9M=.0594b623-f494-4af3-8e1c-f88120c53aca@github.com> > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Cleanups - Adding support for custom basic type T_FLOAT16, passing BasicType lane types to inline expander entries - Cleaning up interface as per review suggestions - Some cleanups - Fix some JTREG failures - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Revamped JTreg test generation and bug fixes - Cleanups - Removing redundant warmup constraint - ... and 5 more: https://git.openjdk.org/jdk/compare/8bafc2f0...f34d324f ------------- Changes: https://git.openjdk.org/jdk/pull/28002/files Webrev: Webrev is not available because diff is too large Stats: 509516 lines in 232 files changed: 281237 ins; 226539 del; 1740 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From jbhateja at openjdk.org Mon Nov 24 19:29:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 24 Nov 2025 19:29:17 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v3] In-Reply-To: References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> Message-ID: On Thu, 13 Nov 2025 19:47:52 GMT, Paul Sandoz wrote: >>> The basic type codes are declared and shared across Java and HotSpot - it's used in `LaneType`. Can we pass a single argument that is the basic type instead of two arguments. HotSpot should know from the basic type what the carrier class and also what the operation type without it being explicitly told, since presumably it knew the inverse - the basic type from the element class. >> >> Hi @PaulSandoz, T_HALFFLOAT used in LaneType is mainly used for differentiation of various cache keys used by conversion operation lookups. In principle, we can extend VM to acknowledge this new custom basic type on the lines of T_METADATA / T_ADDRESS; its scope for now will be restricted to VectorSupport. We can gradually expose this to C2 type, such that TypeVect for all Float16 VectorIR uses T_HALFFLOAT as its basic type; currently, we use T_SHORT as the lane type. Let me know if this looks reasonable > >> > The basic type codes are declared and shared across Java and HotSpot - it's used in `LaneType`. Can we pass a single argument that is the basic type instead of two arguments. HotSpot should know from the basic type what the carrier class and also what the operation type without it being explicitly told, since presumably it knew the inverse - the basic type from the element class. >> >> Hi @PaulSandoz, T_HALFFLOAT used in LaneType is mainly used for differentiation of various cache keys used by conversion operation lookups. In principle, we can extend VM to acknowledge this new custom basic type on the lines of T_METADATA / T_ADDRESS; its scope for now will be restricted to VectorSupport. We can gradually expose this to C2 type, such that TypeVect for all Float16 VectorIR uses T_HALFFLOAT as its basic type; currently, we use T_SHORT as the lane type. Let me know if this looks reasonable > > I am proposing something simpler, really as a temporary step until `Float16` becomes part of the `java.base` module. IIUC from the basic type we can reliably determine what the two arguments we currently passing are e.g., T_HALFFLOAT = { short.class, VECTOR_TYPE_FP16 }. So we don't need to pass two arguments, we can just pass one, the intrinsic can lookup the class and operation type kind. Hi @PaulSandoz, I have addressed your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3572377706 From jbhateja at openjdk.org Mon Nov 24 19:30:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 24 Nov 2025 19:30:57 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v12] In-Reply-To: References: Message-ID: <1ekSot1GL4DhXoVb7M-nbqilK0YLqilL4V0UJbs_b8U=.35b09ea0-5fc9-4e19-a892-71307ee62066@github.com> On Tue, 18 Nov 2025 23:53:13 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Review comments resolutions > > Looks much better now, Jatin. > > It looks like `Matcher::should_attempt_register_biasing()` has some implicit expectations about `mdef` shape. Is it possible to materialize them (as asserts on mach nodes with `Flag_ndd_demotable` or `Flag_ndd_commutative` flags set)? So, a misplaced declaration can be caught during testing. Hi @iwanowww , Please let me know if this is good to land now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3572384737 From vpaprotski at openjdk.org Mon Nov 24 20:52:43 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 24 Nov 2025 20:52:43 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v4] In-Reply-To: References: Message-ID: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into avx2-ntt - next set of comments - whitespace - address first comments - Merge remote-tracking branch 'origin/master' into avx2-ntt - add copyright, whitespace and test jtreg tags - Fixes and comments from Anas - AVX2 and AVX512 intrinsics for MLDSA ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28136/files - new: https://git.openjdk.org/jdk/pull/28136/files/b04f4f0d..cefa021a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=02-03 Stats: 242832 lines in 2033 files changed: 165193 ins; 41903 del; 35736 mod Patch: https://git.openjdk.org/jdk/pull/28136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28136/head:pull/28136 PR: https://git.openjdk.org/jdk/pull/28136 From vpaprotski at openjdk.org Mon Nov 24 20:52:45 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 24 Nov 2025 20:52:45 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 22:55:07 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > next set of comments @mcpowers Thanks for tests! @ferakocz thanks for the review! I think I took them all in, except for the montMul comment section.. Not quite what I meant so tried to reword.. see if it helps any? ------------- PR Review: https://git.openjdk.org/jdk/pull/28136#pullrequestreview-3502030137 From vpaprotski at openjdk.org Mon Nov 24 20:53:01 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 24 Nov 2025 20:53:01 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 15:35:12 GMT, Ferenc Rakoczi wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> next set of comments > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 88: > >> 86: // +-----+-----+-----+-----+----- >> 87: // >> 88: // NOTE: size 0 and 1 are used for initial and final shuffles respectivelly of > > Typo: respectivelly -> respectively done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 248: > >> 246: // We do Montgomery multiplications of two AVX registers in 4 steps: >> 247: // 1. Do the multiplications of the corresponding even numbered slots into >> 248: // the odd numbered slots of a scratch2 register. > > Typo: scratch2 -> scratch I think I meant "the scratch2" register here.. reworded, please double check if its clearer.. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 250: > >> 248: // the odd numbered slots of a scratch2 register. >> 249: // 2. Swap the even and odd numbered slots of the original input registers.* >> 250: // 3. Similar to step 1, but into output register. > > Typo: into output register -> into an output register used 'the' to be 'specific'.. (I think the lack of articles was causing the confusion.. "the scratch2 register is combined with the output register into scratch.. or something..) Also reworded step 4? > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 253: > >> 251: // 4. Combine the outputs of step 1 and step 3 into the output of the Montgomery >> 252: // multiplication. >> 253: // (*For levels 0-6 in the Ntt and levels 1-7 of the inverse Ntt, need NOT swap > > Typo: unnecessary '(*' at the beginning This was my attempt to add a note to second step.. spelled out "Note"? or can just remove, since swapping only happens on second step.. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 282: > >> 280: const XMMRegister* scratch = scratch1 == input1 ? output: scratch1; >> 281: >> 282: // scratch = input1_even*intput2_even > > Suggestion: // scratch = input1_even * intput2_even done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 479: > >> 477: // level 0 - 128 >> 478: // scratch1 = coeffs3 * zetas1 >> 479: // coeffs3, coeffs1 = coeffs1?scratch1 > > Suggestion: // coeffs3, coeffs1 = coeffs1 ? scratch1 done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 524: > >> 522: // coeffs1_2 = coeffs1_2 + scratch1 >> 523: loadXmms(Zetas3, zetas, level * 512, vector_len, _masm); >> 524: shuffle(Scratch1, Coeffs1_2, Coeffs2_2, distance * 32); //Coeffs2_2 freed > > Suggestion: // Coeffs2_2 freed done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 529: > >> 527: >> 528: loadXmms(Zetas3, zetas, 4*64 + level * 512, vector_len, _masm); >> 529: shuffle(Scratch1, Coeffs3_2, Coeffs4_2, distance * 32); //Coeffs4_2 freed > > Suggestion: // Coeffs4_2 freed done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 554: > >> 552: const XMMRegister Coeffs2_2[] = {xmm4, xmm5, xmm6, xmm7}; >> 553: >> 554: // Since we cannot fit the entire payload into registers, we process > > process input -> process the input done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 555: > >> 553: >> 554: // Since we cannot fit the entire payload into registers, we process >> 555: // input in two stages. First half, load 8 registers 32 integers each apart. > > First half -> For the first half done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 557: > >> 555: // input in two stages. First half, load 8 registers 32 integers each apart. >> 556: // With one load, we can process level 0-2 (128-, 64- and 32-integers apart) >> 557: // Remaining levels, load 8 registers from consecutive memory (16-, 8-, 4-, > > Remaining -> For the remaining done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 558: > >> 556: // With one load, we can process level 0-2 (128-, 64- and 32-integers apart) >> 557: // Remaining levels, load 8 registers from consecutive memory (16-, 8-, 4-, >> 558: // 2-, 1-integer appart) > > appart -> apart Thanks! Looks like I've always misspelled that word! :) > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 559: > >> 557: // Remaining levels, load 8 registers from consecutive memory (16-, 8-, 4-, >> 558: // 2-, 1-integer appart) >> 559: // Levels 5, 6, 7 (4-, 2-, 1-integer appart) require shuffles within registers > > appart -> apart done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 560: > >> 558: // 2-, 1-integer appart) >> 559: // Levels 5, 6, 7 (4-, 2-, 1-integer appart) require shuffles within registers >> 560: // Other levels, shuffles can be done by re-aranging register order > > Other -> on the other > re-aranging register order -> rearranging the register order done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 562: > >> 560: // Other levels, shuffles can be done by re-aranging register order >> 561: >> 562: // Four batches of 8 registers each, 128 bytes appart > > appart -> apart done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 701: > >> 699: // In each of these iterations half of the coefficients are added to and >> 700: // subtracted from the other half of the coefficients then the result of >> 701: // the substration is (Montgomery) multiplied by the corresponding zetas. > > substration -> subtraction (I know this was in my own comment :-( ) done (funny, thats exactly how I say "substraction" in my head too :D ) > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 850: > >> 848: } >> 849: >> 850: // Four batches of 8 registers each, 128 bytes appart > > appart -> apart done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557555908 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557577559 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557589525 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557582314 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557592866 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557595337 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557596482 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557596698 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557599194 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557606458 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557606672 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557608631 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557611103 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557611341 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557616181 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557620647 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2557621206 From vlivanov at openjdk.org Mon Nov 24 20:57:25 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 24 Nov 2025 20:57:25 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v18] In-Reply-To: References: Message-ID: On Sun, 23 Nov 2025 03:15:17 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. Thanks, Jatin. Overall, looks good. src/hotspot/cpu/x86/x86.ad line 2641: > 2639: } > 2640: > 2641: if (mdef->num_opnds() <= oper_index || mdef->operand_index(oper_index) < 0 || IMO all 4 checks (plus, `mdef->in(mdef->operand_index(oper_index)) != null`) can be considered as structural invariants for mach nodes marked as NDD demotable. So, I'd like to see an assert ensuring they don't fail for NDD-demotable nodes. Speaking of the code shape, I suggest to restore previous shape with explicit checks before the switch and add asserts before returning negative result: static boolean is_ndd_demotable() { return ((mdef->flags() & Node::PD::Flag_ndd_demotable) != 0) || ((mdef->flags() & Node::PD::Flag_ndd_demotable_commutative) != 0)); } bool Matcher::is_register_biasing_candidate(const MachNode* mdef, int oper_index) { if (mdef == nullptr) { return false; } if (mdef->num_opnds() <= oper_index || mdef->operand_index(oper_index) < 0 || mdef->in(mdef->operand_index(oper_index)) != nullptr) { assert(!is_ndd_demotable(mdef), "%s", mdef->Name()); return false; } // Complex memory operand covers multiple incoming edges needed for // address computation, biasing def towards any address component will not // result into NDD demotion by assembler. if (mdef->operand_num_edges(oper_index) != 1) { assert(!is_ndd_demotable(mdef), "%s", mdef->Name()); return false; } // Demotion candidate must be register mask compatible with definition. const RegMask& oper_mask = mdef->in_RegMask(mdef->operand_index(oper_index)); if (!oper_mask.overlap(mdef->out_RegMask())) { assert(!is_ndd_demotable(mdef), "%s", mdef->Name()); return false; } switch (oper_index) { // First operand of MachNode corresponding to Intel APX NDD selection // pattern can share its assigned register with definition operand if // their live ranges do not overlap. In such a scenario we can demote // it to legacy map0/map1 instruction by replacing its 4-byte extended // EVEX prefix with shorter REX/REX2 encoding. Demotion candidates // are decorated with a special flag by instruction selector. case 1: return is_ndd_demotable(mdef); // Definition operand of commutative operation can be biased towards second operand. case 2: return (mdef->flags() & Node::PD::Flag_ndd_demotable_commutative) != 0; // Current scheme only selects up to two biaising candidates default: assert(false, "unhandled operand index: %s", mdef->Name()); break; } return false; } src/hotspot/share/opto/chaitin.cpp line 1475: > 1473: > 1474: OptoReg::Name PhaseChaitin::select_bias_lrg_color(LRG &lrg) { > 1475: uint bias_lrg1_idx = lrg._copy_bias; Do I get it right that `_copy_bias2 != 0` implies `_copy_bias != 0`? Can you enforce it? (Here and in `PhaseChaitin::bias_color` where `_copy_bias2` is initialized.) src/hotspot/share/opto/chaitin.cpp line 1498: > 1496: // the chances of register sharing once the bias live range > 1497: // becomes the part of IFG. > 1498: uint bias_lrg = lrgs(bias_lrg1_idx).degrees_of_freedom() > How does it work when `bias_lrg1_idx` or `bias_lrg2_idx` indices are 0? Original code had a guard against such condition. src/hotspot/share/opto/chaitin.cpp line 1538: > 1536: > 1537: // Try biasing the color with non-interfering bias live range[s]. > 1538: if (lrg._copy_bias != 0 || lrg._copy_bias2 != 0) { IMO you can drop `(lrg._copy_bias != 0 || lrg._copy_bias2 != 0)` guard. Original code didn't check it and there are enough guard in `select_bias_lrg_color` to catch it. ------------- PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3502091982 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2557677038 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2557617768 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2557609213 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2557605287 From ascarpino at openjdk.org Mon Nov 24 21:04:35 2025 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Mon, 24 Nov 2025 21:04:35 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 22:55:07 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > next set of comments Marked as reviewed by ascarpino (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28136#pullrequestreview-3502221580 From vpaprotski at openjdk.org Mon Nov 24 21:16:03 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 24 Nov 2025 21:16:03 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v5] In-Reply-To: References: Message-ID: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: comments from Ferenc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28136/files - new: https://git.openjdk.org/jdk/pull/28136/files/cefa021a..691e1dfc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=03-04 Stats: 23 lines in 1 file changed: 0 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/28136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28136/head:pull/28136 PR: https://git.openjdk.org/jdk/pull/28136 From vpaprotski at openjdk.org Mon Nov 24 22:01:17 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 24 Nov 2025 22:01:17 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v6] In-Reply-To: References: Message-ID: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: spelling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28136/files - new: https://git.openjdk.org/jdk/pull/28136/files/691e1dfc..bfc16f1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28136/head:pull/28136 PR: https://git.openjdk.org/jdk/pull/28136 From sviswanathan at openjdk.org Mon Nov 24 22:01:18 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 24 Nov 2025 22:01:18 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v6] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 21:56:32 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > spelling Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28136#pullrequestreview-3502367169 From vlivanov at openjdk.org Mon Nov 24 22:23:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 24 Nov 2025 22:23:51 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v27] In-Reply-To: References: Message-ID: On Sat, 15 Nov 2025 02:28:55 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > IR test cases Reviews, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3572961780 From dlong at openjdk.org Tue Nov 25 00:56:46 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Nov 2025 00:56:46 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Sat, 22 Nov 2025 11:15:39 GMT, Quan Anh Mai wrote: >> Also, I think it would make sense to target this for jdk 27 after the fork. > > @dean-long I think you are misunderstanding, the answer is incorrect because the result it gives, `java/nio/file/spi/FileSystemProvider:AnyNull *,iid=top`, is empty, while the correct answer is the set which contains the single value `null`, and is not empty. The reason for this inaccuracy is that there are 2 LCAs for the inputs on the lattice, and they do not subtype each other, we choose the wrong one out of those LCAs. Thanks @merykitty. I do hope I am not misunderstanding. I would argue that in theory, empty is correct here and null is wrong, but in practice I don't think it matters except for dead code. Can you come up with an example where the difference matters? C2 only remembers one Klass, so it makes sense to use LCA(Klass1,Klass2) for meet(Klass1,Klass) in the summarized result, even though it loses information. But I would argue that join(Klass1,Klass2) should not be using LCA at all, and that using null, NotNull, or AnyNull to join two classes that do not subtype seems wrong to me, and I think it is notable that Graal does not do that. But again I think it may be a moot point if it makes no difference in practice. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3573324465 From duke at openjdk.org Tue Nov 25 01:01:06 2025 From: duke at openjdk.org (Kirill Shirokov) Date: Tue, 25 Nov 2025 01:01:06 GMT Subject: RFR: 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces [v2] In-Reply-To: References: Message-ID: > This PR addresses the trailing whitespaces for a .py test. > > They were introduced in commit 916694f2c1e7fc8d6a88e7026bc2d29ba2923849 and not detected by jcheck, since checking *.py for whitespaces is not enabled in .jcheck/conf. > > So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? Kirill Shirokov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8344345-clean-up-trailing-spaces - 8344345: File test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27058/files - new: https://git.openjdk.org/jdk/pull/27058/files/7b0dfca6..345b18f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27058&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27058&range=00-01 Stats: 550196 lines in 5575 files changed: 379888 ins; 107789 del; 62519 mod Patch: https://git.openjdk.org/jdk/pull/27058.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27058/head:pull/27058 PR: https://git.openjdk.org/jdk/pull/27058 From fyang at openjdk.org Tue Nov 25 02:42:23 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 25 Nov 2025 02:42:23 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v6] In-Reply-To: <7kh5C9nj7bf6432cG35kDDvV6zhnKEspe8AcYetJ1do=.e1d9ebd3-d80d-4621-8c1e-c77dc721d0df@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <7kh5C9nj7bf6432cG35kDDvV6zhnKEspe8AcYetJ1do=.e1d9ebd3-d80d-4621-8c1e-c77dc721d0df@github.com> Message-ID: On Mon, 24 Nov 2025 11:56:26 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix is_unordered src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2141: > 2139: case BoolTest::gt: > 2140: cmov_fp_cmp_fp_gt(op1, op2, dst, src, cmp_single, cmov_single); > 2141: log_warning(jit)("Float/Double BoolTest::gt path is not tested well, please report the test case!"); My local tests show this does happen. Try this: `$ make test TEST="./test/jdk/javax/sound/midi/Gervill/SoftFilter/TestProcessAudio.java" TEST_VM_OPTS="-XX:-TieredCompilation"` I think this could be a good reference if you want to add some extra tests for the two cases here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2558363671 From jbhateja at openjdk.org Tue Nov 25 03:04:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Nov 2025 03:04:01 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v6] In-Reply-To: References: Message-ID: <7-u4fTT6SMiqErNn-Xl7o8UTVF2NIV5m0DAhStsbsk0=.5f51025e-8ed8-4d2f-911c-1257b272f9f7@github.com> On Mon, 24 Nov 2025 22:01:17 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > spelling Very nice work @vpaprotsk , Please also add in comments the links to original reference implimentation. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 365: > 363: > 364: static void loadXmms(const XMMRegister destinationRegs[], Register source, int offset, > 365: int vector_len, MacroAssembler *_masm, int regCnt = -1, int memStep = -1) { Suggestion: int vector_len, MacroAssembler *_masm, int regCnt = -1, int memStep = -1) { src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 381: > 379: > 380: static void storeXmms(Register destination, int offset, const XMMRegister xmmRegs[], > 381: int vector_len, MacroAssembler *_masm, int regCnt = -1, int memStep = -1) { Suggestion: int vector_len, MacroAssembler *_masm, int regCnt = -1, int memStep = -1) { src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 659: > 657: // zetas (int[128*8]) = c_rarg1 > 658: static address generate_dilithiumAlmostInverseNtt_avx(StubGenerator *stubgen, > 659: int vector_len,MacroAssembler *_masm) { Fix indentation src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 718: > 716: > 717: // Constants for shuffle and montMul64 > 718: __ mov64(scratch, 0b1010101010101010); 64 bit constant suffix src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 901: > 899: // poly2 (int[256]) = c_rarg2 > 900: static address generate_dilithiumNttMult_avx(StubGenerator *stubgen, > 901: int vector_len, MacroAssembler *_masm) { Fix indentation src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 939: > 937: vector_len, scratch); // 2^64 mod q > 938: if (vector_len == Assembler::AVX_512bit) { > 939: __ mov64(scratch, 0b0101010101010101); Add long constant suffix src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 985: > 983: // constant (int) = c_rarg1 > 984: static address generate_dilithiumMontMulByConstant_avx(StubGenerator *stubgen, > 985: int vector_len, MacroAssembler *_masm) { Fix indentation src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1026: > 1024: __ evpbroadcastd(constant, rConstant, Assembler::AVX_512bit); // constant multiplier > 1025: > 1026: __ mov64(scratch, 0b0101010101010101); //dw-mask Constant suffix ------------- PR Review: https://git.openjdk.org/jdk/pull/28136#pullrequestreview-3503056034 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558380867 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558381318 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558385868 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558390552 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558390067 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558370904 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558391135 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2558371478 From dlong at openjdk.org Tue Nov 25 03:31:59 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Nov 2025 03:31:59 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack Message-ID: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. I played around with other approaches, like: 1. setting the stack to empty 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set but in the end I decided to go with the minimal localized low-risk change. ------------- Commit messages: - set reexecute flag when calling rethrow_C Changes: https://git.openjdk.org/jdk/pull/28486/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28486&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370766 Stats: 14 lines in 2 files changed: 8 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28486.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28486/head:pull/28486 PR: https://git.openjdk.org/jdk/pull/28486 From duke at openjdk.org Tue Nov 25 04:27:40 2025 From: duke at openjdk.org (Harshit470250) Date: Tue, 25 Nov 2025 04:27:40 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v7] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge remote-tracking branch 'origin/master' - remove whitespace - Resolved a bug - remove TODO comments - ... and 7 more: https://git.openjdk.org/jdk/compare/2bc08f71...b094f06d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/9e184e43..b094f06d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=05-06 Stats: 3207 lines in 102 files changed: 1863 ins; 922 del; 422 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From jbhateja at openjdk.org Tue Nov 25 06:22:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Nov 2025 06:22:07 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v19] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/bb41ff78..57a9e4bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=17-18 Stats: 58 lines in 2 files changed: 24 ins; 20 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Tue Nov 25 06:22:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Nov 2025 06:22:15 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v18] In-Reply-To: References: Message-ID: <-tq0SbuGsKQSgaK6yNEG_YAIHMYEcd8_YdlENAWeWLY=.c91e3959-5a21-42ee-838e-353e707063a8@github.com> On Mon, 24 Nov 2025 20:45:25 GMT, Vladimir Ivanov wrote: > if (mdef->operand_num_edges(oper_index) != 1) { > assert(!is_ndd_demotable(mdef), "%s", mdef->Name()); > return false; > } We don't need this assertion, NDD commutative operation MachNode may have its first input as memory, but we may pick second input for biasing. > src/hotspot/share/opto/chaitin.cpp line 1475: > >> 1473: >> 1474: OptoReg::Name PhaseChaitin::select_bias_lrg_color(LRG &lrg) { >> 1475: uint bias_lrg1_idx = lrg._copy_bias; > > Do I get it right that `_copy_bias2 != 0` implies `_copy_bias != 0`? Can you enforce it? (Here and in `PhaseChaitin::bias_color` where `_copy_bias2` is initialized.) Bias live ranges are indipendent while marking and during enforcement. > src/hotspot/share/opto/chaitin.cpp line 1498: > >> 1496: // the chances of register sharing once the bias live range >> 1497: // becomes the part of IFG. >> 1498: uint bias_lrg = lrgs(bias_lrg1_idx).degrees_of_freedom() > > > How does it work when `bias_lrg1_idx` or `bias_lrg2_idx` indices are 0? Original code had a guard against such condition. Original code performed upfront non-zero check on copy_bias and then either selects the copy bias or constrain the register mask of defintion, now also we are checking for both the copybias upfront and select first or second bias which are also guarded with non-zero bias check else select the bias with minimum degreee of freedom if none of the live ranges are part of IFG. > src/hotspot/share/opto/chaitin.cpp line 1538: > >> 1536: >> 1537: // Try biasing the color with non-interfering bias live range[s]. >> 1538: if (lrg._copy_bias != 0 || lrg._copy_bias2 != 0) { > > IMO you can drop `(lrg._copy_bias != 0 || lrg._copy_bias2 != 0)` guard. Original code didn't check it and there are enough guard in `select_bias_lrg_color` to catch it. Hi @iwanowww , guard checks were also part of [stock code](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/chaitin.cpp#L1496) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2558681296 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2558681850 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2558681898 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2558682129 From epeter at openjdk.org Tue Nov 25 07:06:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 07:06:15 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: <5aBBN149tqWFhlNbZV8DOsxHgK07r5kdQWFfW3MW8ro=.99e66e96-1582-42f4-a9f4-e1fc5c1f18eb@github.com> On Mon, 24 Nov 2025 16:41:54 GMT, Aleksey Shipilev wrote: >> But how do you now know that the `StoreL` is really coming from the merged `StoreI`, and that it is not some other unrelated `StoreL`? The info that it comes from an int-array is relevant here `int[int:4]`, don't you think? >> >> What about all the other MergeStores IR tests? For consistency you would now have to adjust those too, but I hope you don't do that ;) >> `./test/hotspot/jtreg/compiler/c2/TestMergeStores.java` >> >> The motivation seems to be that printing of store nodes was a bit different in JDK25. But then we just have to adjust the matching a bit, maybe weaken the IR rule for backports. But I'd prefer not to weaken the IR rule on mainline. >> >> What do you think? > >> But how do you now know that the `StoreL` is really coming from the merged `StoreI`, and that it is not some other unrelated `StoreL`? > > Well, because there are no `long` stores in Java code at all, so whatever that `StoreL` came from, it is JIT-generated? So then the IR test verifies that whatever happens with EA and MergeStores makes sure the store either goes away, or some merged store remains. I personally dislike overly-specific tests that rely on particulars of optimization sequencing or some such, and would rather have a test that checks the generic final state, without over-specificity. > >> The motivation seems to be that printing of store nodes was a bit different in JDK25. But then we just have to adjust the matching a bit, maybe weaken the IR rule for backports. But I'd prefer not to weaken the IR rule on mainline. > > Yes. I mean, there is a tradeoff somewhere here: either mainline relaxes the test and then JDK 25 matches the test version, or JDK 25 diverges. We _usually_ try to avoid divergences, if we can, because they continuously bite us. If your preference about not relaxing the mainline version is strong, then I can yield and diverge JDK 25. It would likely be literally the same fix I have here. @shipilev Thanks for your response. I'm trying to think through best practices below. As far as I see, this change is motivated by a backport to JDK25, where either the printing of the MergeStore'd StoreL looks different, or MergeStores just does not happen. Since there is a functional difference between JDK25 and JDK26, it is not surprising that we need to adjust the IR rules. > Well, because there are no long stores in Java code at all, so whatever that StoreL came from, it is JIT-generated? Are we sure there cannot be another `StoreL` coming from somewhere else than the Java code? @shipilev The IR rule failed on JDK25, and must have printed the `StoreL`, right? How does that line look like? Maybe it is a really minor difference, and we just adjust the IR rules for the backport ever so slightly. > We usually try to avoid divergences, if we can, because they continuously bite us. I understand that issue, yes. A divergence on backporting that would bite us a little less in terms of test failures in older JDKs: just relax/remove IR rules, and rely on the functional part of the test. There is a risk that the reproducer would not reproduce the issue any more because optimizations are reordered. The failing IR rule would detect that the test is now optimized differently, which is a hint that things are not tested the same way, and probably the original but would not be reproduced any more. The test is likely ineffective now. But it would also be a bit unclear how to then address that without immense effort (to write a different test). But I do think that mainline and going to newer JDK should have more specific IR rules. Should such an IR rule fail for some reason because of new VM changes, then it is probably worth looking into that test. The test is likely to trigger some related behavior. Worst case we can at that point relax/remove the IR rule. > I personally dislike overly-specific tests that rely on particulars of optimization sequencing Right, that is a bit of a risk. I could see a scenario where one is trying to reorder some optimizations, and then suddenly hundreds of IR rules fail because they were too specific. That would be annoying. Maybe @shipilev is right here: maybe we should not use IR rules in regression tests where a very specific ordering of optimizations was required. However, the IR rules would give us a hint that the regression test is still likely to test what we think it tests, so it would increase our confidence. But the question is what one is supposed to do when the IR rule fails... Maybe IR rules should only be used for relatively "simple" tests, that only check that a single optimization happened, and not a very specific order/interaction of optimizations? These are just my thoughts. I have very little experience with backports, so I'd really love to hear some general guidance on this from @vnkozlov . Maybe @chhagedorn also has an idea on the use of IR rules for regression tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28437#issuecomment-3574041572 From vlivanov at openjdk.org Tue Nov 25 07:14:59 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Nov 2025 07:14:59 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v18] In-Reply-To: <-tq0SbuGsKQSgaK6yNEG_YAIHMYEcd8_YdlENAWeWLY=.c91e3959-5a21-42ee-838e-353e707063a8@github.com> References: <-tq0SbuGsKQSgaK6yNEG_YAIHMYEcd8_YdlENAWeWLY=.c91e3959-5a21-42ee-838e-353e707063a8@github.com> Message-ID: On Tue, 25 Nov 2025 06:19:13 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 2641: >> >>> 2639: } >>> 2640: >>> 2641: if (mdef->num_opnds() <= oper_index || mdef->operand_index(oper_index) < 0 || >> >> IMO all 4 checks (plus, `mdef->in(mdef->operand_index(oper_index)) != null`) can be considered as structural invariants for mach nodes marked as NDD demotable. So, I'd like to see an assert ensuring they don't fail for NDD-demotable nodes. >> >> Speaking of the code shape, I suggest to restore previous shape with explicit checks before the switch and add asserts before returning negative result: >> >> static boolean is_ndd_demotable() { >> return ((mdef->flags() & Node::PD::Flag_ndd_demotable) != 0) || >> ((mdef->flags() & Node::PD::Flag_ndd_demotable_commutative) != 0)); >> } >> >> bool Matcher::is_register_biasing_candidate(const MachNode* mdef, int oper_index) { >> if (mdef == nullptr) { >> return false; >> } >> >> if (mdef->num_opnds() <= oper_index || >> mdef->operand_index(oper_index) < 0 || >> mdef->in(mdef->operand_index(oper_index)) != nullptr) { >> assert(!is_ndd_demotable(mdef), "%s", mdef->Name()); >> return false; >> } >> >> // Complex memory operand covers multiple incoming edges needed for >> // address computation, biasing def towards any address component will not >> // result into NDD demotion by assembler. >> if (mdef->operand_num_edges(oper_index) != 1) { >> assert(!is_ndd_demotable(mdef), "%s", mdef->Name()); >> return false; >> } >> >> // Demotion candidate must be register mask compatible with definition. >> const RegMask& oper_mask = mdef->in_RegMask(mdef->operand_index(oper_index)); >> if (!oper_mask.overlap(mdef->out_RegMask())) { >> assert(!is_ndd_demotable(mdef), "%s", mdef->Name()); >> return false; >> } >> >> switch (oper_index) { >> // First operand of MachNode corresponding to Intel APX NDD selection >> // pattern can share its assigned register with definition operand if >> // their live ranges do not overlap. In such a scenario we can demote >> // it to legacy map0/map1 instruction by replacing its 4-byte extended >> // EVEX prefix with shorter REX/REX2 encoding. Demotion candidates >> // are decorated with a special flag by instruction selector. >> case 1: return is_ndd_demotable(mdef); >> >> // Definition operand of commutative operation can be biased towards second operand. >> case 2: return (mdef->flags() & Node::PD::Flag_ndd_demotable_commutative) != 0; >> >> // Current scheme only... > >> if (mdef->operand_num_edges(oper_index) != 1) { >> assert(!is_ndd_demotable(mdef), "%s", mdef->Name()); >> return false; >> } > > We don't need this assertion, NDD commutative operation MachNode may have its first input as memory, but we may pick second input for biasing. Do you have `addI_rReg_rReg_mem_ndd` case in mind here? (It matches `Set dst (AddI src1 (LoadI src2))` and is marked as `Flag_ndd_demotable_commutative`). >> src/hotspot/share/opto/chaitin.cpp line 1475: >> >>> 1473: >>> 1474: OptoReg::Name PhaseChaitin::select_bias_lrg_color(LRG &lrg) { >>> 1475: uint bias_lrg1_idx = lrg._copy_bias; >> >> Do I get it right that `_copy_bias2 != 0` implies `_copy_bias != 0`? Can you enforce it? (Here and in `PhaseChaitin::bias_color` where `_copy_bias2` is initialized.) > > Bias live ranges are indipendent while marking and during enforcement. Do I get it right that memory operands in the first position for commutative operations are the reason why `copy_bias` is invalid while `copy_bias2` is not? >> src/hotspot/share/opto/chaitin.cpp line 1498: >> >>> 1496: // the chances of register sharing once the bias live range >>> 1497: // becomes the part of IFG. >>> 1498: uint bias_lrg = lrgs(bias_lrg1_idx).degrees_of_freedom() > >> >> How does it work when `bias_lrg1_idx` or `bias_lrg2_idx` indices are 0? Original code had a guard against such condition. > > Original code performed upfront non-zero check on copy_bias and then either selects the copy bias or constrain the register mask of defintion, now also we are checking for both the copybias upfront and select first or second bias which are also guarded with non-zero bias check else select the bias with minimum degreee of freedom if none of the live ranges are part of IFG. So, the implicit assumption here (enforced by the caller) is that at least one of `bias_lrg1_idx` and `bias_lrg2_idx` values should be non-zero. At least, it deserves an assert here. >> src/hotspot/share/opto/chaitin.cpp line 1538: >> >>> 1536: >>> 1537: // Try biasing the color with non-interfering bias live range[s]. >>> 1538: if (lrg._copy_bias != 0 || lrg._copy_bias2 != 0) { >> >> IMO you can drop `(lrg._copy_bias != 0 || lrg._copy_bias2 != 0)` guard. Original code didn't check it and there are enough guard in `select_bias_lrg_color` to catch it. > > Hi @iwanowww , guard checks were also part of [stock code](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/chaitin.cpp#L1496) The check you pointed at is performed on the result on LRG lookup (which is placed in `select_bias_lrg_color()` now) and not the original index stored in `_copy_bias`/`_copy_bias2`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2558778478 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2558787562 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2558768375 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2558758109 From duke at openjdk.org Tue Nov 25 07:20:00 2025 From: duke at openjdk.org (Shawn M Emery) Date: Tue, 25 Nov 2025 07:20:00 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v5] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 17:00:04 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Address review comments. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Remove K from AES_Crypt > - More minor cleanup. > - Improve comment and minor cleanup. > - 8371820: Further AES performance improvements for key schedule generation src/hotspot/share/opto/library_call.cpp line 7236: > 7234: address stubAddr = nullptr; > 7235: const char *stubName = nullptr; > 7236: bool is_decrypt= false; nit: s/is_decrypt= false/is_decrypt = false/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2558797828 From dskantz at openjdk.org Tue Nov 25 07:50:40 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 25 Nov 2025 07:50:40 GMT Subject: RFR: 8362117: C2: compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a wrong result due to invalidated liveness assumptions for data phis [v2] In-Reply-To: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com> References: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com> Message-ID: On Wed, 3 Sep 2025 08:02:04 GMT, Daniel Skantz wrote: >> This PR addresses a wrong compilation during string optimizations. >> >> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2. >> >> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch. >> >> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117. >> >> Testing: T1-3 (aed5952). >> >> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test. > > Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: > > - store intermediate calculations > - direction convention A comment to keep this PR active. I think the current approach is at least in the right direction. As Dean points out, the dependency validation logic might be insufficient (seen in e.g. JDK-8367405), but these diamond patterns are treated as as special case in string concat code. The current bug is the result of assuming diamond-region phis are not live [1], but that can't be done if they are parameters of the concatenation. [1] https://github.com/openjdk/jdk/blob/cc5b35bf69dcf9e7e8037642c94e8d7e5847952d/src/hotspot/share/opto/stringopts.cpp#L275 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27028#issuecomment-3574184332 From shade at openjdk.org Tue Nov 25 08:45:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Nov 2025 08:45:22 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 16:41:54 GMT, Aleksey Shipilev wrote: >> But how do you now know that the `StoreL` is really coming from the merged `StoreI`, and that it is not some other unrelated `StoreL`? The info that it comes from an int-array is relevant here `int[int:4]`, don't you think? >> >> What about all the other MergeStores IR tests? For consistency you would now have to adjust those too, but I hope you don't do that ;) >> `./test/hotspot/jtreg/compiler/c2/TestMergeStores.java` >> >> The motivation seems to be that printing of store nodes was a bit different in JDK25. But then we just have to adjust the matching a bit, maybe weaken the IR rule for backports. But I'd prefer not to weaken the IR rule on mainline. >> >> What do you think? > >> But how do you now know that the `StoreL` is really coming from the merged `StoreI`, and that it is not some other unrelated `StoreL`? > > Well, because there are no `long` stores in Java code at all, so whatever that `StoreL` came from, it is JIT-generated? So then the IR test verifies that whatever happens with EA and MergeStores makes sure the store either goes away, or some merged store remains. I personally dislike overly-specific tests that rely on particulars of optimization sequencing or some such, and would rather have a test that checks the generic final state, without over-specificity. > >> The motivation seems to be that printing of store nodes was a bit different in JDK25. But then we just have to adjust the matching a bit, maybe weaken the IR rule for backports. But I'd prefer not to weaken the IR rule on mainline. > > Yes. I mean, there is a tradeoff somewhere here: either mainline relaxes the test and then JDK 25 matches the test version, or JDK 25 diverges. We _usually_ try to avoid divergences, if we can, because they continuously bite us. If your preference about not relaxing the mainline version is strong, then I can yield and diverge JDK 25. It would likely be literally the same fix I have here. > @shipilev The IR rule failed on JDK25, and must have printed the StoreL, right? How does that line look like? Maybe it is a really minor difference, and we just adjust the IR rules for the backport ever so slightly It is `STORE_L`, but not `STORE_L_OF_CLASS`, AFAICS, because there is apparently some subtlety in optimizations. > Maybe @shipilev is right here: maybe we should not use IR rules in regression tests where a very specific ordering of optimizations was required. However, the IR rules would give us a hint that the regression test is still likely to test what we think it tests, so it would increase our confidence. But the question is what one is supposed to do when the IR rule fails... To be precise, I think IR rules that depend on _overly precise_ node attributes are borderline flaky. It would not help when things like [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improve node attribute reporting, and we accidental rely on them in IR tests. I really do think that IR tests are useful, but are only solid for the clear-cut cases of "this node type is definitely not present in the final graph shape" vs "this node is expected to be there in the graph at least once". Everything else is relying that the dice keeps landing in a particular way. Then again, in this particular case, it is not that important at all. I was just following SOP that if we figure the test is flaky (for some definition of it) during backports, we do the test fix in mainline, and then backport the test fix along. If we disagree this one is the flaky test, that's also fine, I'll just diverge JDK 25. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28437#issuecomment-3574404864 From duke at openjdk.org Tue Nov 25 09:00:54 2025 From: duke at openjdk.org (Shawn M Emery) Date: Tue, 25 Nov 2025 09:00:54 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: On Mon, 24 Nov 2025 19:10:35 GMT, Jiangli Zhou wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. > >> It looks good to me now. Please close JDK-8372364 as it was an artifact of the prior fix. > > @sviswa7 thanks for reviewing! > >> @jianglizhou Please wait until someone from the Security Group reviews this - thanks. > > Will do. Thanks. @jianglizhou I'm currently reviewing and testing pre and post changes. Will provide updates shortly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28363#issuecomment-3574489419 From jbhateja at openjdk.org Tue Nov 25 09:05:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Nov 2025 09:05:54 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v4] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Fix failing jtreg test in CI ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28002/files - new: https://git.openjdk.org/jdk/pull/28002/files/f34d324f..aca6cc5d Webrevs: - full: Webrev is not available because diff is too large - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From dbriemann at openjdk.org Tue Nov 25 09:09:32 2025 From: dbriemann at openjdk.org (David Briemann) Date: Tue, 25 Nov 2025 09:09:32 GMT Subject: RFR: 8367487: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on Linux aarch64 with Cavium CPU In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 14:24:19 GMT, David Briemann wrote: > Fix by requiring both floating-point half-precision fphp advanced SIMD half-precision (asimdhp) on aarch64. Hi @eme64 it would be great if you could have a look at this fix since you authored this test. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28390#issuecomment-3574565404 From epeter at openjdk.org Tue Nov 25 09:13:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 09:13:56 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 08:41:50 GMT, Aleksey Shipilev wrote: >>> But how do you now know that the `StoreL` is really coming from the merged `StoreI`, and that it is not some other unrelated `StoreL`? >> >> Well, because there are no `long` stores in Java code at all, so whatever that `StoreL` came from, it is JIT-generated? So then the IR test verifies that whatever happens with EA and MergeStores makes sure the store either goes away, or some merged store remains. I personally dislike overly-specific tests that rely on particulars of optimization sequencing or some such, and would rather have a test that checks the generic final state, without over-specificity. >> >>> The motivation seems to be that printing of store nodes was a bit different in JDK25. But then we just have to adjust the matching a bit, maybe weaken the IR rule for backports. But I'd prefer not to weaken the IR rule on mainline. >> >> Yes. I mean, there is a tradeoff somewhere here: either mainline relaxes the test and then JDK 25 matches the test version, or JDK 25 diverges. We _usually_ try to avoid divergences, if we can, because they continuously bite us. If your preference about not relaxing the mainline version is strong, then I can yield and diverge JDK 25. It would likely be literally the same fix I have here. > >> @shipilev The IR rule failed on JDK25, and must have printed the StoreL, right? How does that line look like? Maybe it is a really minor difference, and we just adjust the IR rules for the backport ever so slightly > > It is `STORE_L`, but not `STORE_L_OF_CLASS`, AFAICS, because there is apparently some subtlety in optimizations. > >> Maybe @shipilev is right here: maybe we should not use IR rules in regression tests where a very specific ordering of optimizations was required. However, the IR rules would give us a hint that the regression test is still likely to test what we think it tests, so it would increase our confidence. But the question is what one is supposed to do when the IR rule fails... > > To be precise, I think IR rules that depend on _overly precise_ node attributes are borderline flaky. It would not help when things like [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improve node attribute reporting, and we accidental rely on them in IR tests. > > I really do think that IR tests are useful, but are only solid for the clear-cut cases of "this node type is definitely not present in the final graph shape" vs "this node is expected to be there in the graph at least once". Everything else is relying that the dice keeps landing in a particular way. > > Then again, in this particular case, it is not that important at all. I was just following SOP that if we figure the test is flaky (for some definition of it) during backports, we do the test fix in mainline, and then backport the test fix along. If we disagree this one is the flaky test, that's also fine, I'll just diverge JDK 25. @shipilev Can you show the dump of the `StoreL` on JDK26 vs JDK25? > It would not help when things like [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improve node attribute reporting, and we accidental rely on them in IR tests. Right. That is a concern. But I would argue that in this case here it is not accidental. We check that it is a `StoreL` to an `int[]`, which is very specific to `MergeStore`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28437#issuecomment-3574583613 From epeter at openjdk.org Tue Nov 25 09:18:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 09:18:31 GMT Subject: RFR: 8367487: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on Linux aarch64 with Cavium CPU In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 14:24:19 GMT, David Briemann wrote: > Fix by requiring both floating-point half-precision fphp advanced SIMD half-precision (asimdhp) on aarch64. Looks reasonable. I'll run some sanity testing :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28390#issuecomment-3574591776 From mdoerr at openjdk.org Tue Nov 25 09:25:25 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 25 Nov 2025 09:25:25 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v6] In-Reply-To: References: Message-ID: <_BiA3wQ_PuxbuapWJg0uG2PSv0_0AAPOmznFOTH4hcU=.08997b37-2cde-417f-891a-779bd7291b1f@github.com> > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix missing whitespace. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28299/files - new: https://git.openjdk.org/jdk/pull/28299/files/30b5b531..ae84912d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28299/head:pull/28299 PR: https://git.openjdk.org/jdk/pull/28299 From mdoerr at openjdk.org Tue Nov 25 09:25:31 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 25 Nov 2025 09:25:31 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v5] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 07:16:08 GMT, Shawn M Emery wrote: >> Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Address review comments. >> - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt >> - Remove K from AES_Crypt >> - More minor cleanup. >> - Improve comment and minor cleanup. >> - 8371820: Further AES performance improvements for key schedule generation > > src/hotspot/share/opto/library_call.cpp line 7236: > >> 7234: address stubAddr = nullptr; >> 7235: const char *stubName = nullptr; >> 7236: bool is_decrypt= false; > > nit: s/is_decrypt= false/is_decrypt = false/ Thanks for catching this! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28299#discussion_r2559217516 From aseoane at openjdk.org Tue Nov 25 09:31:30 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 25 Nov 2025 09:31:30 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review Message-ID: This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. **Testing:** passes tiers 1-4 ------------- Commit messages: - Remove more dead code - Remove dead code Changes: https://git.openjdk.org/jdk/pull/28473/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28473&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8280283 Stats: 21 lines in 4 files changed: 0 ins; 19 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28473.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28473/head:pull/28473 PR: https://git.openjdk.org/jdk/pull/28473 From aseoane at openjdk.org Tue Nov 25 09:31:31 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 25 Nov 2025 09:31:31 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review In-Reply-To: References: Message-ID: <5qBb-5e7j0wN1xNgWq5zLeF7-FxuIWqQjK2UCjaSivI=.9b96ab55-abd0-4021-a621-803dff1508b5@github.com> On Mon, 24 Nov 2025 09:26:13 GMT, Anton Seoane Ampudia wrote: > This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). > > `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. > > This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. > > **Testing:** passes tiers 1-4 Hey bot, please find my issue ------------- PR Comment: https://git.openjdk.org/jdk/pull/28473#issuecomment-3574602474 From mli at openjdk.org Tue Nov 25 09:42:46 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 25 Nov 2025 09:42:46 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v6] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <7kh5C9nj7bf6432cG35kDDvV6zhnKEspe8AcYetJ1do=.e1d9ebd3-d80d-4621-8c1e-c77dc721d0df@github.com> Message-ID: On Tue, 25 Nov 2025 02:38:52 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix is_unordered > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2141: > >> 2139: case BoolTest::gt: >> 2140: cmov_fp_cmp_fp_gt(op1, op2, dst, src, cmp_single, cmov_single); >> 2141: log_warning(jit)("Float/Double BoolTest::gt path is not tested well, please report the test case!"); > > My local tests show this does happen. Try this: > `$ make test TEST="./test/jdk/javax/sound/midi/Gervill/SoftFilter/TestProcessAudio.java" TEST_VM_OPTS="-XX:-TieredCompilation"` > > I think this could be a good reference if you want to add some extra tests for the two cases here. Thanks, I'll check it later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2559281739 From epeter at openjdk.org Tue Nov 25 10:05:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 10:05:14 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 16:56:21 GMT, Roland Westrelin wrote: >> I'm not sure if I'm correct, but I think speculative types themselves may not be consistent. For example, if they are consistent, then you will expect that the profiled types of the return values of a method `a` when calling from method `b` would be a subset of the profiled types of the returned values of `a` in general. However, this may not be the case, as we can ask for the second information first, then another type is introduced, then suddenly a method seems not to return a type `C`, but it does seem to return `C` if calling from `b`. As a result, maybe we can abandon trying to verify the correctness of speculative type computations. >> >> Additionally, in the test case, the speculative type being empty is correct, the path is speculatively unreachable, maybe we can use that information to cut off the branches, simplify the CFG for better compilation? > >> Additionally, in the test case, the speculative type being empty is correct, the path is speculatively unreachable, maybe we can use that information to cut off the branches, simplify the CFG for better compilation? > > Attached is another test case that reproduces the same issue. > [TestSpeculativeTypes.java](https://github.com/user-attachments/files/23659130/TestSpeculativeTypes.java) > > I run that one with: > > $ for i in `seq 100`; do java -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:CompileOnly=TestSpeculativeTypes::test2 -XX:CompileOnly=TestSpeculativeTypes::inlined3 -XX:CompileCommand=quiet -XX:TypeProfileLevel=200 -XX:+AlwaysIncrementalInline -XX:VerifyIterativeGVN=10 -XX:CompileCommand=dontinline,TestSpeculativeTypes::notInlined1 -XX:+StressIncrementalInlining TestSpeculativeTypes || break; done > > It usually fails after a few runs. That one has conflicting profile data but no dead path. What you're suggesting has some risk and unclear benefits so I think would need to be investigated separately. @rwestrel @merykitty @marc-chevalier Suggestion: - Marc adds Roland's new reproducer. - Marc annotates both the existing and Roland/s new reproducer: show where the speculative types come from, and where they flow, and where the Phi with the conflicting speculative type is located. - We keep the speculative type verification. Because I'm not convinced that we should just remove verification yet. In future RFE's we could consider: - Removing verification for speculative types. But I wonder if that is smart. - Reconsider if we should really do the "widening" if we have `above_centerline`. As Roland said: there are risks to this. There must have been some historical reason for this. But who knows, maybe it hurts in more cases than it helps. Speculation is always based on a heuristic, and we would need a few benchmarks to show the impact, and we would also need to run some bigger benchmarks to see the impact. That is a lot of effort that would need to happen outside this bugfix. I suppose it is a tradeoff between code size (win if we cut paths) and cost of recompilation (if we fail speculative checks)? -------------------------------- I suppose it depends on what our definition of "consistent" is for speculative types. Of course they can be wrong at runtime, that is the whole point of speculation, so in a sense that makes them "inconsistent". But we could at least enforce some "consistency" in the computation that we do with the speculative types: intersections and unions of types should be done "consistently", I suppose. I'm trying to reconstruct Quan-Anh's definition of "consistent": > For example, if they are consistent, then you will expect that the profiled types of the return values of a method a when calling from method b would be a subset of the profiled types of the returned values of a in general. Does this kind of problem not also arise for non-speculative types, at least during CCP or IGVN? First I'm thinking of Casts, but maybe those are special. They can have narrower type than their input type. What about Phi nodes? During IGVN, do we always expect the input types to be a subtype of the output type? I guess so? Observation: "widening" like that what happens here because of `above_centerline` and elsewhere like integer types with their "widening" (e.g. widening because loop phi continually grows the range) still means that input types are a subset of the output type. So according to Quan-Anh's definition, we _start_ with an inconsistent speculative type. But would it not be nice if it _eventually became consistent_, specifically after CCP/IGVN. And we could already get local consistency after every `Value` call, right? If yes: it would be nice to have _verification_ for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3574746918 From mchevalier at openjdk.org Tue Nov 25 10:05:14 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 25 Nov 2025 10:05:14 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v3] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 09:51:52 GMT, Emanuel Peter wrote: > In future RFE's we could consider: I might have missed something, but which solution would you then include for now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3574785049 From epeter at openjdk.org Tue Nov 25 10:05:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 10:05:14 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v3] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 09:59:49 GMT, Marc Chevalier wrote: > > In future RFE's we could consider: > > I might have missed something, but which solution would you then include for now? The one you have now: one more round of filtering to get to fixpoint ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3574795335 From epeter at openjdk.org Tue Nov 25 10:07:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 10:07:47 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v2] In-Reply-To: References: Message-ID: <00bVdr5iVqT_AHQiazRM_X9hadRH8_xOJntKv9LCpyQ=.34ee4851-8d64-46a8-8eee-edb67093dee8@github.com> On Fri, 21 Nov 2025 12:53:00 GMT, Galder Zamarre?o wrote: >> Trivial cleanup to move tests out of a test class whose description does not match these tests > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Change copyright to Amazon test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxLongLoopBarrier.java line 1: > 1: /* You might as well also find a better home for this test. We are trying to move away from `irTests`, and move tests to directories with the relevant "topic". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28385#discussion_r2559374376 From epeter at openjdk.org Tue Nov 25 10:13:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 10:13:48 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 23:31:16 GMT, Saranya Natarajan wrote: > **Issue:** Some compiler tests uses randomization but does not have `@key randomness` in the jtreg header. > > **Fix:** The list of test cases that did not have `@key randomness` were listed using `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"`. This PR adds `@key randomness` to these tests. > > **Note:** The following tests that are still listed with `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"` after this PR are confirmed to be helper or support file for actual test. > _test/hotspot/jtreg/compiler/codegen/aes/TestAESBase.java > test/hotspot/jtreg/compiler/compilercontrol/jcmd/StressAddJcmdBase.java > test/hotspot/jtreg/compiler/compilercontrol/parser/HugeDirectiveUtil.java > test/hotspot/jtreg/compiler/compilercontrol/share/scenario/CommandGenerator.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/ArgumentValue.java > test/hotspot/jtreg/compiler/lib/ir_framework/AbstractInfo.java > test/hotspot/jtreg/compiler/lib/ir_framework/CompLevel.java > test/hotspot/jtreg/compiler/lib/generators/Generators.java > test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java > test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java > test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java > test/hotspot/jtreg/compiler/intrinsics/mathexact/Verify.java > test/hotspot/jtreg/compiler/intrinsics/bmi/BMITestRunner.java > test/hotspot/jtreg/compiler/intrinsics/unsafe/ByteBufferTest.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressBooleanArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressIntArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressLongArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressCharArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressObjectArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressByteArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressFloatArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressShortArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressDoubleArrayCopy.java > test/hotspot/jtreg/compiler/codecache/cli/codeheapsize/JVMStartupRunner.java > test/hotspot/jtreg/compiler/vectorapi/reshape/utils/VectorReshapeHelper.java > test/hotspot/jtreg/compiler/jvmci/compilerToVM/DummyClass.java_ @sarannat Thanks for working on this. Looks good ? Though you should also look at the suggestions from the others above ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28463#pullrequestreview-3504316119 From duke at openjdk.org Tue Nov 25 10:19:31 2025 From: duke at openjdk.org (Harshit470250) Date: Tue, 25 Nov 2025 10:19:31 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v8] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge remote-tracking branch 'origin/master' - remove whitespace - Resolved a bug - ... and 8 more: https://git.openjdk.org/jdk/compare/c2cc62be...077d0258 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/b094f06d..077d0258 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=06-07 Stats: 311 lines in 19 files changed: 234 ins; 25 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From epeter at openjdk.org Tue Nov 25 10:50:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 10:50:28 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge In-Reply-To: References: Message-ID: <7mAec3_RM8OXtjnFF8HtWXqjdEtBPLYjtPX9qhScBEk=.90ae6164-422d-40f6-9383-6a625d017dc4@github.com> On Wed, 19 Nov 2025 12:19:02 GMT, Christian Hagedorn wrote: > ### Strong Connection between Template Assertion Predicate and Counted Loop > In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. > > #### Maintaining this Property > In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 > > All other opaque nodes are removed. > > ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes > As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 > > ### Violating the Additional Verification with `-XX:+StressLoopBackedge` > In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: > > image > > After duplicate backedge, the Template Assertion Predicates are now at the outer non-counted `Loop`: > > 4190: > 4191: // Clone Template Assertion Predicates to a target loop. The target loop is the original, not-cloned loop. > 4192: // This is currently only used for StressDuplicateLoopBackedge. Also from the PR description: > Solution >The solution I propose is to clone the Template Assertion Predicates to the inner counted loop. This can be guarded with an ifdef ASSERT because it can only happen with StressLoopBackedge which is a develop flag. This is straight forward and solves this "opaque <-> counted loop" mismatching problem. This makes me a little nervous, applying a fix only to debug. But maybe I don't have to be nervous, let's see ;) Can you add some additional justification? What could go wrong in product if we don't have this fix in product? If I understand right: we just lose the predicates. That's not a correctness issue, we just can't optimize as much. And that's why we have your "Additional Verification". This is not about correctness but only about how much we can optimize, right? If so: you could consider writing that down a bit more explicitly, and also enhancing the message of the assert. What do you think? src/hotspot/share/opto/loopopts.cpp line 4210: > 4208: void visit(const TemplateAssertionPredicate& template_assertion_predicate) override { > 4209: _clone_predicate_to_loop.clone_template_assertion_predicate(template_assertion_predicate); > 4210: template_assertion_predicate.kill(_phase->igvn()); Are we killing the old ones? If so: `clone + kill old` -> `move`. I would suggest calling it `MoveAssertionPredicatesVisitor` src/hotspot/share/opto/loopopts.cpp line 4496: > 4494: > 4495: #ifdef ASSERT > 4496: if (StressDuplicateBackedge && head->is_CountedLoop()) { Could we somehow add an assert here? Would this work? `assert(StressDuplicateBackedge || !head->is_CountedLoop(), "counted loop only expected in stress mode");` That would give us some additional confidence that this only happens in debug. But then why limit the fix to debug, and not apply it to product too, just in case the assert fails? test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java line 150: > 148: * @test id=DataUpdateZGC > 149: * @key randomness > 150: * @bug 8288981 8350577 Suggestion: * @bug 8288981 8350577 8360510 Just a suggestion, feel free to ignore if it makes no sense ;) test/hotspot/jtreg/compiler/predicates/assertion/TestStressDuplicateBackedgeWithAssertionPredicate.java line 28: > 26: * @bug 8360510 > 27: * @summary Test that StressDuplicateBackedge correctly clones Template Assertion Predicates to the inner counted loop. > 28: * @run main/othervm -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:+StressDuplicateBackedge Would it be useful to have a run without any flags? test/hotspot/jtreg/compiler/predicates/assertion/TestStressDuplicateBackedgeWithAssertionPredicate.java line 62: > 60: // The Template Assertion Predicates are still at the outer loop. As a result, we find them to > 61: // be useless in the next predicate elimination call with EliminateUselessPredicates because > 62: // they cannot be found from the inner counted loop. However, we have verification code in place First: cudos on the annotations in this test! Really much appreciated :) I'm a bit confused here. Are you talking about the Template Assertion Predicates of the outer or inner loop here? Because you say both: - `Template Assertion Predicates are still at the outer loop` - `they cannot be found from the inner counted loop` Can you clarify? ------------- PR Review: https://git.openjdk.org/jdk/pull/28389#pullrequestreview-3504334956 PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2559480567 PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2559489861 PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2559498552 PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2559431674 PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2559408515 PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2559426922 From epeter at openjdk.org Tue Nov 25 10:50:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 10:50:29 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge In-Reply-To: <7mAec3_RM8OXtjnFF8HtWXqjdEtBPLYjtPX9qhScBEk=.90ae6164-422d-40f6-9383-6a625d017dc4@github.com> References: <7mAec3_RM8OXtjnFF8HtWXqjdEtBPLYjtPX9qhScBEk=.90ae6164-422d-40f6-9383-6a625d017dc4@github.com> Message-ID: <8sH-bnDA7b3cJdAgGpQstkP6uwtVOGWVNGTNoF6oXiA=.9ed1812a-f052-41c9-909e-cf1ade6b707f@github.com> On Tue, 25 Nov 2025 10:21:26 GMT, Emanuel Peter wrote: >> ### Strong Connection between Template Assertion Predicate and Counted Loop >> In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. >> >> #### Maintaining this Property >> In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): >> https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 >> >> All other opaque nodes are removed. >> >> ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes >> As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): >> https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 >> >> ### Violating the Additional Verification with `-XX:+StressLoopBackedge` >> In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: >> >> image >> >> After duplicate backedge, the Template Assertion Predicates are now at the outer non-... > > test/hotspot/jtreg/compiler/predicates/assertion/TestStressDuplicateBackedgeWithAssertionPredicate.java line 62: > >> 60: // The Template Assertion Predicates are still at the outer loop. As a result, we find them to >> 61: // be useless in the next predicate elimination call with EliminateUselessPredicates because >> 62: // they cannot be found from the inner counted loop. However, we have verification code in place > > First: cudos on the annotations in this test! Really much appreciated :) > > I'm a bit confused here. Are you talking about the Template Assertion Predicates of the outer or inner loop here? Because you say both: > - `Template Assertion Predicates are still at the outer loop` > - `they cannot be found from the inner counted loop` > Can you clarify? Ah, maybe the confusion comes from this: - We have an "inner" loop: the one that becomes empty and is removed. - And an "inner" loop from the duplicate backedge optimization. Is this correct? If yes, name them a bit more precisely ;) Also: are the new inner/outer loops nested? The ascii art suggests that they are in sequence. Maybe drawing some backedges could help? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2559459326 From epeter at openjdk.org Tue Nov 25 11:16:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 11:16:28 GMT Subject: RFR: 8371768: AArch64: test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java fails on SVE after JDK-8340093 In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 11:02:54 GMT, Aleksey Shipilev wrote: > Looks like the test should be more resilient with UseSVE > 0, which _can_ vectorise. It does not look all that reliable to me to failOn when vectorization actually happens. So I dropped some non-arch-specific rules, and amended AArch64-specific rules for UseSVE. > > Testing: > - [x] Linux AArch64 server fastdebug, affected test on machine with UseSVE=1 by default > - [x] Linux AArch64 server fastdebug, affected test on machine with UseSVE=0 overridden @shipilev Thanks for working on this! It is really largely my fault: I did not run testing for this test with SVE (I have no SVE machine), and also failed to ask externals to run testing. My bad, I will do better next time ? However, I think there is some level of misunderstanding I hope to clear up below. In the time since this issue was filed, I finally learned how to use QEMU to run with SVE. So if this is all too much for you here, I could try to run this through QEMU and fix the rules. Background on `failOn`, or what I call "negative tests": I have found it quite helpful to not just test where optimization succeed, but also whee they fail, for some reason we can fix in the future. If anyone ever fixes the issue, then the IR rule fails. One can then go read the description, and even find the issue number. This allows us to detect that one might have fixed another issue, and close that as a duplicate. One would then be expected to adjust the IR rule and turn it from negative to positive. What do you think? Does that make sense to you? ------------------------------------------------------ > Looks like the test should be more resilient with UseSVE > 0, which can vectorise. I only ran testing on NEON, so there are most likely some cases that vectorize with SVE but not NEON. The example I can find on JIRA right now: 1) Method "private static double compiler.loopopts.superword.TestReductions.doubleAddBig()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, applyIfPlatformOr={}, applyIfPlatform={}, failOn={"_#V#L OAD_VECTOR_D#_"}, applyIfOr={}, applyIfCPUFeatureAnd={"asimd", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - failOn: Graph contains forbidden nodes: * Constraint 1: "(\d+(\s){2}(LoadVector.*)+(\s){2}===.*vector[A-Za-z])" - Matched forbidden nodes (18): * 1125 LoadVector === 857 7 874 |351 [[ 1127 1129 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched #vectorx< D,2> This comes from that rule: @IR(failOn = IRNode.LOAD_VECTOR_D, applyIfCPUFeatureAnd = {"asimd", "true"}) // I think this could vectorize, but currently does not. Filed: JDK-8370677 // But: it is not clear that it would be profitable, given the sequential reduction. Turns out this rule passes with NEON, but fails with SVE. Now you just removed the rule, together with my comments. I would prefer if we make the IR rules more specific, using `UseSVE=0`. Interestingly, that is what you did for `doubleMulBig`: image You could then write a IR rule that works for `UseSVE>0`, if you like. Or not if you don't care. Up to you. > It does not look all that reliable to me to failOn when vectorization actually happens. It is not that this rule is "unreliable", it is simply wrong for SVE ;) But one rule that should not be incorrect, and you now removed it in many places: @IR(failOn = IRNode.LOAD_VECTOR_D, applyIf = {"AutoVectorizationOverrideProfitability", "= 0"}) Did you ever see these fail? Because `AutoVectorizationOverrideProfitability=0` disables vectorization, and so you really should not see any vectors here. @shipilev One "quick fix" in case this just creates too much noise in your CI: just add a `@requires` that disables running the test on SVE. Then we can fix the IR rules later and enable the test for SVE again. test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java line 1234: > 1232: applyIf = {"AutoVectorizationOverrideProfitability", "> 0"}) > 1233: @IR(failOn = IRNode.LOAD_VECTOR_I, > 1234: applyIf = {"AutoVectorizationOverrideProfitability", "= 0"}) Did this rule ever fail? `AutoVectorizationOverrideProfitability=0` means we should NOT vectorize on any platform. That should also not happen with SVE. test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java line 1238: > 1236: IRNode.MUL_VI, "> 0"}, > 1237: applyIfCPUFeatureOr = {"asimd", "true"}, > 1238: applyIfAnd = {"AutoVectorizationOverrideProfitability", "> 0", "UseSVE", "0"}) You could add specific IR rules for specific lengths, using `IRNode.VECTOR_SIZE`, just grep for it, there are multiple tests. This also sounds like we should file a follow-up RFE? That was the whole point of the very specific IR rules: we want to have an overview what things do vectorize, and at what vector length: hopefully at the longest that the platform supports. test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java line 1608: > 1606: // cannot use the MulVL as the vector accumulator. > 1607: @IR(failOn = IRNode.LOAD_VECTOR_L, > 1608: applyIf = {"AutoVectorizationOverrideProfitability", "= 0"}) Same comment about `AutoVectorizationOverrideProfitability=0`: it should not vectorize on any platform. Also: why did you remove my comment about NEON? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28423#pullrequestreview-3504489937 PR Comment: https://git.openjdk.org/jdk/pull/28423#issuecomment-3575097794 PR Review Comment: https://git.openjdk.org/jdk/pull/28423#discussion_r2559526184 PR Review Comment: https://git.openjdk.org/jdk/pull/28423#discussion_r2559536779 PR Review Comment: https://git.openjdk.org/jdk/pull/28423#discussion_r2559540428 From bmaillard at openjdk.org Tue Nov 25 11:22:28 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 25 Nov 2025 11:22:28 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 09:56:10 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. >> >> The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for >> `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially >> introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. >> >> >> >> https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 >> >> In our case, it happens that the `Load` node gets folded to a constant during the initial >> `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being >> returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only >> has one usage, and this triggers the optimization during verification. >> >> >> static int test0() { >> var c = new MyClass(); >> // the conversion ensures that the ConL node only has one use >> // in the end, which triggers the optimization >> return (int) c.l; >> } >> >> >> The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, >> because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in >> `PhaseGVN::transform`. >> >> For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created >> and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with >> `can_reshape` later. >> >> >> This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` >> prevents its from occurring when boxing elimination is enabled. Boxing elimination is >> disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), >> which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear >> that the issue was on mainline. >> >> ### Testing >> - [x] [GitHub Actions](TODO) >> - [x] tier1-3, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8367627 > - Add notification in Node::has_special_unique_user > - Add run with -XX:+AlwaysIncrementalInline, and add intermediate run for -XX:-DoEscapeAnalysis > - Record in GraphKit::insert_mem_bar_volatile for consistency > - Improve test and fix > - Add test The issue is fixed now. It seems that we also need to add the constant case in `Node::has_special_unique_user()`. Apparently `-XX:+AlwaysIncrementalInline` delays folding of the `Load` node, which causes the pattern to only appear later, and then we need a notification when the number of uses changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28448#issuecomment-3575135812 From chagedorn at openjdk.org Tue Nov 25 12:37:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 12:37:22 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge [v2] In-Reply-To: References: Message-ID: > ### Strong Connection between Template Assertion Predicate and Counted Loop > In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. > > #### Maintaining this Property > In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 > > All other opaque nodes are removed. > > ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes > As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 > > ### Violating the Additional Verification with `-XX:+StressLoopBackedge` > In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: > > image > > After duplicate backedge, the Template Assertion Predicates are now at the outer non-counted `Loop`: > > References: Message-ID: On Tue, 25 Nov 2025 12:34:38 GMT, Christian Hagedorn wrote: >> ### Strong Connection between Template Assertion Predicate and Counted Loop >> In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. >> >> #### Maintaining this Property >> In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): >> https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 >> >> All other opaque nodes are removed. >> >> ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes >> As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): >> https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 >> >> ### Violating the Additional Verification with `-XX:+StressLoopBackedge` >> In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: >> >> image >> >> After duplicate backedge, the Template Assertion Predicates are now at the outer non-... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8360510 > - Review Emanuel > - Exclude StressDuplicateBackedge for TestVerifyLoopOptimizationsHitsMemLimit.java > - 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge Thanks Emanuel for your detailed review! I pushed an updated and answered in the comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/28389#pullrequestreview-3504782614 From chagedorn at openjdk.org Tue Nov 25 12:37:25 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 12:37:25 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge [v2] In-Reply-To: <7mAec3_RM8OXtjnFF8HtWXqjdEtBPLYjtPX9qhScBEk=.90ae6164-422d-40f6-9383-6a625d017dc4@github.com> References: <7mAec3_RM8OXtjnFF8HtWXqjdEtBPLYjtPX9qhScBEk=.90ae6164-422d-40f6-9383-6a625d017dc4@github.com> Message-ID: On Tue, 25 Nov 2025 10:38:21 GMT, Emanuel Peter wrote: > This makes me a little nervous, applying a fix only to debug. But maybe I don't have to be nervous, let's see ;) See my answer to another comment below. I updated the comment here and at the use-site to make it more explicit why it's guarded with an `ifdef ASSERT`. Let me know if that's clearer. > src/hotspot/share/opto/loopopts.cpp line 4496: > >> 4494: >> 4495: #ifdef ASSERT >> 4496: if (StressDuplicateBackedge && head->is_CountedLoop()) { > > Could we somehow add an assert here? > > Would this work? > `assert(StressDuplicateBackedge || !head->is_CountedLoop(), "counted loop only expected in stress mode");` > > That would give us some additional confidence that this only happens in debug. > > But then why limit the fix to debug, and not apply it to product too, just in case the assert fails? We already have such an assert at the entry of this method which is why I did not add an additional assert here: https://github.com/openjdk/jdk/blob/49176e322bbb9ed1ef2f534b949b937770b54162/src/hotspot/share/opto/loopopts.cpp#L4231-L4235 The fix is limited to debug builds only because we can only have a counted loop here if we run with `StressDuplicateBackedge`. Otherwise, it's a normal `Loop` which does not have Template Assertion Predicates above it. Since `StressDuplicateBackedge` is a develop flag, it will always be false in product. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2559790206 PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2559756340 From chagedorn at openjdk.org Tue Nov 25 12:37:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 12:37:27 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge [v2] In-Reply-To: <8sH-bnDA7b3cJdAgGpQstkP6uwtVOGWVNGTNoF6oXiA=.9ed1812a-f052-41c9-909e-cf1ade6b707f@github.com> References: <7mAec3_RM8OXtjnFF8HtWXqjdEtBPLYjtPX9qhScBEk=.90ae6164-422d-40f6-9383-6a625d017dc4@github.com> <8sH-bnDA7b3cJdAgGpQstkP6uwtVOGWVNGTNoF6oXiA=.9ed1812a-f052-41c9-909e-cf1ade6b707f@github.com> Message-ID: On Tue, 25 Nov 2025 10:31:24 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/predicates/assertion/TestStressDuplicateBackedgeWithAssertionPredicate.java line 62: >> >>> 60: // The Template Assertion Predicates are still at the outer loop. As a result, we find them to >>> 61: // be useless in the next predicate elimination call with EliminateUselessPredicates because >>> 62: // they cannot be found from the inner counted loop. However, we have verification code in place >> >> First: cudos on the annotations in this test! Really much appreciated :) >> >> I'm a bit confused here. Are you talking about the Template Assertion Predicates of the outer or inner loop here? Because you say both: >> - `Template Assertion Predicates are still at the outer loop` >> - `they cannot be found from the inner counted loop` >> Can you clarify? > > Ah, maybe the confusion comes from this: > - We have an "inner" loop: the one that becomes empty and is removed. > - And an "inner" loop from the duplicate backedge optimization. > > Is this correct? If yes, name them a bit more precisely ;) > > Also: are the new inner/outer loops nested? The ascii art suggests that they are in sequence. Maybe drawing some backedges could help? > First: cudos on the annotations in this test! Really much appreciated :) Thanks, I'm glad it was helpful! I updated the comments to make it more explicit what I mean by inner and outer loop. Let me know if that's more clear now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2559817197 From chagedorn at openjdk.org Tue Nov 25 12:44:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 12:44:05 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v3] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 13:16:05 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > IgnoreUnrecognizedVMOptions > But would it not be nice if it eventually became consistent, specifically after CCP/IGVN. Just a side node: We remove the speculative types after incremental inlining here: https://github.com/openjdk/jdk/blob/cc5b35bf69dcf9e7e8037642c94e8d7e5847952d/src/hotspot/share/opto/compile.cpp#L2359-L2363 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3575459832 From chagedorn at openjdk.org Tue Nov 25 12:49:19 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 12:49:19 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 12:39:01 GMT, Anton Seoane Ampudia wrote: > This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. > > The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. > > An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. > > `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. > > **Testing:** passes tiers 1-2 That looks reasonable to me. Can you add a sanity hello world test where we run with `SpecTrapLimitExtraEntries=0` and `SpecTrapLimitExtraEntries=100`? We do not seem to have any tests with that flag apart from one AOT test (`AOTProfileFlags.java`). ------------- PR Review: https://git.openjdk.org/jdk/pull/28451#pullrequestreview-3504922243 From roland at openjdk.org Tue Nov 25 12:52:35 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 25 Nov 2025 12:52:35 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: Message-ID: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - review - review - Merge branch 'master' into JDK-8354282 - review - infinite loop in gvn fix - renaming - merge - Merge branch 'master' into JDK-8354282 - fix & test ------------- Changes: https://git.openjdk.org/jdk/pull/24575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=03 Stats: 353 lines in 13 files changed: 252 ins; 27 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From mchevalier at openjdk.org Tue Nov 25 12:56:56 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 25 Nov 2025 12:56:56 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v3] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 13:16:05 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > IgnoreUnrecognizedVMOptions Interesting, and indeed, in what I know, we crash inside the `inline_incrementally`, a few lines above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3575506243 From roland at openjdk.org Tue Nov 25 13:03:49 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 25 Nov 2025 13:03:49 GMT Subject: RFR: 8366888: C2: incorrect assertion predicate with short running long counted loop [v6] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 13:06:38 GMT, Beno?t Maillard wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8366888 >> - review >> - Merge branch 'master' into JDK-8366888 >> - Merge branch 'master' into JDK-8366888 >> - whitespaces >> - review >> - Merge branch 'master' into JDK-8366888 >> - Update src/hotspot/share/opto/predicates.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/predicates.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - ... and 3 more: https://git.openjdk.org/jdk/compare/a2a3cd79...2d329d48 > > Marked as reviewed by bmaillard (Committer). @benoitmaillard and @chhagedorn : thanks for the reviews and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27250#issuecomment-3575525071 From roland at openjdk.org Tue Nov 25 13:03:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 25 Nov 2025 13:03:52 GMT Subject: Integrated: 8366888: C2: incorrect assertion predicate with short running long counted loop In-Reply-To: References: Message-ID: On Fri, 12 Sep 2025 08:57:57 GMT, Roland Westrelin wrote: > In: > > > for (int i = 100; i < 1100; i++) { > v += floatArray[i - 100]; > Objects.checkIndex(i, longRange); > } > > > The int counted loop has both an int range check and a long range. The > int range check is optimized first. Assertion predicates are inserted > above the loop. One predicates checks that: > > > init - 100 > > The loop is then transformed to enable the optimization of the long > range check. The loop is short running, so there's no need to create a > loop nest. The counted loop is mostly left as is but, the loop's > bounds are changed from: > > > for (int i = 100; i < 1100; i++) { > > > to: > > > for (int i = 0; i < 1000; i++) { > > > The reason for that the long range check transformation expects the > loop to start at 0. > > Pre/main/post loops are created. Template Assertion predicates are > added above the main loop. The loop is unrolled. Initialized assertion > predicates are created. The one created from the condition: > > > init - 100 > > checks the value of `i` out of the pre loop which is 1. That check fails. > > The root cause of the failure is that when bounds of the counted loop > are changed, template assertion predicates need to be updated with and > adjusted init input. > > When the bounds of the loop are known, the assertion predicates can be > updated in place. Otherwise, when the loop is speculated to be short > running, the assertion predicates are updated when they are cloned. This pull request has now been integrated. Changeset: 35f4a741 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/35f4a7410cdaaa9d3ce68148cb81e893ad0d93de Stats: 269 lines in 8 files changed: 257 ins; 3 del; 9 mod 8366888: C2: incorrect assertion predicate with short running long counted loop Co-authored-by: Christian Hagedorn Reviewed-by: chagedorn, bmaillard ------------- PR: https://git.openjdk.org/jdk/pull/27250 From jbhateja at openjdk.org Tue Nov 25 13:04:26 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Nov 2025 13:04:26 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v18] In-Reply-To: References: <-tq0SbuGsKQSgaK6yNEG_YAIHMYEcd8_YdlENAWeWLY=.c91e3959-5a21-42ee-838e-353e707063a8@github.com> Message-ID: On Tue, 25 Nov 2025 07:06:51 GMT, Vladimir Ivanov wrote: >>> if (mdef->operand_num_edges(oper_index) != 1) { >>> assert(!is_ndd_demotable(mdef), "%s", mdef->Name()); >>> return false; >>> } >> >> We don't need this assertion, NDD commutative operation MachNode may have its first input as memory, but we may pick second input for biasing. > > Do you have `addI_rReg_rReg_mem_ndd` case in mind here? (It matches `Set dst (AddI src1 (LoadI src2))` and is marked as `Flag_ndd_demotable_commutative`). Yes, memory operand capture more than one edge, effective address computation is done using BASE + INDEX x SCALE + DISP, there are multiple address computation schemeds using different components but x86 mandates INDEX and BASE must be held in valid registers, we cannot share either of these registers with destination operand and demote. New assertion check should catch violations. >> Bias live ranges are indipendent while marking and during enforcement. > > Do I get it right that memory operands in the first position for commutative operations are the reason why `copy_bias` is invalid while `copy_bias2` is not? Yes, thats correct. I have fined tuned the assertion checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2559911309 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2559909395 From jbhateja at openjdk.org Tue Nov 25 13:04:25 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Nov 2025 13:04:25 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v20] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Review comments resolution - Revert "8367780: Enable UseAPX on Intel CPUs only when both APX_F and APX_NCI_NDD_NF cpuid features are present" This reverts commit 3d4e0491940c4b4a05ac84006933d939370e7e2b. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/57a9e4bf..912d109a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=18-19 Stats: 98 lines in 4 files changed: 28 ins; 29 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From jbhateja at openjdk.org Tue Nov 25 13:04:28 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Nov 2025 13:04:28 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v18] In-Reply-To: References: <-tq0SbuGsKQSgaK6yNEG_YAIHMYEcd8_YdlENAWeWLY=.c91e3959-5a21-42ee-838e-353e707063a8@github.com> Message-ID: On Tue, 25 Nov 2025 06:56:54 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/chaitin.cpp line 1538: >> >>> 1536: >>> 1537: // Try biasing the color with non-interfering bias live range[s]. >>> 1538: if (lrg._copy_bias != 0 || lrg._copy_bias2 != 0) { >> >> IMO you can drop `(lrg._copy_bias != 0 || lrg._copy_bias2 != 0)` guard. Original code didn't check it and there are enough guard in `select_bias_lrg_color` to catch it. > > The check you pointed at is performed on the result on LRG lookup (which is placed in `select_bias_lrg_color()` now) and not the original index stored in `_copy_bias`/`_copy_bias2`. Correct, I have re-structured it now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2559911773 From jbhateja at openjdk.org Tue Nov 25 13:07:59 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Nov 2025 13:07:59 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v21] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/912d109a..d596c232 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=19-20 Stats: 29 lines in 2 files changed: 26 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From epeter at openjdk.org Tue Nov 25 13:11:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 13:11:47 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: <0df3H15uO96P1n3zLpKl5y_RKrAgc1h_V91bGB5mCr8=.06942d05-f66d-442f-a754-8135ac0eec30@github.com> On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews Is this issue at all related to https://github.com/openjdk/jdk/pull/24575? It seems we remove a `CastLL` from the graph, because the input type is wider than the Cast's type, right? If I remember correctly from https://github.com/openjdk/jdk/pull/24575, if a CastLL is narrowing, we don't want to remove it, see `ConstraintCastNode::Identity`. Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3575564456 From roland at openjdk.org Tue Nov 25 13:13:27 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 25 Nov 2025 13:13:27 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v6] In-Reply-To: References: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> Message-ID: On Fri, 21 Nov 2025 23:41:45 GMT, Vladimir Ivanov wrote: > But now I see a slight change in behavior in the following part of `Compile::Compile`: > > ``` > if (_late_inlines.length() == 0 && !has_mh_late_inlines() && !failing() && has_stringbuilder()) { > inline_string_calls(true); > } > ``` > > After `dec_number_of_mh_late_inlines()` is gone, `inline_string_calls()` won't be called during parsing if any MH late inline calls are observed irrespective of whether they are all inlined by that point or not. Good catch. I think we should investigate whether removing the call to `inline_string_calls()` here and letting `Compile::inline_incrementally()` handle all string calls is good enough. That would simply things. My recollection is that there used to be a single call to `inline_string_calls()` and it was right after parsing. When late inlining was added another call was added after all inlining is over. The call after parsing was conservatively left in to make sure nothing gets broken by accident but there's a good chance, a single call after late inlining is good enough. What I propose is: 1) I bring the number_of_mh_late_inlines logic back in this change 2) I file a bug to investigate removal of the `inline_string_calls()` (as a consequence no need for the number_of_mh_late_inlines logic). Does that sound ok to you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3575577207 From epeter at openjdk.org Tue Nov 25 14:14:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 14:14:53 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge [v2] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 12:37:22 GMT, Christian Hagedorn wrote: >> ### Strong Connection between Template Assertion Predicate and Counted Loop >> In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. >> >> #### Maintaining this Property >> In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): >> https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 >> >> All other opaque nodes are removed. >> >> ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes >> As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): >> https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 >> >> ### Violating the Additional Verification with `-XX:+StressLoopBackedge` >> In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: >> >> image >> >> After duplicate backedge, the Template Assertion Predicates are now at the outer non-... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8360510 > - Review Emanuel > - Exclude StressDuplicateBackedge for TestVerifyLoopOptimizationsHitsMemLimit.java > - 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge Looks good, thanks for the updates! ? test/hotspot/jtreg/compiler/predicates/assertion/TestStressDuplicateBackedgeWithAssertionPredicate.java line 76: > 74: // Otherwise, we cannot apply the duplicate backedge optimization to the outer loop. > 75: // 4) Found to be empty and removed. > 76: for (int j = 0; j < 10; j++) {} I think there is still a mild chance for confusion here about the two "inner" loops. You also mix `CountedLoop`, `Counted Loop` and `counted Loop` - OCD triggered ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28389#pullrequestreview-3505268793 PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2560142406 From aseoane at openjdk.org Tue Nov 25 14:19:38 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 25 Nov 2025 14:19:38 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value [v2] In-Reply-To: References: Message-ID: <47YSbEjOSDD6WAnJR1-oivkApBawNVczLFhzWIsr52I=.822ca770-c64a-48e9-a1da-4f0b333d99cb@github.com> > This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. > > The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. > > An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. > > `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. > > **Testing:** passes tiers 1-2 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Add simple sanity test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28451/files - new: https://git.openjdk.org/jdk/pull/28451/files/1822b0a8..5ea304e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28451&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28451&range=00-01 Stats: 40 lines in 1 file changed: 40 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28451.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28451/head:pull/28451 PR: https://git.openjdk.org/jdk/pull/28451 From aseoane at openjdk.org Tue Nov 25 14:19:41 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 25 Nov 2025 14:19:41 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value [v2] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 12:46:02 GMT, Christian Hagedorn wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: >> >> Add simple sanity test > > That looks reasonable to me. Can you add a sanity hello world test where we run with `SpecTrapLimitExtraEntries=0` and `SpecTrapLimitExtraEntries=100`? We do not seem to have any tests with that flag apart from one AOT test (`AOTProfileFlags.java`). Makes sense @chhagedorn. I added a very basic sanity test; let me know if you were looking for something more elaborate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28451#issuecomment-3575858794 From chagedorn at openjdk.org Tue Nov 25 15:31:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 15:31:49 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v23] In-Reply-To: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> References: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> Message-ID: On Fri, 21 Nov 2025 15:54:08 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix trip counter loop-variant detection Thanks for the updates! I first did some skimming first but ended up doing another complete pass. It's already in a good state but I still found some more small things which I left as comments. But then I think it's good to go from my side (regarding the code) :-) About testing: Thanks for doing an extended testing with directly inserting the old code again to have a proper comparison. I first thought it's going to be too tricky which is why I proposed a logging - but I already feared that it's not going to be stable enough. So, I'm glad that you managed to do a old vs. new state! For next steps, I suggest I'll give your patch a spin in our standard testing once you addressed the last comments in this badge. Then I'm also happy to run some more extended testing with your old vs. new counted loop transformation state (would be nice if you can update your branch with the latest review and also merge in latest master). Let me know if you need some help :-) src/hotspot/share/opto/loopnode.cpp line 1656: > 1654: assert(phi != nullptr && phi->in(LoopNode::LoopBackControl) == iv_incr.incr(), "No phi"); > 1655: > 1656: assert(stride.compute_non_zero_stride_con(exit_test.mask(), bt), "illegal condition"); We should be explicit here: Suggestion: assert(stride.compute_non_zero_stride_con(exit_test.mask(), bt) != 0, "illegal condition"); src/hotspot/share/opto/loopnode.cpp line 1686: > 1684: } > 1685: > 1686: // Find the trip-counter increment & limit. Limit must be loop invariant. Suggestion: // Find the trip-counter increment & limit. Limit must be loop invariant. src/hotspot/share/opto/loopnode.cpp line 1694: > 1692: // --------- > 1693: > 1694: if (!_phase->ctrl_is_member(_loop, _incr)) { // Swapped trip counter and limit? We can probably also use `is_invariant()` here src/hotspot/share/opto/loopnode.cpp line 1764: > 1762: } > 1763: > 1764: swap(_xphi, _stride_node); // 'incr' is commutative, so ok to swap Comment indentation is off. Suggestion: if (!_stride_node->is_Con()) { // Oops, swap these if (!_xphi->is_Con()) { // Is the other guy a constant? return; // Nope, unknown stride, bail out } swap(_xphi, _stride_node); // 'incr' is commutative, so ok to swap src/hotspot/share/opto/loopnode.cpp line 1832: > 1830: while (xphi->Opcode() == Op_Cast(_iv_bt)) { > 1831: xphi = xphi->in(1); > 1832: } I'm wondering if this should be part of the `xphi` computation in `LoopIVStride`. Or in other words: Do the other use-sites of `xphi()` do not need this uncast logic? Maybe @rwestrel knows more. src/hotspot/share/opto/loopnode.cpp line 1843: > 1841: Node* sfpt = _back_control->in(0)->in(0); > 1842: if (_loop->_child != nullptr && sfpt->Opcode() == Op_SafePoint) { > 1843: _safepoint = sfpt->as_SafePoint(); For consistency: Suggestion: Node* safepoint = _back_control->in(0)->in(0); if (_loop->_child != nullptr && safepoint->Opcode() == Op_SafePoint) { _safepoint = safepoint->as_SafePoint(); src/hotspot/share/opto/loopnode.cpp line 2035: > 2033: > 2034: // Check trip counter will end up higher than the limit > 2035: const TypeInteger* limit_t = igvn->type(_structure.limit())->is_integer(_iv_bt); Looks like this could now be moved into the only use in `is_infinite_loop()` directly, so you do not need to pass it into as argument. But I see that you reuse it again later in this method. I would have probably still moved it inside `is_infinite_loop()` and re-fetched it further down again. But I leave it up to you to decide :-) src/hotspot/share/opto/loopnode.cpp line 2252: > 2250: // again and can skip the predicate. > 2251: > 2252: int sov = check_stride_overflow(_structure.final_limit_correction(), limit_t, _iv_bt); I suggest to rename it to `stride_overflow_state` or something like that since `sov` is a rather non-intuitive abbreviation. The best thing is probably to turn this into a proper enum since the states -1, 0, and 1 are not that easy to comprehend. I leave it up to you if you also want to do this in this PR - minor detail. src/hotspot/share/opto/loopnode.cpp line 2394: > 2392: // i++; > 2393: // i = i && 0x7fff; > 2394: // } Somehow indentation seems off: Suggestion: // while (true) { // sum + = array[i]; // i++; // i = i && 0x7fff; // } src/hotspot/share/opto/loopnode.cpp line 2548: > 2546: mask = BoolTest::gt; > 2547: else > 2548: ShouldNotReachHere(); We should also add braces here: Suggestion: if (mask == BoolTest::le) { mask = BoolTest::lt; } else if (mask == BoolTest::ge) { mask = BoolTest::gt; } else { ShouldNotReachHere(); } src/hotspot/share/opto/loopnode.cpp line 2572: > 2570: nphi = igvn->register_new_node_with_optimizer(nphi); > 2571: _phase->set_ctrl(nphi, _phase->get_ctrl(phi)); > 2572: igvn->replace_node(_structure.phi(), nphi); You can use `phi` instead of fetching it again with `_structure.phi()`: Suggestion: Node* phi = _structure.phi(); if (!TypeInteger::bottom(_iv_bt)->higher_equal(phi->bottom_type())) { Node* nphi = PhiNode::make(phi->in(0), phi->in(LoopNode::EntryControl), TypeInteger::bottom(_iv_bt)); nphi->set_req(LoopNode::LoopBackControl, phi->in(LoopNode::LoopBackControl)); nphi = igvn->register_new_node_with_optimizer(nphi); _phase->set_ctrl(nphi, _phase->get_ctrl(phi)); igvn->replace_node(phi, nphi); phi = nphi->as_Phi(); } src/hotspot/share/opto/loopnode.cpp line 2593: > 2591: > 2592: // Replace the old IfNode with a new LoopEndNode > 2593: Node* lex = igvn->register_new_node_with_optimizer(BaseCountedLoopEndNode::make(iff->in(0), It's somewhat difficult to follow the logic with the different abbreviations, some referring to the old loop exit and some to the newly created one. Maybe you can improve the naming here by making it more clear what belongs to what. But we could also do that separately at some point since it was like that before and the refactoring has already become quite large :-) src/hotspot/share/opto/loopnode.cpp line 2639: > 2637: _structure.sfpt() != nullptr && > 2638: !_loop->_has_call && > 2639: _phase->is_deleteable_safept(_structure.sfpt()); For me, the old indentation was easier to read. src/hotspot/share/opto/loopnode.cpp line 2707: > 2705: #endif > 2706: > 2707: _phase->C->print_method(PHASE_AFTER_CLOOPS, 3, l); I was thinking about moving this below setting the phi type on the next lines to have this information already available in the IGV dump. I guess you could squeeze that change in here as well. src/hotspot/share/opto/loopnode.cpp line 3067: > 3065: } > 3066: > 3067: //============================================================================= I guess this can also be removed Suggestion: src/hotspot/share/opto/loopnode.cpp line 5474: > 5472: volatile int PhaseIdealLoop::_long_loop_nests=0; // Number of long loops successfully transformed to a nest > 5473: volatile int CountedLoopConverter::_long_loop_counted_loops = 0; // Number of long loops successfully transformed to a > 5474: // counted loop I suggest to move the comment above, then it fits on one line: Suggestion: // Number of long loops successfully transformed to a counted loop volatile int CountedLoopConverter::_long_loop_counted_loops = 0; src/hotspot/share/opto/loopopts.cpp line 4282: > 4280: } > 4281: > 4282: Node* loop_incr = loop_exit.incr(); Can even be made `const`: Suggestion: const Node* loop_incr = loop_exit.incr(); ------------- PR Review: https://git.openjdk.org/jdk/pull/24458#pullrequestreview-3505053490 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560032114 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560052236 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2559970421 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560105572 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560128739 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560135552 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560179453 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560227254 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560183704 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560308736 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560323023 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560380302 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560347353 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560387712 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560064816 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560393952 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560001309 From chagedorn at openjdk.org Tue Nov 25 15:31:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 15:31:52 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v23] In-Reply-To: References: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> Message-ID: <_drU8CPk9ylwjZmBb2E6d-na8ISuD-bwm_gq_VOm-i4=.d0f1b795-9829-46f8-a0b9-9c0aa2ba46a5@github.com> On Tue, 25 Nov 2025 15:00:11 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix trip counter loop-variant detection > > src/hotspot/share/opto/loopnode.cpp line 2572: > >> 2570: nphi = igvn->register_new_node_with_optimizer(nphi); >> 2571: _phase->set_ctrl(nphi, _phase->get_ctrl(phi)); >> 2572: igvn->replace_node(_structure.phi(), nphi); > > You can use `phi` instead of fetching it again with `_structure.phi()`: > Suggestion: > > Node* phi = _structure.phi(); > if (!TypeInteger::bottom(_iv_bt)->higher_equal(phi->bottom_type())) { > Node* nphi = > PhiNode::make(phi->in(0), phi->in(LoopNode::EntryControl), TypeInteger::bottom(_iv_bt)); > nphi->set_req(LoopNode::LoopBackControl, phi->in(LoopNode::LoopBackControl)); > nphi = igvn->register_new_node_with_optimizer(nphi); > _phase->set_ctrl(nphi, _phase->get_ctrl(phi)); > igvn->replace_node(phi, nphi); > phi = nphi->as_Phi(); > } Maybe also rename `nphi` to `new_phi` to better distinguish `phi` from `nphi`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560327529 From chagedorn at openjdk.org Tue Nov 25 15:31:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 15:31:54 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v17] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 14:11:11 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/loopnode.hpp line 1338: >> >>> 1336: _back_control(back_control), >>> 1337: _loop(loop), >>> 1338: _phase(phase) {} >> >> Maybe also add an assert here that `back_control` is non-null. > > I disagree: `back_control` is not nessarily non-null always. In fact, `loop_exit_control()` could return null even if `head` and `loop` are non-null. This is also why the original code explicitly checks this as well. You're right, it could be null. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2560017691 From chagedorn at openjdk.org Tue Nov 25 15:50:40 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 15:50:40 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge [v3] In-Reply-To: References: Message-ID: > ### Strong Connection between Template Assertion Predicate and Counted Loop > In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. > > #### Maintaining this Property > In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 > > All other opaque nodes are removed. > > ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes > As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 > > ### Violating the Additional Verification with `-XX:+StressLoopBackedge` > In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: > > image > > After duplicate backedge, the Template Assertion Predicates are now at the outer non-counted `Loop`: > > References: Message-ID: On Tue, 25 Nov 2025 14:12:04 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8360510 >> - Review Emanuel >> - Exclude StressDuplicateBackedge for TestVerifyLoopOptimizationsHitsMemLimit.java >> - 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge > > test/hotspot/jtreg/compiler/predicates/assertion/TestStressDuplicateBackedgeWithAssertionPredicate.java line 76: > >> 74: // Otherwise, we cannot apply the duplicate backedge optimization to the outer loop. >> 75: // 4) Found to be empty and removed. >> 76: for (int j = 0; j < 10; j++) {} > > I think there is still a mild chance for confusion here about the two "inner" loops. > > You also mix `CountedLoop`, `Counted Loop` and `counted Loop` - OCD triggered ? And you're right! Pushed another update :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28389#discussion_r2560505506 From chagedorn at openjdk.org Tue Nov 25 15:55:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 15:55:11 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value [v2] In-Reply-To: <47YSbEjOSDD6WAnJR1-oivkApBawNVczLFhzWIsr52I=.822ca770-c64a-48e9-a1da-4f0b333d99cb@github.com> References: <47YSbEjOSDD6WAnJR1-oivkApBawNVczLFhzWIsr52I=.822ca770-c64a-48e9-a1da-4f0b333d99cb@github.com> Message-ID: <_Khf0dztN6uckHdLFa3dA_4SIkIKlftAxK5FQdyGReY=.5880f132-214a-47e3-8643-6aaba084d42c@github.com> On Tue, 25 Nov 2025 14:19:38 GMT, Anton Seoane Ampudia wrote: >> This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. >> >> The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. >> >> An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. >> >> `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. >> >> **Testing:** passes tiers 1-2 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Add simple sanity test Yes, that's perfect, thanks for adding the test! test/hotspot/jtreg/compiler/arguments/TestSpecTrapLimitExtraEntries.java line 25: > 23: > 24: /* > 25: * @test You can add the bug number: Suggestion: * @test * @bug 8364490 test/hotspot/jtreg/compiler/arguments/TestSpecTrapLimitExtraEntries.java line 27: > 25: * @test > 26: * @summary "Hello world" sanity test for SpecTrapLimitExtraEntries > 27: * @requires vm.flagless Is this really required? I would have expected that it also works regardless of the flags being passed in. This gives some more coverage (e.g. running with `-Xcomp` etc.). ------------- PR Review: https://git.openjdk.org/jdk/pull/28451#pullrequestreview-3505739361 PR Review Comment: https://git.openjdk.org/jdk/pull/28451#discussion_r2560512991 PR Review Comment: https://git.openjdk.org/jdk/pull/28451#discussion_r2560514656 From thartmann at openjdk.org Tue Nov 25 16:07:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 25 Nov 2025 16:07:50 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: <82x1Pyi6js121o6bp9z-zIT9zntaYTpeXrVQngbQApQ=.1f760cf8-259f-412d-bcb8-e74a4e424174@github.com> On Mon, 25 Aug 2025 15:24:48 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Thanks for the fix, it looks good to me now :) >> I'm just running some internal testing now, please ping me after the weekend :) > > Thanks a lot for the testing @eme64! I think I need another review to merge it. Hi @jaskarth: Any plans to pick this back up? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3576389451 From duke at openjdk.org Tue Nov 25 16:33:46 2025 From: duke at openjdk.org (Zihao Lin) Date: Tue, 25 Nov 2025 16:33:46 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v8] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 13:12:59 GMT, Roland Westrelin wrote: >> Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - fix assert >> - add more assert >> - rid of access.addr().type() >> - Merge branch 'openjdk:master' into 8344116 >> - Merge branch 'openjdk:master' into 8344116 >> - Merge branch 'openjdk:master' into 8344116 >> - Fix build >> - Fix test failed >> - 8344116: C2: remove slice parameter from LoadNode::make > > src/hotspot/share/opto/callnode.cpp line 1740: > >> 1738: Node* klass_node = in(AllocateNode::KlassNode); >> 1739: Node* proto_adr = phase->transform(new AddPNode(klass_node, klass_node, phase->MakeConX(in_bytes(Klass::prototype_header_offset())))); >> 1740: mark_node = LoadNode::make(*phase, control, mem, proto_adr, TypeX_X, TypeX_X->basic_type(), MemNode::unordered); > > We could assert that C->get_alias_index(kit->type(card_adr) == Compile::AliasIdxRaw I give it a try, but it won't pass the test. Is it possible the original version is wrong? The class mark will not be `TypeRawPtr::BOTTOM`, it should equal to Klass slice index. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2560657848 From jkarthikeyan at openjdk.org Tue Nov 25 16:50:52 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 25 Nov 2025 16:50:52 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: <82x1Pyi6js121o6bp9z-zIT9zntaYTpeXrVQngbQApQ=.1f760cf8-259f-412d-bcb8-e74a4e424174@github.com> References: <82x1Pyi6js121o6bp9z-zIT9zntaYTpeXrVQngbQApQ=.1f760cf8-259f-412d-bcb8-e74a4e424174@github.com> Message-ID: <4Z9dhg1AffMk26YW2o9e0KYmN1mqiZSXpFBWKcQQkyU=.6643d98b-7e85-4340-906b-a2ad861575bb@github.com> On Tue, 25 Nov 2025 16:03:23 GMT, Tobias Hartmann wrote: >> Thanks a lot for the testing @eme64! I think I need another review to merge it. > > Hi @jaskarth: Any plans to pick this back up? Thanks! Hi @TobiHartmann, I haven't had a chance to take a look at this due to being busy with my studies but I'm hoping to pick it back up in a week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3576587122 From chagedorn at openjdk.org Tue Nov 25 16:57:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Nov 2025 16:57:37 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 Message-ID: [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. #### Testing - [X] Tier1 - [X] Tier5 with IR framework internal tests only - [ ] Additionally testing `TestIRMatching.java` and `TestPhaseIRMatching.java` on arm, aarch64, ppc64le, ppc64be s390, and riscv64 (thanks to @mhaessig for taking care of that!) Thanks, Christian ------------- Commit messages: - 8372461: [IR Framework] Multiple test failures after JDK-8371789 Changes: https://git.openjdk.org/jdk/pull/28495/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28495&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372461 Stats: 8 lines in 2 files changed: 0 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28495.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28495/head:pull/28495 PR: https://git.openjdk.org/jdk/pull/28495 From epeter at openjdk.org Tue Nov 25 17:06:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 17:06:18 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge [v3] In-Reply-To: References: Message-ID: <3jucenA5rmOrRvQWRpAJJmOsMf_eBOLWgpAWjV6Y-Lc=.d520ea0f-0a4e-436f-9427-2629f00b8354@github.com> On Tue, 25 Nov 2025 15:50:40 GMT, Christian Hagedorn wrote: >> ### Strong Connection between Template Assertion Predicate and Counted Loop >> In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. >> >> #### Maintaining this Property >> In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): >> https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 >> >> All other opaque nodes are removed. >> >> ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes >> As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): >> https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 >> >> ### Violating the Additional Verification with `-XX:+StressLoopBackedge` >> In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: >> >> image >> >> After duplicate backedge, the Template Assertion Predicates are now at the outer non-... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Improve comments Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28389#pullrequestreview-3506049018 From aseoane at openjdk.org Tue Nov 25 17:37:27 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 25 Nov 2025 17:37:27 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value [v3] In-Reply-To: References: Message-ID: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> > This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. > > The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. > > An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. > > `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. > > **Testing:** passes tiers 1-2 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28451/files - new: https://git.openjdk.org/jdk/pull/28451/files/5ea304e3..49097207 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28451&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28451&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28451.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28451/head:pull/28451 PR: https://git.openjdk.org/jdk/pull/28451 From aseoane at openjdk.org Tue Nov 25 17:37:32 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 25 Nov 2025 17:37:32 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value [v2] In-Reply-To: <_Khf0dztN6uckHdLFa3dA_4SIkIKlftAxK5FQdyGReY=.5880f132-214a-47e3-8643-6aaba084d42c@github.com> References: <47YSbEjOSDD6WAnJR1-oivkApBawNVczLFhzWIsr52I=.822ca770-c64a-48e9-a1da-4f0b333d99cb@github.com> <_Khf0dztN6uckHdLFa3dA_4SIkIKlftAxK5FQdyGReY=.5880f132-214a-47e3-8643-6aaba084d42c@github.com> Message-ID: On Tue, 25 Nov 2025 15:49:17 GMT, Christian Hagedorn wrote: >> Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: >> >> Add simple sanity test > > test/hotspot/jtreg/compiler/arguments/TestSpecTrapLimitExtraEntries.java line 27: > >> 25: * @test >> 26: * @summary "Hello world" sanity test for SpecTrapLimitExtraEntries >> 27: * @requires vm.flagless > > Is this really required? I would have expected that it also works regardless of the flags being passed in. This gives some more coverage (e.g. running with `-Xcomp` etc.). No, not really. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28451#discussion_r2560783224 From epeter at openjdk.org Tue Nov 25 17:47:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Nov 2025 17:47:01 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account Message-ID: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> **Summary** I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. Reasons for this benchmark: - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. - There are some known issues we can demonstrate well with this benchmark: - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. - Small iteration count loops: auto-vectorization can lead to slowdowns. - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. ---------------------------------------------------------------------- **Analysis based on this Benchmark** Analysis done in this PR: - Arrays: auto vectorization vs scalar loops performance - Arrays: auto vectorization loops vs intrinsics - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` Future work: - Investigate deeper, inspect assembly, etc. - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) - Performance comparison with Graal. ---------------------------------------------------------------------- **Array Benchmark: auto vectorization vs scalar** We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_long` we have a "phase-transition" around 64, that goes steeper rather than flatter. `linux_x64_oci` arrays_sw_linux_x64_oci_server `windows_x64_oci` arrays_sw_windows_x64_oci_server `macosx_x64_sandybridge` arrays_sw_macosx_x64_sandybridge `linux_aarch64` arrays_sw_linux_aarch64_server `macosx_aarch64` arrays_sw_macosx_aarch64 ---------------------------------------------------------------------- **Array Benchmark: auto vectorization vs intrinsics** Observations: - `linux_x64_oci` and `windows_x64_oci`: - `Objects`: - `System.arraycopy` has vectorized intrinsic, loop does not auto vectorize. - `Arrays.fill`: for `null` it seems to be fast between 0-70 elements, then slow. Why, and why don't we have faster intrinsics here? ? - Null loop seems significantly faster than the others. Why? ? - `byte`, `char`, `short`: all behave very similar. - Intrinsics perform very well, and have distinct "steps". - Auto vectorization loops are slower for all except 0 elements. That is not surprising at small iteration counts (0-150), see [JDK-8344085](https://bugs.openjdk.org/browse/JDK-8344085). But for larger iteration counts (150-300), it is probably due to something else, maybe unrolling factor? ? - `int`, `long`, `float`, `double`: 4-byte and 8-byte types behave the same on both platforms. - copy: `linux_x64` consistently performs better with `System.arraycopy` (intrinsic) and worse with auto vectorization. But `windows_x64` has better auto vectorization for elements 0-50/100, and then performs getter with the intrinsic for larger sizes. In some cases the lines are parallel (just constant performance difference), in others the lines diverge (different unrolling factor?). I suspect we don't get consistent performance, one platform is probably AVX2 and the other AVX512. Investigate ? - fill: strangely, the platforms are more consistent here. The intrinsics are a little faster in all cases, compared to auto vectorization. Investigate ? - `macosx_x64_sandybridge`: similar to `x64` platforms above, but a bit different because it has a different AVX support. Intrinsics are generally performing better, except for the fill null loop, just like for above. - `aarch64`: - The plots look a little "cleaner", less noise. The performance is also less "zig-zag-y", especially with larger iteration counts. - `Object`: - copy: intrinsics are massively faster, of course no vectorization for loops. - fill: null cases are much faster, and intrinsic is a little faster still, but not much. But no fast intrinsic for variable fill. How can the intrinsic be so massively faster? ? - Primitives: - copy: intrinsic is consistently solidly faster, except for 8-byte types: on one of the two platforms it looks that auto vectorization is only a bit slower for 0-250, and may even become faster above 300 iterations. Investigate ? - fill: - 8-byte types: performance is identical for all versions. - 1-4 byte types: - `macosx_aarch64`: seems to have issues with the zero fill intrinsic: It has very eradic performance behaviour above 256 bytes. Investigate ? - `linux_aarch64`: zero fill intrinsic: at first a little slower than var fill instinsic, but after about 400 bytes it becomes very significantly faster. - Auto vectorization is slower than the var fill intrinsic. Investigate ? The big questions from above: - `x64` for `Objects`: What's up with the fill null intrinsic above 70 elements? Why is the intrinsic slower than the fill zero loop for more than 70 elements? Are we using 4 or 8 byte pointers? - `x64` for `Primitives`: both intrinsic and loop vectorize - but why do we still see a performance difference, both for large and small iteration counts? - `aarch64` for `Objects`: why are the copy intrinsics so massively faster compared to loop? It is more than what vectorization could explain, it seems. - `aarch64` for `Primitives`: Why are intrinsics faster than auto vectorization, in many cases? - `macosx_aarch64` eradic perf behaviour above 256 bytes, why? `linux_x64_oci` arrays_linux_x64_oci `windows_x64_oci` arrays_windows_x64_oci `macosx_x64_sandybridge` arrays_macosx_x64_sandybridge `linux_aarch64` arrays_linux_aarch64 `macosx_aarch64` arrays_macosx_aarch64 ---------------------------------------------------------------------- **Memory Segment Benchmark** Quick analysis: - Auto vectorization is quite a bit slower than the `MemorySegment.copy/fill`. But there are some strange performance behaviours on x64 machines. I suspect it has to do with memory alignment: `MemorySegment.fill` probably does not align memory, and so it gets penalized for split loads/stores. - Just like with arrays: for small iteration counts (0-32) we get a regression with auto vectorization, compared to scalar performance. `linux_x64_oci` ms_linux_x64_oci_server `windows_x64_oci` ms_windows_x64_oci_server `macosx_x64_sandybridge` ms_macosx_x64_sandybridge `linux_aarch64` ms_linux_aarch64_server `macosx_aarch64` ms_macosx_aarch64 ------------- Commit messages: - more MS types - fix MS fill - more backing types - object array benchmarks - fix bm - ms bm update - clean up benchmark - more types - improve benchmark - Merge branch 'master' into JDK-8367158-fill-and-copy-benchmarks - ... and 4 more: https://git.openjdk.org/jdk/compare/44964181...40a80d79 Changes: https://git.openjdk.org/jdk/pull/27315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27315&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8367158 Stats: 1055 lines in 2 files changed: 1055 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27315/head:pull/27315 PR: https://git.openjdk.org/jdk/pull/27315 From qamai at openjdk.org Tue Nov 25 17:49:19 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 25 Nov 2025 17:49:19 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: <0df3H15uO96P1n3zLpKl5y_RKrAgc1h_V91bGB5mCr8=.06942d05-f66d-442f-a754-8135ac0eec30@github.com> References: <0df3H15uO96P1n3zLpKl5y_RKrAgc1h_V91bGB5mCr8=.06942d05-f66d-442f-a754-8135ac0eec30@github.com> Message-ID: On Tue, 25 Nov 2025 13:07:54 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> reviews > > Is this issue at all related to https://github.com/openjdk/jdk/pull/24575? > > It seems we remove a `CastLL` from the graph, because the input type is wider than the Cast's type, right? > > If I remember correctly from https://github.com/openjdk/jdk/pull/24575, if a CastLL is narrowing, we don't want to remove it, see `ConstraintCastNode::Identity`. > > Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? @eme64 Yes, it is indeed similar. The issue here is that after loop opts, we try to remove almost all `CastNode`s so that the graph can be GVN-ed better (think of `x = a + b` and `y = cast(a) + b`). > Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? Macro expansion tries to be smart for an array copy and does this: byte[] dst; byte[] src; int len; if (len <= 32) { int casted_len = cast(len, 0, 32); vectormask mask = VectorMaskGen(casted_len); vector v = LoadVectorMasked(src, 0, mask); StoreVectorMasked(dst, 0, v, mask); } else { // do the copy normally; } As you can see, the masked accesses are only meaningful if `len <= 32`. But after loop opts, the cast is gone, leaving us with a len which happens to be larger than `32`. The path should be dead, but IGVN reaches the `LoadVectorMaskedNode` first, which triggers the assert. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3576813301 From vpaprotski at openjdk.org Tue Nov 25 20:12:26 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 25 Nov 2025 20:12:26 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v7] In-Reply-To: References: Message-ID: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: comments from Jatin ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28136/files - new: https://git.openjdk.org/jdk/pull/28136/files/bfc16f1f..094051e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28136&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28136/head:pull/28136 PR: https://git.openjdk.org/jdk/pull/28136 From vpaprotski at openjdk.org Tue Nov 25 20:12:28 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 25 Nov 2025 20:12:28 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v6] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 22:01:17 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > spelling Thanks for the review Jatin. Re links/references.. This was original work, apart from the base from Ferenc.. I did have a look at the original reference from IBM but Ferenc's multiply was already better. ------------- PR Review: https://git.openjdk.org/jdk/pull/28136#pullrequestreview-3505809840 From sviswanathan at openjdk.org Tue Nov 25 20:12:29 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 25 Nov 2025 20:12:29 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v7] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 20:09:36 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comments from Jatin Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28136#pullrequestreview-3506642227 From vpaprotski at openjdk.org Tue Nov 25 20:12:34 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 25 Nov 2025 20:12:34 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v6] In-Reply-To: <7-u4fTT6SMiqErNn-Xl7o8UTVF2NIV5m0DAhStsbsk0=.5f51025e-8ed8-4d2f-911c-1257b272f9f7@github.com> References: <7-u4fTT6SMiqErNn-Xl7o8UTVF2NIV5m0DAhStsbsk0=.5f51025e-8ed8-4d2f-911c-1257b272f9f7@github.com> Message-ID: On Tue, 25 Nov 2025 02:50:41 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> spelling > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 365: > >> 363: >> 364: static void loadXmms(const XMMRegister destinationRegs[], Register source, int offset, >> 365: int vector_len, MacroAssembler *_masm, int regCnt = -1, int memStep = -1) { > > Suggestion: > > int vector_len, MacroAssembler *_masm, int regCnt = -1, int memStep = -1) { done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 381: > >> 379: >> 380: static void storeXmms(Register destination, int offset, const XMMRegister xmmRegs[], >> 381: int vector_len, MacroAssembler *_masm, int regCnt = -1, int memStep = -1) { > > Suggestion: > > int vector_len, MacroAssembler *_masm, int regCnt = -1, int memStep = -1) { done > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 659: > >> 657: // zetas (int[128*8]) = c_rarg1 >> 658: static address generate_dilithiumAlmostInverseNtt_avx(StubGenerator *stubgen, >> 659: int vector_len,MacroAssembler *_masm) { > > Fix indentation I dont think this is any better: static address generate_dilithiumAlmostInverseNtt_avx(StubGenerator *stubgen, int vector_len, MacroAssembler *_masm) { I prefer more lines on the screen instead. I also didn't see anything in hotspot-style.md specifically on function declaration style so figure it is up to me. Did add a space after the coma. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 718: > >> 716: >> 717: // Constants for shuffle and montMul64 >> 718: __ mov64(scratch, 0b1010101010101010); > > 64 bit constant suffix Note the `0b` prefix. `0b0000000000000000000000000000000000000000000000000101010101010101UL` is worse. And the very next line is using the constant as a 16bit value > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 901: > >> 899: // poly2 (int[256]) = c_rarg2 >> 900: static address generate_dilithiumNttMult_avx(StubGenerator *stubgen, >> 901: int vector_len, MacroAssembler *_masm) { > > Fix indentation as above > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 939: > >> 937: vector_len, scratch); // 2^64 mod q >> 938: if (vector_len == Assembler::AVX_512bit) { >> 939: __ mov64(scratch, 0b0101010101010101); > > Add long constant suffix as above > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 985: > >> 983: // constant (int) = c_rarg1 >> 984: static address generate_dilithiumMontMulByConstant_avx(StubGenerator *stubgen, >> 985: int vector_len, MacroAssembler *_masm) { > > Fix indentation as above > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1026: > >> 1024: __ evpbroadcastd(constant, rConstant, Assembler::AVX_512bit); // constant multiplier >> 1025: >> 1026: __ mov64(scratch, 0b0101010101010101); //dw-mask > > Constant suffix as above ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2561127351 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2561128486 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2560573897 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2560581463 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2560583718 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2560585705 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2560635291 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2561171198 From vlivanov at openjdk.org Tue Nov 25 21:59:24 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Nov 2025 21:59:24 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v21] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 13:07:59 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions Thanks for clarifications, Jatin. Looks good. src/hotspot/cpu/x86/x86.ad line 2689: > 2687: // operand. > 2688: case 2: > 2689: return (mdef->flags() & Node::PD::Flag_ndd_demotable_commutative) != 0; `is_ndd_demotable_commutative()` can be used instead. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3507014825 PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2561594552 From vpaprotski at openjdk.org Tue Nov 25 22:45:57 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 25 Nov 2025 22:45:57 GMT Subject: Integrated: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements In-Reply-To: References: Message-ID: On Tue, 4 Nov 2025 16:38:49 GMT, Volodymyr Paprotski wrote: > - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline > - `SignatureBench.MLDSA` is 1.2x-2.2x faster > - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) > - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version > - `SignatureBench.MLDSA` is upto 5% faster, never slower > > Note on intrinsic: > - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. > - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 > > Tests and benchmarks: > - Added a fuzz test to ensure Java and intrinsic produces exactly same result > - Added benchmark to measure the performance of intrinsic itself > > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" > make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" > make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" This pull request has now been integrated. Changeset: b36b6947 Author: Volodymyr Paprotski URL: https://git.openjdk.org/jdk/commit/b36b69470968b1578877cfe9658892a5fe44e38e Stats: 1827 lines in 6 files changed: 1124 ins; 255 del; 448 mod 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements Reviewed-by: sviswanathan, mpowers, ascarpino ------------- PR: https://git.openjdk.org/jdk/pull/28136 From vlivanov at openjdk.org Tue Nov 25 22:48:15 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Nov 2025 22:48:15 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v6] In-Reply-To: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> References: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> Message-ID: <-kd-AfwkJebk8njImn0KeKvUCQnwoiqLr96cKCovlFc=.30649d16-8dee-4c9d-b1eb-ac9d7e9df86a@github.com> On Fri, 21 Nov 2025 11:33:42 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370939 > - review > - Merge branch 'master' into JDK-8370939 > - review > - more > - more > - more > - more > - test > - fix Sure, I'm fine either way. There are known cases when `dec_number_of_mh_late_inlines()` call is missing, so the patch as it is now looks fine as well considering we'll investigate the effects on `inline_string_calls()` call. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3577907027 From dholmes at openjdk.org Wed Nov 26 00:57:59 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 26 Nov 2025 00:57:59 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v7] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 20:12:26 GMT, Volodymyr Paprotski wrote: >> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1" >> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comments from Jatin The new test can only run on x86 but it is not restricted to x86, thus it fails when run on Aarch64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3578268951 From syan at openjdk.org Wed Nov 26 02:17:49 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 26 Nov 2025 02:17:49 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Additionally testing `TestIRMatching.java` and `TestPhaseIRMatching.java` on arm, aarch64, ppc64le, ppc64be s390, and riscv64 (thanks to @mhaessig for taking care of that!) > > Thanks, > Christian After apply the propose patch, the tests include testlibrary_tests/ir_framework/examples/IRExample.java testlibrary_tests/ir_framework/tests/TestIRMatching.java testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java run passed with fastdebug build on linux-x64. ------------- Marked as reviewed by syan (Committer). PR Review: https://git.openjdk.org/jdk/pull/28495#pullrequestreview-3508036272 From duke at openjdk.org Wed Nov 26 04:44:50 2025 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 26 Nov 2025 04:44:50 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v5] In-Reply-To: References: Message-ID: <_q96s_UbnHbgnVHqppMwnZ7J-_WEslZk7J3E0GQVbW0=.e4d9cfd3-5196-4aa3-9509-e2c309a33740@github.com> On Mon, 24 Nov 2025 17:00:04 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Address review comments. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Remove K from AES_Crypt > - More minor cleanup. > - Improve comment and minor cleanup. > - 8371820: Further AES performance improvements for key schedule generation The internal testing came back clean and performed AESReinit benchmarks on aarch64, where a 8.9% performance gain was observed with the complete set of changes! ------------- Marked as reviewed by smemery at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/28299#pullrequestreview-3508635701 From duke at openjdk.org Wed Nov 26 05:21:54 2025 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 26 Nov 2025 05:21:54 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: <3MOzUPn45y6gloe4p1JAazZByYIzKEi7jldiIb_iSA4=.62ca62c0-800d-48a2-b9d6-08b4066197f1@github.com> On Sun, 23 Nov 2025 04:54:15 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. Was able to recreate the issue with the provided test case with messageSize 100101 before the fix and verified that the same test passed with the fix in place! Will try to finish the code review shortly. ------------- PR Review: https://git.openjdk.org/jdk/pull/28363#pullrequestreview-3508800232 From epeter at openjdk.org Wed Nov 26 06:12:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 06:12:55 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account In-Reply-To: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: On Tue, 16 Sep 2025 14:28:12 GMT, Emanuel Peter wrote: > **Summary** > > I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. > Reasons for this benchmark: > - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. > - There are some known issues we can demonstrate well with this benchmark: > - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. > - Small iteration count loops: auto-vectorization can lead to slowdowns. > - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. > - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. > > ---------------------------------------------------------------------- > > **Analysis based on this Benchmark** > > Analysis done in this PR: > - Arrays: auto vectorization vs scalar loops performance > - Arrays: auto vectorization loops vs intrinsics > - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` > > Future work: > - Investigate deeper, inspect assembly, etc. > - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. > - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? > - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? > - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) > - Performance comparison with Graal. > > ---------------------------------------------------------------------- > > **Array Benchmark: auto vectorization vs scalar** > > We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. > > Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_long` we have a "phase-transition" around 64, that goes steeper rather... Note: there are related benchmarks in https://github.com/openjdk/jdk/pull/28260, but they do not take the same approach to "randomize" alignment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27315#issuecomment-3579394031 From chagedorn at openjdk.org Wed Nov 26 06:22:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 06:22:49 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 02:14:48 GMT, SendaoYan wrote: >> [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: >> >> - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". >> - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. >> - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: >> https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 >> I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. >> >> #### Testing >> - [X] Tier1 >> - [X] Tier5 with IR framework internal tests only >> - [ ] Additionally testing `TestIRMatching.java` and `TestPhaseIRMatching.java` on arm, aarch64, ppc64le, ppc64be s390, and riscv64 (thanks to @mhaessig for taking care of that!) >> >> Thanks, >> Christian > > After apply the propose patch, the tests include testlibrary_tests/ir_framework/examples/IRExample.java testlibrary_tests/ir_framework/tests/TestIRMatching.java testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java run passed with fastdebug build on linux-x64. Thanks @sendaoYan for your review and verifying it as well! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3579437027 From chagedorn at openjdk.org Wed Nov 26 06:23:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 06:23:50 GMT Subject: RFR: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge [v3] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 15:50:40 GMT, Christian Hagedorn wrote: >> ### Strong Connection between Template Assertion Predicate and Counted Loop >> In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. >> >> #### Maintaining this Property >> In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): >> https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 >> >> All other opaque nodes are removed. >> >> ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes >> As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): >> https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 >> >> ### Violating the Additional Verification with `-XX:+StressLoopBackedge` >> In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: >> >> image >> >> After duplicate backedge, the Template Assertion Predicates are now at the outer non-... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Improve comments Thanks Emanuel for your review! ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28389#issuecomment-3579440813 From duke at openjdk.org Wed Nov 26 06:27:48 2025 From: duke at openjdk.org (Francesco Nigro) Date: Wed, 26 Nov 2025 06:27:48 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account In-Reply-To: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: On Tue, 16 Sep 2025 14:28:12 GMT, Emanuel Peter wrote: > **Summary** > > I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. > Reasons for this benchmark: > - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. > - There are some known issues we can demonstrate well with this benchmark: > - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. > - Small iteration count loops: auto-vectorization can lead to slowdowns. > - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. > - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. > > ---------------------------------------------------------------------- > > **Analysis based on this Benchmark** > > Analysis done in this PR: > - Arrays: auto vectorization vs scalar loops performance > - Arrays: auto vectorization loops vs intrinsics > - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` > > Future work: > - Investigate deeper, inspect assembly, etc. > - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. > - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? > - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? > - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) > - Performance comparison with Graal. > > ---------------------------------------------------------------------- > > **Array Benchmark: auto vectorization vs scalar** > > We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. > > Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_long` we have a "phase-transition" around 64, that goes steeper rather... Changes requested by franz1981 at github.com (no known OpenJDK username). test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java line 155: > 153: > 154: @CompilerControl(CompilerControl.Mode.INLINE) > 155: public static int offsetLoad(int i) { return i % 64; } it's a minor but `& 63`: since i is not proven to be positive, C2 doesn't strength reduce the modulus into the cheaper form (&). you can mask `i` stripping out the negative bits too, and should work the same. ------------- PR Review: https://git.openjdk.org/jdk/pull/27315#pullrequestreview-3509101760 PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2563459324 From duke at openjdk.org Wed Nov 26 06:27:50 2025 From: duke at openjdk.org (Francesco Nigro) Date: Wed, 26 Nov 2025 06:27:50 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account In-Reply-To: References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: On Wed, 26 Nov 2025 06:24:38 GMT, Francesco Nigro wrote: >> **Summary** >> >> I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. >> Reasons for this benchmark: >> - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. >> - There are some known issues we can demonstrate well with this benchmark: >> - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. >> - Small iteration count loops: auto-vectorization can lead to slowdowns. >> - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. >> - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. >> >> ---------------------------------------------------------------------- >> >> **Analysis based on this Benchmark** >> >> Analysis done in this PR: >> - Arrays: auto vectorization vs scalar loops performance >> - Arrays: auto vectorization loops vs intrinsics >> - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` >> >> Future work: >> - Investigate deeper, inspect assembly, etc. >> - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. >> - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? >> - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? >> - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) >> - Performance comparison with Graal. >> >> ---------------------------------------------------------------------- >> >> **Array Benchmark: auto vectorization vs scalar** >> >> We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. >> >> Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_... > > test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java line 155: > >> 153: >> 154: @CompilerControl(CompilerControl.Mode.INLINE) >> 155: public static int offsetLoad(int i) { return i % 64; } > > it's a minor but `& 63`: since i is not proven to be positive, C2 doesn't strength reduce the modulus into the cheaper form (&). > you can mask `i` stripping out the negative bits too, and should work the same. Same applies elsewhere in the bench ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2563462420 From dlong at openjdk.org Wed Nov 26 06:31:22 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 26 Nov 2025 06:31:22 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v2] In-Reply-To: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: > The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. > > I played around with other approaches, like: > 1. setting the stack to empty > 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() > 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set > but in the end I decided to go with the minimal localized low-risk change. Dean Long has updated the pull request incrementally with one additional commit since the last revision: remove extra spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28486/files - new: https://git.openjdk.org/jdk/pull/28486/files/a319cc08..8f89b007 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28486&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28486&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28486.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28486/head:pull/28486 PR: https://git.openjdk.org/jdk/pull/28486 From dlong at openjdk.org Wed Nov 26 06:31:23 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 26 Nov 2025 06:31:23 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v2] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Wed, 26 Nov 2025 05:58:53 GMT, SendaoYan wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> remove extra spaces > > test/hotspot/jtreg/compiler/exceptions/TestAccessErrorInCatch.java line 37: > >> 35: * @run main/othervm -Xbatch >> 36: * -XX:CompileCommand=compileonly,IllegalAccessInCatch*::test >> 37: * -XX:+IgnoreUnrecognizedVMOptions -XX:+VerifyStack > > Maybe one whitespace will be enough between two options? Fixed. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28486#discussion_r2563459018 From jbhateja at openjdk.org Wed Nov 26 06:51:35 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 26 Nov 2025 06:51:35 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v22] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26283/files - new: https://git.openjdk.org/jdk/pull/26283/files/d596c232..2fe2ab2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=20-21 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From epeter at openjdk.org Wed Nov 26 06:53:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 06:53:53 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account In-Reply-To: References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: On Wed, 26 Nov 2025 06:25:33 GMT, Francesco Nigro wrote: >> test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java line 155: >> >>> 153: >>> 154: @CompilerControl(CompilerControl.Mode.INLINE) >>> 155: public static int offsetLoad(int i) { return i % 64; } >> >> it's a minor but `& 63`: since i is not proven to be positive, C2 doesn't strength reduce the modulus into the cheaper form (&). >> you can mask `i` stripping out the negative bits too, and should work the same. > > Same applies elsewhere in the bench Good idea! Though I think in all call sites of `offsetLoad` we know that `i` is positive, so I suspect that this is already enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2563547610 From epeter at openjdk.org Wed Nov 26 06:53:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 06:53:53 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account In-Reply-To: References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: On Wed, 26 Nov 2025 06:49:24 GMT, Emanuel Peter wrote: >> Same applies elsewhere in the bench > > Good idea! Though I think in all call sites of `offsetLoad` we know that `i` is positive, so I suspect that this is already enough. I can also do "shift and mask" for `offsetStore`. Though I would expect C2 to do the transformation already as well here too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2563553425 From jbhateja at openjdk.org Wed Nov 26 06:57:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 26 Nov 2025 06:57:53 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v21] In-Reply-To: References: Message-ID: <71WSAr8Qj3QpCUcgfQHQraMTfRMK2lbPcPiY2LMg5JU=.c313bbb5-af13-4853-a0f4-deefa1c0f7dd@github.com> On Tue, 25 Nov 2025 21:56:37 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Review comments resolutions > > Thanks for clarifications, Jatin. Looks good. Thanks @iwanowww , for your comments and review approval. Hi @sviswa7 , @dlunde can you check and re-approve this version. Best Regards ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3579595261 From epeter at openjdk.org Wed Nov 26 07:02:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 07:02:08 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: <0df3H15uO96P1n3zLpKl5y_RKrAgc1h_V91bGB5mCr8=.06942d05-f66d-442f-a754-8135ac0eec30@github.com> Message-ID: On Tue, 25 Nov 2025 17:46:28 GMT, Quan Anh Mai wrote: >> Is this issue at all related to https://github.com/openjdk/jdk/pull/24575? >> >> It seems we remove a `CastLL` from the graph, because the input type is wider than the Cast's type, right? >> >> If I remember correctly from https://github.com/openjdk/jdk/pull/24575, if a CastLL is narrowing, we don't want to remove it, see `ConstraintCastNode::Identity`. >> >> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? > > @eme64 Yes, it is indeed similar. The issue here is that after loop opts, we try to remove almost all `CastNode`s so that the graph can be GVN-ed better (think of `x = a + b` and `y = cast(a) + b`). > >> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? > > Macro expansion tries to be smart for an array copy and does this: > > byte[] dst; > byte[] src; > int len; > if (len <= 32) { > int casted_len = cast(len, 0, 32); > vectormask mask = VectorMaskGen(casted_len); > vector v = LoadVectorMasked(src, 0, mask); > StoreVectorMasked(dst, 0, v, mask); > } else { > // do the copy normally; > } > > As you can see, the masked accesses are only meaningful if `len <= 32`. But after loop opts, the cast is gone, leaving us with a len which happens to be larger than `32`. The path should be dead, but IGVN reaches the `LoadVectorMaskedNode` first, which triggers the assert. @merykitty Thanks for the explanations! So the `CastLL` is a narrowing cast, right? And `ConstraintCastNode::Identity` removes it, because the input type is wider, right? To me this part sounds incorrect. Narrowing casts should only be removed if the input is already narrower. No? Any opinions from @rwestrel ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3579635299 From fyang at openjdk.org Wed Nov 26 07:17:53 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 26 Nov 2025 07:17:53 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 11:30:41 GMT, Anjian Wen wrote: > Support AES CBC intrinsic on RISCV, Already passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto Hi, Thanks for making the changes. I am having a look. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2619: > 2617: // > 2618: // Output: > 2619: // x0 - input length Shouldn't this be `x10`? `x0` is the zero register on riscv. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2727: > 2725: // > 2726: // Output: > 2727: // r0 - input length Same question here. ------------- PR Review: https://git.openjdk.org/jdk/pull/28320#pullrequestreview-3509268954 PR Review Comment: https://git.openjdk.org/jdk/pull/28320#discussion_r2563600347 PR Review Comment: https://git.openjdk.org/jdk/pull/28320#discussion_r2563602419 From duke at openjdk.org Wed Nov 26 07:18:56 2025 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 26 Nov 2025 07:18:56 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: On Sun, 23 Nov 2025 04:54:15 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 26: > 24: /* > 25: * @test > 26: * @bug 8371864 Does it make sense to just run the unit test on architectures with `@requires vm.cpu.features ~= ".*avx512f.*" | vm.cpu.features ~= ".*avx2.*"` annotation? test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 27: > 25: * @test > 26: * @bug 8371864 > 27: * @run main/othervm/timeout=600 TestGCMSplitBound 60 was sufficient for my test runs. test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 59: > 57: private static final int TAG_SIZE_IN_BYTES = 16; > 58: > 59: private Cipher getCipher(final byte[] key, final byte[] aad, final byte[] nonce, int mode) nit: line > 80 characters test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 72: > 70: } > 71: > 72: private byte[] gcmEncrypt(final byte[] key, final byte[] plaintext, final byte[] aad) nit: > 80 characters test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 80: > 78: byte[] output = new byte[len]; > 79: System.arraycopy(nonce, 0, output, 0, IV_SIZE_IN_BYTES); > 80: cipher.doFinal(plaintext, 0, plaintext.length, output, IV_SIZE_IN_BYTES); nit: > 80 characters test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 89: > 87: System.arraycopy(ciphertext, 0, nonce, 0, IV_SIZE_IN_BYTES); > 88: Cipher cipher = getCipher(key, aad, nonce, Cipher.DECRYPT_MODE); > 89: return cipher.doFinal(ciphertext, IV_SIZE_IN_BYTES, ciphertext.length - IV_SIZE_IN_BYTES); nit: > 80 characters test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 118: > 116: encryptAndDecrypt(key, aad, message, PARALLEL_LEN); > 117: } > 118: for (int messageSize = SPLIT_LEN - 300; messageSize <= SPLIT_LEN + 300; messageSize++) { nit: > 80 characters test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 124: > 122: } catch (Exception e) { > 123: throw new RuntimeException( > 124: "Failed for messageSize " + Integer.toHexString(messageSize), e); nit: > 80 characters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2563643535 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2563643699 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2563644171 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2563644443 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2563644686 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2563644886 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2563645061 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2563645227 From chagedorn at openjdk.org Wed Nov 26 07:27:47 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 07:27:47 GMT Subject: RFR: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 09:56:02 GMT, Emanuel Peter wrote: > **Analysis** > > This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. > > The aliasing runtime check happens before the pre-loop. The values needed for the aliasing runtime check thus need to be available not just at the pre-loop, but even earlier: already at the aliasing check. > > Sadly, so far we only ever checked for `is_pre_loop_invariant`, and not `is_available_for_speculative_check`. We now found an example with the fuzzer that has a `pre_init` value that is pinned after the aliasing runtime check but before the pre-loop. Thus it passed the checks, and then created a bad graph (cyclic path, think "use before definition"). > > This a very rare case. Getting the ctrl pinned after the aliasing runtime check but before the pre-loop requires some very specific order of loop-opts, of unroll/pre-main-post/peeling etc. But it can happen, and so we must handle it right. > > Solution: check for the stronger condition `is_available_for_speculative_check`. > > **Future Work** > > We should improve the debug printing when aliasing checks cannot be inserted. Currently the tag `SW_REJECTIONS` is a bit messy, we should fix that up. But it would be too many changes for this bug fix here. > > **Details** > > During `SuperWord`, we want to insert the aliasing runtime check above `687 ParsePredicate` which is annotated with `#Auto_Vectorization_Check`. For this, we require the `pre_init` value: `1244 AddI`. Sadly, this value is pinned lower down. > image > > So when we insert the aliasing runtime check, we create a bad (circular) graph: > image The fix looks good to me, too! I have only one question. src/hotspot/share/opto/vectorization.hpp line 281: > 279: // but the early ctrl is before the predicate. > 280: Node* n_early = phase()->compute_early_ctrl(n, n_ctrl); > 281: return phase()->is_dominator(n_early, check_ctrl); Here, ctrl is too far down but we have a valid early ctrl. Where do update ctrl (i.e. `set_ctrl()`) of `n` to a valid ctrl when you use it for the speculative check? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28449#pullrequestreview-3509349024 PR Review Comment: https://git.openjdk.org/jdk/pull/28449#discussion_r2563673238 From chagedorn at openjdk.org Wed Nov 26 07:58:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 07:58:23 GMT Subject: Integrated: 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 12:19:02 GMT, Christian Hagedorn wrote: > ### Strong Connection between Template Assertion Predicate and Counted Loop > In [JDK-8350579](https://bugs.openjdk.org/browse/JDK-8350579), we fixed the issue that a Template Assertion Predicate for a folded loop A could end up at another loop B. We then created an Initialized Assertion Predicate at loop B from the template of loop A and used the values from the already folded, completely unrelated loop A . As a result, we crashed with a halt because loop B violated the predicate with the wrong values. As a fix, we established a strong connection between Template Assertion Predicates and their associated loop node by adding a direct link from `OpaqueTemplateAssertionPredicate` -> `CountedLoop`. > > #### Maintaining this Property > In `PhaseIdealLoop::eliminate_useless_predicates()`, we walk through all counted loops and only keep those `OpaqueTemplateAssertionPredicate` nodes that can be found from the loop heads and are actually meant for this loop (using the strong connection): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1245-L1249 > > All other opaque nodes are removed. > > ### Additional Verification for Useless `OpaqueTemplateAssertionPredicate` Nodes > As an additional verification for `OpaqueTemplateAssertionPredicate` nodes that are found to be useless in `eliminate_useless_predicates()`, we check that in this case the `CountedLoop` is really dead (otherwise, we should have found the `OpaqueTemplateAssertionPredicate` in our walks through all loop): > https://github.com/openjdk/jdk/blob/d2926dfd9a242928877d0b1e40eac498073975bd/src/hotspot/share/opto/predicates.cpp#L1294-L1301 > > ### Violating the Additional Verification with `-XX:+StressLoopBackedge` > In `PhaseIdealLoop::duplicate_loop_backedge()`, we convert a loop with a merge point into two loops which should enable us to transform the new inner loop into a counted loop. This only makes sense for a `Loop` that is not a counted loop, yet. However, to stress the transformation, we can also run with `-XX:+StressDuplicateBackedge` that also transforms a counted loop into an inner and an outer loop. This is a problem when we have Template Assertion Predicates above a counted loop to be stressed: > > image > > After duplicate backedge, the Template Assertion Predicates are now at the outer non-counted `Loop`: > > URL: https://git.openjdk.org/jdk/commit/275cb9f28799081878e0a7c53ce1c0450f4e963e Stats: 149 lines in 4 files changed: 145 ins; 0 del; 4 mod 8360510: C2: Template Assertion Predicates are not cloned to the inner counted loop with -XX:+StressDuplicateBackedge Reviewed-by: epeter, roland ------------- PR: https://git.openjdk.org/jdk/pull/28389 From epeter at openjdk.org Wed Nov 26 08:01:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 08:01:52 GMT Subject: RFR: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 07:22:41 GMT, Christian Hagedorn wrote: >> **Analysis** >> >> This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. >> >> The aliasing runtime check happens before the pre-loop. The values needed for the aliasing runtime check thus need to be available not just at the pre-loop, but even earlier: already at the aliasing check. >> >> Sadly, so far we only ever checked for `is_pre_loop_invariant`, and not `is_available_for_speculative_check`. We now found an example with the fuzzer that has a `pre_init` value that is pinned after the aliasing runtime check but before the pre-loop. Thus it passed the checks, and then created a bad graph (cyclic path, think "use before definition"). >> >> This a very rare case. Getting the ctrl pinned after the aliasing runtime check but before the pre-loop requires some very specific order of loop-opts, of unroll/pre-main-post/peeling etc. But it can happen, and so we must handle it right. >> >> Solution: check for the stronger condition `is_available_for_speculative_check`. >> >> **Future Work** >> >> We should improve the debug printing when aliasing checks cannot be inserted. Currently the tag `SW_REJECTIONS` is a bit messy, we should fix that up. But it would be too many changes for this bug fix here. >> >> **Details** >> >> During `SuperWord`, we want to insert the aliasing runtime check above `687 ParsePredicate` which is annotated with `#Auto_Vectorization_Check`. For this, we require the `pre_init` value: `1244 AddI`. Sadly, this value is pinned lower down. >> image >> >> So when we insert the aliasing runtime check, we create a bad (circular) graph: >> image > > src/hotspot/share/opto/vectorization.hpp line 281: > >> 279: // but the early ctrl is before the predicate. >> 280: Node* n_early = phase()->compute_early_ctrl(n, n_ctrl); >> 281: return phase()->is_dominator(n_early, check_ctrl); > > Here, ctrl is too far down but we have a valid early ctrl. Where do update ctrl (i.e. `set_ctrl()`) of `n` to a valid ctrl when you use it for the speculative check? I suppose I don't. Do you think I have to? What about all the inputs of `n`? Those may also have a ctrl that is too low now. So if you want consistency we would have to fix up everything up the chain... :/ But I think this issue is not limited to the new `is_available_for_speculative_check`, but already existed for the much older `is_pre_loop_invariant`, which also uses `compute_early_ctrl`. So the problem is a little bigger, if it is really a problem at all - it may well be a problem. Suggestion: - We leave the fix as is. It is at least a step in the right direction. - We enhance our loop-opts verification, and verify ctrl after SuperWord. Then we will discover that a lot of the ctrl is set inaccurately, and fix it. That way we can also have confidence that the fix is correct. If we did the fix now, without verification, this would just be a blind stab in the dark kind of fix ;) What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28449#discussion_r2563811678 From epeter at openjdk.org Wed Nov 26 08:01:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 08:01:53 GMT Subject: RFR: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 07:58:18 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorization.hpp line 281: >> >>> 279: // but the early ctrl is before the predicate. >>> 280: Node* n_early = phase()->compute_early_ctrl(n, n_ctrl); >>> 281: return phase()->is_dominator(n_early, check_ctrl); >> >> Here, ctrl is too far down but we have a valid early ctrl. Where do update ctrl (i.e. `set_ctrl()`) of `n` to a valid ctrl when you use it for the speculative check? > > I suppose I don't. Do you think I have to? What about all the inputs of `n`? Those may also have a ctrl that is too low now. So if you want consistency we would have to fix up everything up the chain... :/ > > But I think this issue is not limited to the new `is_available_for_speculative_check`, but already existed for the much older `is_pre_loop_invariant`, which also uses `compute_early_ctrl`. So the problem is a little bigger, if it is really a problem at all - it may well be a problem. > > Suggestion: > - We leave the fix as is. It is at least a step in the right direction. > - We enhance our loop-opts verification, and verify ctrl after SuperWord. Then we will discover that a lot of the ctrl is set inaccurately, and fix it. That way we can also have confidence that the fix is correct. If we did the fix now, without verification, this would just be a blind stab in the dark kind of fix ;) > > What do you think? It would be continued work from this RFE: https://bugs.openjdk.org/browse/JDK-8307982 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28449#discussion_r2563814576 From dfenacci at openjdk.org Wed Nov 26 08:30:55 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 26 Nov 2025 08:30:55 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Additionally testing `TestIRMatching.java` and `TestPhaseIRMatching.java` on arm, aarch64, ppc64le, ppc64be s390, and riscv64 (thanks to @mhaessig for taking care of that!) > > Thanks, > Christian Thanks for fixing these regexes and checks @chhagedorn. Looks good to me. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28495#pullrequestreview-3509654230 From shade at openjdk.org Wed Nov 26 08:35:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 08:35:13 GMT Subject: RFR: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: <8yEkvkT-OLL1Z19vcobdiKc7A2zH7bJUVLP7u2kds8w=.d9bc0fb4-f89a-48e1-91f3-e68e30319482@github.com> On Thu, 20 Nov 2025 16:48:01 GMT, Aleksey Shipilev wrote: > As you can see in the report, current matchers rely heavily on mainline C2 implementation to match specific stores. This fails when we try to backport MergeStores fixes to 25u. It would be better to relax the matchers a bit to cater for 25u backports, and also making test more robust for future MergeStores changes, if any. > > Additional testing: > - [x] Linux x86_64 server fastdebug, mainline, affected test, 100x > - [x] Linux AArch64 server fastdebug, mainline, affected test, 100x > - [x] Linux x86_64 server fastdebug, jdk25u, affected test, 100x Found the missing backport in 25u that makes the test mismatch, see the comments in JIRA. So all this is actually moot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28437#issuecomment-3580109287 From shade at openjdk.org Wed Nov 26 08:35:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 08:35:14 GMT Subject: Withdrawn: 8372266: Relax store matchers in compiler/escapeAnalysis/TestRematerializeObjects.java test In-Reply-To: References: Message-ID: <5oVcrxcjAOWCDWEHCq7NWMYHKPNNBlXwX7Qbx_yl4_s=.bb9a16d9-f505-4da7-b498-cf0445f213a2@github.com> On Thu, 20 Nov 2025 16:48:01 GMT, Aleksey Shipilev wrote: > As you can see in the report, current matchers rely heavily on mainline C2 implementation to match specific stores. This fails when we try to backport MergeStores fixes to 25u. It would be better to relax the matchers a bit to cater for 25u backports, and also making test more robust for future MergeStores changes, if any. > > Additional testing: > - [x] Linux x86_64 server fastdebug, mainline, affected test, 100x > - [x] Linux AArch64 server fastdebug, mainline, affected test, 100x > - [x] Linux x86_64 server fastdebug, jdk25u, affected test, 100x This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/28437 From duke at openjdk.org Wed Nov 26 08:36:09 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 26 Nov 2025 08:36:09 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v8] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 13:20:26 GMT, Roland Westrelin wrote: >> Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - fix assert >> - add more assert >> - rid of access.addr().type() >> - Merge branch 'openjdk:master' into 8344116 >> - Merge branch 'openjdk:master' into 8344116 >> - Merge branch 'openjdk:master' into 8344116 >> - Fix build >> - Fix test failed >> - 8344116: C2: remove slice parameter from LoadNode::make > > Can we remove `C2AccessValuePtr` entirely and use: > > Node* _addr; > > where, currently, there's: > > C2AccessValuePtr& _addr; > > ? Hi @rwestrel , I removed C2AccessValuePtr, Could you please take a look, thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24258#issuecomment-3580115736 From duke at openjdk.org Wed Nov 26 08:36:12 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 26 Nov 2025 08:36:12 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v8] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 16:30:09 GMT, Zihao Lin wrote: >> src/hotspot/share/opto/callnode.cpp line 1740: >> >>> 1738: Node* klass_node = in(AllocateNode::KlassNode); >>> 1739: Node* proto_adr = phase->transform(new AddPNode(klass_node, klass_node, phase->MakeConX(in_bytes(Klass::prototype_header_offset())))); >>> 1740: mark_node = LoadNode::make(*phase, control, mem, proto_adr, TypeX_X, TypeX_X->basic_type(), MemNode::unordered); >> >> We could assert that C->get_alias_index(kit->type(card_adr) == Compile::AliasIdxRaw > > Hi, I give it a try, but it failed pass the test. Is it possible the original version is wrong? > The mark word will not be `TypeRawPtr::BOTTOM`, it should equal to Klass slice index. One dump is ` 1368 AddP === _ 196 196 1367 [[ ]] Klass:precise java/util/LinkedHashMap$Entry: 0x0000000918349ca0 (java/util/Map$Entry):Constant:exact+168 *` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2563948581 From roland at openjdk.org Wed Nov 26 08:38:05 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Nov 2025 08:38:05 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 08:30:18 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: >> >> - review >> - infinite loop in gvn fix >> - renaming > > @rwestrel Sorry I dropped the review on this one for a long time :/ > > I left quite a few comments. But on the whole I'm really happy with the direction you are taking. It's getting much clearer. I would still see some more clear explanations/comments. That way, we can make our previously implicit assumptions even more explicit :) @eme64 updated change should address your comments ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3580124357 From mdoerr at openjdk.org Wed Nov 26 09:25:51 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Nov 2025 09:25:51 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v6] In-Reply-To: <_BiA3wQ_PuxbuapWJg0uG2PSv0_0AAPOmznFOTH4hcU=.08997b37-2cde-417f-891a-779bd7291b1f@github.com> References: <_BiA3wQ_PuxbuapWJg0uG2PSv0_0AAPOmznFOTH4hcU=.08997b37-2cde-417f-891a-779bd7291b1f@github.com> Message-ID: On Tue, 25 Nov 2025 09:25:25 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix missing whitespace. Thanks a lot for benchmarking and for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3580390542 From duke at openjdk.org Wed Nov 26 09:28:41 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 26 Nov 2025 09:28:41 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v8] In-Reply-To: References: Message-ID: > If nodes both are constant, support constant folding. Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge branch 'master' into JDK-8370196 - Fix - Fix - Apply suggestion from @eme64 Co-authored-by: Emanuel Peter - Add Math to Operations.java - Add tests - Merge branch 'master' into JDK-8370196 - test - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Andrew Haley - C2: Improve (U)MulHiLNode::MulHiValue ------------- Changes: https://git.openjdk.org/jdk/pull/28097/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=07 Stats: 399 lines in 8 files changed: 363 ins; 17 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From snatarajan at openjdk.org Wed Nov 26 09:30:18 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 26 Nov 2025 09:30:18 GMT Subject: Integrated: 8349835: C2: Simplify IGV property printing In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com> Message-ID: On Fri, 22 Aug 2025 13:28:22 GMT, Saranya Natarajan wrote: > The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708). > > ### Fix > Implemented the suggested refactoring. > > ### Testing > Github Actions, Tier 1-3 This pull request has now been integrated. Changeset: 5fe731d5 Author: Saranya Natarajan URL: https://git.openjdk.org/jdk/commit/5fe731d55a54ace42de4a15d612dba310de9d977 Stats: 207 lines in 2 files changed: 91 ins; 109 del; 7 mod 8349835: C2: Simplify IGV property printing Reviewed-by: rcastanedalo, dfenacci, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/26902 From chagedorn at openjdk.org Wed Nov 26 10:24:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 10:24:01 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Additionally testing `TestIRMatching.java` and `TestPhaseIRMatching.java` on arm, aarch64, ppc64le, ppc64be s390, and riscv64 (thanks to @mhaessig for taking care of that!) > > Thanks, > Christian Thanks Damon for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3580627883 From shade at openjdk.org Wed Nov 26 10:37:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 10:37:03 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 19:48:21 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Drop atomic counters >> - Initial version > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4807: > >> 4805: >> 4806: Register offset = rscratch1; >> 4807: assert_different_registers(mdp, recv, offset); > > We also have `rscratch2` which we can use for registers shuffling in the following code. Unfortunately not. I remember trying that and that did not work. I just added `rscratch2` here, and it immediately failed on this path: Stack: [0x000078f046f00000,0x000078f047000000], sp=0x000078f046ffd350, free space=1012k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x9697f9] void assert_different_registers_impl(char const*, int, Register, Register, Register, Register)+0xc9 (macroAssembler_x86.cpp:4799) V [libjvm.so+0x1676bbd] MacroAssembler::type_profile(Register, Register, int)+0xad (macroAssembler_x86.cpp:4799) V [libjvm.so+0x962da7] LIR_Assembler::emit_opTypeCheck(LIR_OpTypeCheck*)+0x757 (c1_LIRAssembler_x86.cpp:1266) V [libjvm.so+0x941a8c] LIR_OpTypeCheck::emit_code(LIR_Assembler*)+0x1c (c1_LIR.cpp:1023) V [libjvm.so+0x95157e] LIR_Assembler::emit_lir_list(LIR_List*)+0xde (c1_LIRAssembler.cpp:301) V [libjvm.so+0x951e66] LIR_Assembler::emit_code(BlockList*)+0xf6 (c1_LIRAssembler.cpp:266) V [libjvm.so+0x8faef9] Compilation::emit_code_body()+0x189 (c1_Compilation.cpp:348) V [libjvm.so+0x8fb41d] Compilation::compile_java_method()+0x3bd (c1_Compilation.cpp:409) V [libjvm.so+0x8fbbee] Compilation::compile_method()+0x25e (c1_Compilation.cpp:471) V [libjvm.so+0x8fc31f] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*)+0x35f (c1_Compilation.cpp:600) V [libjvm.so+0x8fdd7a] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x20a (c1_Compiler.cpp:263) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2564452657 From dfenacci at openjdk.org Wed Nov 26 10:38:03 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 26 Nov 2025 10:38:03 GMT Subject: RFR: 8371419: IGV: Add view to visualise dominator tree and dominator information In-Reply-To: References: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> Message-ID: <7xoQR5DqhdPLxaEdzbqxBxSgGGX4dTQvADPBPfdPSpo=.a3c2ad6d-452e-463e-ad6c-6db8a42f3790@github.com> On Mon, 17 Nov 2025 14:51:07 GMT, Daniel Lund?n wrote: > I think that the dominator tree view should be a separate "view" (just right of the CFG view button) instead of a "mode" as you suggest. I think it might make sense too @dlunde. First I used the "mode" of the CFG view as they had few "commonalities" (they are relevant int the same phases, they both have blocks and edges between blocks, etc.) but they are in fact 2 different graphs. > do not forget to extend the combo box in the Options window with the option to select the dominator tree view by default. I'm not sure I understand what you mean @robcasloz: if there is a separate view for the dominator tree, there is no need for a dominator tree option. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28293#issuecomment-3580681874 From shade at openjdk.org Wed Nov 26 10:41:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 10:41:00 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 20:47:07 GMT, John R Rose wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Drop atomic counters >> - Initial version > > src/hotspot/cpu/x86/interp_masm_x86.cpp line 524: > >> 522: LP64_ONLY(assert(Rsub_klass != r13, "r13 holds bcp");) >> 523: assert(Rsub_klass != rcx, "rcx holds 2ndary super array length"); >> 524: assert(Rsub_klass != rdi, "rdi holds 2ndary super array scan ptr"); > > I think you can kill this assert as well; rdi is no longer relevant to this function. Right. Killed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2564466483 From shade at openjdk.org Wed Nov 26 10:41:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 10:41:03 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 19:25:08 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Drop atomic counters >> - Initial version > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4818: > >> 4816: addptr(offset, receiver_step); >> 4817: cmpptr(offset, end_receiver_offset); >> 4818: jccb(Assembler::notEqual, L_loop); > > Fix indention since these instructions also in the loop. I prefer to keep these at this indentation level: this is loop infrastructure. Pretty much like I would write the post-condition: do { ... } while ((offset += receiver_step) != end_receiver_offset); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2564461790 From roland at openjdk.org Wed Nov 26 10:54:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Nov 2025 10:54:46 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: <0df3H15uO96P1n3zLpKl5y_RKrAgc1h_V91bGB5mCr8=.06942d05-f66d-442f-a754-8135ac0eec30@github.com> Message-ID: On Tue, 25 Nov 2025 17:46:28 GMT, Quan Anh Mai wrote: >> Is this issue at all related to https://github.com/openjdk/jdk/pull/24575? >> >> It seems we remove a `CastLL` from the graph, because the input type is wider than the Cast's type, right? >> >> If I remember correctly from https://github.com/openjdk/jdk/pull/24575, if a CastLL is narrowing, we don't want to remove it, see `ConstraintCastNode::Identity`. >> >> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? > > @eme64 Yes, it is indeed similar. The issue here is that after loop opts, we try to remove almost all `CastNode`s so that the graph can be GVN-ed better (think of `x = a + b` and `y = cast(a) + b`). > >> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? > > Macro expansion tries to be smart for an array copy and does this: > > byte[] dst; > byte[] src; > int len; > if (len <= 32) { > int casted_len = cast(len, 0, 32); > vectormask mask = VectorMaskGen(casted_len); > vector v = LoadVectorMasked(src, 0, mask); > StoreVectorMasked(dst, 0, v, mask); > } else { > // do the copy normally; > } > > As you can see, the masked accesses are only meaningful if `len <= 32`. But after loop opts, the cast is gone, leaving us with a len which happens to be larger than `32`. The path should be dead, but IGVN reaches the `LoadVectorMaskedNode` first, which triggers the assert. > @merykitty Thanks for the explanations! So the `CastLL` is a narrowing cast, right? And `ConstraintCastNode::Identity` removes it, because the input type is wider, right? To me this part sounds incorrect. Narrowing casts should only be removed if the input is already narrower. No? But the type of the CastLL is widened after loop opts, right? So it's similar to https://github.com/openjdk/jdk/pull/24575 but with a constant input to the cast. That's a case that #24575 doesn't address (it doesn't prevent constant folding of a cast) and can cause issues. See https://github.com/openjdk/jdk/pull/24575#issuecomment-3356091219 I intend to create a follow up to 24575 that will address the remaining issues in a way that's similar to what @merykitty proposes here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3580754379 From epeter at openjdk.org Wed Nov 26 11:18:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 11:18:50 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: <0df3H15uO96P1n3zLpKl5y_RKrAgc1h_V91bGB5mCr8=.06942d05-f66d-442f-a754-8135ac0eec30@github.com> Message-ID: On Wed, 26 Nov 2025 10:52:14 GMT, Roland Westrelin wrote: >> @eme64 Yes, it is indeed similar. The issue here is that after loop opts, we try to remove almost all `CastNode`s so that the graph can be GVN-ed better (think of `x = a + b` and `y = cast(a) + b`). >> >>> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? >> >> Macro expansion tries to be smart for an array copy and does this: >> >> byte[] dst; >> byte[] src; >> int len; >> if (len <= 32) { >> int casted_len = cast(len, 0, 32); >> vectormask mask = VectorMaskGen(casted_len); >> vector v = LoadVectorMasked(src, 0, mask); >> StoreVectorMasked(dst, 0, v, mask); >> } else { >> // do the copy normally; >> } >> >> As you can see, the masked accesses are only meaningful if `len <= 32`. But after loop opts, the cast is gone, leaving us with a len which happens to be larger than `32`. The path should be dead, but IGVN reaches the `LoadVectorMaskedNode` first, which triggers the assert. > >> @merykitty Thanks for the explanations! So the `CastLL` is a narrowing cast, right? And `ConstraintCastNode::Identity` removes it, because the input type is wider, right? To me this part sounds incorrect. Narrowing casts should only be removed if the input is already narrower. No? > > But the type of the CastLL is widened after loop opts, right? > So it's similar to https://github.com/openjdk/jdk/pull/24575 but with a constant input to the cast. That's a case that #24575 doesn't address (it doesn't prevent constant folding of a cast) and can cause issues. See https://github.com/openjdk/jdk/pull/24575#issuecomment-3356091219 > I intend to create a follow up to 24575 that will address the remaining issues in a way that's similar to what @merykitty proposes here. @rwestrel Ok, thanks for the clarifying details. That makes sense. I missed the widening after loop-opts: before the constant input lay outside the range, now it is inside and so the `CastLL` is folded, replaced with the (wrong) constant rather than top. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3580831773 From epeter at openjdk.org Wed Nov 26 11:22:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 11:22:54 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: <0df3H15uO96P1n3zLpKl5y_RKrAgc1h_V91bGB5mCr8=.06942d05-f66d-442f-a754-8135ac0eec30@github.com> Message-ID: On Wed, 26 Nov 2025 10:52:14 GMT, Roland Westrelin wrote: >> @eme64 Yes, it is indeed similar. The issue here is that after loop opts, we try to remove almost all `CastNode`s so that the graph can be GVN-ed better (think of `x = a + b` and `y = cast(a) + b`). >> >>> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? >> >> Macro expansion tries to be smart for an array copy and does this: >> >> byte[] dst; >> byte[] src; >> int len; >> if (len <= 32) { >> int casted_len = cast(len, 0, 32); >> vectormask mask = VectorMaskGen(casted_len); >> vector v = LoadVectorMasked(src, 0, mask); >> StoreVectorMasked(dst, 0, v, mask); >> } else { >> // do the copy normally; >> } >> >> As you can see, the masked accesses are only meaningful if `len <= 32`. But after loop opts, the cast is gone, leaving us with a len which happens to be larger than `32`. The path should be dead, but IGVN reaches the `LoadVectorMaskedNode` first, which triggers the assert. > >> @merykitty Thanks for the explanations! So the `CastLL` is a narrowing cast, right? And `ConstraintCastNode::Identity` removes it, because the input type is wider, right? To me this part sounds incorrect. Narrowing casts should only be removed if the input is already narrower. No? > > But the type of the CastLL is widened after loop opts, right? > So it's similar to https://github.com/openjdk/jdk/pull/24575 but with a constant input to the cast. That's a case that #24575 doesn't address (it doesn't prevent constant folding of a cast) and can cause issues. See https://github.com/openjdk/jdk/pull/24575#issuecomment-3356091219 > I intend to create a follow up to 24575 that will address the remaining issues in a way that's similar to what @merykitty proposes here. @rwestrel Is there any conflict with your solution? If not, we can go ahead with @merykitty 's solution here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3580846973 From snatarajan at openjdk.org Wed Nov 26 11:24:19 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 26 Nov 2025 11:24:19 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness [v2] In-Reply-To: References: Message-ID: > **Issue:** Some compiler tests uses randomization but does not have `@key randomness` in the jtreg header. > > **Fix:** The list of test cases that did not have `@key randomness` were listed using `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"`. This PR adds `@key randomness` to these tests. > > **Note:** The following tests that are still listed with `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"` after this PR are confirmed to be helper or support file for actual test. > _test/hotspot/jtreg/compiler/codegen/aes/TestAESBase.java > test/hotspot/jtreg/compiler/compilercontrol/jcmd/StressAddJcmdBase.java > test/hotspot/jtreg/compiler/compilercontrol/parser/HugeDirectiveUtil.java > test/hotspot/jtreg/compiler/compilercontrol/share/scenario/CommandGenerator.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/ArgumentValue.java > test/hotspot/jtreg/compiler/lib/ir_framework/AbstractInfo.java > test/hotspot/jtreg/compiler/lib/ir_framework/CompLevel.java > test/hotspot/jtreg/compiler/lib/generators/Generators.java > test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java > test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java > test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java > test/hotspot/jtreg/compiler/intrinsics/mathexact/Verify.java > test/hotspot/jtreg/compiler/intrinsics/bmi/BMITestRunner.java > test/hotspot/jtreg/compiler/intrinsics/unsafe/ByteBufferTest.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressBooleanArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressIntArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressLongArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressCharArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressObjectArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressByteArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressFloatArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressShortArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressDoubleArrayCopy.java > test/hotspot/jtreg/compiler/codecache/cli/codeheapsize/JVMStartupRunner.java > test/hotspot/jtreg/compiler/vectorapi/reshape/utils/VectorReshapeHelper.java > test/hotspot/jtreg/compiler/jvmci/compilerToVM/DummyClass.java_ Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review comments - removing space and javadoc style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28463/files - new: https://git.openjdk.org/jdk/pull/28463/files/fe6403d0..1e38555a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28463&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28463&range=00-01 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28463.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28463/head:pull/28463 PR: https://git.openjdk.org/jdk/pull/28463 From snatarajan at openjdk.org Wed Nov 26 11:24:21 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 26 Nov 2025 11:24:21 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness [v2] In-Reply-To: References: Message-ID: <5r88siWBHsqHY-Ey7e0W4ZMrKdswjcLNZgJicegxrvo=.16a16b47-4f7d-4cc7-baed-69662f5d1204@github.com> On Mon, 24 Nov 2025 07:32:12 GMT, Damon Fenacci wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> addressing review comments - removing space and javadoc style > > test/hotspot/jtreg/compiler/vectorapi/Test8278948.java line 33: > >> 31: import jdk.test.lib.Utils; >> 32: >> 33: /** > > Do we need javadoc style comments for JTreg? (we don't seem to be too consistent in our tests) That is true. There is no need for javadoc style comments (see [The JDK Test Framework: Tag Language Specification](https://openjdk.org/jtreg/tag-spec.html)). I have fixed this file. You are also right about we not being consistent with the tests. I have left the rest of the test (in this changeset) with javadoc style as it is. Do let me know if I should change them ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28463#discussion_r2564592720 From epeter at openjdk.org Wed Nov 26 11:29:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 11:29:48 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness [v2] In-Reply-To: <5r88siWBHsqHY-Ey7e0W4ZMrKdswjcLNZgJicegxrvo=.16a16b47-4f7d-4cc7-baed-69662f5d1204@github.com> References: <5r88siWBHsqHY-Ey7e0W4ZMrKdswjcLNZgJicegxrvo=.16a16b47-4f7d-4cc7-baed-69662f5d1204@github.com> Message-ID: On Wed, 26 Nov 2025 11:20:40 GMT, Saranya Natarajan wrote: >> test/hotspot/jtreg/compiler/vectorapi/Test8278948.java line 33: >> >>> 31: import jdk.test.lib.Utils; >>> 32: >>> 33: /** >> >> Do we need javadoc style comments for JTreg? (we don't seem to be too consistent in our tests) > > That is true. There is no need for javadoc style comments (see [The JDK Test Framework: Tag Language Specification](https://openjdk.org/jtreg/tag-spec.html)). I have fixed this file. > > You are also right about we not being consistent with the tests. I have left the rest of the test (in this changeset) with javadoc style as it is. Do let me know if I should change them ? Personally, I would not worry too much about comment style. Well actually: are we sure that both get executed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28463#discussion_r2564612087 From chagedorn at openjdk.org Wed Nov 26 11:33:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 11:33:49 GMT Subject: RFR: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 07:59:06 GMT, Emanuel Peter wrote: >> I suppose I don't. Do you think I have to? What about all the inputs of `n`? Those may also have a ctrl that is too low now. So if you want consistency we would have to fix up everything up the chain... :/ >> >> But I think this issue is not limited to the new `is_available_for_speculative_check`, but already existed for the much older `is_pre_loop_invariant`, which also uses `compute_early_ctrl`. So the problem is a little bigger, if it is really a problem at all - it may well be a problem. >> >> Suggestion: >> - We leave the fix as is. It is at least a step in the right direction. >> - We enhance our loop-opts verification, and verify ctrl after SuperWord. Then we will discover that a lot of the ctrl is set inaccurately, and fix it. That way we can also have confidence that the fix is correct. If we did the fix now, without verification, this would just be a blind stab in the dark kind of fix ;) >> >> What do you think? > > It would be continued work from this RFE: https://bugs.openjdk.org/browse/JDK-8307982 > I suppose I don't. Do you think I have to? What about all the inputs of n? Those may also have a ctrl that is too low now. So if you want consistency we would have to fix up everything up the chain... :/ That's the big question. It's probably hard to say if it's necessary or not but also hard to say if someone is actually relying on it. I would say: If ctrl is not optimal but legal, it does not matter that much. If it's illegal, we should probably fix it to avoid such wrong uses further down the line. > But I think this issue is not limited to the new is_available_for_speculative_check, but already existed for the much older is_pre_loop_invariant, which also uses compute_early_ctrl. So the problem is a little bigger, if it is really a problem at all - it may well be a problem. Right, that's my feeling as well that more places are off. As long as they are just not optimal but legal, we probably do not need to worry about them. > * We enhance our loop-opts verification, and verify ctrl after SuperWord. Then we will discover that a lot of the ctrl is set inaccurately, and fix it. That's a good idea for the future. Not sure how easy it will be to keep it as accurate as possible. > That way we can also have confidence that the fix is correct. If we did the fix now, without verification, this would just be a blind stab in the dark kind of fix ;) I'm concerned here that someone relies on the illegal ctrl later even though chances are probably low. But what do you think about just seting ctrl to the just computed early ctrl (if we are sure we are going to create the speculative check)? That would be an easy fix (might not be optimal but at least legal). But I agree that we should probably tackle this ctrl verification in general at some point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28449#discussion_r2564621610 From epeter at openjdk.org Wed Nov 26 11:33:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 11:33:49 GMT Subject: RFR: 8367487: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on Linux aarch64 with Cavium CPU In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 14:24:19 GMT, David Briemann wrote: > Fix by requiring both floating-point half-precision fphp advanced SIMD half-precision (asimdhp) on aarch64. Tests pass, changes look good. @dbriemann thanks for fixing this! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28390#pullrequestreview-3510476286 From jbhateja at openjdk.org Wed Nov 26 11:34:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 26 Nov 2025 11:34:11 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v5] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28002/files - new: https://git.openjdk.org/jdk/pull/28002/files/aca6cc5d..756a0d0c Webrevs: - full: Webrev is not available because diff is too large - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=03-04 Stats: 26 lines in 9 files changed: 5 ins; 7 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From epeter at openjdk.org Wed Nov 26 11:42:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 11:42:46 GMT Subject: RFR: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:30:13 GMT, Christian Hagedorn wrote: >> It would be continued work from this RFE: https://bugs.openjdk.org/browse/JDK-8307982 > >> I suppose I don't. Do you think I have to? What about all the inputs of n? Those may also have a ctrl that is too low now. So if you want consistency we would have to fix up everything up the chain... :/ > > That's the big question. It's probably hard to say if it's necessary or not but also hard to say if someone is actually relying on it. I would say: If ctrl is not optimal but legal, it does not matter that much. If it's illegal, we should probably fix it to avoid such wrong uses further down the line. > >> But I think this issue is not limited to the new is_available_for_speculative_check, but already existed for the much older is_pre_loop_invariant, which also uses compute_early_ctrl. So the problem is a little bigger, if it is really a problem at all - it may well be a problem. > > Right, that's my feeling as well that more places are off. As long as they are just not optimal but legal, we probably do not need to worry about them. > >> * We enhance our loop-opts verification, and verify ctrl after SuperWord. Then we will discover that a lot of the ctrl is set inaccurately, and fix it. > > That's a good idea for the future. Not sure how easy it will be to keep it as accurate as possible. > >> That way we can also have confidence that the fix is correct. If we did the fix now, without verification, this would just be a blind stab in the dark kind of fix ;) > > I'm concerned here that someone relies on the illegal ctrl later even though chances are probably low. But what do you think about just seting ctrl to the just computed early ctrl (if we are sure we are going to create the speculative check)? That would be an easy fix (might not be optimal but at least legal). But I agree that we should probably tackle this ctrl verification in general at some point. Right. The ctrl can either be legal (between the early and lates point), or outside it (illegal). I fear that we do have some illegal ctrl now: Imagine we use some node `n1(n2(n3(...)))`, where all of `n1-3` have ctrl between pre-loop and predicate, but their early is before the predicate. If we now use `n1` at the predicate and update its ctrl to early, then we would actually have to update the ctrl of `n2` and `n3` between early and predicate too, right? The whole expression would have to be moved, otherwise we have inputs that have a later ctrl than its outputs, and that is not right. The same is true for `is_pre_loop_invariant` with ctrl between inside pre-loop and before pre-loop. I'm very happy to file even a bug for this. But I'd prefer not to have to do it in this same issue here. Is that ok? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28449#discussion_r2564651706 From rcastanedalo at openjdk.org Wed Nov 26 11:46:50 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 26 Nov 2025 11:46:50 GMT Subject: RFR: 8371419: IGV: Add view to visualise dominator tree and dominator information In-Reply-To: References: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> Message-ID: On Wed, 19 Nov 2025 09:58:59 GMT, Roberto Casta?eda Lozano wrote: >> This change introduces a dominator tree view in IGV?s CFG panel, enabling users to toggle between the control flow graph and the dominator tree. This makes dominator relationships easier to inspect than the current stdout-based output (`-XX:+PrintDominators`). >> >> ## Motivation >> * Today, dominator information is difficult to access (e.g. via `-XX:+PrintDominators`, which is hard to read and correlate with the graph). >> * IGV already computes dominators for some phases but does not visualize them. >> * Comparing dominator trees across graphs/phases was not supported. >> >> ## What?s New >> 1. Toggle in the CFG view (toolbar button (image) to switch between: >> * Control Flow Graph (CFG) >> * Dominator Tree >> 2. Dominator edge coloring to indicate provenance: >> * Blue: dominator info provided by C2 (from GCM phase onward for now, a follow RFE will handle loop optimization dominator information) >> * Red: dominator info computed by IGV (pre-GCM) >> 3. Graph comparison enhancements: >> * Compare dominator trees between graphs (new) >> * Compare CFG differences between graphs (previously missing) >> 4. Node annotations: >> * `idom`: immediate dominator >> * `dom_depth`: dominator depth >> * `block`: numeric block ID for all nodes in a block >> >> The resulting main view looks like this: >> Screenshot 2025-11-13 at 15 04 12 >> >> ## Testing >> * Tier 1-3 >> * Manual testing in IGV > > Thank you for this work Damon, this looks very useful! I have a few high-level comments: > > - I agree with @dlunde's comment, as a user I think the dominator tree should be a separate view and not a "mode" of the CFG view. If you do that, please do not forget to extend the combo box in the Options window with the option to select the dominator tree view by default. > - Would it be possible to avoid dumping dominator information as node properties, to reduce the size of the graph dump? The block property information that you already dump should be enough for your purposes, no? If you want to show dominator information as node properties, you can instead propagate the information from blocks to their nodes in `ServerCompilerPreProcessor::preProcess()`, similarly to how it is done for liveness information. > - I like the idea of distinguishing visually when control-flow information originates from HotSpot and when it is approximated by IGV, currently we just rely on the user implicitly knowing this, which is confusing and error-prone. However, there is an issue with your proposal: once the graph is saved into a file (from IGV) the information is lost, and when the graph is re-opened all dominator trees are shown as originating from HotSpot (blue edges). If we want to do this, I think we need to explicitly reflect in the serialized XML format whether control-flow information is approximated or not. Further, the representation of HotSpot/IGV origin should be consistent between the CFG and dominator tree views. In short, I think this is a great and much-needed IGV feature, but one that would require substantial work to get right, so my suggestion would be leaving it out of the scope of this RFE and creating a separate RFE just for it. What do you think? > I'm not sure I understand what you mean @robcasloz: if there is a separate view for the dominator tree, there is no need for a dominator tree option. I'm referring to the option to select a default view (`Default View`) when a graph is opened in the `Options` window, see screenshot below. The list of options will have to be extended so that the new dominator tree view can be chosen as well (unlikely choice perhaps, but it's good to list all of the views for completeness). options ------------- PR Comment: https://git.openjdk.org/jdk/pull/28293#issuecomment-3580934230 From dfenacci at openjdk.org Wed Nov 26 11:46:51 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 26 Nov 2025 11:46:51 GMT Subject: RFR: 8371419: IGV: Add view to visualise dominator tree and dominator information In-Reply-To: References: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> Message-ID: <0BPtgjs-kKjBWNbRzq97v3dUQeRfdi63ucdmh9GirxQ=.1a37fafa-6d79-45f5-b96f-72280eb112ee@github.com> On Wed, 19 Nov 2025 09:58:59 GMT, Roberto Casta?eda Lozano wrote: > Would it be possible to avoid dumping dominator information as node properties, to reduce the size of the graph dump? The idea I had in mind was to dump all the information we compute in C2 about dominators/blocks and the information in nodes and blocks come from different sources (and didn't think 2 small fields per node would make a lot of difference ?). When the dominator information comes from C2 the block information in the node properties comes from [`_node_to_block_mapping` ](https://github.com/openjdk/jdk/blob/2347e9a4e14eb14700415c58130885a7d06522d5/src/hotspot/share/opto/block.hpp#L403) whereas the block one directly [from the blocks](https://github.com/openjdk/jdk/blob/2347e9a4e14eb14700415c58130885a7d06522d5/src/hotspot/share/opto/block.hpp#L118) (and might not be yet in sync while computing GCM as I found out: image > I like the idea of distinguishing visually when control-flow information originates from HotSpot and when it is approximated by IGV, ... once the graph is saved into a file (from IGV) the information is lost Oops, I didn't really think about this. > I think this is a great and much-needed IGV feature, but one that would require substantial work to get right, so my suggestion would be leaving it out of the scope of this RFE and creating a separate RFE just for it. What do you think? Yes, it is probably better to leave it for now. I'll file a new RFE. Thanks @robcasloz. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28293#issuecomment-3580934752 From chagedorn at openjdk.org Wed Nov 26 11:53:19 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 11:53:19 GMT Subject: RFR: 8372585: TestVerifyLoopOptimizationsHitsMemLimit fails with product builds Message-ID: Updating `TestVerifyLoopOptimizationsHitsMemLimit.java` in [JDK-8360510](https://bugs.openjdk.org/browse/JDK-8360510) missed to add `-XX:+IgnoreUnrecognizedVMOptions` which now leads to a test failure with product builds. This is fixed with this patch Thanks, Christian ------------- Commit messages: - 8372585: TestVerifyLoopOptimizationsHitsMemLimit fails with product builds Changes: https://git.openjdk.org/jdk/pull/28504/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28504&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372585 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28504/head:pull/28504 PR: https://git.openjdk.org/jdk/pull/28504 From rcastanedalo at openjdk.org Wed Nov 26 11:54:50 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 26 Nov 2025 11:54:50 GMT Subject: RFR: 8371419: IGV: Add view to visualise dominator tree and dominator information In-Reply-To: <0BPtgjs-kKjBWNbRzq97v3dUQeRfdi63ucdmh9GirxQ=.1a37fafa-6d79-45f5-b96f-72280eb112ee@github.com> References: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> <0BPtgjs-kKjBWNbRzq97v3dUQeRfdi63ucdmh9GirxQ=.1a37fafa-6d79-45f5-b96f-72280eb112ee@github.com> Message-ID: On Wed, 26 Nov 2025 11:44:41 GMT, Damon Fenacci wrote: > The idea I had in mind was to dump all the information we compute in C2 about dominators/blocks and the information in nodes and blocks come from different sources (and didn't think 2 small fields per node would make a lot of difference ?). When the dominator information comes from C2 the block information in the node properties comes from [_node_to_block_mapping ](https://github.com/openjdk/jdk/blob/2347e9a4e14eb14700415c58130885a7d06522d5/src/hotspot/share/opto/block.hpp#L403)whereas the block one directly [from the blocks](https://github.com/openjdk/jdk/blob/2347e9a4e14eb14700415c58130885a7d06522d5/src/hotspot/share/opto/block.hpp#L118) (and might not be yet in sync while computing GCM as I found out: Fair enough, then I agree it makes sense to dump node-level domination information as well, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28293#issuecomment-3580958734 From shade at openjdk.org Wed Nov 26 11:59:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 11:59:04 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Sat, 22 Nov 2025 21:21:49 GMT, John R Rose wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Drop atomic counters >> - Initial version > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4760: > >> 4758: } >> 4759: >> 4760: void MacroAssembler::type_profile(Register recv, Register mdp, int mdp_offset) { > > The name chosen is subtly misleading. We have value (argument/parameter/return) profiling as well as receiver profiling. Since this particular macro-instruction is closely coupled to `ReceiverTypeData`, I suggest calling it `profile_receiver_type`, and documenting, up top, that it is precisely for collecting data into that structure. > > The name being replaced (`record_klass_in_profile_helper`) has the same problem. This is a historical artifact; the name was chosen before other sorts of type profiles were introduced. > > (And `profile_receiver_type` is surely better than `receiver_type_profile`, which is not a verb phrase.) > > Eventually we may wish to improve the other kinds of profiling, which have their own structures and representations. I thought for a while about what that might look like, and particularly if it factored into a different set of macro-instructions. Could we factor this proposed macro into a "find entry" part and an "increment counter" part? But no, it doesn't seem to pay off. There's benefit to preserving the jewel-like conciseness of the code pattern here. So I guess future work on other type profiles is mostly independent. > > But we do need a more specific name, that makes very clear the coupling to `ReceiverTypeData`. Even if the old code had that problem also. Putting it way out here in the macro-assembler makes such a problem worse, since the interpreter "knows about" MDOs, but the macro-assembler doesn't. > > I don't object to moving this down to the macro-assembler. It is no longer coupled to the interpreter, after the JIT learned the same trick. I think we should prepare ourselves, mentally, for similar moves with the other type profile mechanisms. > > I think the definition of `class ReceiverTypeData` should mention this macro. Otherwise we won't know where to look for updates (since it's no longer bundled with the interpreter). This macro is, in effect, a member of that class. (That's true of other MDO structures: Random assembly code is part of their APIs. The C++ code is very vague about how and where this happens. That's a problem for another time, I guess.) > > Another point. I would like to see pseudo-code that sketches what this complicated macro emits. (I was the author of the other pseudo-code deleted by this patch; I like that sort of thing.) I sugge... Renamed to `profile_receiver_type`, added some comments. My gripe with adding overly verbose comments outside the code is that they get desynced pretty often. So I opted to do a bit more generic version of the comments, and then inlined them near the code in question. > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4812: > >> 4810: >> 4811: // Optimistic: search for already set up receiver. >> 4812: movptr(offset, base_receiver_offset); > > I wondered about using REP-CMPSQ to search the receiver array. It would require reformatting the MDO to make the receiver klasses contiguous. The x86 manual ORM (August 2023) cheers me down: > >> Using a REP prefix with string move instructions can provide high performance in the situations described above. However, using a REP prefix with string scan instructions (SCASB, SCASW, SCASD, SCASQ) or compare instructions (CMPSB, CMPSW, SMPSD, SMPSQ) is not recommended for high performance. Consider using SIMD instructions instead. > > I still wonder if, at some point, it will be profitable to make the receivers contiguous so we can use SIMD instructions to search them. Probably not any time soon. I would say we cross that bridge when we come to it. I think it would only be useful if we bump `TypeProfileWidth` beyond `2` for C2 configurations. Otherwise, having a very dense loop looks more profitable. We shall also see whatever comes out of scalable compiler counters, before we do any other moves in this area. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2564696630 PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2564703946 From rcastanedalo at openjdk.org Wed Nov 26 11:59:46 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 26 Nov 2025 11:59:46 GMT Subject: RFR: 8372585: TestVerifyLoopOptimizationsHitsMemLimit fails with product builds In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:46:51 GMT, Christian Hagedorn wrote: > Updating `TestVerifyLoopOptimizationsHitsMemLimit.java` in [JDK-8360510](https://bugs.openjdk.org/browse/JDK-8360510) missed to add `-XX:+IgnoreUnrecognizedVMOptions` which now leads to a test failure with product builds. This is fixed with this patch > > Thanks, > Christian Trivial. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28504#pullrequestreview-3510573758 From thartmann at openjdk.org Wed Nov 26 12:18:47 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 26 Nov 2025 12:18:47 GMT Subject: RFR: 8372585: TestVerifyLoopOptimizationsHitsMemLimit fails with product builds In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:46:51 GMT, Christian Hagedorn wrote: > Updating `TestVerifyLoopOptimizationsHitsMemLimit.java` in [JDK-8360510](https://bugs.openjdk.org/browse/JDK-8360510) missed to add `-XX:+IgnoreUnrecognizedVMOptions` which now leads to a test failure with product builds. This is fixed with this patch > > Thanks, > Christian Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28504#pullrequestreview-3510647127 From chagedorn at openjdk.org Wed Nov 26 12:18:48 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 12:18:48 GMT Subject: RFR: 8372585: TestVerifyLoopOptimizationsHitsMemLimit fails with product builds In-Reply-To: References: Message-ID: <8S5RL9p9gLAz29iPrJJFv8Gj6BvY5FLm2jFF1m1tg5U=.e2cab7e7-8580-4d0c-9ce1-e7b74142c91c@github.com> On Wed, 26 Nov 2025 11:56:39 GMT, Roberto Casta?eda Lozano wrote: >> Updating `TestVerifyLoopOptimizationsHitsMemLimit.java` in [JDK-8360510](https://bugs.openjdk.org/browse/JDK-8360510) missed to add `-XX:+IgnoreUnrecognizedVMOptions` which now leads to a test failure with product builds. This is fixed with this patch >> >> Thanks, >> Christian > > Trivial. Thanks @robcasloz and @TobiHartmann for the quick reviews! I'm running some sanity testing and will integrate afterwards. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28504#issuecomment-3581046551 From roland at openjdk.org Wed Nov 26 12:40:53 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Nov 2025 12:40:53 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: <0df3H15uO96P1n3zLpKl5y_RKrAgc1h_V91bGB5mCr8=.06942d05-f66d-442f-a754-8135ac0eec30@github.com> Message-ID: On Wed, 26 Nov 2025 10:52:14 GMT, Roland Westrelin wrote: >> @eme64 Yes, it is indeed similar. The issue here is that after loop opts, we try to remove almost all `CastNode`s so that the graph can be GVN-ed better (think of `x = a + b` and `y = cast(a) + b`). >> >>> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? >> >> Macro expansion tries to be smart for an array copy and does this: >> >> byte[] dst; >> byte[] src; >> int len; >> if (len <= 32) { >> int casted_len = cast(len, 0, 32); >> vectormask mask = VectorMaskGen(casted_len); >> vector v = LoadVectorMasked(src, 0, mask); >> StoreVectorMasked(dst, 0, v, mask); >> } else { >> // do the copy normally; >> } >> >> As you can see, the masked accesses are only meaningful if `len <= 32`. But after loop opts, the cast is gone, leaving us with a len which happens to be larger than `32`. The path should be dead, but IGVN reaches the `LoadVectorMaskedNode` first, which triggers the assert. > >> @merykitty Thanks for the explanations! So the `CastLL` is a narrowing cast, right? And `ConstraintCastNode::Identity` removes it, because the input type is wider, right? To me this part sounds incorrect. Narrowing casts should only be removed if the input is already narrower. No? > > But the type of the CastLL is widened after loop opts, right? > So it's similar to https://github.com/openjdk/jdk/pull/24575 but with a constant input to the cast. That's a case that #24575 doesn't address (it doesn't prevent constant folding of a cast) and can cause issues. See https://github.com/openjdk/jdk/pull/24575#issuecomment-3356091219 > I intend to create a follow up to 24575 that will address the remaining issues in a way that's similar to what @merykitty proposes here. > @rwestrel Is there any conflict with your solution? If not, we can go ahead with @merykitty 's solution here. No, no conflict. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3581138856 From chagedorn at openjdk.org Wed Nov 26 13:29:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 13:29:00 GMT Subject: Integrated: 8372585: TestVerifyLoopOptimizationsHitsMemLimit fails with product builds In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:46:51 GMT, Christian Hagedorn wrote: > Updating `TestVerifyLoopOptimizationsHitsMemLimit.java` in [JDK-8360510](https://bugs.openjdk.org/browse/JDK-8360510) missed to add `-XX:+IgnoreUnrecognizedVMOptions` which now leads to a test failure with product builds. This is fixed with this patch > > Thanks, > Christian This pull request has now been integrated. Changeset: 74354f23 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/74354f23dbb0fc852d216c8f1d3e5f80d406cfc6 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8372585: TestVerifyLoopOptimizationsHitsMemLimit fails with product builds Reviewed-by: rcastanedalo, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/28504 From chagedorn at openjdk.org Wed Nov 26 13:28:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 13:28:59 GMT Subject: RFR: 8372585: TestVerifyLoopOptimizationsHitsMemLimit fails with product builds In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:46:51 GMT, Christian Hagedorn wrote: > Updating `TestVerifyLoopOptimizationsHitsMemLimit.java` in [JDK-8360510](https://bugs.openjdk.org/browse/JDK-8360510) missed to add `-XX:+IgnoreUnrecognizedVMOptions` which now leads to a test failure with product builds. This is fixed with this patch > > Thanks, > Christian Testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28504#issuecomment-3581318899 From shade at openjdk.org Wed Nov 26 13:49:29 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 13:49:29 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v4] In-Reply-To: References: Message-ID: > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Grossly simplify register shuffling - More asserts - More comment touchups - Inline code comments - Mention the updater in ReceiverTypeData - type_profile -> profile_receiver_type - Stylistic: remove redundant assert - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - ... and 2 more: https://git.openjdk.org/jdk/compare/5291e1c1...33e4edb1 ------------- Changes: https://git.openjdk.org/jdk/pull/25305/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=03 Stats: 381 lines in 8 files changed: 165 ins; 197 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From shade at openjdk.org Wed Nov 26 13:49:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 13:49:31 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Wed, 24 Sep 2025 13:08:14 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Drop atomic counters > - Initial version When looking at this PR again, I realized shuffling could be much simpler if we do it outside the loop. I am testing new revision now and would do a few touchups. I'll say when the patch is ready for more thorough look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3581395413 From shade at openjdk.org Wed Nov 26 13:49:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 13:49:33 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 17:10:33 GMT, John R Rose wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Drop atomic counters >> - Initial version > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4845: > >> 4843: push(temp_reg); >> 4844: movptr(temp_reg, recv); >> 4845: recv_reg = temp_reg; > > I can mentally do the appropriate `assert_different_registers` here, but an explicit one to confirm would be better. > (Same comment for the next arm of the if/else.) https://github.com/openjdk/jdk/pull/25305#issuecomment-3581395413 :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2565073504 From mdoerr at openjdk.org Wed Nov 26 13:55:47 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Nov 2025 13:55:47 GMT Subject: RFR: 8367487: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on Linux aarch64 with Cavium CPU In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 14:24:19 GMT, David Briemann wrote: > Fix by requiring both floating-point half-precision fphp advanced SIMD half-precision (asimdhp) on aarch64. +1 ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28390#pullrequestreview-3511077193 From dlunden at openjdk.org Wed Nov 26 14:11:07 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 26 Nov 2025 14:11:07 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v22] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 06:51:35 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Rerunning tests and will re-approve when finished. Latest changes look good, here are a few nits (only comment and style changes): https://github.com/openjdk/jdk/commit/e33416a7d8b9076fdd40a22914d8bb163c9b9600 Also, thanks for your patience! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3581488353 From chagedorn at openjdk.org Wed Nov 26 14:31:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 14:31:58 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> References: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> Message-ID: <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> On Tue, 25 Nov 2025 12:52:35 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - review > - review > - Merge branch 'master' into JDK-8354282 > - review > - infinite loop in gvn fix > - renaming > - merge > - Merge branch 'master' into JDK-8354282 > - fix & test Introducing a 4th dependency type looks reasonable. It's also nice to see one more refactoring in that area which makes it very expressive now. Thanks for doing that! I left some suggestions to possibly further improve the code. src/hotspot/share/opto/castnode.cpp line 40: > 38: const ConstraintCastNode::DependencyType ConstraintCastNode::DependencyType::FloatingNonNarrowing(true, false, "floating non narrowing dependency"); // not pinned, doesn't narrow type > 39: const ConstraintCastNode::DependencyType ConstraintCastNode::DependencyType::NonFloatingNarrowing(false, true, "now floating narrowing dependency"); // pinned, narrows type > 40: const ConstraintCastNode::DependencyType ConstraintCastNode::DependencyType::NonFloatingNonNarrowing(false, false, "non floating non narrowing dependency"); // pinned, doesn't narrow type Adding `-`: Suggestion: const ConstraintCastNode::DependencyType ConstraintCastNode::DependencyType::FloatingNonNarrowing(true, false, "floating non-narrowing dependency"); // not pinned, doesn't narrow type const ConstraintCastNode::DependencyType ConstraintCastNode::DependencyType::NonFloatingNarrowing(false, true, "non-floating narrowing dependency"); // pinned, narrows type const ConstraintCastNode::DependencyType ConstraintCastNode::DependencyType::NonFloatingNonNarrowing(false, false, "non-floating non-narrowing dependency"); // pinned, doesn't narrow type src/hotspot/share/opto/castnode.cpp line 50: > 48: if (!_dependency.narrows_type()) { > 49: return this; > 50: } I suggest to split the comment to make it more clear: Suggestion: if (!_dependency.narrows_type()) { // If this cast doesn't carry a type dependency (i.e. not used for type narrowing), we cannot optimize it. return this; } // This cast node carries a type depedency. We can remove it if: // - Its input has a narrower type // - There's a dominating cast with same input but narrower type src/hotspot/share/opto/castnode.cpp line 634: > 632: if (wide_t != bottom_t) { > 633: // Widening the type of the Cast (to allow some commoning) causes the Cast to change how it can be optimized (if > 634: // type of its input is narrower than the Cast's type, we can't remove it to not loose the dependency). Suggestion: // type of its input is narrower than the Cast's type, we can't remove it to not loose the control dependency). src/hotspot/share/opto/castnode.hpp line 101: > 99: } > 100: return NonFloatingNonNarrowing; > 101: } Just a side note: We seem to mix the terms "(non-)pinned" with "(non-)floating" freely. Should we stick to just one? But maybe it's justified to use both depending on the situation/code context. src/hotspot/share/opto/castnode.hpp line 120: > 118: // be removed in any case otherwise the sunk node floats back into the loop. > 119: static const DependencyType NonFloatingNonNarrowing; > 120: I needed a moment to completely understand all these combinations. I rewrote the definitions in this process a little bit. Feel free to take some of it over: // All the possible combinations of floating/narrowing with example use cases: // Use case example: Range Check CastII // Floating: The Cast is only dependent on the single range check. // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely // remove the cast because the array access will be safe. static const DependencyType FloatingNarrowing; // Use case example: Widening Cast nodes' types after loop opts: We want to common Casts with slightly different types. // Floating: These Casts only depend on the single control. // NonNarrowing: Even when the input type is narrower, we are not removing the Cast. Otherwise, the dependency // to the single control is lost, and an array access could float above its range check because we // just removed the dependency to the range check by removing the Cast. This could lead to an // out-of-bounds access. static const DependencyType FloatingNonNarrowing; // Use case example: An array accesses that is no longer dependent on a single range check (e.g. range check smearing). // NonFloating: The array access must be pinned below all the checks it depends on. If the check it directly depends // on with a control input is hoisted, we do hoist the Cast as well. If we allowed the Cast to float, // we risk that the array access ends up above another check it depends on (we cannot model two control // dependencies for a node in the IR). This could lead to an out-of-bounds access. // Narrowing: If the Cast does not narrow the input type, then it's safe to remove the cast because the array access // will be safe. static const DependencyType NonFloatingNarrowing; // Use case example: Sinking nodes out of a loop // Non-Floating & Non-Narrowing: We don't want the Cast that forces the node to be out of loop to be removed in any // case. Otherwise, the sunk node could float back into the loop, undoing the sinking. // This Cast is only used for pinning without caring about narrowing types. static const DependencyType NonFloatingNonNarrowing; test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java line 100: > 98: @Run(test = "test3") > 99: public static void test3_runner() { > 100: i = RANDOM.nextInt(3, length-1); Suggestion: i = RANDOM.nextInt(3, length - 1); ------------- PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3510584501 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2565071692 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2565111822 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2565208320 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2565130012 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2565000528 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2565211189 From chagedorn at openjdk.org Wed Nov 26 14:37:45 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Nov 2025 14:37:45 GMT Subject: RFR: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:40:21 GMT, Emanuel Peter wrote: >>> I suppose I don't. Do you think I have to? What about all the inputs of n? Those may also have a ctrl that is too low now. So if you want consistency we would have to fix up everything up the chain... :/ >> >> That's the big question. It's probably hard to say if it's necessary or not but also hard to say if someone is actually relying on it. I would say: If ctrl is not optimal but legal, it does not matter that much. If it's illegal, we should probably fix it to avoid such wrong uses further down the line. >> >>> But I think this issue is not limited to the new is_available_for_speculative_check, but already existed for the much older is_pre_loop_invariant, which also uses compute_early_ctrl. So the problem is a little bigger, if it is really a problem at all - it may well be a problem. >> >> Right, that's my feeling as well that more places are off. As long as they are just not optimal but legal, we probably do not need to worry about them. >> >>> * We enhance our loop-opts verification, and verify ctrl after SuperWord. Then we will discover that a lot of the ctrl is set inaccurately, and fix it. >> >> That's a good idea for the future. Not sure how easy it will be to keep it as accurate as possible. >> >>> That way we can also have confidence that the fix is correct. If we did the fix now, without verification, this would just be a blind stab in the dark kind of fix ;) >> >> I'm concerned here that someone relies on the illegal ctrl later even though chances are probably low. But what do you think about just seting ctrl to the just computed early ctrl (if we are sure we are going to create the speculative check)? That would be an easy fix (might not be optimal but at least legal). But I agree that we should probably tackle this ctrl verification in general at some point. > > Right. The ctrl can either be legal (between the early and lates point), or outside it (illegal). > > I fear that we do have some illegal ctrl now: > Imagine we use some node `n1(n2(n3(...)))`, where all of `n1-3` have ctrl between pre-loop and predicate, but their early is before the predicate. If we now use `n1` at the predicate and update its ctrl to early, then we would actually have to update the ctrl of `n2` and `n3` between early and predicate too, right? The whole expression would have to be moved, otherwise we have inputs that have a later ctrl than its outputs, and that is not right. > > The same is true for `is_pre_loop_invariant` with ctrl between inside pre-loop and before pre-loop. > > I'm very happy to file even a bug for this. But I'd prefer not to have to do it in this same issue here. Is that ok? Yes, you are right, the problem/implications are more wide-spread and it's not much better when fixing one place while leaving the other ones in a illegal state. I think you have a better overview over Autovectorization to make an estimate if there are immediate problems with continuing with illegal ctrl or not (sounds like there are none - at least in theory). So, I agree with you to investigate that separately. I think it could also just be part of tackling verification for ctrl in general. You can move ahead with this PR then, thanks for discussing this aspect! :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28449#discussion_r2565252769 From epeter at openjdk.org Wed Nov 26 14:57:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 14:57:53 GMT Subject: RFR: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 14:35:31 GMT, Christian Hagedorn wrote: >> Right. The ctrl can either be legal (between the early and lates point), or outside it (illegal). >> >> I fear that we do have some illegal ctrl now: >> Imagine we use some node `n1(n2(n3(...)))`, where all of `n1-3` have ctrl between pre-loop and predicate, but their early is before the predicate. If we now use `n1` at the predicate and update its ctrl to early, then we would actually have to update the ctrl of `n2` and `n3` between early and predicate too, right? The whole expression would have to be moved, otherwise we have inputs that have a later ctrl than its outputs, and that is not right. >> >> The same is true for `is_pre_loop_invariant` with ctrl between inside pre-loop and before pre-loop. >> >> I'm very happy to file even a bug for this. But I'd prefer not to have to do it in this same issue here. Is that ok? > > Yes, you are right, the problem/implications are more wide-spread and it's not much better when fixing one place while leaving the other ones in a illegal state. I think you have a better overview over Autovectorization to make an estimate if there are immediate problems with continuing with illegal ctrl or not (sounds like there are none - at least in theory). So, I agree with you to investigate that separately. I think it could also just be part of tackling verification for ctrl in general. You can move ahead with this PR then, thanks for discussing this aspect! :-) Yes, thanks for bringing it up! I filed [JDK-8372613](https://bugs.openjdk.org/browse/JDK-8372613). Not sure if it should be an RFE or a bug. Surely not a high-prio bug, but we should look into it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28449#discussion_r2565324638 From epeter at openjdk.org Wed Nov 26 15:02:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 15:02:07 GMT Subject: RFR: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 14:41:22 GMT, Roland Westrelin wrote: >> **Analysis** >> >> This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. >> >> The aliasing runtime check happens before the pre-loop. The values needed for the aliasing runtime check thus need to be available not just at the pre-loop, but even earlier: already at the aliasing check. >> >> Sadly, so far we only ever checked for `is_pre_loop_invariant`, and not `is_available_for_speculative_check`. We now found an example with the fuzzer that has a `pre_init` value that is pinned after the aliasing runtime check but before the pre-loop. Thus it passed the checks, and then created a bad graph (cyclic path, think "use before definition"). >> >> This a very rare case. Getting the ctrl pinned after the aliasing runtime check but before the pre-loop requires some very specific order of loop-opts, of unroll/pre-main-post/peeling etc. But it can happen, and so we must handle it right. >> >> Solution: check for the stronger condition `is_available_for_speculative_check`. >> >> **Future Work** >> >> We should improve the debug printing when aliasing checks cannot be inserted. Currently the tag `SW_REJECTIONS` is a bit messy, we should fix that up. But it would be too many changes for this bug fix here. >> >> **Details** >> >> During `SuperWord`, we want to insert the aliasing runtime check above `687 ParsePredicate` which is annotated with `#Auto_Vectorization_Check`. For this, we require the `pre_init` value: `1244 AddI`. Sadly, this value is pinned lower down. >> image >> >> So when we insert the aliasing runtime check, we create a bad (circular) graph: >> image > > Looks good to me. @rwestrel @chhagedorn Thanks for the review, and for the conversation about getting ctrl right ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28449#issuecomment-3581710051 From epeter at openjdk.org Wed Nov 26 15:02:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 15:02:09 GMT Subject: Integrated: 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 09:56:02 GMT, Emanuel Peter wrote: > **Analysis** > > This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. > > The aliasing runtime check happens before the pre-loop. The values needed for the aliasing runtime check thus need to be available not just at the pre-loop, but even earlier: already at the aliasing check. > > Sadly, so far we only ever checked for `is_pre_loop_invariant`, and not `is_available_for_speculative_check`. We now found an example with the fuzzer that has a `pre_init` value that is pinned after the aliasing runtime check but before the pre-loop. Thus it passed the checks, and then created a bad graph (cyclic path, think "use before definition"). > > This a very rare case. Getting the ctrl pinned after the aliasing runtime check but before the pre-loop requires some very specific order of loop-opts, of unroll/pre-main-post/peeling etc. But it can happen, and so we must handle it right. > > Solution: check for the stronger condition `is_available_for_speculative_check`. > > **Future Work** > > We should improve the debug printing when aliasing checks cannot be inserted. Currently the tag `SW_REJECTIONS` is a bit messy, we should fix that up. But it would be too many changes for this bug fix here. > > **Details** > > During `SuperWord`, we want to insert the aliasing runtime check above `687 ParsePredicate` which is annotated with `#Auto_Vectorization_Check`. For this, we require the `pre_init` value: `1244 AddI`. Sadly, this value is pinned lower down. > image > > So when we insert the aliasing runtime check, we create a bad (circular) graph: > image This pull request has now been integrated. Changeset: e3a08558 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/e3a085581bfa70437b73d4b0527a084e0c5c9aac Stats: 164 lines in 3 files changed: 155 ins; 0 del; 9 mod 8371146: C2 SuperWord: VTransform::add_speculative_check uses pre_init that is pinned after Auto_Vectorization_Check, leading to bad graph Reviewed-by: roland, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28449 From epeter at openjdk.org Wed Nov 26 15:05:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Nov 2025 15:05:03 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Additionally testing `TestIRMatching.java` and `TestPhaseIRMatching.java` on arm, aarch64, ppc64le, ppc64be s390, and riscv64 (thanks to @mhaessig for taking care of that!) > > Thanks, > Christian I suppose the issue that caused the regression was not properly tested? Anyway: thanks for fixing is ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28495#pullrequestreview-3511395470 From jbhateja at openjdk.org Wed Nov 26 15:47:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 26 Nov 2025 15:47:54 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v23] In-Reply-To: References: Message-ID: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 - Incorporating polished comments suggestions from Daniel - Review comments resolution - Review comments resolutions - Review comments resolution - Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. - Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated - Review comments resolution - Review comments resolutions - Moving demotion candidate marking to AD file, review comments resolutions - ... and 11 more: https://git.openjdk.org/jdk/compare/1ce2a44e...93577b83 ------------- Changes: https://git.openjdk.org/jdk/pull/26283/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=22 Stats: 283 lines in 13 files changed: 205 ins; 15 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/26283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283 PR: https://git.openjdk.org/jdk/pull/26283 From shade at openjdk.org Wed Nov 26 15:55:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 15:55:38 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v5] In-Reply-To: References: Message-ID: > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Tighten up some more - Offset is always rscratch1, no need to save it - Grossly simplify register shuffling - More asserts - More comment touchups - Inline code comments - Mention the updater in ReceiverTypeData - type_profile -> profile_receiver_type - Stylistic: remove redundant assert - ... and 5 more: https://git.openjdk.org/jdk/compare/c028369d...c441209a ------------- Changes: https://git.openjdk.org/jdk/pull/25305/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=04 Stats: 383 lines in 8 files changed: 167 ins; 197 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From aseoane at openjdk.org Wed Nov 26 15:58:12 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 26 Nov 2025 15:58:12 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 09:26:13 GMT, Anton Seoane Ampudia wrote: > This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). > > `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. > > This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. > > **Testing:** passes tiers 1-4 Pinging @bulasevich @theRealAph, as they were in the original code review and discussed about this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28473#issuecomment-3582016397 From roland at openjdk.org Wed Nov 26 16:14:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Nov 2025 16:14:43 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v5] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/3569280e..2aa918e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=03-04 Stats: 13 lines in 2 files changed: 5 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From vpaprotski at openjdk.org Wed Nov 26 16:47:23 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 26 Nov 2025 16:47:23 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v7] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 00:55:18 GMT, David Holmes wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> comments from Jatin > > The new test can only run on x86 but it is not restricted to x86, thus it fails when run on Aarch64. @dholmes-ora Hi David, need some help with this please, don't have access to an ARM system to reproduce (or the ARM expertise).. could you point me at the failing job if thats available? Or some log if not? - Is it an issue with the options (i.e. `-XX:UseAVX=2` perhaps). I probably should had added `-XX:+IgnoreUnrecognizedVMOptions` to it.. - Otherwise, I am stumped.. the test case isn't architecture-specific.. it calls two methods (one of which is annotated as an intrinsic..) and expects them to return the same value.. i.e. Java and Intrinsic version should behave the same.. - Only thing I can think of.. The ARM implementation took some shortcuts in name of optimization. This can be entirely valid if the code calling the intrinsics never should get some specific value (-ranges). i.e. the tests RNG be further restricted.. - Otherwise.. is it possible its a bug in the ARM intrinsic? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3582205814 From vpaprotski at openjdk.org Wed Nov 26 16:52:03 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 26 Nov 2025 16:52:03 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 21:00:39 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> next set of comments > > Marked as reviewed by ascarpino (Reviewer). Oh.. realized that I should had checked JBS.. thanks @ascarpino for resolving the bug I caused! At least its just the option.. whew. > @dholmes-ora Hi David, need some help with this please, don't have access to an ARM system to reproduce (or the ARM expertise).. could you point me at the failing job if thats available? Or some log if not? > > * Is it an issue with the options (i.e. `-XX:UseAVX=2` perhaps). I probably should had added `-XX:+IgnoreUnrecognizedVMOptions` to it.. > * Otherwise, I am stumped.. the test case isn't architecture-specific.. it calls two methods (one of which is annotated as an intrinsic..) and expects them to return the same value.. i.e. Java and Intrinsic version should behave the same.. > * Only thing I can think of.. The ARM implementation took some shortcuts in name of optimization. This can be entirely valid if the code calling the intrinsics never should get some specific value (-ranges). i.e. the tests RNG be further restricted.. > * Otherwise.. is it possible its a bug in the ARM intrinsic? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3582226267 From roland at openjdk.org Wed Nov 26 16:55:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Nov 2025 16:55:52 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v6] In-Reply-To: <-kd-AfwkJebk8njImn0KeKvUCQnwoiqLr96cKCovlFc=.30649d16-8dee-4c9d-b1eb-ac9d7e9df86a@github.com> References: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> <-kd-AfwkJebk8njImn0KeKvUCQnwoiqLr96cKCovlFc=.30649d16-8dee-4c9d-b1eb-ac9d7e9df86a@github.com> Message-ID: On Tue, 25 Nov 2025 22:45:53 GMT, Vladimir Ivanov wrote: > Sure, I'm fine either way. There are known cases when `dec_number_of_mh_late_inlines()` call is missing, so the patch as it is now looks fine as well considering we'll investigate the effects on `inline_string_calls()` call. Ok. Let's keep the patch as it is then. Thanks for you input. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3582248065 From shade at openjdk.org Wed Nov 26 19:42:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Nov 2025 19:42:03 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v3] In-Reply-To: References: Message-ID: On Sat, 22 Nov 2025 21:42:49 GMT, John R Rose wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls >> - Drop atomic counters >> - Initial version > > Code is good. Consider changing a name and adding documentation. Tests are still passing, ready for another review round, @rose00, @vnkozlov :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3582957544 From jiangli at openjdk.org Wed Nov 26 19:58:54 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 26 Nov 2025 19:58:54 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: On Wed, 26 Nov 2025 07:15:20 GMT, Shawn M Emery wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. > > test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 26: > >> 24: /* >> 25: * @test >> 26: * @bug 8371864 > > Does it make sense to just run the unit test on architectures with `@requires vm.cpu.features ~= ".*avx512f.*" | vm.cpu.features ~= ".*avx2.*"` annotation? Thanks for reviewing and testing! > Does it make sense to just run the unit test on architectures with @requires vm.cpu.features ~= ".*avx512f.*" | vm.cpu.features ~= ".*avx2.*" annotation? Limiting the test execution on the relevant devices is a good idea. We can also check for `os.simpleArch == "x64"`. We probably could check for ".*avx512.*" instead ".*avx512f.*" just to make sure we still get the proper test coverage in case there is any future/hidden bugs with populating cpu feature flags. I just did a quick testing: On my local machine, these related cpu feature flags are set: `avx, avx2`. On a machine enabled with the `aesgcm_avx512` intrinsic, these are the related cpu feature flags: `avx, avx2, avx512f, avx512dq, avx512cd, avx512bw, avx512vl, avx512_vpopcntdq, avx512_vpclmulqdq, avx512_vaes, avx512_vnni, avx512_vbmi2, avx512_vbmi, avx512_bitalg, avx512_ifma` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2566334457 From jiangli at openjdk.org Wed Nov 26 23:09:19 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 26 Nov 2025 23:09:19 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v9] In-Reply-To: References: Message-ID: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Address @smemery's comments: - Add @requires - Shorten long lines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28363/files - new: https://git.openjdk.org/jdk/pull/28363/files/4ea57ee7..64beb969 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=07-08 Stats: 20 lines in 1 file changed: 10 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363 PR: https://git.openjdk.org/jdk/pull/28363 From jiangli at openjdk.org Wed Nov 26 23:09:20 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 26 Nov 2025 23:09:20 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: On Wed, 26 Nov 2025 19:55:47 GMT, Jiangli Zhou wrote: >> test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 26: >> >>> 24: /* >>> 25: * @test >>> 26: * @bug 8371864 >> >> Does it make sense to just run the unit test on architectures with `@requires vm.cpu.features ~= ".*avx512f.*" | vm.cpu.features ~= ".*avx2.*"` annotation? > > Thanks for reviewing and testing! > >> Does it make sense to just run the unit test on architectures with @requires vm.cpu.features ~= ".*avx512f.*" | vm.cpu.features ~= ".*avx2.*" annotation? > > Limiting the test execution on the relevant devices is a good idea. We can also check for `os.simpleArch == "x64"`. We probably could check for ".*avx512.*" instead ".*avx512f.*" just to make sure we still get the proper test coverage in case there is any future/hidden bugs with populating cpu feature flags. > > I just did a quick testing: > On my local machine, these related cpu feature flags are set: `avx, avx2`. > > On a machine enabled with the `aesgcm_avx512` intrinsic, these are the related cpu feature flags: > `avx, avx2, avx512f, avx512dq, avx512cd, avx512bw, avx512vl, avx512_vpopcntdq, avx512_vpclmulqdq, avx512_vaes, avx512_vnni, avx512_vbmi2, avx512_vbmi, avx512_bitalg, avx512_ifma` Added `@requires`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2566688828 From jiangli at openjdk.org Wed Nov 26 23:09:23 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 26 Nov 2025 23:09:23 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: On Wed, 26 Nov 2025 07:15:34 GMT, Shawn M Emery wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. > > test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 72: > >> 70: } >> 71: >> 72: private byte[] gcmEncrypt(final byte[] key, final byte[] plaintext, final byte[] aad) > > nit: > 80 characters Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2566689379 From jiangli at openjdk.org Wed Nov 26 23:11:51 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 26 Nov 2025 23:11:51 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: <-GTQM1Bb1GezGvj3dlPWQsH0U9L3PbW5NuIC4WQ1M2I=.0e4eaf1b-8931-4f13-b9f3-ce5886896505@github.com> On Wed, 26 Nov 2025 07:15:23 GMT, Shawn M Emery wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. > > test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 27: > >> 25: * @test >> 26: * @bug 8371864 >> 27: * @run main/othervm/timeout=600 TestGCMSplitBound > > 60 was sufficient for my test runs. It's probably better to give larger timeout factor to prevent false failure when testing on slower machine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2566693578 From vlivanov at openjdk.org Thu Nov 27 01:21:24 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 27 Nov 2025 01:21:24 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks Message-ID: Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. The difference can be illustrated with the following simple cases: class A { void m() {} } class B extends A { void m() {} } void testInstanceOf(A obj) { if (obj instanceof B) { obj.m(); } } InstanceOf::testInstanceOf (12 bytes) @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call vs void testInstanceOfCast(A obj) { if (obj instanceof B) { B b = (B)obj; b.m(); } } InstanceOf::testInstanceOfCast (17 bytes) @ 13 InstanceOf$B::m (1 bytes) inline (hot) Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. Testing: hs-tier1 - hs-tier5 ------------- Commit messages: - bugid - C2: Materialize type information from instanceof checks Changes: https://git.openjdk.org/jdk/pull/28517/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372634 Stats: 558 lines in 10 files changed: 524 ins; 0 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/28517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28517/head:pull/28517 PR: https://git.openjdk.org/jdk/pull/28517 From duke at openjdk.org Thu Nov 27 01:21:25 2025 From: duke at openjdk.org (ExE Boss) Date: Thu, 27 Nov 2025 01:21:25 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 00:53:54 GMT, Vladimir Ivanov wrote: > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 test/hotspot/jtreg/compiler/inlining/TestSubtypeCheckTypeInfo.java line 323: > 321: static boolean lateInlineIsInstanceCondPost(A o, boolean cond) { > 322: return B.class.isInstance(o) && cond; > 323: } What?about the?non?late?version of?these?methods? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2566844680 From vlivanov at openjdk.org Thu Nov 27 01:21:25 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 27 Nov 2025 01:21:25 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 01:02:51 GMT, ExE Boss wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > test/hotspot/jtreg/compiler/inlining/TestSubtypeCheckTypeInfo.java line 323: > >> 321: static boolean lateInlineIsInstanceCondPost(A o, boolean cond) { >> 322: return B.class.isInstance(o) && cond; >> 323: } > > What?about the?non?late?version of?these?methods? There are corresponding test cases (`testInstanceOfCondPre` et al) where conditions are embedded. The idea of `testInstanceOfCondLate` and similar test cases is to check how inlining works when condition improves receiver type during incremental inlining phase. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2566881862 From wenanjian at openjdk.org Thu Nov 27 02:40:20 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 27 Nov 2025 02:40:20 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v2] In-Reply-To: References: Message-ID: > Support AES CBC intrinsic on RISCV, Already passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: fix some comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28320/files - new: https://git.openjdk.org/jdk/pull/28320/files/280fae41..db18bda6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28320&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28320&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28320/head:pull/28320 PR: https://git.openjdk.org/jdk/pull/28320 From wenanjian at openjdk.org Thu Nov 27 02:40:21 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 27 Nov 2025 02:40:21 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v2] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 07:03:48 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix some comments > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2619: > >> 2617: // >> 2618: // Output: >> 2619: // x0 - input length > > Shouldn't this be `x10`? `x0` is the zero register on riscv. Thanks for the reminder, fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28320#discussion_r2566996650 From chagedorn at openjdk.org Thu Nov 27 06:42:47 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Nov 2025 06:42:47 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Additionally testing `TestIRMatching.java` and `TestPhaseIRMatching.java` on arm, aarch64, ppc64le, ppc64be s390, and riscv64 (thanks to @mhaessig for taking care of that!) > > Thanks, > Christian Thanks Emanuel for your review! I first thought about running more tests for the IR framework but thought we don't rely on the changed type information - apparently we do! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3584408244 From duke at openjdk.org Thu Nov 27 07:15:11 2025 From: duke at openjdk.org (Harshit470250) Date: Thu, 27 Nov 2025 07:15:11 GMT Subject: RFR: 8372641: [s390x] Test failure TestMergeStores.java Message-ID: [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. I also implemented ReverseBytesUS. ------------- Commit messages: - add Opcode ReverseBytesUS - Add Opcode ReverseBytesS Changes: https://git.openjdk.org/jdk/pull/28523/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28523&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372641 Stats: 34 lines in 1 file changed: 34 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28523.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28523/head:pull/28523 PR: https://git.openjdk.org/jdk/pull/28523 From duke at openjdk.org Thu Nov 27 07:49:53 2025 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 27 Nov 2025 07:49:53 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v9] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 23:09:19 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Address @smemery's comments: > - Add @requires > - Shorten long lines src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4026: > 4024: //process 8 16 byte blocks at a time until all are done 'encrypt_by_8_new followed by ghash_last_8' > 4025: __ xorl(pos, pos); > 4026: __ cmpl(len, 128); Was this part of the original problem? I was trying to trace where this is called with < 128 bytes and couldn't find the path. test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 134: > 132: } catch (Exception e) { > 133: throw new RuntimeException("Failed for messageSize " + > 134: Integer.toHexString(messageSize), e); nit: `+` operator should be first and line indented >= 8 white-spaces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2567476829 PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2567477126 From mhaessig at openjdk.org Thu Nov 27 08:18:48 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Nov 2025 08:18:48 GMT Subject: RFR: 8372641: [s390x] Test failure TestMergeStores.java In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 07:09:01 GMT, Harshit470250 wrote: > [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. > I also implemented ReverseBytesUS. Thank you for fixing this, @Harshit470250. The new instructions look good to me. I am only unsure about the cost (see comments below). src/hotspot/cpu/s390/s390.ad line 11619: > 11617: match(Set dst (ReverseBytesS src)); > 11618: predicate(UseByteReverseInstruction); > 11619: ins_cost(DEFAULT_COST); Would this not be twice the `DEFAULT_COST` since it uses two instructions? Suggestion: ins_cost(2 * DEFAULT_COST); src/hotspot/cpu/s390/s390.ad line 11635: > 11633: match(Set dst (ReverseBytesUS src)); > 11634: predicate(UseByteReverseInstruction); > 11635: ins_cost(DEFAULT_COST); Suggestion: ins_cost(2 * DEFAULT_COST); ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28523#pullrequestreview-3513961005 PR Review Comment: https://git.openjdk.org/jdk/pull/28523#discussion_r2567535924 PR Review Comment: https://git.openjdk.org/jdk/pull/28523#discussion_r2567536919 From duke at openjdk.org Thu Nov 27 08:46:26 2025 From: duke at openjdk.org (Harshit470250) Date: Thu, 27 Nov 2025 08:46:26 GMT Subject: RFR: 8372641: [s390x] Test failure TestMergeStores.java [v2] In-Reply-To: References: Message-ID: > [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. > I also implemented ReverseBytesUS. Harshit470250 has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Review Co-authored-by: Manuel H?ssig Co-authored-by: Amit Kumar ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28523/files - new: https://git.openjdk.org/jdk/pull/28523/files/36c6de13..61352bdc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28523&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28523&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28523.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28523/head:pull/28523 PR: https://git.openjdk.org/jdk/pull/28523 From amitkumar at openjdk.org Thu Nov 27 08:46:27 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 27 Nov 2025 08:46:27 GMT Subject: RFR: 8372641: [s390x] Test failure TestMergeStores.java [v2] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 08:43:16 GMT, Harshit470250 wrote: >> [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. >> I also implemented ReverseBytesUS. > > Harshit470250 has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Review > > Co-authored-by: Manuel H?ssig > Co-authored-by: Amit Kumar Changes requested by amitkumar (Committer). src/hotspot/cpu/s390/s390.ad line 11623: > 11621: > 11622: format %{ "LRVR $dst, $src\n\t # byte reverse int" > 11623: "SRA $dst, 0x0010\t # right shift by 16" %} Suggestion: "SRA $dst, 0x0010\t # right shift by 16, sign extended" %} src/hotspot/cpu/s390/s390.ad line 11639: > 11637: > 11638: format %{ "LRVR $dst, $src\n\t # byte reverse int" > 11639: "SRL $dst, 0x0010\t # right shift by 16" %} Suggestion: "SRL $dst, 0x0010\t # right shift by 16, zero extended" %} ------------- PR Review: https://git.openjdk.org/jdk/pull/28523#pullrequestreview-3514039618 PR Review Comment: https://git.openjdk.org/jdk/pull/28523#discussion_r2567602172 PR Review Comment: https://git.openjdk.org/jdk/pull/28523#discussion_r2567603002 From chagedorn at openjdk.org Thu Nov 27 08:57:48 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Nov 2025 08:57:48 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: <9CGZeCADEds8B60aZZxkUj9GWIfvQAmQ9lN8E_ft4uo=.9923fd74-e17e-453d-9f83-e2367ae96ca9@github.com> On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Failing IR framework internal tests on all platforms > > Thanks, > Christian Could you help to test the previously failing tests - `TestIRMatching.java` - `TestPhaseIRMatching.java` - `IRExample.java` with the proposed patch on different platforms? - PPC (@TheRealMDoerr) - s390 (@offamitkumar) - riscv (@Hamlin-Li) That would be highly appreciated :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3584800447 From duke at openjdk.org Thu Nov 27 08:59:09 2025 From: duke at openjdk.org (Harshit470250) Date: Thu, 27 Nov 2025 08:59:09 GMT Subject: RFR: 8372641: [s390x] Test failure TestMergeStores.java [v3] In-Reply-To: References: Message-ID: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> > [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. > I also implemented ReverseBytesUS. Harshit470250 has updated the pull request incrementally with one additional commit since the last revision: Added whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28523/files - new: https://git.openjdk.org/jdk/pull/28523/files/61352bdc..d5ad5e4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28523&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28523&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28523.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28523/head:pull/28523 PR: https://git.openjdk.org/jdk/pull/28523 From mhaessig at openjdk.org Thu Nov 27 08:59:10 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Nov 2025 08:59:10 GMT Subject: RFR: 8372641: [s390x] Test failure TestMergeStores.java [v3] In-Reply-To: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> References: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> Message-ID: On Thu, 27 Nov 2025 08:56:12 GMT, Harshit470250 wrote: >> [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. >> I also implemented ReverseBytesUS. > > Harshit470250 has updated the pull request incrementally with one additional commit since the last revision: > > Added whitespace Thanks for addressing my comments. This looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28523#pullrequestreview-3514116906 From dbriemann at openjdk.org Thu Nov 27 09:14:06 2025 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 27 Nov 2025 09:14:06 GMT Subject: RFR: 8367487: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on Linux aarch64 with Cavium CPU In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 14:24:19 GMT, David Briemann wrote: > Fix by requiring both floating-point half-precision fphp advanced SIMD half-precision (asimdhp) on aarch64. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28390#issuecomment-3584841784 From dbriemann at openjdk.org Thu Nov 27 09:14:06 2025 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 27 Nov 2025 09:14:06 GMT Subject: Integrated: 8367487: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on Linux aarch64 with Cavium CPU In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 14:24:19 GMT, David Briemann wrote: > Fix by requiring both floating-point half-precision fphp advanced SIMD half-precision (asimdhp) on aarch64. This pull request has now been integrated. Changeset: 86aae125 Author: David Briemann URL: https://git.openjdk.org/jdk/commit/86aae125f1a4e16dfe2dd0faf63f96ae1ca7bcd0 Stats: 15 lines in 1 file changed: 13 ins; 0 del; 2 mod 8367487: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on Linux aarch64 with Cavium CPU Reviewed-by: epeter, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/28390 From duke at openjdk.org Thu Nov 27 09:29:52 2025 From: duke at openjdk.org (Niklas Keller) Date: Thu, 27 Nov 2025 09:29:52 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v9] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 07:46:46 GMT, Shawn M Emery wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @smemery's comments: >> - Add @requires >> - Shorten long lines > > test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 134: > >> 132: } catch (Exception e) { >> 133: throw new RuntimeException("Failed for messageSize " + >> 134: Integer.toHexString(messageSize), e); > > nit: `+` operator should be first and line indented >= 8 white-spaces. Aren't these nits something a tool should check and in the best case also fix automatically? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2567760087 From mchevalier at openjdk.org Thu Nov 27 09:46:23 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Nov 2025 09:46:23 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v4] In-Reply-To: References: Message-ID: > This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. > > The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. > > # Analysis > ## Obervationally > ### IGVN > During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: > > in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) > in(2): null > > We compute the join (HS' meet): > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 > > t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > But the current `_type` (of the `PhiNode` as a `TypeNode`) is > > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > > We filter `t` by `_type` > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 > and we get > > ft=java/lang/Object * > > which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 > and > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 > > > ### Verification > On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time > > _type=java/lang/Object * > > and so after filtering `t` by (new) `_type` and we get > > ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. > > ## But why?! > ### Details on type computation > In short, we are doing > > t = typeof(in(1)) / typeof(in(2)) > ft = t /\ _type (* IGVN *) > ft' = t /\ ft (* Verification *) > > and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". > > To me, the surprising fact was that the intersection > > java/lang/Object * (... Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - test - Merge branch 'master' into JDK-8371716 - More test - IgnoreUnrecognizedVMOptions - Fix bug number - Filter twice ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28331/files - new: https://git.openjdk.org/jdk/pull/28331/files/e9f3ac98..7a092dac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=02-03 Stats: 73615 lines in 1087 files changed: 49098 ins; 16665 del; 7852 mod Patch: https://git.openjdk.org/jdk/pull/28331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28331/head:pull/28331 PR: https://git.openjdk.org/jdk/pull/28331 From mchevalier at openjdk.org Thu Nov 27 10:01:54 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Nov 2025 10:01:54 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v4] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 09:46:23 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again... > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - test > - Merge branch 'master' into JDK-8371716 > - More test > - IgnoreUnrecognizedVMOptions > - Fix bug number > - Filter twice I've added the second test (and I confirm that it crashes quickly on master), simplified a bit, commented, and merge with master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3585025088 From roland at openjdk.org Thu Nov 27 10:03:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 27 Nov 2025 10:03:56 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 15:46:20 GMT, Zihao Lin wrote: >> This patch remove slice parameter from LoadNode::make >> >> I have done more work which remove slice paramater from StoreNode::make. >> >> Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 >> >> Hi team, I am new, I'd appreciate any guidance. Thank a lot! > > Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: > > Fix test failed Changes requested by roland (Reviewer). src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 720: > 718: if (ShenandoahCardBarrier) { > 719: post_barrier(kit, kit->control(), access.raw_access(), access.base(), > 720: access.addr(), access.alias_idx(), new_val, T_OBJECT, true); `access.alias_idx()` should be `C->get_alias_index(kit.gvn().type(access.addr()))` So I think we want to remove `uint _alias_idx;` from `C2AtomicParseAccess` as well. This could be done as a follow up if you think this change has already gotten too complicated. src/hotspot/share/opto/escape.cpp line 4488: > 4486: const TypePtr* adr_type = proj->adr_type(); > 4487: const TypePtr* new_adr_type = tinst->add_offset(adr_type->offset()); > 4488: if (adr_type != new_adr_type) { Can you explain that change? Did something go wrong in a merge? src/hotspot/share/opto/graphKit.cpp line 1703: > 1701: BasicType bt, > 1702: DecoratorSet decorators) { > 1703: C2AccessValuePtr addr(adr, adr_type); `adr_type` no longer used in this and next methods. ------------- PR Review: https://git.openjdk.org/jdk/pull/24258#pullrequestreview-3514352600 PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2567870138 PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2567854115 PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2567875036 From roland at openjdk.org Thu Nov 27 10:03:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 27 Nov 2025 10:03:58 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v8] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 08:30:51 GMT, Zihao Lin wrote: >> Hi, I give it a try, but it failed pass the test. Is it possible the original version is wrong? >> The mark word will not be `TypeRawPtr::BOTTOM`, it should equal to Klass slice index. > > One dump `proto_adr ` is ` 1368 AddP === _ 196 196 1367 [[ ]] Klass:precise java/util/LinkedHashMap$Entry: 0x0000000918349ca0 (java/util/Map$Entry):Constant:exact+168 *` I think the original version was wrong indeed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2567848297 From mchevalier at openjdk.org Thu Nov 27 10:12:51 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Nov 2025 10:12:51 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v4] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 09:46:23 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again... > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - test > - Merge branch 'master' into JDK-8371716 > - More test > - IgnoreUnrecognizedVMOptions > - Fix bug number > - Filter twice I think it would be reasonable to move forward with the current solution, at the moment. It would resolve some existing failures in Valhalla (so it would be nice if it's in soon), it is on the safe side, without putting too much effort in a case that seems quite rare, or in improvements with unclear benefits. Yet, I agree that is not fundamentally very satisfying, but this fix doesn't prevent (or make harder) a possible future broader revisit of speculative types. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3585068039 From snatarajan at openjdk.org Thu Nov 27 10:24:58 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Thu, 27 Nov 2025 10:24:58 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness [v2] In-Reply-To: References: <5r88siWBHsqHY-Ey7e0W4ZMrKdswjcLNZgJicegxrvo=.16a16b47-4f7d-4cc7-baed-69662f5d1204@github.com> Message-ID: On Wed, 26 Nov 2025 11:27:19 GMT, Emanuel Peter wrote: >> That is true. There is no need for javadoc style comments (see [The JDK Test Framework: Tag Language Specification](https://openjdk.org/jtreg/tag-spec.html)). I have fixed this file. >> >> You are also right about we not being consistent with the tests. I have left the rest of the test (in this changeset) with javadoc style as it is. Do let me know if I should change them ? > > Personally, I would not worry too much about comment style. Well actually: are we sure that both get executed? I tested the following tests in this changeset that started with` /**` (javadoc style) with` /**` and` /*`. Both the styles resulted in the same results. test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java test/hotspot/jtreg/compiler/loopopts/superword/MinMaxRed_Int.java test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java test/hotspot/jtreg/compiler/vectorapi/TestVectorCompressExpandBits.java test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java test/hotspot/jtreg/compiler/vectorapi/VectorSaturatedOperationsTest.java test/hotspot/jtreg/compiler/vectorization/TestMacroLogicVector.java ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28463#discussion_r2567943579 From mli at openjdk.org Thu Nov 27 10:33:54 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 27 Nov 2025 10:33:54 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: <0Pd1KHDasNexyrTADSFklEkdKRA4mFjf7S3xXkpzKDQ=.231aa3d7-44bb-4821-889c-fddc42673519@github.com> On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Failing IR framework internal tests on all platforms > > Thanks, > Christian There is following IR test failure in TestIRMatching.java: Failed IR Rules (2) of Methods (2) ---------------------------------- 1) Method "public boolean ir_framework.tests.CheckCastArray.array(java.lang.Object[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#CHECKCAST_ARRAY#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintOptoAssembly": - failOn: Graph contains forbidden nodes: * Constraint 1: "(((?i:cmp|CLFI|CLR).*aryklassptr:\[.*:Constant|.*(?i:mov|mv|or).*aryklassptr:\[.*:Constant.*\\R.*(cmp|CMP|CLR)))" - Matched forbidden node: * 066 + mv R7, narrowklass: aryklassptr:[instklassptr:ir_framework/tests/MyClass:NotNull+0 (java/lang/Cloneable,java/io/Serializable):Constant+0 # compressed klass ptr, #@loadConNKlass 072 + bne R28, R7, B5 #@cmp 2) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintOptoAssembly": - failOn: Graph contains forbidden nodes: * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" - Matched forbidden node: * 1b4 + CALL, runtime leaf nofp 0x00007f88035103c0 #@CallLeafNoFPDirect checkcast_arraycopy ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3585142828 From galder at openjdk.org Thu Nov 27 10:35:09 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 27 Nov 2025 10:35:09 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v2] In-Reply-To: <00bVdr5iVqT_AHQiazRM_X9hadRH8_xOJntKv9LCpyQ=.34ee4851-8d64-46a8-8eee-edb67093dee8@github.com> References: <00bVdr5iVqT_AHQiazRM_X9hadRH8_xOJntKv9LCpyQ=.34ee4851-8d64-46a8-8eee-edb67093dee8@github.com> Message-ID: On Tue, 25 Nov 2025 10:05:32 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Change copyright to Amazon > > test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxLongLoopBarrier.java line 1: > >> 1: /* > > You might as well also find a better home for this test. We are trying to move away from `irTests`, and move tests to directories with the relevant "topic". Aha, I was wondering about that a couple of days ago... I'll move it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28385#discussion_r2567979455 From galder at openjdk.org Thu Nov 27 10:42:30 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 27 Nov 2025 10:42:30 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v3] In-Reply-To: References: Message-ID: > Trivial cleanup to move tests out of a test class whose description does not match these tests Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Refactored to compiler.gcbarriers package ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28385/files - new: https://git.openjdk.org/jdk/pull/28385/files/bb287ba6..278d4bce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28385&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28385&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28385/head:pull/28385 PR: https://git.openjdk.org/jdk/pull/28385 From galder at openjdk.org Thu Nov 27 10:42:31 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 27 Nov 2025 10:42:31 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v2] In-Reply-To: <00bVdr5iVqT_AHQiazRM_X9hadRH8_xOJntKv9LCpyQ=.34ee4851-8d64-46a8-8eee-edb67093dee8@github.com> References: <00bVdr5iVqT_AHQiazRM_X9hadRH8_xOJntKv9LCpyQ=.34ee4851-8d64-46a8-8eee-edb67093dee8@github.com> Message-ID: On Tue, 25 Nov 2025 10:05:32 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Change copyright to Amazon > > test/hotspot/jtreg/compiler/gcbarriers/TestMinMaxLongLoopBarrier.java line 1: > >> (failed to retrieve contents of file, check the PR for context) > You might as well also find a better home for this test. We are trying to move away from `irTests`, and move tests to directories with the relevant "topic". @eme64 Moved!?? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28385#discussion_r2567998042 From chagedorn at openjdk.org Thu Nov 27 11:38:47 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Nov 2025 11:38:47 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: <0Pd1KHDasNexyrTADSFklEkdKRA4mFjf7S3xXkpzKDQ=.231aa3d7-44bb-4821-889c-fddc42673519@github.com> References: <0Pd1KHDasNexyrTADSFklEkdKRA4mFjf7S3xXkpzKDQ=.231aa3d7-44bb-4821-889c-fddc42673519@github.com> Message-ID: On Thu, 27 Nov 2025 10:30:54 GMT, Hamlin Li wrote: >> [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: >> >> - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". >> - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. >> - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: >> https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 >> I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. >> >> #### Testing >> - [X] Tier1 >> - [X] Tier5 with IR framework internal tests only >> - [ ] Failing IR framework internal tests on all platforms >> >> Thanks, >> Christian > > There is following IR test failure in TestIRMatching.java: > > Failed IR Rules (2) of Methods (2) > ---------------------------------- > 1) Method "public boolean ir_framework.tests.CheckCastArray.array(java.lang.Object[])" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#CHECKCAST_ARRAY#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintOptoAssembly": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(((?i:cmp|CLFI|CLR).*aryklassptr:\[.*:Constant|.*(?i:mov|mv|or).*aryklassptr:\[.*:Constant.*\\R.*(cmp|CMP|CLR)))" > - Matched forbidden node: > * 066 + mv R7, narrowklass: aryklassptr:[instklassptr:ir_framework/tests/MyClass:NotNull+0 (java/lang/Cloneable,java/io/Serializable):Constant+0 # compressed klass ptr, #@loadConNKlass > 072 + bne R28, R7, B5 #@cmp > > 2) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintOptoAssembly": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" > - Matched forbidden node: > * 1b4 + CALL, runtime leaf nofp 0x00007f88035103c0 #@CallLeafNoFPDirect checkcast_arraycopy Thanks @Hamlin-Li for testing! Can you send me the complete `.jtr` output? It is a little tricky to understand in isolation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3585388056 From epeter at openjdk.org Thu Nov 27 11:45:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Nov 2025 11:45:21 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism Message-ID: **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. ------------- Commit messages: - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes - rm old documentation - git move to new test - streamline - refactor and verify - unique worklist - wip solution - JDK-8372451 Changes: https://git.openjdk.org/jdk/pull/28512/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28512&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372451 Stats: 202 lines in 3 files changed: 158 ins; 1 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/28512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28512/head:pull/28512 PR: https://git.openjdk.org/jdk/pull/28512 From epeter at openjdk.org Thu Nov 27 11:45:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Nov 2025 11:45:23 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 16:02:20 GMT, Emanuel Peter wrote: > **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. > > **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. > > **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. src/hotspot/share/opto/vtransform.cpp line 45: > 43: // This is similar to IGVN optimization. But we are a bit lazy, and don't care about > 44: // notification / worklist, since the list of nodes is rather small, and we don't > 45: // expect optimizations that trickle over the whole graph. This was wrong: we did have have a "trickle" over the graph with a chain of dead nodes, progressively realizing that the last node in the link was dead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28512#discussion_r2567733145 From mli at openjdk.org Thu Nov 27 11:50:47 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 27 Nov 2025 11:50:47 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: <7AKLVLaUQCWz2QK_r8eMohTJSPnpKsqt-7E3MW4MnTM=.c6ca3df1-ccb7-4a41-9125-744b8435dee6@github.com> On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Failing IR framework internal tests on all platforms > > Thanks, > Christian [ [TestIRMatching.txt](https://github.com/user-attachments/files/23795645/TestIRMatching.txt) ](url) github does not allow me to load jtr files, so rename it to txt file. Please let me know if you need more log files. Sorry, I'm working on some other tasks, can not help to debug this for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3585433535 From epeter at openjdk.org Thu Nov 27 12:20:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Nov 2025 12:20:55 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v4] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 09:46:23 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again... > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - test > - Merge branch 'master' into JDK-8371716 > - More test > - IgnoreUnrecognizedVMOptions > - Fix bug number > - Filter twice I think this solution is reasonable, and we discussed different options. Thanks @marc-chevalier for working on this. And thanks especially for the extra descriptions in the tests. It was super fascinating to see how we get to the inconsistent speculative types :) @merykitty It would be really interesting to see if the "widening" we do when we go above centerline with speculative types really makes sense. I'm sure there are micro benchmarks that would benefit from the extra speculation, and others that would suffer from it. I suppose we would have to make the changes and run larger benchmarks. Is that something you would want to look into? src/hotspot/share/opto/cfgnode.cpp line 1377: > 1375: stringStream ss; > 1376: > 1377: ss.print("At node:\n"); Suggestion: ss.print_cr("At node:"); src/hotspot/share/opto/cfgnode.cpp line 1382: > 1380: for (uint i = 1; i < req(); ++i) { > 1381: ss.print("in(%d): ", i); > 1382: if (r->in(i) && phase->type(r->in(i)) == Type::CONTROL) { Suggestion: if (r->in(i) != nullptr && phase->type(r->in(i)) == Type::CONTROL) { No implicit null check. src/hotspot/share/opto/cfgnode.cpp line 1407: > 1405: assert(false, "computed type would not pass verification"); > 1406: } > 1407: #endif You could consider moving this to a seperate verification method, but up to you. It is also very verbose. I wonder if it really makes sense to have that much code. But again: up to you. You could also make your dump less verbose, and then format it directly into the assert. The benefit would be that if a fuzzer finds such a failure we would see more directly what's going on. test/hotspot/jtreg/compiler/igvn/ClashingSpeculativeTypePhiNode.java line 57: > 55: * compiler.igvn.ClashingSpeculativeTypePhiNode > 56: * > 57: * @run main compiler.igvn.ClashingSpeculativeTypePhiNode Suggestion: * @run main ${test.main.class} The new JTREG version allows using this template. It would prevent issues where you wrongly copy from another file and forget to fix up the class name ;) You can apply it also to the compilecommands. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28331#pullrequestreview-3514828452 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2568358477 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2568359469 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2568357864 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2568256187 From epeter at openjdk.org Thu Nov 27 12:24:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Nov 2025 12:24:54 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v3] In-Reply-To: References: Message-ID: <1BZ3cuwoLIBRdM_MfBcMh3IaBQ27Yl0LcPVKZVNJYxg=.c0c255aa-620f-44b7-8493-551201fa7ff7@github.com> On Thu, 27 Nov 2025 10:42:30 GMT, Galder Zamarre?o wrote: >> Trivial cleanup to move tests out of a test class whose description does not match these tests > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Refactored to compiler.gcbarriers package Changes requested by epeter (Reviewer). test/hotspot/jtreg/compiler/gcbarriers/TestMinMaxLongLoopBarrier.java line 42: > 40: * @summary Test that MinL/MaxL nodes are removed when GC barriers in loop > 41: * @library /test/lib / > 42: * @run driver compiler.c2.irTests.TestMinMaxLongLoopBarrier I think you forgot to update the class path here. You should do this now that it is possible, it prevents errors with wrongly copied test class names ;) Suggestion: * @run driver ${test.main.class} ------------- PR Review: https://git.openjdk.org/jdk/pull/28385#pullrequestreview-3515012522 PR Review Comment: https://git.openjdk.org/jdk/pull/28385#discussion_r2568405841 From roland at openjdk.org Thu Nov 27 12:29:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 27 Nov 2025 12:29:56 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v6] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - review - review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/2aa918e2..6bbda426 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=04-05 Stats: 23 lines in 2 files changed: 10 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From roland at openjdk.org Thu Nov 27 12:33:45 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 27 Nov 2025 12:33:45 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v7] In-Reply-To: References: Message-ID: <4vqKZmOZa_hvbbySegpobemqL5dNbz1qcvIlu96fjaQ=.0bb55f83-155e-4312-aee8-038ccaeb0a88@github.com> > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/6bbda426..7a65f097 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From roland at openjdk.org Thu Nov 27 12:33:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 27 Nov 2025 12:33:46 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> References: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> Message-ID: On Wed, 26 Nov 2025 14:29:06 GMT, Christian Hagedorn wrote: > Introducing a 4th dependency type looks reasonable. It's also nice to see one more refactoring in that area which makes it very expressive now. Thanks for doing that! I left some suggestions to possibly further improve the code. Thanks for the comments/suggestions. Updated change should take care of all of them. > src/hotspot/share/opto/castnode.hpp line 101: > >> 99: } >> 100: return NonFloatingNonNarrowing; >> 101: } > > Just a side note: We seem to mix the terms "(non-)pinned" with "(non-)floating" freely. Should we stick to just one? But maybe it's justified to use both depending on the situation/code context. The patch as it is now adds some extra uses of "pinned" and "floating". What could make sense, I suppose, would be to try to use "floating"/"non floating" instead but there are so many uses of "pinned" in the code base already, and I don't see us getting rid of them, that I wonder if it would make a difference. So, I'm not too sure what to do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3585614507 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2568439115 From mchevalier at openjdk.org Thu Nov 27 12:35:24 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Nov 2025 12:35:24 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v5] In-Reply-To: References: Message-ID: <-lKe3DkagAXc0krDm4tig5ohwSPYu-as9XtpqEDCayM=.ad81edb8-eb78-4579-b626-a78e18e3f69d@github.com> > This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. > > The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. > > # Analysis > ## Obervationally > ### IGVN > During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: > > in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) > in(2): null > > We compute the join (HS' meet): > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 > > t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > But the current `_type` (of the `PhiNode` as a `TypeNode`) is > > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > > We filter `t` by `_type` > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 > and we get > > ft=java/lang/Object * > > which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 > and > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 > > > ### Verification > On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time > > _type=java/lang/Object * > > and so after filtering `t` by (new) `_type` and we get > > ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. > > ## But why?! > ### Details on type computation > In short, we are doing > > t = typeof(in(1)) / typeof(in(2)) > ft = t /\ _type (* IGVN *) > ft' = t /\ ft (* Verification *) > > and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". > > To me, the surprising fact was that the intersection > > java/lang/Object * (... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28331/files - new: https://git.openjdk.org/jdk/pull/28331/files/7a092dac..1c28403a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=03-04 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28331/head:pull/28331 PR: https://git.openjdk.org/jdk/pull/28331 From mchevalier at openjdk.org Thu Nov 27 12:37:56 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Nov 2025 12:37:56 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v4] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 12:09:30 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - test >> - Merge branch 'master' into JDK-8371716 >> - More test >> - IgnoreUnrecognizedVMOptions >> - Fix bug number >> - Filter twice > > src/hotspot/share/opto/cfgnode.cpp line 1407: > >> 1405: assert(false, "computed type would not pass verification"); >> 1406: } >> 1407: #endif > > You could consider moving this to a seperate verification method, but up to you. > > It is also very verbose. I wonder if it really makes sense to have that much code. But again: up to you. > > You could also make your dump less verbose, and then format it directly into the assert. The benefit would be that if a fuzzer finds such a failure we would see more directly what's going on. To be fair, it's not printing as much as it looks like. This is verbose because I cannot write ss.print_cr("t: %s", t->to_string()); or something like ss.print_cr("t: %magic", pp_type, t); for ss.print("t: "); t->dump_on(&ss); ss.print_cr(""); I'm rather tempted to keep it as it is (or close), because that it the kind of information I wanted to see when I was working on that. I suspect if the assert fails and someone has to look at that again, they might find the same information useful as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2568463171 From chagedorn at openjdk.org Thu Nov 27 12:47:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Nov 2025 12:47:05 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Failing IR framework internal tests on all platforms > > Thanks, > Christian No worries, that's already very helpful, thanks for sending the file! I'll have a look at the failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3585670779 From jbhateja at openjdk.org Thu Nov 27 13:03:31 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 27 Nov 2025 13:03:31 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVXX=0 -XX:MaxVectorSize=8 Message-ID: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. It better to reject matching of VectorBlend in such a scenario. All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crash with UseAVX=0 -XX:MaxVectorSize=8 Changes: https://git.openjdk.org/jdk/pull/28533/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337791 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From jbhateja at openjdk.org Thu Nov 27 13:19:24 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 27 Nov 2025 13:19:24 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v2] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > It better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crash with UseAVX=0 -XX:MaxVectorSize=8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/70600498..6ca95427 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From roland at openjdk.org Thu Nov 27 13:28:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 27 Nov 2025 13:28:14 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 00:53:54 GMT, Vladimir Ivanov wrote: > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 I tried the patch and the test case out of curiosity but when I removed the change to `Parse::sharpen_type_after_if()`, the test still passed. I made the change below and it now fails without the change to `Parse::sharpen_type_after_if()` and passes with it. diff --git a/test/hotspot/jtreg/compiler/inlining/TestSubtypeCheckTypeInfo.java b/test/hotspot/jtreg/compiler/inlining/TestSubtypeCheckTypeInfo.java index cc292dc0900..bdb79966a2a 100644 --- a/test/hotspot/jtreg/compiler/inlining/TestSubtypeCheckTypeInfo.java +++ b/test/hotspot/jtreg/compiler/inlining/TestSubtypeCheckTypeInfo.java @@ -346,13 +346,14 @@ static boolean lateInlineIsInstanceCondPost(A o, boolean cond) { // Parse compilation log (-XX:+PrintCompilation -XX:+PrintInlining output). static boolean parseOutput(List output) { boolean result = true; - Pattern compilation = Pattern.compile("^\\d+\\s+\\d+\\s+b\\s+"); + Pattern compilation = Pattern.compile("^\\d+\\s+\\d+\\s+b\\s+.*"); StringBuilder inlineTree = new StringBuilder(); for (String line : output) { // Detect start of next compilation. if (compilation.matcher(line).matches()) { // Parse output for previous compilation. result &= validateInliningOutput(inlineTree.toString()); + inlineTree.setLength(0); } inlineTree.append(line); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/28517#issuecomment-3585853817 From epeter at openjdk.org Thu Nov 27 14:09:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Nov 2025 14:09:00 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 Message-ID: @MBaesken Reported this issue on windows: TestAliasingCheckPreLimitNotAvailable_all-flags-fixed-stress-seed.jtr and TestAliasingCheckPreLimitNotAvailable_all-flags-no-stress-seed.jtr show failures on WIndows : [0.095s][error][gc] Failed to lookup symbol: VirtualAlloc2 Error occurred during initialization of VM ZGC requires Windows version 1803 or later AIX fails too : Error occurred during initialization of VM Option -XX:+UseZGC not supported I learned a small lesson here: `@requires vm.gc.Z` is much smarter than checking that no other GC is set, or ZGC is set. It also checks if ZGC is available, which is not always the case, e.g. on the reported Windows machne. @MBaesken Can you please confirm that this fixes the test for you? ------------- Commit messages: - JDK-8372685 Changes: https://git.openjdk.org/jdk/pull/28537/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28537&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372685 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28537.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28537/head:pull/28537 PR: https://git.openjdk.org/jdk/pull/28537 From qamai at openjdk.org Thu Nov 27 14:10:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 27 Nov 2025 14:10:55 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 00:53:54 GMT, Vladimir Ivanov wrote: > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 src/hotspot/share/opto/parse2.cpp line 1739: > 1737: } > 1738: > 1739: // Match an instanceof check. We seem to require that the input of `SubTypeCheck` is not `null`. What do you think about allowing `SubTypeCheck` to accept `null` and return `false`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2568870697 From mhaessig at openjdk.org Thu Nov 27 14:17:57 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Nov 2025 14:17:57 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 16:02:20 GMT, Emanuel Peter wrote: > **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. > > **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. > > **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. Thank you for fixing this, @eme64! The optimization mechanism you are replacing had a mechanism to detect when too many passes had been done. Now there is no such mechanism. How can you detect a condition where we take way too long to reach a fixpoint apart from a timeout? What do implementations of `VTransform::optimize` have to guarantee to ensure that we reach a fixpoint? ------------- PR Review: https://git.openjdk.org/jdk/pull/28512#pullrequestreview-3515641101 From chagedorn at openjdk.org Thu Nov 27 14:19:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Nov 2025 14:19:13 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 [v2] In-Reply-To: References: Message-ID: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Failing IR framework internal tests on all platforms > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Fix wrong regex ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28495/files - new: https://git.openjdk.org/jdk/pull/28495/files/b7fcd4f1..a8f26f29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28495&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28495&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28495.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28495/head:pull/28495 PR: https://git.openjdk.org/jdk/pull/28495 From chagedorn at openjdk.org Thu Nov 27 14:25:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Nov 2025 14:25:53 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 13:53:12 GMT, Emanuel Peter wrote: > @MBaesken Reported this issue on windows: > > TestAliasingCheckPreLimitNotAvailable_all-flags-fixed-stress-seed.jtr and TestAliasingCheckPreLimitNotAvailable_all-flags-no-stress-seed.jtr show failures on WIndows : > > [0.095s][error][gc] Failed to lookup symbol: VirtualAlloc2 > Error occurred during initialization of VM > ZGC requires Windows version 1803 or later > > AIX fails too : > Error occurred during initialization of VM > Option -XX:+UseZGC not supported > > > I learned a small lesson here: `@requires vm.gc.Z` is much smarter than checking that no other GC is set, or ZGC is set. It also checks if ZGC is available, which is not always the case, e.g. on the reported Windows machne. > > @MBaesken Can you please confirm that this fixes the test for you? That was hard to spot, looks good and trivial but let's wait for @MBaesken to confirm. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28537#pullrequestreview-3515692306 From duke at openjdk.org Thu Nov 27 14:59:51 2025 From: duke at openjdk.org (ExE Boss) Date: Thu, 27 Nov 2025 14:59:51 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks In-Reply-To: References: Message-ID: <82Ddhg3yXemMeyKmZUCWZIPUVOTkdCbXiOcl8LO_Su0=.47680bc7-526d-4c15-9b84-dd9c7d27728d@github.com> On Thu, 27 Nov 2025 01:17:15 GMT, Vladimir Ivanov wrote: >> test/hotspot/jtreg/compiler/inlining/TestSubtypeCheckTypeInfo.java line 323: >> >>> 321: static boolean lateInlineIsInstanceCondPost(A o, boolean cond) { >>> 322: return B.class.isInstance(o) && cond; >>> 323: } >> >> What?about the?non?late?version of?these?methods? > > There are corresponding test cases (`testInstanceOfCondPre` et al) where conditions are embedded. > > The idea of `testInstanceOfCondLate` and similar test cases is to check how inlining works when condition improves receiver type during incremental inlining phase. What?I?meant was?where the?`instanceof` is?in the?called?method, the `testInstanceOfCondPre` all?have the?`instanceof`?checks as?part of?the?`if`?statement. -------------------------------------------------------------------------------- Something?like: static void testInstanceOfCondDefaultInlinePre(A a, boolean cond) { if (defaultInlineInstanceOfCondPre(a, cond)) { a.m(); } } static void testInstanceOfCondDefaultInlinePost(A a, boolean cond) { if (defaultInlineInstanceOfCondPost(a, cond)) { a.m(); } } static void testIsInstanceCondDefaultInlinePre(A a, boolean cond) { if (defaultInlineIsInstanceCondPre(a, cond)) { a.m(); } } static void testIsInstanceCondDefaultInlinePost(A a, boolean cond) { if (defaultInlineIsInstanceCondPost(a, cond)) { a.m(); } } -------------------------------------------------------------------------------- I?suggest adding?such a?test because?of real?world?code which?use?different internal?implementation classes but?expose their?public?API as?only a?single common?supertype, like?`java.lang.constant.ClassDesc` and?its?`isPrimitive()`/`isArray()`/`isClassOrInterface()` methods (which?currently don?t do?the?`instanceof`?check, but?they probably?should so?that they?can be?reliably?inlined). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2569108988 From mbaesken at openjdk.org Thu Nov 27 15:23:45 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Nov 2025 15:23:45 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 13:53:12 GMT, Emanuel Peter wrote: > ZGC requires Windows version 1803 or later Maybe this can be improved to something like 'ZGC requires Windows 10 version 1803, Windows 11 or Windows server 2019 or later' . In fact, not all Windows server 2016 have VirtualAlloc2 (see also this discussion https://github.com/microsoft/ebpf-for-windows/issues/704 ) . Our test machine with Windows server 2016 generated the error above . ------------- PR Comment: https://git.openjdk.org/jdk/pull/28537#issuecomment-3586425251 From duke at openjdk.org Thu Nov 27 15:32:00 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 27 Nov 2025 15:32:00 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 10:01:04 GMT, Roland Westrelin wrote: >> Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test failed > > src/hotspot/share/opto/graphKit.cpp line 1703: > >> 1701: BasicType bt, >> 1702: DecoratorSet decorators) { >> 1703: C2AccessValuePtr addr(adr, adr_type); > > `adr_type` no longer used in this and next methods. let's remove all unused adt_type in GraphKit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2569258915 From jbhateja at openjdk.org Thu Nov 27 15:32:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 27 Nov 2025 15:32:15 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v3] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Fine tune matcher check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/6ca95427..2c08c7db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From jbhateja at openjdk.org Thu Nov 27 15:32:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 27 Nov 2025 15:32:15 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v3] In-Reply-To: <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> Message-ID: On Thu, 27 Nov 2025 15:29:05 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Fine tune matcher check @sviswa7, @eme64 , kindly check and approve. ------------- PR Review: https://git.openjdk.org/jdk/pull/28533#pullrequestreview-3516043441 From jbhateja at openjdk.org Thu Nov 27 15:32:18 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 27 Nov 2025 15:32:18 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v2] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Thu, 27 Nov 2025 13:19:24 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crash with UseAVX=0 -XX:MaxVectorSize=8 src/hotspot/cpu/x86/x86.ad line 3324: > 3322: break; > 3323: case Op_VectorBlend: > 3324: if (UseAVX == 0 && MaxVectorSize < 16) { While its technically fesable to create a 64-bit wide byte vector blend for AVX level 0 using blendvp selection pattern and emit pblendv instruction for it, but the machine nodes inputs will not be iso-morphic, ie. MachTemp node for xmm0 will be 128 bit wide TypeVect::VECTX and other input will be of type 64bit TypeVect::VECTD type. Given that blend is a lanewise operation hence IR type must be compatible with input types. Its better to use size_in_bits rather than MaxVectorSize in guard check. We could have pushed this check to actual pattern predicate but its better to uplift the checks to match_rule_supported_vector since it prevents creation non-matchable IR during construction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2569238618 From jbhateja at openjdk.org Thu Nov 27 15:33:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 27 Nov 2025 15:33:07 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v22] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 14:08:09 GMT, Daniel Lund?n wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > Rerunning tests and will re-approve when finished. Latest changes look good, here are a few nits (only comment and style changes): https://github.com/openjdk/jdk/commit/e33416a7d8b9076fdd40a22914d8bb163c9b9600 > > Also, thanks for your patience! Hi @dlunde , waiting for your test clearance. thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3586460979 From epeter at openjdk.org Thu Nov 27 15:42:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Nov 2025 15:42:25 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v2] In-Reply-To: References: Message-ID: > **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. > > **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. > > **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: limit steps of optimize, for Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28512/files - new: https://git.openjdk.org/jdk/pull/28512/files/f5219b12..9f5bf837 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28512&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28512&range=00-01 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28512/head:pull/28512 PR: https://git.openjdk.org/jdk/pull/28512 From epeter at openjdk.org Thu Nov 27 15:42:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Nov 2025 15:42:27 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v2] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:15:16 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> limit steps of optimize, for Manuel > > Thank you for fixing this, @eme64! > > The optimization mechanism you are replacing had a mechanism to detect when too many passes had been done. Now there is no such mechanism. How can you detect a condition where we take way too long to reach a fixpoint apart from a timeout? > > What do implementations of `VTransform::optimize` have to guarantee to ensure that we reach a fixpoint? @mhaessig Good point! I had intended to do that, but then somehow forgot ? I now just limit it to `100 * number_of_initial_nodes`. Maybe that won't be good enough forever, but for now it should be good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28512#issuecomment-3586493314 From epeter at openjdk.org Thu Nov 27 15:46:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Nov 2025 15:46:46 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 15:21:22 GMT, Matthias Baesken wrote: >> @MBaesken Reported this issue on windows: >> >> TestAliasingCheckPreLimitNotAvailable_all-flags-fixed-stress-seed.jtr and TestAliasingCheckPreLimitNotAvailable_all-flags-no-stress-seed.jtr show failures on WIndows : >> >> [0.095s][error][gc] Failed to lookup symbol: VirtualAlloc2 >> Error occurred during initialization of VM >> ZGC requires Windows version 1803 or later >> >> AIX fails too : >> Error occurred during initialization of VM >> Option -XX:+UseZGC not supported >> >> >> I learned a small lesson here: `@requires vm.gc.Z` is much smarter than checking that no other GC is set, or ZGC is set. It also checks if ZGC is available, which is not always the case, e.g. on the reported Windows machne. >> >> @MBaesken Can you please confirm that this fixes the test for you? > >> ZGC requires Windows version 1803 or later > > Maybe this can be improved to something like 'ZGC requires Windows 10 version 1803, Windows 11 or Windows server 2019 or later' . > In fact, not all Windows server 2016 have VirtualAlloc2 (see also this discussion https://github.com/microsoft/ebpf-for-windows/issues/704 ) . > Our test machine with Windows server 2016 generated the error above . > > Unfortunately MS claims here > https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc2 > Windows?10 [desktop apps only] / Windows Server?2016 [desktop apps only] > without stating any OS versions. @MBaesken Thanks for the additional information! I leave it up to you if you want to file an RFE for the error message. I don't have the expertise on Windows nor on GC. Can you confirm that this change fixes your issue though? Because I could not reproduce the issue on my machine, so I'm relying on you here ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28537#issuecomment-3586520805 From epeter at openjdk.org Thu Nov 27 15:55:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Nov 2025 15:55:49 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v3] In-Reply-To: <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> Message-ID: On Thu, 27 Nov 2025 15:32:15 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Fine tune matcher check @jatin-bhateja Thanks for looking into this! I think it is fine to just disable the small vector size for AVX=0. Those platforms are quite rare now anyway. Should we also adjust the predicate in the matching rule here? 22509 instruct blendvp(vec dst, vec src, vec mask, rxmm0 tmp) %{ 22510 predicate(UseAVX == 0); 22511 match(Set dst (VectorBlend (Binary dst src) mask)); 22512 format %{ "vector_blend $dst,$src,$mask\t! using $tmp as TEMP" %} 22513 effect(TEMP tmp); 22514 ins_encode %{ 22515 assert(UseSSE >= 4, "required"); 22516 22517 if ($mask$$XMMRegister != $tmp$$XMMRegister) { 22518 __ movdqu($tmp$$XMMRegister, $mask$$XMMRegister); 22519 } 22520 __ pblendvb($dst$$XMMRegister, $src$$XMMRegister); // uses xmm0 as mask 22521 %} 22522 ins_pipe( pipe_slow ); 22523 %} ------------- PR Review: https://git.openjdk.org/jdk/pull/28533#pullrequestreview-3516189610 From mdoerr at openjdk.org Thu Nov 27 15:55:53 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Nov 2025 15:55:53 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: <9CGZeCADEds8B60aZZxkUj9GWIfvQAmQ9lN8E_ft4uo=.9923fd74-e17e-453d-9f83-e2367ae96ca9@github.com> References: <9CGZeCADEds8B60aZZxkUj9GWIfvQAmQ9lN8E_ft4uo=.9923fd74-e17e-453d-9f83-e2367ae96ca9@github.com> Message-ID: On Thu, 27 Nov 2025 08:55:28 GMT, Christian Hagedorn wrote: > * TestIRMatching.java > * TestPhaseIRMatching.java > * IRExample.java Thanks for the ping! The 3 tests have passed with your latest PR version on linux ppc64le. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3586556012 From duke at openjdk.org Thu Nov 27 16:01:57 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 27 Nov 2025 16:01:57 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 09:59:31 GMT, Roland Westrelin wrote: >> Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test failed > > src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 720: > >> 718: if (ShenandoahCardBarrier) { >> 719: post_barrier(kit, kit->control(), access.raw_access(), access.base(), >> 720: access.addr(), access.alias_idx(), new_val, T_OBJECT, true); > > `access.alias_idx()` should be `C->get_alias_index(kit.gvn().type(access.addr()))` > > So I think we want to remove `uint _alias_idx;` from `C2AtomicParseAccess` as well. This could be done as a follow up if you think this change has already gotten too complicated. I think we can create another task focus on remove `alias_idx` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2569382908 From jbhateja at openjdk.org Thu Nov 27 16:07:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 27 Nov 2025 16:07:48 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v3] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> Message-ID: On Thu, 27 Nov 2025 15:27:54 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Fine tune matcher check > > @sviswa7, @eme64 , kindly check and approve. > @jatin-bhateja Thanks for looking into this! > > I think it is fine to just disable the small vector size for AVX=0. Those platforms are quite rare now anyway. > > Should we also adjust the predicate in the matching rule here? > As mentioned above https://github.com/openjdk/jdk/pull/28533#discussion_r2569238618 Its better to lift predicates to match rule supported vector i.e. AVX == 0 and size_in_bits < 128. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3586601660 From kxu at openjdk.org Thu Nov 27 16:14:44 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 27 Nov 2025 16:14:44 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v24] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/392a010d..d8baf0fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=22-23 Stats: 24 lines in 2 files changed: 2 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From epeter at openjdk.org Thu Nov 27 16:15:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Nov 2025 16:15:48 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v3] In-Reply-To: <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> Message-ID: On Thu, 27 Nov 2025 15:32:15 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Fine tune matcher check Ok, that's fine with me too. It would be nice if you could also attach a regression test, or maybe add an additional run to the existing test, with the required flags for reproducing this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3586626636 From mhaessig at openjdk.org Thu Nov 27 16:22:48 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Nov 2025 16:22:48 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v2] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 15:42:25 GMT, Emanuel Peter wrote: >> **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. >> >> **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. >> >> **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > limit steps of optimize, for Manuel Thank you for addressing my comment. It would be good to get another testing run with the limit. Otherwise, this looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28512#pullrequestreview-3516292071 From liach at openjdk.org Thu Nov 27 16:27:52 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 27 Nov 2025 16:27:52 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 16:16:05 GMT, Chen Liang wrote: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. This uses another one of the 16-bit instanceKlassFlags, which requires runtime engineers to agree. Need compiler review to check if such IR tests are the best way to ensure constant folding for core library classes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3586679853 From mli at openjdk.org Thu Nov 27 16:33:56 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 27 Nov 2025 16:33:56 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 [v2] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:19:13 GMT, Christian Hagedorn wrote: >> [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: >> >> - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". >> - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. >> - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: >> https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 >> I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. >> >> #### Testing >> - [X] Tier1 >> - [X] Tier5 with IR framework internal tests only >> - [ ] Failing IR framework internal tests on all platforms >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix wrong regex Latest PR passed! Nice fix! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3586705672 From jvernee at openjdk.org Thu Nov 27 16:50:48 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 27 Nov 2025 16:50:48 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: <0f6w-e-F6PVzyBNmFsu37oNVKgKSxNwQMfA1Y2GC46c=.d196d665-deeb-432c-b089-a4f5494b44ec@github.com> On Thu, 27 Nov 2025 16:16:05 GMT, Chen Liang wrote: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. ------------- Marked as reviewed by jvernee (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28540#pullrequestreview-3516377492 From shade at openjdk.org Thu Nov 27 17:28:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Nov 2025 17:28:18 GMT Subject: RFR: 8372188: AArch64: Generate atomic match rules from M4 stencils Message-ID: Current atomic match rules are all over the place in AArch64: - CAE and weak CAS rules are generated with the help of `cas.m4`, and then are supposed to be copy-pasted (?) into `aarch64.ad`. I did it about 20 times when fixing [JDK-8372154](https://bugs.openjdk.org/browse/JDK-8372154), gets tedious very quickly. - Strong CAS and get-and-set rules are still in the same section of `aarch64.ad`, and are written by hand. Yet, those can be automatically generated from M4 stencils as well. This PR cleans that up by moving all these rules into a separate `.ad` file, which one can cleanly re-generate by invoking `m4 aarch64_atomic_ad.m4 > aarch64_atomic.ad`. The meat of the change is `aarch64_atomic.m4`, everything else is either generated from it, or removed in favor of auto-generated code. There should be no semantic change, as I attempted to move the rules mostly verbatim, only changing non-semantic stuff like match rule names and some formats. Testing: - [x] Eyeballing match rules before/after - [x] Linux AArch64 server fastdebug, `hotspot_compiler` - [x] Linux AArch64 server fastdebug, `tier1` - [ ] Linux AArch64 server fastdebug, `all` - [ ] Linux AArch64 server fastdebug, jcstress run ------------- Commit messages: - More Hotspot tests do not like OptoAssembly changes - Some tests want specific formats + moves - Minor stencil touchup - Missed L_Acq variants - No more atomics in main AD - Fix Changes: https://git.openjdk.org/jdk/pull/28538/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28538&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372188 Stats: 2349 lines in 5 files changed: 1156 ins; 1193 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28538/head:pull/28538 PR: https://git.openjdk.org/jdk/pull/28538 From shade at openjdk.org Thu Nov 27 17:28:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Nov 2025 17:28:18 GMT Subject: RFR: 8372188: AArch64: Generate atomic match rules from M4 stencils In-Reply-To: References: Message-ID: <2aGnPLAQsGhPyB6N1QwxvoI2tuzRcZhg93DH-BXCNmk=.ddb99506-85da-45c0-837e-798a5101cafe@github.com> On Thu, 27 Nov 2025 14:54:06 GMT, Aleksey Shipilev wrote: > Current atomic match rules are all over the place in AArch64: > - CAE and weak CAS rules are generated with the help of `cas.m4`, and then are supposed to be copy-pasted (?) into `aarch64.ad`. I did it about 20 times when fixing [JDK-8372154](https://bugs.openjdk.org/browse/JDK-8372154), gets tedious very quickly. > - Strong CAS and get-and-set rules are still in the same section of `aarch64.ad`, and are written by hand. Yet, those can be automatically generated from M4 stencils as well. > > This PR cleans that up by moving all these rules into a separate `.ad` file, which one can cleanly re-generate by invoking `m4 aarch64_atomic_ad.m4 > aarch64_atomic.ad`. The meat of the change is `aarch64_atomic.m4`, everything else is either generated from it, or removed in favor of auto-generated code. There should be no semantic change, as I attempted to move the rules mostly verbatim, only changing non-semantic stuff like match rule names and some formats. > > Testing: > - [x] Eyeballing match rules before/after > - [x] Linux AArch64 server fastdebug, `hotspot_compiler` > - [x] Linux AArch64 server fastdebug, `tier1` > - [ ] Linux AArch64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, jcstress run Sanity checking with: $ git diff master | grep ^- | grep instruct | grep "%{" | grep -v \$ | sort | nl ... $ git diff master | grep ^+ | grep instruct | grep "%{" | grep -v \$ | sort | nl ... 60 match rules out, 60 match rules in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28538#issuecomment-3586352646 From aph at openjdk.org Thu Nov 27 17:41:47 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 27 Nov 2025 17:41:47 GMT Subject: RFR: 8372188: AArch64: Generate atomic match rules from M4 stencils In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:54:06 GMT, Aleksey Shipilev wrote: > Current atomic match rules are all over the place in AArch64: > - CAE and weak CAS rules are generated with the help of `cas.m4`, and then are supposed to be copy-pasted (?) into `aarch64.ad`. I did it about 20 times when fixing [JDK-8372154](https://bugs.openjdk.org/browse/JDK-8372154), gets tedious very quickly. > - Strong CAS and get-and-set rules are still in the same section of `aarch64.ad`, and are written by hand. Yet, those can be automatically generated from M4 stencils as well. > > This PR cleans that up by moving all these rules into a separate `.ad` file, which one can cleanly re-generate by invoking `m4 aarch64_atomic_ad.m4 > aarch64_atomic.ad`. The meat of the change is `aarch64_atomic.m4`, everything else is either generated from it, or removed in favor of auto-generated code. There should be no semantic change, as I attempted to move the rules mostly verbatim, only changing non-semantic stuff like match rule names and some formats. > > Testing: > - [x] Eyeballing match rules before/after > - [x] Linux AArch64 server fastdebug, `hotspot_compiler` > - [x] Linux AArch64 server fastdebug, `tier1` > - [ ] Linux AArch64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, jcstress run Thanks, that's a very nice cleanup. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28538#pullrequestreview-3516508246 From krk at openjdk.org Thu Nov 27 17:51:11 2025 From: krk at openjdk.org (Kerem Kat) Date: Thu, 27 Nov 2025 17:51:11 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v3] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - rename test file - Merge branch 'master' into fix-c2-segfault-unlocknode - fix test spacing - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - copyright format fix? - 8370502: C2: segfault while adding node to IGVN worklist ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/e8699d79..d0971b5b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=01-02 Stats: 42917 lines in 619 files changed: 29935 ins; 9637 del; 3345 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From krk at openjdk.org Thu Nov 27 17:51:12 2025 From: krk at openjdk.org (Kerem Kat) Date: Thu, 27 Nov 2025 17:51:12 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v3] In-Reply-To: <4X6V2ONReceYNZ7I_zUlK7VxZflXz0vJzhuUWnZEGoM=.3a5a263b-4c0a-4c20-8f65-52e7f09bb03f@github.com> References: <4X6V2ONReceYNZ7I_zUlK7VxZflXz0vJzhuUWnZEGoM=.3a5a263b-4c0a-4c20-8f65-52e7f09bb03f@github.com> Message-ID: On Thu, 20 Nov 2025 16:30:10 GMT, Manuel H?ssig wrote: >> Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - rename test file >> - Merge branch 'master' into fix-c2-segfault-unlocknode >> - fix test spacing >> - Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Manuel H?ssig >> - Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Manuel H?ssig >> - copyright format fix? >> - 8370502: C2: segfault while adding node to IGVN worklist > > Thank you for working on this, @krk. And nice job reducing the test further! > > I have a few questions and style comments below. Thank you for the comments. To answer three remaining comments from @mhaessig, @dean-long and @TobiHartmann about the same subject: `CallNode::extract_projections` sets `fallthrough_memproj` to `null` when the node has `outcnt == 0`. It does not assert on this because `do_asserts = false` is passed. Regarding `mem` vs. `fallthrough_memproj`, replacing `fallthrough_memproj` with `mem` appears to change the semantics from memory outputs to memory inputs. Nothing after unlock (exit from the synchronized block) uses memory, as `a[i] =0` would throw (`a` is a zero-length array), making the rest of the `test` method unreachable. This raises the question: why is `synchronized` reachable at all, as it is also after `a[i] = 0`. About the invariant, I have found mixed null checks of `fallthrough_memproj`: ## Analysis of all reads of `fallthrough_memproj` ### Does null-check * `PhaseMacroExpand::expand_unlock_node` checks now with the current PR. * `ArrayCopyNode::finish_transform` * `ShenandoahBarrierC2Support::find_bottom_mem` * `GraphKit::replace_call` * `PhaseMacroExpand::expand_allocate_common` * `PhaseMacroExpand::yank_alloc_node` * `PhaseMacroExpand::eliminate_locking_node` * `StringConcat::eliminate_call` #### (checks equality with a non-null `Node*`) * `MemoryGraphFixer::get_ctrl` * `CallGenerator::do_late_inline_helper` ### No check, but `do_asserts = true` * `PhaseMacroExpand::process_users_of_allocation` asserts for `callprojs`, checks for null for `_callprojs`. ### No check * `PhaseMacroExpand::expand_lock_node` * `PhaseMacroExpand::generate_arraycopy` * `PhaseMacroExpand::generate_slow_arraycopy` * `PhaseMacroExpand::expand_arraycopy_node` ## To summarize 1. If `do_asserts = false` is passed to `CallNode::extract_projections`, it means `fallthrough_memproj` (and some others) may be `null`. After this, `fallthrough_memproj` should be null-checked. Currently, only code that needs null-checking is in `PhaseMacroExpand` class. 2. Why is `synchronized` reachable in the repro at all, as it is also after `a[i] = 0`. Based on this analysis, I propose creating new issues for these two points and keep the current PR focused on `PhaseMacroExpand::expand_unlock_node`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3586912195 From alanb at openjdk.org Thu Nov 27 18:50:47 2025 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 27 Nov 2025 18:50:47 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 16:16:05 GMT, Chen Liang wrote: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. src/hotspot/share/ci/ciField.cpp line 220: > 218: return false; > 219: // Explicit opt-in from system classes > 220: if (holder->trust_final_fields()) This is definitely nicer than listing specific classes. It would be nicer again once we can make this exceptions go away. src/java.base/share/classes/jdk/internal/vm/annotation/TrustFinalFields.java line 61: > 59: /// fields in classes specified by this annotation. > 60: /// > 61: /// This annotation is only recognized on privileged code and is ignored elsewhere. "privileged code" hints of protection domains, permissions or security manager. Some of the annotations are limited to classes defined by the boot loader, is it the case here too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2569767299 PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2569764340 From liach at openjdk.org Thu Nov 27 19:01:52 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 27 Nov 2025 19:01:52 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: <1tazzYHm78XLDovV11RAQt2W-ujENi4b_frOa87Jv14=.45b6d8a1-cb76-49ac-8048-429916bc9c6c@github.com> On Thu, 27 Nov 2025 18:45:29 GMT, Alan Bateman wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > src/java.base/share/classes/jdk/internal/vm/annotation/TrustFinalFields.java line 61: > >> 59: /// fields in classes specified by this annotation. >> 60: /// >> 61: /// This annotation is only recognized on privileged code and is ignored elsewhere. > > "privileged code" hints of protection domains, permissions or security manager. Some of the annotations are limited to classes defined by the boot loader, is it the case here too? I took this sentence from `@AOTSafeClassInitializer`. The term "privileged" comes from this variable in `classFileParser.cpp`: https://github.com/openjdk/jdk/blob/d94c52ccf2fed3fc66d25a34254c9b581c175fa1/src/hotspot/share/classfile/classFileParser.cpp#L1818-L1820 The other annotations have this note, which seems incorrect from the hotspot excerpt: @implNote This annotation only takes effect for fields of classes loaded by the boot loader. Annotations on fields of classes loaded outside of the boot loader are ignored. This behavior seems to be originally changed by 6964a690ed9a23d4c0692da2dfbced46e1436355, referring to an inaccessible issue. What should I do with this? Should I leave this as-is and create a separate patch to update this comment for vm.annotation annotations, or fix this first and have the separate patch fix other annotations later? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2569787223 From kxu at openjdk.org Thu Nov 27 19:04:15 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 27 Nov 2025 19:04:15 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v25] In-Reply-To: References: Message-ID: <2aZW11q1WcJ9Yfa1vwjbS7k2F0RZmHARLinFbx8eYA8=.7f1b3542-8847-481d-a455-57f096328ed6@github.com> > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 46 commits: - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - additional suggestions from code review - Apply suggestions from code review Co-authored-by: Christian Hagedorn - fix trip counter loop-variant detection - fix bad merge with ctrl_is_member() - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp - Merge branch 'master' into counted-loop-refactor - add missed minor changes - fix bad merge - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp - ... and 36 more: https://git.openjdk.org/jdk/compare/8a0672c8...4ab9a0e5 ------------- Changes: https://git.openjdk.org/jdk/pull/24458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=24 Stats: 1228 lines in 3 files changed: 626 ins; 295 del; 307 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From kxu at openjdk.org Thu Nov 27 19:04:16 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 27 Nov 2025 19:04:16 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v23] In-Reply-To: References: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> Message-ID: <1WjkBLNBasWBvCZCGZAcnHxJGBh4OQ9j-ILlJOxlE4o=.c909e314-ca0a-42ff-b874-282d9d1efb3c@github.com> On Tue, 25 Nov 2025 14:08:26 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix trip counter loop-variant detection > > src/hotspot/share/opto/loopnode.cpp line 1832: > >> 1830: while (xphi->Opcode() == Op_Cast(_iv_bt)) { >> 1831: xphi = xphi->in(1); >> 1832: } > > I'm wondering if this should be part of the `xphi` computation in `LoopIVStride`. Or in other words: Do the other use-sites of `xphi()` do not need this uncast logic? Maybe @rwestrel knows more. Iterative uncasting was introduced in #25539, and I think that pr missed updating what's now `PhaseIdealLoop::check_counted_loop_shape()`, the only other caller of `xphi()`. `check_counted_loop_shape()` asserts after a loop nest is created. However, [new test cases in #25539](https://github.com/openjdk/jdk/blob/8a0672c819e09a16c30fbdf58dc2b81f50958da4/test/hotspot/jtreg/compiler/loopopts/TestCountedLoopCastIV.java) does not create loop nests (if I understand correctly), and they didn't trigger this assertion which would otherwise fail. It's safe to add this to `xphi()`. I'm not sure if created loop nests would contain this multiple-cast pattern at all, but it shouldn't break anything even they don't. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2569790528 From liach at openjdk.org Thu Nov 27 19:11:47 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 27 Nov 2025 19:11:47 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 18:47:15 GMT, Alan Bateman wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > src/hotspot/share/ci/ciField.cpp line 220: > >> 218: return false; >> 219: // Explicit opt-in from system classes >> 220: if (holder->trust_final_fields()) > > This is definitely nicer than listing specific classes. It would be nicer again once we can make this exceptions go away. True, this occupies one of the 16 precious instance klass bits in runtime. I wish we can derive this from our final means final restrictions, but their setup is to permit use-sites to migrate more easily, and is harder for declaration sites to deduce if a declaration is easier to be permitted. We can consider blanket-trust when the JVM uses `--illegal-final-field-mutation=deny` without additional `--enable-final-field-mutation`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2569800720 From krk at openjdk.org Thu Nov 27 19:14:05 2025 From: krk at openjdk.org (Kerem Kat) Date: Thu, 27 Nov 2025 19:14:05 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v4] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: fix rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/d0971b5b..a0f0ecb9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From kxu at openjdk.org Thu Nov 27 19:16:27 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 27 Nov 2025 19:16:27 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v26] In-Reply-To: References: Message-ID: <0O-2k4zYIUFPW8UOi8dMl23NGPKXRTmKKi0rTd8azw8=.36e9911f-e81c-4ee6-808a-3ac2c323b323@github.com> > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: remove trailing whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/4ab9a0e5..6b5254f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From kxu at openjdk.org Thu Nov 27 19:16:27 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 27 Nov 2025 19:16:27 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v23] In-Reply-To: References: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> Message-ID: On Tue, 25 Nov 2025 14:22:59 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix trip counter loop-variant detection > > src/hotspot/share/opto/loopnode.cpp line 2035: > >> 2033: >> 2034: // Check trip counter will end up higher than the limit >> 2035: const TypeInteger* limit_t = igvn->type(_structure.limit())->is_integer(_iv_bt); > > Looks like this could now be moved into the only use in `is_infinite_loop()` directly, so you do not need to pass it into as argument. But I see that you reuse it again later in this method. I would have probably still moved it inside `is_infinite_loop()` and re-fetched it further down again. But I leave it up to you to decide :-) moved to `is_infinite_loop()` > src/hotspot/share/opto/loopnode.cpp line 2252: > >> 2250: // again and can skip the predicate. >> 2251: >> 2252: int sov = check_stride_overflow(_structure.final_limit_correction(), limit_t, _iv_bt); > > I suggest to rename it to `stride_overflow_state` or something like that since `sov` is a rather non-intuitive abbreviation. > > The best thing is probably to turn this into a proper enum since the states -1, 0, and 1 are not that easy to comprehend. I leave it up to you if you also want to do this in this PR - minor detail. Create enum `StrideOverflowState.Overflow`, `.NoOverflow`, and `.RequireLimitCheck` > src/hotspot/share/opto/loopnode.cpp line 2593: > >> 2591: >> 2592: // Replace the old IfNode with a new LoopEndNode >> 2593: Node* lex = igvn->register_new_node_with_optimizer(BaseCountedLoopEndNode::make(iff->in(0), > > It's somewhat difficult to follow the logic with the different abbreviations, some referring to the old loop exit and some to the newly created one. Maybe you can improve the naming here by making it more clear what belongs to what. But we could also do that separately at some point since it was like that before and the refactoring has already become quite large :-) Renamed to `new_cmp`, `new_test`, `loop_end` and `loop_end_exit`. Added coresponding comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2569804636 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2569806048 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2569807660 From kxu at openjdk.org Thu Nov 27 21:08:56 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 27 Nov 2025 21:08:56 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v23] In-Reply-To: References: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> Message-ID: <8kNvPKU3I3PdOKtInEoHzV-i8T6-IETIBup-bxcr7_c=.91cc1d46-6d49-4fb7-9302-55597b7ae428@github.com> On Fri, 21 Nov 2025 16:34:03 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix trip counter loop-variant detection > > Was too busy this week, will try to come back to this next week! @chhagedorn Thank you reviewing. I'm glad to hear I'm making progress. Please see [my pervious comment](https://github.com/openjdk/jdk/pull/24458#discussion_r2569790528) regarding iteratively uncasting `xphi()`. > [...] give your patch a spin in our standard testing [...] Yes please. I've addressed last few suggestions and merged in the master. > [...] run some more extended testing with your old vs. new counted loop transformation state [...] Good idea. I've updated the old vs. new code based on the latest patch on this pr. Please find it on the [`counted-loop-refactor-old-vs-new` branch](https://github.com/tabjy/jdk/commits/counted-loop-refactor-old-vs-new/). Please let me know how the testing goes. Thank you very much once again! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3587271888 From xgong at openjdk.org Fri Nov 28 01:37:17 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 28 Nov 2025 01:37:17 GMT Subject: RFR: 8286941: Add mask IR for partial vector operations for ARM SVE [v9] In-Reply-To: <-BEiqR1zXMnGYuI9KR2jaecvz2CwR3CJ5x1PDEQnX7o=.eb9ff185-5221-4308-ac38-71abbe410ef2@github.com> References: <-BEiqR1zXMnGYuI9KR2jaecvz2CwR3CJ5x1PDEQnX7o=.eb9ff185-5221-4308-ac38-71abbe410ef2@github.com> Message-ID: On Thu, 27 Nov 2025 15:57:38 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Save vect_type to ReductionNode and VectorMaskOpNode > > src/hotspot/share/opto/vectornode.cpp line 996: > >> 994: } >> 995: return LoadNode::Ideal(phase, can_reshape); >> 996: } > > @XiaohongGong Extremely late review ? > > Does this not prevent us from doing the `LoadNode::Ideal` optimizations for the cases where `vector_needs_partial_operations` returns true? > > See also: https://bugs.openjdk.org/browse/JDK-8371603 If `vector_needs_partial_operations` returns true, then the original `LoadVectorNode` is transformed to a `LoadVectorMaskedNode`, which is also a kind of `LoadVectorNode`. So in the next IGVN iteration, it will do `LoadNode::Ideal` optimizations. So you are right that we just want to avoid missing the optimizations from `LoadNode::Ideal` for any cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9037#discussion_r2570252362 From haosun at openjdk.org Fri Nov 28 02:41:54 2025 From: haosun at openjdk.org (Hao Sun) Date: Fri, 28 Nov 2025 02:41:54 GMT Subject: RFR: 8372188: AArch64: Generate atomic match rules from M4 stencils In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:54:06 GMT, Aleksey Shipilev wrote: > Current atomic match rules are all over the place in AArch64: > - CAE and weak CAS rules are generated with the help of `cas.m4`, and then are supposed to be copy-pasted (?) into `aarch64.ad`. I did it about 20 times when fixing [JDK-8372154](https://bugs.openjdk.org/browse/JDK-8372154), gets tedious very quickly. > - Strong CAS and get-and-set rules are still in the same section of `aarch64.ad`, and are written by hand. Yet, those can be automatically generated from M4 stencils as well. > > This PR cleans that up by moving all these rules into a separate `.ad` file, which one can cleanly re-generate by invoking `m4 aarch64_atomic_ad.m4 > aarch64_atomic.ad`. The meat of the change is `aarch64_atomic.m4`, everything else is either generated from it, or removed in favor of auto-generated code. There should be no semantic change, as I attempted to move the rules mostly verbatim, only changing non-semantic stuff like match rule names and some formats. > > Testing: > - [x] Eyeballing match rules before/after > - [x] Linux AArch64 server fastdebug, `hotspot_compiler` > - [x] Linux AArch64 server fastdebug, `tier1` > - [ ] Linux AArch64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, jcstress run LGTM. Thanks for your work. ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/28538#pullrequestreview-3517305576 From jiangli at openjdk.org Fri Nov 28 05:53:50 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 28 Nov 2025 05:53:50 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v9] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 07:46:36 GMT, Shawn M Emery wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @smemery's comments: >> - Add @requires >> - Shorten long lines > > src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4026: > >> 4024: //process 8 16 byte blocks at a time until all are done 'encrypt_by_8_new followed by ghash_last_8' >> 4025: __ xorl(pos, pos); >> 4026: __ cmpl(len, 128); > > Was this part of the original problem? I was trying to trace where this is called with < 128 bytes and couldn't find the path. As I documented in JDK-8371864 description, there was also a bug in AVX2 version of the intrinsic, `StubGenerator::aesgcm_avx2`. Hence the bug title mentioned both AVX512 and AVX2 intrinsics stubs. The failure can be reproduced if you run `TestGCMSplitBound.java` on a machine supports AVX2 but not AVX512 features. You would need to find a x64 machine that supports AVX2 but not AVX512 features. See [StubGenerator::generate_aes_stubs()](https://github.com/openjdk/jdk/blob/195b36f90b789b64f4a0fc867c620935d609a455/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp#L209) for how it decides which version of the stub is used. On my local machine with AVX2 support, `TestGCMSplitBound.java` fails without the fix: test result: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Failed for messageSize 100001 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2570548349 From jiangli at openjdk.org Fri Nov 28 06:01:26 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 28 Nov 2025 06:01:26 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v10] In-Reply-To: References: Message-ID: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Change to break before operators. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28363/files - new: https://git.openjdk.org/jdk/pull/28363/files/64beb969..cfa36442 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28363&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363 PR: https://git.openjdk.org/jdk/pull/28363 From jiangli at openjdk.org Fri Nov 28 06:10:50 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 28 Nov 2025 06:10:50 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v9] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 09:27:04 GMT, Niklas Keller wrote: >> test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java line 134: >> >>> 132: } catch (Exception e) { >>> 133: throw new RuntimeException("Failed for messageSize " + >>> 134: Integer.toHexString(messageSize), e); >> >> nit: `+` operator should be first and line indented >= 8 white-spaces. > > Aren't these nits something a tool should check and in the best case also fix automatically? >> nit: + operator should be first and line indented >= 8 white-spaces. > > Aren't these nits something a tool should check and in the best case also fix automatically? Changed to break before operators `+`. AFAIK, we have mixed styles in existing JDK code with operator on the new line and operator at the end of previous line for breaking long lines. +1 on the suggestion to do auto-detection and auto-fix if we want to more strictly reinforce style. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28363#discussion_r2570569111 From duke at openjdk.org Fri Nov 28 06:21:49 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 28 Nov 2025 06:21:49 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v9] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 23:09:19 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Address @smemery's comments: > - Add @requires > - Shorten long lines @jianglizhou thank you for the AVX2 related output from the unit test pre-fix. From this I was able to trace the point of failure and see that your proposed changes are good for approval. Thank you for your work on these issues! ------------- Marked as reviewed by smemery at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/28363#pullrequestreview-3517602570 From epeter at openjdk.org Fri Nov 28 06:38:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Nov 2025 06:38:14 GMT Subject: RFR: 8286941: Add mask IR for partial vector operations for ARM SVE [v9] In-Reply-To: References: <-BEiqR1zXMnGYuI9KR2jaecvz2CwR3CJ5x1PDEQnX7o=.eb9ff185-5221-4308-ac38-71abbe410ef2@github.com> Message-ID: <2lPhbPKtIgaIf15iIV0TrtWQvF0foRRelauo6OimNtQ=.535ef920-e235-49ee-96ec-970432f0990c@github.com> On Fri, 28 Nov 2025 01:33:44 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectornode.cpp line 996: >> >>> 994: } >>> 995: return LoadNode::Ideal(phase, can_reshape); >>> 996: } >> >> @XiaohongGong Extremely late review ? >> >> Does this not prevent us from doing the `LoadNode::Ideal` optimizations for the cases where `vector_needs_partial_operations` returns true? >> >> See also: https://bugs.openjdk.org/browse/JDK-8371603 > > If `vector_needs_partial_operations` returns true, then the original `LoadVectorNode` is either transformed to a `LoadVectorMaskedNode` or `nullptr`. So it seems `LoadNode::Ideal` is not called if `try_to_gen_masked_vector` returns `nullptr` and some optmizations are missing? That would be an issue. Yes, exactly. I think instead of: return VectorNode::try_to_gen_masked_vector(phase, this, vt); we should do Node* progress = VectorNode::try_to_gen_masked_vector(phase, this, vt); if (progress != nullptr) { return progress; } That should be correct, right? Of course the naming here is a bit confusing, and suggests that this may not be correct. Because `vector_needs_partial_operations` would suggest we _always_ need to do partial operations. And so then we would expect that `try_to_gen_masked_vector` would have to _always_ succeed. And so maybe that is why the reviewers did not think that we should continue with `LoadNode::Ideal` if it fails, I suppose? So I think the names should be changed to `maybe_vector_needs_partial_operations` and `transform_to_partial_vector_if_needed`, or similar. What do you think? Of course I hit an example in [JDK-8371603](https://bugs.openjdk.org/browse/JDK-8371603) where we don't step over MergeMems, which may already be an issue. But I would still like to find some other examples of missing optimizations. Let's see what I can find. Working through QEMU is a bit slow ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9037#discussion_r2570603928 From epeter at openjdk.org Fri Nov 28 07:23:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Nov 2025 07:23:13 GMT Subject: RFR: 8286941: Add mask IR for partial vector operations for ARM SVE [v9] In-Reply-To: References: <-BEiqR1zXMnGYuI9KR2jaecvz2CwR3CJ5x1PDEQnX7o=.eb9ff185-5221-4308-ac38-71abbe410ef2@github.com> Message-ID: On Fri, 28 Nov 2025 01:33:44 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectornode.cpp line 996: >> >>> 994: } >>> 995: return LoadNode::Ideal(phase, can_reshape); >>> 996: } >> >> @XiaohongGong Extremely late review ? >> >> Does this not prevent us from doing the `LoadNode::Ideal` optimizations for the cases where `vector_needs_partial_operations` returns true? >> >> See also: https://bugs.openjdk.org/browse/JDK-8371603 > > If `vector_needs_partial_operations` returns true, then the original `LoadVectorNode` is either transformed to a `LoadVectorMaskedNode` or `nullptr`. So it seems `LoadNode::Ideal` is not called if `try_to_gen_masked_vector` returns `nullptr` and some optmizations are missing? That would be an issue. @XiaohongGong Yes, I was able to find a simple reproducer. // java -Xbatch -XX:CompileCommand=compileonly,Test*::test -XX:CompileCommand=printcompilation,Test*::test -XX:+PrintIdeal TestOptimizeLoadVector.java import jdk.incubator.vector.VectorSpecies; import jdk.incubator.vector.IntVector; public class Test1 { static final VectorSpecies SPECIES = IntVector.SPECIES_256; static void test(int[] a) { // The LOAD below can be optimized away, and be replaced by the value of v1: // LoadVectorNode::Ideal calls LoadNode::Ideal, which looks at the memory // input and skips and independent stores, finding a store that matches the // exact location. And this store stores the value of v1, so we can replace // the LOAD, and just use v1 directly. Hence, the example below should have // Only a single load, and 3 stores. // HOWEVER: if we somehow exit too early in LoadVectorNode::Ideal, we may // never reach LoadNode::Ideal and miss the optimization. // This happens on aarch64 SVE with 256bits, when we return true for // Matcher::vector_needs_partial_operations, but then do nothing when calling // VectorNode::try_to_gen_masked_vector. We just return nullptr instantly, // rather than trying the other optimizations that LoadNode::Ideal has to // offer. IntVector v1 = IntVector.fromArray(SPECIES, a, 0 * SPECIES.length()); v1.intoArray(a, 1 * SPECIES.length()); // STORE of v1 v1.intoArray(a, 2 * SPECIES.length()); // independent STORE - no overlap with STORE above and LOAD below. IntVector v2 = IntVector.fromArray(SPECIES, a, 1 * SPECIES.length()); // LOAD - is it replaced with v1? v2.intoArray(a, 3 * SPECIES.length()); } public static void main(String[] args) { int[] a = new int[1000]; for (int i = 0; i < 10_000; i++) { test(a); } } } I'll see if we can do similar things for the other cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9037#discussion_r2570673883 From epeter at openjdk.org Fri Nov 28 07:23:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Nov 2025 07:23:14 GMT Subject: RFR: 8286941: Add mask IR for partial vector operations for ARM SVE [v9] In-Reply-To: References: <-BEiqR1zXMnGYuI9KR2jaecvz2CwR3CJ5x1PDEQnX7o=.eb9ff185-5221-4308-ac38-71abbe410ef2@github.com> Message-ID: On Fri, 28 Nov 2025 07:19:15 GMT, Emanuel Peter wrote: >> If `vector_needs_partial_operations` returns true, then the original `LoadVectorNode` is either transformed to a `LoadVectorMaskedNode` or `nullptr`. So it seems `LoadNode::Ideal` is not called if `try_to_gen_masked_vector` returns `nullptr` and some optmizations are missing? That would be an issue. > > @XiaohongGong Yes, I was able to find a simple reproducer. > > > // java -Xbatch -XX:CompileCommand=compileonly,Test*::test -XX:CompileCommand=printcompilation,Test*::test -XX:+PrintIdeal TestOptimizeLoadVector.java > > import jdk.incubator.vector.VectorSpecies; > import jdk.incubator.vector.IntVector; > > public class Test1 { > > static final VectorSpecies SPECIES = > IntVector.SPECIES_256; > > static void test(int[] a) { > // The LOAD below can be optimized away, and be replaced by the value of v1: > // LoadVectorNode::Ideal calls LoadNode::Ideal, which looks at the memory > // input and skips and independent stores, finding a store that matches the > // exact location. And this store stores the value of v1, so we can replace > // the LOAD, and just use v1 directly. Hence, the example below should have > // Only a single load, and 3 stores. > // HOWEVER: if we somehow exit too early in LoadVectorNode::Ideal, we may > // never reach LoadNode::Ideal and miss the optimization. > // This happens on aarch64 SVE with 256bits, when we return true for > // Matcher::vector_needs_partial_operations, but then do nothing when calling > // VectorNode::try_to_gen_masked_vector. We just return nullptr instantly, > // rather than trying the other optimizations that LoadNode::Ideal has to > // offer. > IntVector v1 = IntVector.fromArray(SPECIES, a, 0 * SPECIES.length()); > v1.intoArray(a, 1 * SPECIES.length()); // STORE of v1 > v1.intoArray(a, 2 * SPECIES.length()); // independent STORE - no overlap with STORE above and LOAD below. > IntVector v2 = IntVector.fromArray(SPECIES, a, 1 * SPECIES.length()); // LOAD - is it replaced with v1? > v2.intoArray(a, 3 * SPECIES.length()); > } > > public static void main(String[] args) { > int[] a = new int[1000]; > for (int i = 0; i < 10_000; i++) { > test(a); > } > } > } > > > I'll see if we can do similar things for the other cases. We can continue the conversation in: https://bugs.openjdk.org/browse/JDK-8371603 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9037#discussion_r2570674757 From xgong at openjdk.org Fri Nov 28 07:41:09 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 28 Nov 2025 07:41:09 GMT Subject: RFR: 8286941: Add mask IR for partial vector operations for ARM SVE [v9] In-Reply-To: References: <-BEiqR1zXMnGYuI9KR2jaecvz2CwR3CJ5x1PDEQnX7o=.eb9ff185-5221-4308-ac38-71abbe410ef2@github.com> Message-ID: On Fri, 28 Nov 2025 07:19:45 GMT, Emanuel Peter wrote: >> @XiaohongGong Yes, I was able to find a simple reproducer. >> >> >> // java -Xbatch -XX:CompileCommand=compileonly,Test*::test -XX:CompileCommand=printcompilation,Test*::test -XX:+PrintIdeal TestOptimizeLoadVector.java >> >> import jdk.incubator.vector.VectorSpecies; >> import jdk.incubator.vector.IntVector; >> >> public class Test1 { >> >> static final VectorSpecies SPECIES = >> IntVector.SPECIES_256; >> >> static void test(int[] a) { >> // The LOAD below can be optimized away, and be replaced by the value of v1: >> // LoadVectorNode::Ideal calls LoadNode::Ideal, which looks at the memory >> // input and skips and independent stores, finding a store that matches the >> // exact location. And this store stores the value of v1, so we can replace >> // the LOAD, and just use v1 directly. Hence, the example below should have >> // Only a single load, and 3 stores. >> // HOWEVER: if we somehow exit too early in LoadVectorNode::Ideal, we may >> // never reach LoadNode::Ideal and miss the optimization. >> // This happens on aarch64 SVE with 256bits, when we return true for >> // Matcher::vector_needs_partial_operations, but then do nothing when calling >> // VectorNode::try_to_gen_masked_vector. We just return nullptr instantly, >> // rather than trying the other optimizations that LoadNode::Ideal has to >> // offer. >> IntVector v1 = IntVector.fromArray(SPECIES, a, 0 * SPECIES.length()); >> v1.intoArray(a, 1 * SPECIES.length()); // STORE of v1 >> v1.intoArray(a, 2 * SPECIES.length()); // independent STORE - no overlap with STORE above and LOAD below. >> IntVector v2 = IntVector.fromArray(SPECIES, a, 1 * SPECIES.length()); // LOAD - is it replaced with v1? >> v2.intoArray(a, 3 * SPECIES.length()); >> } >> >> public static void main(String[] args) { >> int[] a = new int[1000]; >> for (int i = 0; i < 10_000; i++) { >> test(a); >> } >> } >> } >> >> >> I'll see if we can do similar things for the other cases. > > We can continue the conversation in: > https://bugs.openjdk.org/browse/JDK-8371603 > Yes, exactly. I think instead of: > > ``` > return VectorNode::try_to_gen_masked_vector(phase, this, vt); > ``` > > we should do > > ``` > Node* progress = VectorNode::try_to_gen_masked_vector(phase, this, vt); > if (progress != nullptr) { return progress; } > ``` > > That should be correct, right? Yes, I think so. > Of course the naming here is a bit confusing, and suggests that this may not be correct. Because `vector_needs_partial_operations` would suggest we _always_ need to do partial operations. And so then we would expect that `try_to_gen_masked_vector` would have to _always_ succeed. And so maybe that is why the reviewers did not think that we should continue with `LoadNode::Ideal` if it fails, I suppose? So I think the names should be changed to `maybe_vector_needs_partial_operations` and `transform_to_partial_vector_if_needed`, or similar. What do you think? `vector_needs_partial_operations` is used to check whether we need consider the partial vector lanes for the specified op. The partial vector lanes means the vector size of the op is lower than the MaxVectorSize, hence the higher lanes should be ignored. I agree with your point. I'v created a JBS to fix it and will create a patch soon. Thank you so much for the debugging, testing and any input! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9037#discussion_r2570704667 From epeter at openjdk.org Fri Nov 28 08:06:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Nov 2025 08:06:06 GMT Subject: RFR: 8286941: Add mask IR for partial vector operations for ARM SVE [v9] In-Reply-To: References: <-BEiqR1zXMnGYuI9KR2jaecvz2CwR3CJ5x1PDEQnX7o=.eb9ff185-5221-4308-ac38-71abbe410ef2@github.com> Message-ID: On Fri, 28 Nov 2025 07:38:11 GMT, Xiaohong Gong wrote: >> We can continue the conversation in: >> https://bugs.openjdk.org/browse/JDK-8371603 > >> Yes, exactly. I think instead of: >> >> ``` >> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >> ``` >> >> we should do >> >> ``` >> Node* progress = VectorNode::try_to_gen_masked_vector(phase, this, vt); >> if (progress != nullptr) { return progress; } >> ``` >> >> That should be correct, right? > > Yes, I think so. > >> Of course the naming here is a bit confusing, and suggests that this may not be correct. Because `vector_needs_partial_operations` would suggest we _always_ need to do partial operations. And so then we would expect that `try_to_gen_masked_vector` would have to _always_ succeed. And so maybe that is why the reviewers did not think that we should continue with `LoadNode::Ideal` if it fails, I suppose? So I think the names should be changed to `maybe_vector_needs_partial_operations` and `transform_to_partial_vector_if_needed`, or similar. What do you think? > > `vector_needs_partial_operations` is used to check whether we need consider the partial vector lanes for the specified op. The partial vector lanes means the vector size of the op is lower than the MaxVectorSize, hence the higher lanes should be ignored. > > I agree with your point. I'v created a JBS (https://bugs.openjdk.org/browse/JDK-8372717) to fix it and will create a patch soon. Thank you so much for the debugging, testing and any input! @XiaohongGong I already have 2 reproducers for missing load and store optimizations attached to [JDK-8371603](https://bugs.openjdk.org/browse/JDK-8371603). You should add those 2 tests with IR verification to your PR then. And you should also verify that this patch fixes the issue from [JDK-8371603](https://bugs.openjdk.org/browse/JDK-8371603), and then we can close that one as a duplicate. It is a JDK26 bug, so we should try to fix it soon-ish ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9037#discussion_r2570749497 From epeter at openjdk.org Fri Nov 28 08:10:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Nov 2025 08:10:14 GMT Subject: RFR: 8286941: Add mask IR for partial vector operations for ARM SVE [v9] In-Reply-To: References: <-BEiqR1zXMnGYuI9KR2jaecvz2CwR3CJ5x1PDEQnX7o=.eb9ff185-5221-4308-ac38-71abbe410ef2@github.com> Message-ID: On Fri, 28 Nov 2025 08:03:23 GMT, Emanuel Peter wrote: >>> Yes, exactly. I think instead of: >>> >>> ``` >>> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >>> ``` >>> >>> we should do >>> >>> ``` >>> Node* progress = VectorNode::try_to_gen_masked_vector(phase, this, vt); >>> if (progress != nullptr) { return progress; } >>> ``` >>> >>> That should be correct, right? >> >> Yes, I think so. >> >>> Of course the naming here is a bit confusing, and suggests that this may not be correct. Because `vector_needs_partial_operations` would suggest we _always_ need to do partial operations. And so then we would expect that `try_to_gen_masked_vector` would have to _always_ succeed. And so maybe that is why the reviewers did not think that we should continue with `LoadNode::Ideal` if it fails, I suppose? So I think the names should be changed to `maybe_vector_needs_partial_operations` and `transform_to_partial_vector_if_needed`, or similar. What do you think? >> >> `vector_needs_partial_operations` is used to check whether we need consider the partial vector lanes for the specified op. The partial vector lanes means the vector size of the op is lower than the MaxVectorSize, hence the higher lanes should be ignored. >> >> I agree with your point. I'v created a JBS (https://bugs.openjdk.org/browse/JDK-8372717) to fix it and will create a patch soon. Thank you so much for the debugging, testing and any input! > > @XiaohongGong I already have 2 reproducers for missing load and store optimizations attached to [JDK-8371603](https://bugs.openjdk.org/browse/JDK-8371603). > > You should add those 2 tests with IR verification to your PR then. And you should also verify that this patch fixes the issue from [JDK-8371603](https://bugs.openjdk.org/browse/JDK-8371603), and then we can close that one as a duplicate. It is a JDK26 bug, so we should try to fix it soon-ish ;) You filed [JDK-8372717](https://bugs.openjdk.org/browse/JDK-8372717) as a JDK27 issue. But it really is also a fix for the JDK26 regression of [JDK-8371603](https://bugs.openjdk.org/browse/JDK-8371603). Not sure if we should convert it to a bug, or just tag the PR with both issues, so both are resolved as a consequence of it. The JDK26 fork is soon, and so if we don't integrate before that we would have to backport, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9037#discussion_r2570756661 From mbaesken at openjdk.org Fri Nov 28 08:24:47 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 28 Nov 2025 08:24:47 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: References: Message-ID: <5ridSmPf7i-rpGtWp8WiZ9AjmVt6RKpO5U1aXy4OGcQ=.c10657f2-abaa-4ff4-b10d-b45684a3eb66@github.com> On Thu, 27 Nov 2025 13:53:12 GMT, Emanuel Peter wrote: > @MBaesken Reported this issue on windows: > > TestAliasingCheckPreLimitNotAvailable_all-flags-fixed-stress-seed.jtr and TestAliasingCheckPreLimitNotAvailable_all-flags-no-stress-seed.jtr show failures on WIndows : > > [0.095s][error][gc] Failed to lookup symbol: VirtualAlloc2 > Error occurred during initialization of VM > ZGC requires Windows version 1803 or later > > AIX fails too : > Error occurred during initialization of VM > Option -XX:+UseZGC not supported > > > I learned a small lesson here: `@requires vm.gc.Z` is much smarter than checking that no other GC is set, or ZGC is set. It also checks if ZGC is available, which is not always the case, e.g. on the reported Windows machne. > > @MBaesken Can you please confirm that this fixes the test for you? LGTM ------------- Marked as reviewed by mbaesken (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28537#pullrequestreview-3517901294 From mbaesken at openjdk.org Fri Nov 28 08:24:48 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 28 Nov 2025 08:24:48 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 15:43:49 GMT, Emanuel Peter wrote: > > Can you confirm that this change fixes your issue though? Because I could not reproduce the issue on my machine, so I'm relying on you here ;) This fixes the issue on all our machines where the problem showed up yesterday! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28537#issuecomment-3588344468 From thartmann at openjdk.org Fri Nov 28 08:26:05 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 28 Nov 2025 08:26:05 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 16:48:57 GMT, Volodymyr Paprotski wrote: >> Marked as reviewed by ascarpino (Reviewer). > > Oh.. realized that I should had checked JBS.. thanks @ascarpino for resolving the bug I caused! At least its just the option.. whew. > >> @dholmes-ora Hi David, need some help with this please, don't have access to an ARM system to reproduce (or the ARM expertise).. could you point me at the failing job if thats available? Or some log if not? >> >> * Is it an issue with the options (i.e. `-XX:UseAVX=2` perhaps). I probably should had added `-XX:+IgnoreUnrecognizedVMOptions` to it.. >> * Otherwise, I am stumped.. the test case isn't architecture-specific.. it calls two methods (one of which is annotated as an intrinsic..) and expects them to return the same value.. i.e. Java and Intrinsic version should behave the same.. >> * Only thing I can think of.. The ARM implementation took some shortcuts in name of optimization. This can be entirely valid if the code calling the intrinsics never should get some specific value (-ranges). i.e. the tests RNG be further restricted.. >> * Otherwise.. is it possible its a bug in the ARM intrinsic? This caused a regression: [JDK-8372703](https://bugs.openjdk.org/browse/JDK-8372703). @vpaprotsk Could you please have a look? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3588349196 From mbaesken at openjdk.org Fri Nov 28 08:27:52 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 28 Nov 2025 08:27:52 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 15:43:49 GMT, Emanuel Peter wrote: > I leave it up to you if you want to file an RFE for the error message. I don't have the expertise on Windows nor on GC. @xmas92 , @jsikstro what do you think ? Is this about the 'ZGC requires Windows version 1803 or later' message that surprised us a little bit because we see it on Windows server 2016 , but the 1803 looks like it refers to some update of good old Win 10 . ------------- PR Comment: https://git.openjdk.org/jdk/pull/28537#issuecomment-3588354070 From thartmann at openjdk.org Fri Nov 28 08:32:24 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 28 Nov 2025 08:32:24 GMT Subject: RFR: 8372720: Problem list compiler/arguments/TestCodeEntryAlignment.java Message-ID: Let's problem list the test until [JDK-8372703](https://bugs.openjdk.org/browse/JDK-8372703) is fixed. Thanks, Tobias ------------- Commit messages: - Fixed bug number - JDK-8372720 Changes: https://git.openjdk.org/jdk/pull/28550/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28550&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372720 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28550.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28550/head:pull/28550 PR: https://git.openjdk.org/jdk/pull/28550 From mchevalier at openjdk.org Fri Nov 28 08:34:48 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 28 Nov 2025 08:34:48 GMT Subject: RFR: 8372720: Problem list compiler/arguments/TestCodeEntryAlignment.java In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 08:21:16 GMT, Tobias Hartmann wrote: > Let's problem list the test until [JDK-8372703](https://bugs.openjdk.org/browse/JDK-8372703) is fixed. > > Thanks, > Tobias Marked as reviewed by mchevalier (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28550#pullrequestreview-3517932179 From epeter at openjdk.org Fri Nov 28 08:41:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Nov 2025 08:41:47 GMT Subject: RFR: 8372720: Problem list compiler/arguments/TestCodeEntryAlignment.java In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 08:21:16 GMT, Tobias Hartmann wrote: > Let's problem list the test until [JDK-8372703](https://bugs.openjdk.org/browse/JDK-8372703) is fixed. > > Thanks, > Tobias LGTM ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28550#pullrequestreview-3517953366 From jsikstro at openjdk.org Fri Nov 28 08:48:01 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 28 Nov 2025 08:48:01 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 08:25:20 GMT, Matthias Baesken wrote: >> @MBaesken Thanks for the additional information! >> >> I leave it up to you if you want to file an RFE for the error message. I don't have the expertise on Windows nor on GC. >> >> Can you confirm that this change fixes your issue though? Because I could not reproduce the issue on my machine, so I'm relying on you here ;) > >> I leave it up to you if you want to file an RFE for the error message. I don't have the expertise on Windows nor on GC. > > @xmas92 , @jsikstro what do you think ? > Is this about the 'ZGC requires Windows version 1803 or later' message that surprised us a little bit because we see it on Windows server 2016 , but the 1803 looks like it refers to some update of good old Win 10 . @MBaesken 1803 seems to refer to both a Windows 10 and Windows Server 2016 (internal) release number/version. Here's a version list of the old semi-annual releases of Windows Server 2016: https://en.wikipedia.org/wiki/Windows_Server#Semi-Annual_releases_(discontinued) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28537#issuecomment-3588402869 From thartmann at openjdk.org Fri Nov 28 08:48:57 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 28 Nov 2025 08:48:57 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: On Thu, 21 Aug 2025 15:21:48 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Update comment for constraint casts Sounds good, thanks for the update! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3588406326 From thartmann at openjdk.org Fri Nov 28 08:49:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 28 Nov 2025 08:49:10 GMT Subject: RFR: 8372720: Problem list compiler/arguments/TestCodeEntryAlignment.java In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 08:21:16 GMT, Tobias Hartmann wrote: > Let's problem list the test until [JDK-8372703](https://bugs.openjdk.org/browse/JDK-8372703) is fixed. > > Thanks, > Tobias Thanks for the reviews, Marc and Emanuel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28550#issuecomment-3588400992 From thartmann at openjdk.org Fri Nov 28 08:49:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 28 Nov 2025 08:49:10 GMT Subject: Integrated: 8372720: Problem list compiler/arguments/TestCodeEntryAlignment.java In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 08:21:16 GMT, Tobias Hartmann wrote: > Let's problem list the test until [JDK-8372703](https://bugs.openjdk.org/browse/JDK-8372703) is fixed. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 70b4eb24 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/70b4eb249eb4bad727f83e0b004a0ce481208726 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8372720: Problem list compiler/arguments/TestCodeEntryAlignment.java Reviewed-by: mchevalier, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28550 From duke at openjdk.org Fri Nov 28 08:49:57 2025 From: duke at openjdk.org (Zihao Lin) Date: Fri, 28 Nov 2025 08:49:57 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: References: Message-ID: <2oDqUvcW_3hJRPRri4uttpkgfeCovL4ZZkcI0R1bB1A=.173b3a58-d0f1-4b29-94d1-77b0a350c790@github.com> On Thu, 27 Nov 2025 09:54:39 GMT, Roland Westrelin wrote: >> Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test failed > > src/hotspot/share/opto/escape.cpp line 4488: > >> 4486: const TypePtr* adr_type = proj->adr_type(); >> 4487: const TypePtr* new_adr_type = tinst->add_offset(adr_type->offset()); >> 4488: if (adr_type != new_adr_type) { > > Can you explain that change? Did something go wrong in a merge? Here is a assert failed command: main -XX:CompileCommand=dontinline,compiler.arraycopy.TestArrayCopyMemoryChain::test* -Xbatch compiler.arraycopy.TestArrayCopyMemoryChain reason: User specified action: run main/othervm -XX:CompileCommand=dontinline,compiler.arraycopy.TestArrayCopyMemoryChain::test* -Xbatch compiler.arraycopy.TestArrayCopyMemoryChain started: Fri Nov 28 16:36:37.189 CST 2025 Mode: othervm [/othervm specified] Process id: 16782 finished: Fri Nov 28 16:36:37.350 CST 2025 elapsed time (seconds): 0.161 configuration: STDOUT: CompileCommand: dontinline compiler/arraycopy/TestArrayCopyMemoryChain.test* bool dontinline = true # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/Users/linzihao/Desktop/jdk-dev/src/hotspot/share/opto/escape.cpp:4184), pid=16782, tid=26115 # assert(result != nullptr) failed: new projection should have been allocated # # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.linzihao.jdk-dev) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.linzihao.jdk-dev, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /Users/linzihao/Desktop/jdk-dev/build/macosx-aarch64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_arraycopy_TestArrayCopyMemoryChain_java/scratch/0/hs_err_pid16782.log # # Compiler replay data is saved as: # /Users/linzihao/Desktop/jdk-dev/build/macosx-aarch64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_arraycopy_TestArrayCopyMemoryChain_java/scratch/0/replay_pid16782.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2570853694 From duke at openjdk.org Fri Nov 28 08:57:53 2025 From: duke at openjdk.org (Zihao Lin) Date: Fri, 28 Nov 2025 08:57:53 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: <2oDqUvcW_3hJRPRri4uttpkgfeCovL4ZZkcI0R1bB1A=.173b3a58-d0f1-4b29-94d1-77b0a350c790@github.com> References: <2oDqUvcW_3hJRPRri4uttpkgfeCovL4ZZkcI0R1bB1A=.173b3a58-d0f1-4b29-94d1-77b0a350c790@github.com> Message-ID: <2wAnS7drj_r3dqsy5CEF9vBG40KizHsQDOxMeNymwhw=.9bc29879-eead-401c-b750-814592feff63@github.com> On Fri, 28 Nov 2025 08:47:13 GMT, Zihao Lin wrote: >> src/hotspot/share/opto/escape.cpp line 4488: >> >>> 4486: const TypePtr* adr_type = proj->adr_type(); >>> 4487: const TypePtr* new_adr_type = tinst->add_offset(adr_type->offset()); >>> 4488: if (adr_type != new_adr_type) { >> >> Can you explain that change? Did something go wrong in a merge? > > Here is a assert failed > > command: main -XX:CompileCommand=dontinline,compiler.arraycopy.TestArrayCopyMemoryChain::test* -Xbatch compiler.arraycopy.TestArrayCopyMemoryChain > reason: User specified action: run main/othervm -XX:CompileCommand=dontinline,compiler.arraycopy.TestArrayCopyMemoryChain::test* -Xbatch compiler.arraycopy.TestArrayCopyMemoryChain > started: Fri Nov 28 16:36:37.189 CST 2025 > Mode: othervm [/othervm specified] > Process id: 16782 > finished: Fri Nov 28 16:36:37.350 CST 2025 > elapsed time (seconds): 0.161 > configuration: > STDOUT: > CompileCommand: dontinline compiler/arraycopy/TestArrayCopyMemoryChain.test* bool dontinline = true > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/Users/linzihao/Desktop/jdk-dev/src/hotspot/share/opto/escape.cpp:4184), pid=16782, tid=26115 > # assert(result != nullptr) failed: new projection should have been allocated > # > # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.linzihao.jdk-dev) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.linzihao.jdk-dev, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /Users/linzihao/Desktop/jdk-dev/build/macosx-aarch64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_arraycopy_TestArrayCopyMemoryChain_java/scratch/0/hs_err_pid16782.log > # > # Compiler replay data is saved as: > # /Users/linzihao/Desktop/jdk-dev/build/macosx-aarch64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_arraycopy_TestArrayCopyMemoryChain_java/scratch/0/replay_pid16782.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # The assert failed because `find_inst_mem()` skipped an Initialize memory projection whose `adr_type` was still the general slice, then tried to fetch the instance-specific projection from `_node_map` and got nullptr. That happens when a precise `NarrowMemProj` already exists: the code doesn?t create a new one and also never records the mapping, so later lookup fails. The fix records the mapping even if the precise `NarrowMemProj` is already present (not newly created). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2570873232 From pminborg at openjdk.org Fri Nov 28 09:03:49 2025 From: pminborg at openjdk.org (Per Minborg) Date: Fri, 28 Nov 2025 09:03:49 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 16:16:05 GMT, Chen Liang wrote: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. I really like this one! I wonder if we could enable the new annotation `@TrustFinalFields` on package level as well so we could get rid of _all_ the special handing in `ciField.spp`. I am not sure this is the best way to do it but it would perhaps be possible to annotate the `package-info.java` file. For example in `java.lang.invoke.package-info.java`: @TrustFinalFields package java.lang.invoke; Is there a better way to do it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3588443540 From duke at openjdk.org Fri Nov 28 09:15:45 2025 From: duke at openjdk.org (Zihao Lin) Date: Fri, 28 Nov 2025 09:15:45 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v14] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: remove adr_type from graphKit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/35ec9135..18714dae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=12-13 Stats: 62 lines in 6 files changed: 0 ins; 34 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From galder at openjdk.org Fri Nov 28 09:23:58 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 28 Nov 2025 09:23:58 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v2] In-Reply-To: <4vSKAtr0tUG0V193gIvnEFdHm18ZhqflVAwk-09IVQ0=.081806f5-6303-4b4f-975d-7c85427ccae5@github.com> References: <4vSKAtr0tUG0V193gIvnEFdHm18ZhqflVAwk-09IVQ0=.081806f5-6303-4b4f-975d-7c85427ccae5@github.com> Message-ID: On Thu, 20 Nov 2025 04:01:09 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Don't read and write the same memory in the JMH benchmarks > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns > > `VectorMaskCastNode` is used to cast a vector mask from one type to > another type. The cast may be generated by calling the vector API `cast` > or generated by the compiler. For example, some vector mask operations > like `trueCount` require the input mask to be integer types, so for > floating point type masks, the compiler will cast the mask to the > corresponding integer type mask automatically before doing the mask > operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` > don't generate code, otherwise code will be generated to extend or narrow > the mask. This IR node is not free no matter it generates code or not > because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` > The middle `VectorMaskCast` prevented the following optimization: > `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which > blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we > can safely do the optimization. But if the input value is changed, we > can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper > function, which can be used to uncast a chain of `VectorMaskCastNode`, > like the existing `Node::uncast(bool)` function. The funtion returns > the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may > contain one or more consecutive `VectorMaskCastNode` and this does not > affect the correctness of the optimization. Then this function can be > called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (... Nice improvement @erifan, just some small comments from me src/hotspot/share/opto/vectornode.cpp line 1056: > 1054: // x remains to be a bool vector with no changes. > 1055: // This function can be used to eliminate the VectorMaskCast in such patterns. > 1056: Node* VectorNode::uncast_mask(Node* n) { Could this be a static method instead? test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 57: > 55: applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) > 56: public static int testTwoCastToDifferentType() { > 57: // The types before and after the two casts are not the same, so the cast cannot be eliminated. Outdated comment. Also please expand assertion comments test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 79: > 77: applyIfCPUFeatureAnd = {"avx2", "true", "avx512", "false"}) > 78: public static int testTwoCastToDifferentType2() { > 79: // The types before and after the two casts are not the same, so the cast cannot be eliminated. Could you expand the documentation on the IR assertions? It's not immediately clear why with AVX-512 the cast remains but with AVX-2 it's removed. Also, this comment is outdated. test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 240: > 238: > 239: @Test > 240: @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0", Could you add some assertion comments here as well to understand what causes the differences with different architectures? test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 260: > 258: > 259: @Test > 260: @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0", Same here ------------- Changes requested by galder (Author). PR Review: https://git.openjdk.org/jdk/pull/28313#pullrequestreview-3518051437 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570915091 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570925650 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570924373 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570932229 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570932750 From galder at openjdk.org Fri Nov 28 09:40:25 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 28 Nov 2025 09:40:25 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v4] In-Reply-To: References: Message-ID: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> > Trivial cleanup to move tests out of a test class whose description does not match these tests Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/gcbarriers/TestMinMaxLongLoopBarrier.java Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28385/files - new: https://git.openjdk.org/jdk/pull/28385/files/278d4bce..d023353f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28385&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28385&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28385/head:pull/28385 PR: https://git.openjdk.org/jdk/pull/28385 From galder at openjdk.org Fri Nov 28 09:40:26 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 28 Nov 2025 09:40:26 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v3] In-Reply-To: <1BZ3cuwoLIBRdM_MfBcMh3IaBQ27Yl0LcPVKZVNJYxg=.c0c255aa-620f-44b7-8493-551201fa7ff7@github.com> References: <1BZ3cuwoLIBRdM_MfBcMh3IaBQ27Yl0LcPVKZVNJYxg=.c0c255aa-620f-44b7-8493-551201fa7ff7@github.com> Message-ID: On Thu, 27 Nov 2025 12:21:35 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactored to compiler.gcbarriers package > > test/hotspot/jtreg/compiler/gcbarriers/TestMinMaxLongLoopBarrier.java line 42: > >> 40: * @summary Test that MinL/MaxL nodes are removed when GC barriers in loop >> 41: * @library /test/lib / >> 42: * @run driver compiler.c2.irTests.TestMinMaxLongLoopBarrier > > I think you forgot to update the class path here. You should do this now that it is possible, it prevents errors with wrongly copied test class names ;) > Suggestion: > > * @run driver ${test.main.class} Good catch! Applied your suggestion as is ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28385#discussion_r2570996565 From shade at openjdk.org Fri Nov 28 09:44:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Nov 2025 09:44:53 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v10] In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 06:01:26 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Change to break before operators. Oh man, the `pos`/`len` modifications in current code are confusing. I scratched my head for quite a while trying to comprehend why does `__ bind(MESG_BELOW_32_BLKS)` split the `pos += 16` and `len -= 16`? On a surface, that just looks like a bug. But looks that way because we do `initial_blocks_16_avx512` twice, do `pos += 16` twice, but only do the `len += 32` after the second call. Which does not help if we shortcut after the first call. In fact, I am not at all sure that checking `len < 32` _without_ modifying `len` beforehand does not break the assumptions downstream: initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset); __ addl(pos, 16 * 16); __ cmpl(len, 32 * 16); __ jcc(Assembler::below, MESG_BELOW_32_BLKS); Really, in these kind of intrinsics, _you want_ to make sure `pos` and `len` updates are tightly bound together, otherwise these kinds of mistakes would keep happening. You will lose on code density a bit, but would have more readable and robust code. Shouldn't it be like this? diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp index 1e728ffa279..a16e25b075d 100644 --- a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp +++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp @@ -3475,12 +3475,14 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset); __ addl(pos, 16 * 16); + __ subl(len, 16 * 16); + __ cmpl(len, 32 * 16); __ jcc(Assembler::below, MESG_BELOW_32_BLKS); initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset + 16); __ addl(pos, 16 * 16); - __ subl(len, 32 * 16); + __ subl(len, 16 * 16); __ cmpl(len, 32 * 16); __ jcc(Assembler::below, NO_BIG_BLKS); @@ -3491,24 +3493,27 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, true, true, false, false, false, ghashin_offset, aesout_offset, HashKey_32); __ addl(pos, 16 * 16); + __ subl(len, 16 * 16); ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, true, false, true, false, true, ghashin_offset + 16, aesout_offset + 16, HashKey_16); __ evmovdquq(AAD_HASHx, ZTMP4, Assembler::AVX_512bit); __ addl(pos, 16 * 16); - __ subl(len, 32 * 16); + __ subl(len, 16 * 16); __ jmp(ENCRYPT_BIG_BLKS_NO_HXOR); __ bind(ENCRYPT_BIG_NBLKS); ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, false, true, false, false, false, ghashin_offset, aesout_offset, HashKey_32); __ addl(pos, 16 * 16); + __ subl(len, 16 * 16); + ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, false, false, true, true, true, ghashin_offset + 16, aesout_offset + 16, HashKey_16); __ movdqu(AAD_HASHx, ZTMP4); __ addl(pos, 16 * 16); - __ subl(len, 32 * 16); + __ subl(len, 16 * 16); __ bind(NO_BIG_BLKS); __ cmpl(len, 16 * 16); @@ -3525,9 +3530,9 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist ghash16_avx512(false, true, false, false, true, in, pos, avx512_subkeyHtbl, AAD_HASHx, SHUF_MASK, stack_offset, 16 * 16, 0, HashKey_16); __ addl(pos, 16 * 16); + __ subl(len, 16 * 16); __ bind(MESG_BELOW_32_BLKS); - __ subl(len, 16 * 16); gcm_enc_dec_last_avx512(len, in, pos, AAD_HASHx, SHUF_MASK, avx512_subkeyHtbl, ghashin_offset, HashKey_16, true, true); __ bind(GHASH_DONE); ------------- PR Review: https://git.openjdk.org/jdk/pull/28363#pullrequestreview-3518173513 From epeter at openjdk.org Fri Nov 28 09:48:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Nov 2025 09:48:57 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v4] In-Reply-To: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> References: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> Message-ID: On Fri, 28 Nov 2025 09:40:25 GMT, Galder Zamarre?o wrote: >> Trivial cleanup to move tests out of a test class whose description does not match these tests > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/gcbarriers/TestMinMaxLongLoopBarrier.java > > Co-authored-by: Emanuel Peter Thanks for the updates and the cleanup in general :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28385#pullrequestreview-3518185744 From mbaesken at openjdk.org Fri Nov 28 10:15:16 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 28 Nov 2025 10:15:16 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: References: Message-ID: <3861jYG-DmA_XdGaE8Zj5GlutdxiXc4t-jjRnixANF4=.7119ba1e-8af8-4aa6-9bb2-52c92f1b3d91@github.com> On Fri, 28 Nov 2025 08:25:20 GMT, Matthias Baesken wrote: >> @MBaesken Thanks for the additional information! >> >> I leave it up to you if you want to file an RFE for the error message. I don't have the expertise on Windows nor on GC. >> >> Can you confirm that this change fixes your issue though? Because I could not reproduce the issue on my machine, so I'm relying on you here ;) > >> I leave it up to you if you want to file an RFE for the error message. I don't have the expertise on Windows nor on GC. > > @xmas92 , @jsikstro what do you think ? > Is this about the 'ZGC requires Windows version 1803 or later' message that surprised us a little bit because we see it on Windows server 2016 , but the 1803 looks like it refers to some update of good old Win 10 . > @MBaesken 1803 seems to refer to both a Windows 10 and Windows Server 2016 (internal) release number/version. Here's a version list of the old semi-annual releases of Windows Server 2016: https://en.wikipedia.org/wiki/Windows_Server#Semi-Annual_releases_(discontinued) Thanks ! The wikipedia says 'semi-annual releases do not include any desktop environments. Instead, they are restricted to the Nano Server configuration installed in a [Docker](https://en.wikipedia.org/wiki/Docker_(software)) [container](https://en.wikipedia.org/wiki/Containerization_(computing)),[[17]](https://en.wikipedia.org/wiki/Windows_Server#cite_note-thomasmaurer-17)[[29]](https://en.wikipedia.org/wiki/Windows_Server#cite_note-:0-29) and the Server Core configuration, licensed only to serve as a container host' so this sounds like it is a rather special 'flavor' of Win Server 2016 . So maybe it is no wonder what we get the warning and have no VirtualAlloc2 on our Win Server 2016 test machine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28537#issuecomment-3588687610 From mhaessig at openjdk.org Fri Nov 28 10:18:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Nov 2025 10:18:56 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v4] In-Reply-To: References: Message-ID: <0i5fmZUdyfKcYATTCm9RTnMK939UqCcDsJFIkNzLtn8=.35c27371-ced6-4e25-908b-595484986c25@github.com> On Thu, 27 Nov 2025 19:14:05 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: > > fix rename Thank you for the thorough investigation and the fixes. Filing issues and concentrating on the case at hand sounds good to me. I have a few more comments on the test. test/hotspot/jtreg/compiler/c2/TestUnlockNodeNullMemprof.java line 37: > 35: public static void main(String[] args) { > 36: int[] a = new int[0]; // test valid only when size is 0. > 37: for (int i = 0; i < Integer.valueOf(10000); i++) Suggestion: for (int i = 0; i < Integer.valueOf(10000); i++) // test only valid with boxed loop limit This surprised me, so I would appreciate the comment, but feel free to leave it. test/hotspot/jtreg/compiler/c2/TestUnlockNodeNullMemprof.java line 39: > 37: for (int i = 0; i < Integer.valueOf(10000); i++) > 38: try { > 39: test(a, 0); Suggestion: test(a); See below test/hotspot/jtreg/compiler/c2/TestUnlockNodeNullMemprof.java line 44: > 42: } > 43: > 44: static void test(int[] a, int invar) { Suggestion: static void test(int[] a) { The test works without `invar`. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28432#pullrequestreview-3518247186 PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2571147027 PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2571140040 PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2571082704 From mhaessig at openjdk.org Fri Nov 28 10:24:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Nov 2025 10:24:56 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: <7yVPVVb69q3amaeSWocd9dRAosVwxXZn6LpgJ02V2JE=.fca22171-2387-4d19-9806-3f33c4e2814f@github.com> On Fri, 14 Nov 2025 21:50:27 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test, add temporary @IR rule for testLongRange and improve comments Sorry for dropping the ball. The changes look good. I'll start another round of testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3518341196 From epeter at openjdk.org Fri Nov 28 10:25:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Nov 2025 10:25:55 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v4] In-Reply-To: <0i5fmZUdyfKcYATTCm9RTnMK939UqCcDsJFIkNzLtn8=.35c27371-ced6-4e25-908b-595484986c25@github.com> References: <0i5fmZUdyfKcYATTCm9RTnMK939UqCcDsJFIkNzLtn8=.35c27371-ced6-4e25-908b-595484986c25@github.com> Message-ID: On Fri, 28 Nov 2025 10:16:06 GMT, Manuel H?ssig wrote: >> Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: >> >> fix rename > > test/hotspot/jtreg/compiler/c2/TestUnlockNodeNullMemprof.java line 37: > >> 35: public static void main(String[] args) { >> 36: int[] a = new int[0]; // test valid only when size is 0. >> 37: for (int i = 0; i < Integer.valueOf(10000); i++) > > Suggestion: > > for (int i = 0; i < Integer.valueOf(10000); i++) // test only valid with boxed loop limit > > This surprised me, so I would appreciate the comment, but feel free to leave it. Is that maybe because it delays the discovery of the constant until CCP? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2571167713 From epeter at openjdk.org Fri Nov 28 10:25:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Nov 2025 10:25:51 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v4] In-Reply-To: References: Message-ID: <1Agol3OtcCV7ilUBseuyB3DMWXfinb4bTBnRafLtfS0=.d4081ee2-4495-471e-85e2-ffcc2f825d21@github.com> On Thu, 27 Nov 2025 19:14:05 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: > > fix rename test/hotspot/jtreg/compiler/c2/TestUnlockNodeNullMemprof.java line 29: > 27: * @summary Do not segfault while adding node to IGVN worklist > 28: * > 29: * @run main/othervm -Xbatch compiler.c2.TestUnlockNodeNullMemprof Suggestion: * @run main/othervm -Xbatch ${test.main.class} Possible since a recent JTREG update. Makes wrongly copying class name go away ;) Also: I wonder if we should also have a run without any flags? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2571167814 From mhaessig at openjdk.org Fri Nov 28 10:27:52 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Nov 2025 10:27:52 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 21:50:27 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test, add temporary @IR rule for testLongRange and improve comments Also, I would suggest that we integrate this only after the branch of the JDK 26 branch on December 4th. This has interactions with a bunch of PRs that are in flight and it would be good to give this some time to bake on mainline before being released. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3588728061 From shade at openjdk.org Fri Nov 28 10:53:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Nov 2025 10:53:55 GMT Subject: RFR: 8372188: AArch64: Generate atomic match rules from M4 stencils In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:54:06 GMT, Aleksey Shipilev wrote: > Current atomic match rules are all over the place in AArch64: > - CAE and weak CAS rules are generated with the help of `cas.m4`, and then are supposed to be copy-pasted (?) into `aarch64.ad`. I did it about 20 times when fixing [JDK-8372154](https://bugs.openjdk.org/browse/JDK-8372154), gets tedious very quickly. > - Strong CAS and get-and-set rules are still in the same section of `aarch64.ad`, and are written by hand. Yet, those can be automatically generated from M4 stencils as well. > > This PR cleans that up by moving all these rules into a separate `.ad` file, which one can cleanly re-generate by invoking `m4 aarch64_atomic_ad.m4 > aarch64_atomic.ad`. The meat of the change is `aarch64_atomic.m4`, everything else is either generated from it, or removed in favor of auto-generated code. There should be no semantic change, as I attempted to move the rules mostly verbatim, only changing non-semantic stuff like match rule names and some formats. > > Testing: > - [x] Eyeballing match rules before/after > - [x] Linux AArch64 server fastdebug, `hotspot_compiler` > - [x] Linux AArch64 server fastdebug, `tier1` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstress run Thanks for reviews! All tests are passing (there are some known AArch64 failures). But it is a large patch, so I will integrate this on Monday to avoid dealing with any breakages over the weekend. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28538#issuecomment-3588824144 From alanb at openjdk.org Fri Nov 28 10:55:54 2025 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 28 Nov 2025 10:55:54 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: <1tazzYHm78XLDovV11RAQt2W-ujENi4b_frOa87Jv14=.45b6d8a1-cb76-49ac-8048-429916bc9c6c@github.com> References: <1tazzYHm78XLDovV11RAQt2W-ujENi4b_frOa87Jv14=.45b6d8a1-cb76-49ac-8048-429916bc9c6c@github.com> Message-ID: On Thu, 27 Nov 2025 18:58:59 GMT, Chen Liang wrote: >> src/java.base/share/classes/jdk/internal/vm/annotation/TrustFinalFields.java line 61: >> >>> 59: /// fields in classes specified by this annotation. >>> 60: /// >>> 61: /// This annotation is only recognized on privileged code and is ignored elsewhere. >> >> "privileged code" hints of protection domains, permissions or security manager. Some of the annotations are limited to classes defined by the boot loader, is it the case here too? > > I took this sentence from `@AOTSafeClassInitializer`. The term "privileged" comes from this variable in `classFileParser.cpp`: > https://github.com/openjdk/jdk/blob/d94c52ccf2fed3fc66d25a34254c9b581c175fa1/src/hotspot/share/classfile/classFileParser.cpp#L1818-L1820 > > The other annotations have this note, which seems incorrect from the hotspot excerpt: > > @implNote > This annotation only takes effect for fields of classes loaded by the boot > loader. Annotations on fields of classes loaded outside of the boot loader > are ignored. > > > This behavior seems to be originally changed by 6964a690ed9a23d4c0692da2dfbced46e1436355, referring to an inaccessible issue. > > What should I do with this? Should I leave this as-is and create a separate patch to update this comment for vm.annotation annotations, or fix this first and have the separate patch fix other annotations later? For this PR then you could just change the last sentence to say that the annotation is only effective for classes defined by the boot class loader or platform class loader. A follow-up PR could propose changes to the other annotation descriptions. As regards background then one of the significant changes in JDK 9 was that java.* modules could be mapped to the platform class loader without give them "all permission" in the security manager execution mode. If you see JBS issues or comments speaking of "de-privileging" then it's likely related to changes that "moved" modules that were originally mapped to the boot class loader to the platform class loader. Now that the security manager execution mode is gone then we don't have to deal with all these issues now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2571257172 From roland at openjdk.org Fri Nov 28 10:57:31 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Nov 2025 10:57:31 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected Message-ID: Crash occurs because a `MergeMem` node references itself: 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) ``` Before IGVN, that part of the stream is: 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) 522 is a loop head, 604 is the backedge. The loop becomes unreachable during IGVN. The loop body above is transformed to: 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) 540 IfFalse === 538 [[ 562 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) 562 Region === 562 540 539 [[ 562 600 596 878 592 876 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) 876 Phi === 562 608 876 [[ 877 876 597 420 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !orig=[581] !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) 608 MergeMem === _ 1 876 1 1 1 1 1 1 1 1 1 1 878 [[ 876 878 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) ``` That part of the graph is dead but still being updated. It is matched by `PhiNode::try_clean_memory_phi()` and the result is that 608 reference itself. I think all that is missing here is a check that if the `Phi` is transformed a self loop is not created. Other `Phi` transformations do that. ------------- Commit messages: - test - fix Changes: https://git.openjdk.org/jdk/pull/28554/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28554&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371464 Stats: 90 lines in 3 files changed: 83 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28554.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28554/head:pull/28554 PR: https://git.openjdk.org/jdk/pull/28554 From alanb at openjdk.org Fri Nov 28 10:59:48 2025 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 28 Nov 2025 10:59:48 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 19:08:40 GMT, Chen Liang wrote: >> src/hotspot/share/ci/ciField.cpp line 220: >> >>> 218: return false; >>> 219: // Explicit opt-in from system classes >>> 220: if (holder->trust_final_fields()) >> >> This is definitely nicer than listing specific classes. It would be nicer again once we can make this exceptions go away. > > True, this occupies one of the 16 precious instance klass bits in runtime. I wish we can derive this from our final means final restrictions, but their setup is to permit use-sites to migrate more easily, and is harder for declaration sites to deduce if a declaration is easier to be permitted. We can consider blanket-trust when the JVM uses `--illegal-final-field-mutation=deny` without additional `--enable-final-field-mutation`. This would be the equivalent of running with -XX:+TrustFinalNonStaticFields, which would be nice, but there would be performance surprises as soon as you enable final field mutation for any module (and likely ALL-UNNAMED). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2571272623 From alanb at openjdk.org Fri Nov 28 11:08:56 2025 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 28 Nov 2025 11:08:56 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: <27R9PsHG0Jn3Ov39a-G9IXvKoEG01P0mOKMfVVrF4S4=.82593db1-0844-428f-9eff-af1529ff9663@github.com> On Fri, 28 Nov 2025 09:00:46 GMT, Per Minborg wrote: > I wonder if we could enable the new annotation `@TrustFinalFields` on package level as well so we could get rid of _all_ the special handing in `ciField.spp`. I am not sure this is the best way to do it but it would perhaps be possible to annotate the `package-info.java` file. For example in `java.lang.invoke.package-info.java`: The VM don't read/parse the package-info class. It's really only used from APIs to read the annotations. In any case, teh long term goal needs to be to remove all special handling. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3588887344 From mhaessig at openjdk.org Fri Nov 28 12:17:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Nov 2025 12:17:51 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v2] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Wed, 26 Nov 2025 06:31:22 GMT, Dean Long wrote: >> The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. >> >> I played around with other approaches, like: >> 1. setting the stack to empty >> 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() >> 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set >> but in the end I decided to go with the minimal localized low-risk change. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > remove extra spaces Thank you for fixing this, @dean-long. It looks good to me. test/hotspot/jtreg/compiler/exceptions/TestAccessErrorInCatch.java line 26: > 24: /* > 25: * @test > 26: * @bug 8367002 Suggestion: * @bug 8367002 8370766 Perhaps we should add this bug to the test, since you modified it. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28486#pullrequestreview-3518732734 PR Review Comment: https://git.openjdk.org/jdk/pull/28486#discussion_r2571470700 From shade at openjdk.org Fri Nov 28 12:27:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Nov 2025 12:27:07 GMT Subject: RFR: 8371768: AArch64: test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java fails on SVE after JDK-8340093 [v2] In-Reply-To: References: Message-ID: > Looks like the test should be more resilient with UseSVE > 0, which _can_ vectorise. It does not look all that reliable to me to failOn when vectorization actually happens. So I dropped some non-arch-specific rules, and amended AArch64-specific rules for UseSVE. > > Testing: > - [x] Linux AArch64 server fastdebug, affected test on machine with UseSVE=1 by default > - [x] Linux AArch64 server fastdebug, affected test on machine with UseSVE=0 overridden Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8371768-testbug-reduction - A bit of mop up - UseSVE works ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28423/files - new: https://git.openjdk.org/jdk/pull/28423/files/c5e4a3cc..0d922cb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28423&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28423&range=00-01 Stats: 47164 lines in 640 files changed: 33290 ins; 9786 del; 4088 mod Patch: https://git.openjdk.org/jdk/pull/28423.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28423/head:pull/28423 PR: https://git.openjdk.org/jdk/pull/28423 From chagedorn at openjdk.org Fri Nov 28 13:06:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 28 Nov 2025 13:06:50 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: <9CGZeCADEds8B60aZZxkUj9GWIfvQAmQ9lN8E_ft4uo=.9923fd74-e17e-453d-9f83-e2367ae96ca9@github.com> Message-ID: <2WBk-Rs9In3kWDo-AXbxTW23XAfYuN6XqlQROfhG9k4=.83db3049-2630-49f6-919b-5d380ea7fb1d@github.com> On Thu, 27 Nov 2025 15:52:33 GMT, Martin Doerr wrote: >> Could you help to test the previously failing tests >> - `TestIRMatching.java` >> - `TestPhaseIRMatching.java` >> - `IRExample.java` >> >> with the proposed patch on different platforms? >> >> - PPC (@TheRealMDoerr) >> - s390 (@offamitkumar) >> - riscv (@Hamlin-Li) >> >> That would be highly appreciated :-) > >> * TestIRMatching.java >> * TestPhaseIRMatching.java >> * IRExample.java > > Thanks for the ping! The 3 tests have passed with your latest PR version on linux ppc64le. Thanks @TheRealMDoerr and @Hamlin-Li for testing! Even though we miss s390 testing, I think we can take the risk and integrate this since the tests are regularly failing at tier5. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28495#issuecomment-3589278645 From epeter at openjdk.org Fri Nov 28 13:16:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Nov 2025 13:16:01 GMT Subject: RFR: 8372461: [IR Framework] Multiple test failures after JDK-8371789 [v2] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:19:13 GMT, Christian Hagedorn wrote: >> [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: >> >> - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". >> - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. >> - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: >> https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 >> I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. >> >> #### Testing >> - [X] Tier1 >> - [X] Tier5 with IR framework internal tests only >> - [ ] Failing IR framework internal tests on all platforms >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix wrong regex Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28495#pullrequestreview-3518971149 From jpai at openjdk.org Fri Nov 28 13:32:52 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 28 Nov 2025 13:32:52 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: <1tazzYHm78XLDovV11RAQt2W-ujENi4b_frOa87Jv14=.45b6d8a1-cb76-49ac-8048-429916bc9c6c@github.com> Message-ID: On Fri, 28 Nov 2025 10:52:42 GMT, Alan Bateman wrote: >> I took this sentence from `@AOTSafeClassInitializer`. The term "privileged" comes from this variable in `classFileParser.cpp`: >> https://github.com/openjdk/jdk/blob/d94c52ccf2fed3fc66d25a34254c9b581c175fa1/src/hotspot/share/classfile/classFileParser.cpp#L1818-L1820 >> >> The other annotations have this note, which seems incorrect from the hotspot excerpt: >> >> @implNote >> This annotation only takes effect for fields of classes loaded by the boot >> loader. Annotations on fields of classes loaded outside of the boot loader >> are ignored. >> >> >> This behavior seems to be originally changed by 6964a690ed9a23d4c0692da2dfbced46e1436355, referring to an inaccessible issue. >> >> What should I do with this? Should I leave this as-is and create a separate patch to update this comment for vm.annotation annotations, or fix this first and have the separate patch fix other annotations later? > > For this PR then you could just change the last sentence to say that the annotation is only effective for classes defined by the boot class loader or platform class loader. A follow-up PR could propose changes to the other annotation descriptions. > > As regards background then one of the significant changes in JDK 9 was that java.* modules could be mapped to the platform class loader without give them "all permission" in the security manager execution mode. If you see JBS issues or comments speaking of "de-privileging" then it's likely related to changes that "moved" modules that were originally mapped to the boot class loader to the platform class loader. Now that the security manager execution mode is gone then we don't have to deal with all these issues now. Hello Chen, should this annotation also mention what happens if a class annotated with `@TrustFinalFields` has any of its `final` fields updated? For example, `@Stable` has this to say about such unexpected updates: ...It is in general a bad idea to reset such * variables to any other value, since compiled code might have folded * an earlier stored value, and will never detect the reset value. Are there any unexpected consequences of marking a class as `@TrustFinalFields` and having a `@Stable` on any of the final fields (for example an array): @TrustedFinalFields class JDKFooBar { private final String reallyFinal; @Stable private final int reallyFinalButAlsoStable; @Stable private final long[] finalAndStableArray; } Finally, would it still be recommended that a class annotated with `@TrustFinalFields` also have a final array field annoted with `@Stable` if that array field elements are initialized to a non-default value only once? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2571701254 From jpai at openjdk.org Fri Nov 28 13:38:47 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 28 Nov 2025 13:38:47 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: <1tazzYHm78XLDovV11RAQt2W-ujENi4b_frOa87Jv14=.45b6d8a1-cb76-49ac-8048-429916bc9c6c@github.com> Message-ID: On Fri, 28 Nov 2025 13:30:33 GMT, Jaikiran Pai wrote: >> For this PR then you could just change the last sentence to say that the annotation is only effective for classes defined by the boot class loader or platform class loader. A follow-up PR could propose changes to the other annotation descriptions. >> >> As regards background then one of the significant changes in JDK 9 was that java.* modules could be mapped to the platform class loader without give them "all permission" in the security manager execution mode. If you see JBS issues or comments speaking of "de-privileging" then it's likely related to changes that "moved" modules that were originally mapped to the boot class loader to the platform class loader. Now that the security manager execution mode is gone then we don't have to deal with all these issues now. > > Hello Chen, should this annotation also mention what happens if a class annotated with `@TrustFinalFields` has any of its `final` fields updated? For example, `@Stable` has this to say about such unexpected updates: > > > ...It is in general a bad idea to reset such > * variables to any other value, since compiled code might have folded > * an earlier stored value, and will never detect the reset value. > > > Are there any unexpected consequences of marking a class as `@TrustFinalFields` and having a `@Stable` on any of the final fields (for example an array): > > > @TrustedFinalFields > class JDKFooBar { > private final String reallyFinal; > > @Stable > private final int reallyFinalButAlsoStable; > > @Stable > private final long[] finalAndStableArray; > > } > > Finally, would it still be recommended that a class annotated with `@TrustFinalFields` also have a final array field annoted with `@Stable` if that array field elements are initialized to a non-default value only once? One another question - if a class/interface is annotated with `@TargetFinalFields`, is that annotation only applicable to that specific class or would it also be applicable for any (final fields in) subclasses of that class or implementations of that interface (does the VM ignore this annotation on an interface, should it)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2571717831 From mbaesken at openjdk.org Fri Nov 28 13:39:13 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 28 Nov 2025 13:39:13 GMT Subject: RFR: 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 Message-ID: [JDK-8372720](https://bugs.openjdk.org/browse/JDK-8372720) problem listed the test compiler/arguments/TestCodeEntryAlignment.java on macOS x64 but the issue appears on other OS running on x64 CPUs (e.g. Linux) too . ------------- Commit messages: - JDK-8372730 Changes: https://git.openjdk.org/jdk/pull/28553/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28553&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372730 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28553.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28553/head:pull/28553 PR: https://git.openjdk.org/jdk/pull/28553 From jpai at openjdk.org Fri Nov 28 13:47:48 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 28 Nov 2025 13:47:48 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 16:16:05 GMT, Chen Liang wrote: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. src/java.base/share/classes/jdk/internal/vm/annotation/TrustFinalFields.java line 49: > 47: /// As a result, this should be used on classes where package-wide trusting is > 48: /// not possible due to backward compatibility concerns, such as for `java.util` > 49: /// classes. Should this sentence be reworded? It's not clear what the backward compatible concerns (for `java.util` package) are. I think it might be better to leave out any backward compatibility part when explaining which classes to use this annotation on. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2571741641 From roland at openjdk.org Fri Nov 28 14:54:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Nov 2025 14:54:52 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: <2wAnS7drj_r3dqsy5CEF9vBG40KizHsQDOxMeNymwhw=.9bc29879-eead-401c-b750-814592feff63@github.com> References: <2oDqUvcW_3hJRPRri4uttpkgfeCovL4ZZkcI0R1bB1A=.173b3a58-d0f1-4b29-94d1-77b0a350c790@github.com> <2wAnS7drj_r3dqsy5CEF9vBG40KizHsQDOxMeNymwhw=.9bc29879-eead-401c-b750-814592feff63@github.com> Message-ID: <-1wiWF_UEvCO6xPuYvIsElBzPPQDejGahm9Xd5YszPU=.cfb41cb1-f681-4e75-8c29-2d928468f53b@github.com> On Fri, 28 Nov 2025 08:54:49 GMT, Zihao Lin wrote: >> Here is a assert failed >> >> command: main -XX:CompileCommand=dontinline,compiler.arraycopy.TestArrayCopyMemoryChain::test* -Xbatch compiler.arraycopy.TestArrayCopyMemoryChain >> reason: User specified action: run main/othervm -XX:CompileCommand=dontinline,compiler.arraycopy.TestArrayCopyMemoryChain::test* -Xbatch compiler.arraycopy.TestArrayCopyMemoryChain >> started: Fri Nov 28 16:36:37.189 CST 2025 >> Mode: othervm [/othervm specified] >> Process id: 16782 >> finished: Fri Nov 28 16:36:37.350 CST 2025 >> elapsed time (seconds): 0.161 >> configuration: >> STDOUT: >> CompileCommand: dontinline compiler/arraycopy/TestArrayCopyMemoryChain.test* bool dontinline = true >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/Users/linzihao/Desktop/jdk-dev/src/hotspot/share/opto/escape.cpp:4184), pid=16782, tid=26115 >> # assert(result != nullptr) failed: new projection should have been allocated >> # >> # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.linzihao.jdk-dev) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.linzihao.jdk-dev, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) >> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # /Users/linzihao/Desktop/jdk-dev/build/macosx-aarch64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_arraycopy_TestArrayCopyMemoryChain_java/scratch/0/hs_err_pid16782.log >> # >> # Compiler replay data is saved as: >> # /Users/linzihao/Desktop/jdk-dev/build/macosx-aarch64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_arraycopy_TestArrayCopyMemoryChain_java/scratch/0/replay_pid16782.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # > > The assert failed because `find_inst_mem()` skipped an Initialize memory projection whose `adr_type` was still the general slice, then tried to fetch the instance-specific projection from `_node_map` and got nullptr. That happens when a precise `NarrowMemProj` already exists: the code doesn?t create a new one and also never records the mapping, so later lookup fails. > > The fix records the mapping even if the precise `NarrowMemProj` is already present (not newly created). I had a closer look and I think you ran into an inconsistency. Let me see if I can get it fixed as a separate change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2571905920 From dlunden at openjdk.org Fri Nov 28 15:05:56 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 28 Nov 2025 15:05:56 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v23] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 15:47:54 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 > - Incorporating polished comments suggestions from Daniel > - Review comments resolution > - Review comments resolutions > - Review comments resolution > - Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. > - Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated > - Review comments resolution > - Review comments resolutions > - Moving demotion candidate marking to AD file, review comments resolutions > - ... and 11 more: https://git.openjdk.org/jdk/compare/1ce2a44e...93577b83 Tests look good! ------------- Marked as reviewed by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3519298644 From liach at openjdk.org Fri Nov 28 15:08:56 2025 From: liach at openjdk.org (Chen Liang) Date: Fri, 28 Nov 2025 15:08:56 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 13:45:19 GMT, Jaikiran Pai wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > src/java.base/share/classes/jdk/internal/vm/annotation/TrustFinalFields.java line 49: > >> 47: /// As a result, this should be used on classes where package-wide trusting is >> 48: /// not possible due to backward compatibility concerns, such as for `java.util` >> 49: /// classes. > > Should this sentence be reworded? It's not clear what the backward compatible concerns (for `java.util` package) are. I think it might be better to leave out any backward compatibility part when explaining which classes to use this annotation on. Existing users have been hacking java.util final fields. I think leaving out the backward compatibility part causes more trouble, because otherwise people can just blanket-approve java.util classes for trusting and break those applications. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2571941033 From liach at openjdk.org Fri Nov 28 15:08:57 2025 From: liach at openjdk.org (Chen Liang) Date: Fri, 28 Nov 2025 15:08:57 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: <1tazzYHm78XLDovV11RAQt2W-ujENi4b_frOa87Jv14=.45b6d8a1-cb76-49ac-8048-429916bc9c6c@github.com> Message-ID: On Fri, 28 Nov 2025 13:35:51 GMT, Jaikiran Pai wrote: >> Hello Chen, should this annotation also mention what happens if a class annotated with `@TrustFinalFields` has any of its `final` fields updated? For example, `@Stable` has this to say about such unexpected updates: >> >> >> ...It is in general a bad idea to reset such >> * variables to any other value, since compiled code might have folded >> * an earlier stored value, and will never detect the reset value. >> >> >> Are there any unexpected consequences of marking a class as `@TrustFinalFields` and having a `@Stable` on any of the final fields (for example an array): >> >> >> @TrustedFinalFields >> class JDKFooBar { >> private final String reallyFinal; >> >> @Stable >> private final int reallyFinalButAlsoStable; >> >> @Stable >> private final long[] finalAndStableArray; >> >> } >> >> Finally, would it still be recommended that a class annotated with `@TrustFinalFields` also have a final array field annoted with `@Stable` if that array field elements are initialized to a non-default value only once? > > One another question - if a class/interface is annotated with `@TargetFinalFields`, is that annotation only applicable to that specific class or would it also be applicable for any (final fields in) subclasses of that class or implementations of that interface (does the VM ignore this annotation on an interface, should it)? I don't think we should mention anything about updating final fields. If you use this field, you intend the fields not to get subsequently updated. Promising the behavior in this case only introduces more trouble and is meaningless for this annotation's readers. For inheritance, we can add a word or two. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2571939030 From duke at openjdk.org Fri Nov 28 15:21:11 2025 From: duke at openjdk.org (Max Verevkin) Date: Fri, 28 Nov 2025 15:21:11 GMT Subject: RFR: 8366076: arm32: Fix register allocation for vector instructions [v2] In-Reply-To: <17J8mScwi2eBCPmmmTJd0ittihe0BfqZYuPgC638L8Q=.6e87a120-905f-4a30-a6f0-7e80fd613144@github.com> References: <17J8mScwi2eBCPmmmTJd0ittihe0BfqZYuPgC638L8Q=.6e87a120-905f-4a30-a6f0-7e80fd613144@github.com> Message-ID: > Arm32 has 32 double-precision floating point registers, the first 16 of which coincide with the 32 single-precision floating point registers. Some vector-operation nodes were implemented in terms of scalar instructions, which only really works for the first 16 doubles. This commit addresses that. Max Verevkin has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8366076: arm32: Fix register allocation for vector instructions Arm32 has 32 double-precision floating point registers, the first 16 of which coincide with the 32 single-precision floating point registers. Some vector-operation nodes were implemented in terms of scalar instructions, which only really works for the first 16 doubles. This commit addresses that. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27071/files - new: https://git.openjdk.org/jdk/pull/27071/files/7b7f61fe..2faabeed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27071&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27071&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27071.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27071/head:pull/27071 PR: https://git.openjdk.org/jdk/pull/27071 From duke at openjdk.org Fri Nov 28 15:21:12 2025 From: duke at openjdk.org (Max Verevkin) Date: Fri, 28 Nov 2025 15:21:12 GMT Subject: RFR: 8366076: arm32: Fix register allocation for vector instructions [v2] In-Reply-To: References: <17J8mScwi2eBCPmmmTJd0ittihe0BfqZYuPgC638L8Q=.6e87a120-905f-4a30-a6f0-7e80fd613144@github.com> <11z84H0pSO4eduTEEVcUelci_1MxZMimuwouswlt8W0=.a0d59c62-092c-4620-b4c2-c2ff62423c4e@github.com> Message-ID: On Tue, 18 Nov 2025 05:23:14 GMT, Dean Long wrote: >> I am not 100% sure if they are completely equivalent and `dflt_low_reg` could be used instead of defining a new class. I figured I should introduce a new class similar to how `sflt_reg` and `dflt_low_reg` are similar yet distinct. > > A reg_class just produces a RegMask, so there is no need to give identical masks different names. Aliasing can still be done. See `actual_dflt_reg` for example. Sorry for such a long delay. Should be addressed now :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27071#discussion_r2571965840 From aph at openjdk.org Fri Nov 28 15:25:01 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 28 Nov 2025 15:25:01 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v5] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 15:55:38 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - Grossly simplify register shuffling > - More asserts > - More comment touchups > - Inline code comments > - Mention the updater in ReceiverTypeData > - type_profile -> profile_receiver_type > - Stylistic: remove redundant assert > - ... and 5 more: https://git.openjdk.org/jdk/compare/c028369d...c441209a I'm seeing minor performance regressions in `InterfaceCalls.test2ndInt5Types`, before and after this PR: Mainline: Benchmark (randomized) Mode Cnt Score Error Units InterfaceCalls.test2ndInt5Types false avgt 4 28.185 ? 0.538 ns/op InterfaceCalls.test2ndInt5Types:IPC false avgt 2.232 insns/clk InterfaceCalls.test2ndInt5Types:branch-misses:u false avgt 0.342 #/op InterfaceCalls.test2ndInt5Types:instructions:u false avgt 206.028 #/op This PR: Benchmark (randomized) Mode Cnt Score Error Units InterfaceCalls.test2ndInt5Types false avgt 4 32.247 ? 0.109 ns/op InterfaceCalls.test2ndInt5Types:IPC false avgt 2.231 insns/clk InterfaceCalls.test2ndInt5Types:branch-misses:u false avgt 0.561 #/op InterfaceCalls.test2ndInt5Types:instructions:u false avgt 238.324 #/op model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz java -XX:+UnlockExperimentalVMOptions -XX:ProfileCaptureRatio=1 -jar /home/aph/theRealAph-jdk/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar test2ndInt5Types -p randomized=false -f 1 -jvmArgs ' -XX:TieredStopAtLevel=3' -t 1 -prof perfnorm ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3589699041 From vpaprotski at openjdk.org Fri Nov 28 16:30:59 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 28 Nov 2025 16:30:59 GMT Subject: RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3] In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 08:23:27 GMT, Tobias Hartmann wrote: >> Oh.. realized that I should had checked JBS.. thanks @ascarpino for resolving the bug I caused! At least its just the option.. whew. >> >>> @dholmes-ora Hi David, need some help with this please, don't have access to an ARM system to reproduce (or the ARM expertise).. could you point me at the failing job if thats available? Or some log if not? >>> >>> * Is it an issue with the options (i.e. `-XX:UseAVX=2` perhaps). I probably should had added `-XX:+IgnoreUnrecognizedVMOptions` to it.. >>> * Otherwise, I am stumped.. the test case isn't architecture-specific.. it calls two methods (one of which is annotated as an intrinsic..) and expects them to return the same value.. i.e. Java and Intrinsic version should behave the same.. >>> * Only thing I can think of.. The ARM implementation took some shortcuts in name of optimization. This can be entirely valid if the code calling the intrinsics never should get some specific value (-ranges). i.e. the tests RNG be further restricted.. >>> * Otherwise.. is it possible its a bug in the ARM intrinsic? > > This caused a regression: [JDK-8372703](https://bugs.openjdk.org/browse/JDK-8372703). @vpaprotsk Could you please have a look? Thanks. @TobiHartmann looking! - Havent been able to reproduce yet (and folks with machine access I need are away today, US holiday) - From the first glance, the error is about code size (and this intrinsic is indeed large..). But that shouldnt be platform-dependent, iirc.. except I see `enum platform_dependent_constants` is no longer just a simple static sum of ints.. hmm. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3589866931 From jbhateja at openjdk.org Fri Nov 28 19:01:55 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 28 Nov 2025 19:01:55 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v3] In-Reply-To: References: <52RpYM-r-1EZcYjbaNllAEPHQP1nYhQcs-GfydIzP08=.0bfb8185-78a7-4dfb-9700-f4a36a1d0e99@github.com> Message-ID: On Fri, 19 Sep 2025 16:21:26 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix jtreg, one less spill > > Looks good to me. Hi @sviswa7. @iwanowww , can you kindly re-approve this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3590146768 From jiangli at openjdk.org Sat Nov 29 04:59:53 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Sat, 29 Nov 2025 04:59:53 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v10] In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 09:42:22 GMT, Aleksey Shipilev wrote: > Oh man, the `pos`/`len` modifications in current code are confusing. I scratched my head for quite a while trying to comprehend why does `__ bind(MESG_BELOW_32_BLKS)` split the `pos += 16` and `len -= 16`? On a surface, that just looks like a bug. The combination of handling for the fall through from `ENCRYPT_16_BLKS` and conditional entry to `MESG_BELOW_32_BLKS` cases are subtle. I had also missed the fall through case in my initial proposed fix (with comp/jcc) until @sviswa7 pointed it out and suggested the current fix. The fix for `StubGenerator::aesgcm_avx512` with moving `__ addl(pos, 16 * 16)` to be before `__ bind(MESG_BELOW_32_BLKS)` works correctly for both the fall through and conditional jump cases now. > > But looks that way because we do `initial_blocks_16_avx512` twice, do `pos += 16` twice, but only do the `len += 32` after the second call. Which does not help if we shortcut after the first call. In fact, I am not at all sure that checking `len < 32` _without_ modifying `len` beforehand does not break the assumptions downstream: > > ``` > initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset); > __ addl(pos, 16 * 16); > __ cmpl(len, 32 * 16); > __ jcc(Assembler::below, MESG_BELOW_32_BLKS); > ``` > > Really, in these kind of intrinsics, _you want_ to make sure `pos` and `len` updates are tightly bound together, otherwise these kinds of mistakes would keep happening. You will lose on code density a bit, but would have more readable and robust code. > > Shouldn't it be like this? > > ``` > diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp > index 1e728ffa279..a16e25b075d 100644 > --- a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp > +++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp > @@ -3475,12 +3475,14 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist > > initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset); > __ addl(pos, 16 * 16); > + __ subl(len, 16 * 16); > + > __ cmpl(len, 32 * 16); > __ jcc(Assembler::below, MESG_BELOW_32_BLKS); > > initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset + 16); > __ addl(pos, 16 * 16); > - __ subl(len, 32 * 16); > + __ subl(len, 16 * 16); > > __ cmpl(len, 32 * 16); > __ jcc(Assembler::below, NO_BIG_BLKS); > @@ -3491,24 +3493,27 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist > ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, > true, true, false, false, false, ghashin_offset, aesout_offset, HashKey_32); > __ addl(pos, 16 * 16); > + __ subl(len, 16 * 16); > > ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, > true, false, true, false, true, ghashin_offset + 16, aesout_offset + 16, HashKey_16); > __ evmovdquq(AAD_HASHx, ZTMP4, Assembler::AVX_512bit); > __ addl(pos, 16 * 16); > - __ subl(len, 32 * 16); > + __ subl(len, 16 * 16); > __ jmp(ENCRYPT_BIG_BLKS_NO_HXOR); > > __ bind(ENCRYPT_BIG_NBLKS); > ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, > false, true, false, false, false, ghashin_offset, aesout_offset, HashKey_32); > __ addl(pos, 16 * 16); > + __ subl(len, 16 * 16); > + > ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, > false, false, true, true, true, ghashin_offset + 16, aesout_offset + 16, HashKey_16); > > __ movdqu(AAD_HASHx, ZTMP4); > __ addl(pos, 16 * 16); > - __ subl(len, 32 * 16); > + __ subl(len, 16 * 16); > > __ bind(NO_BIG_BLKS); > __ cmpl(len, 16 * 16); > @@ -3525,9 +3530,9 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist > > ghash16_avx512(false, true, false, false, true, in, pos, avx512_subkeyHtbl, AAD_HASHx, SHUF_MASK, stack_offset, 16 * 16, 0, HashKey_16); > __ addl(pos, 16 * 16); > + __ subl(len, 16 * 16); > > __ bind(MESG_BELOW_32_BLKS); > - __ subl(len, 16 * 16); > gcm_enc_dec_last_avx512(len, in, pos, AAD_HASHx, SHUF_MASK, avx512_subkeyHtbl, ghashin_offset, HashKey_16, true, true); > > __ bind(GHASH_DONE); > ``` Improving readability is a good idea, hand-rolled assembly however is mostly motivated by performance. While `sub` with immediate value is fast and takes one cpu cycle, I would agree with the original author of `aesgcm_avx512` on combining two `sub` instructions into one instruction whenever possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28363#issuecomment-3590992730 From jiangli at openjdk.org Sat Nov 29 04:59:54 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Sat, 29 Nov 2025 04:59:54 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v10] In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 04:55:44 GMT, Jiangli Zhou wrote: >> Oh man, the `pos`/`len` modifications in current code are confusing. I scratched my head for quite a while trying to comprehend why does `__ bind(MESG_BELOW_32_BLKS)` split the `pos += 16` and `len -= 16`? On a surface, that just looks like a bug. >> >> But looks that way because we do `initial_blocks_16_avx512` twice, do `pos += 16` twice, but only do the `len += 32` after the second call. Which does not help if we shortcut after the first call. In fact, I am not at all sure that checking `len < 32` _without_ modifying `len` beforehand does not break the assumptions downstream: >> >> >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset); >> __ addl(pos, 16 * 16); >> __ cmpl(len, 32 * 16); >> __ jcc(Assembler::below, MESG_BELOW_32_BLKS); >> >> >> Really, in these kind of intrinsics, _you want_ to make sure `pos` and `len` updates are tightly bound together, otherwise these kinds of mistakes would keep happening. You will lose on code density a bit, but would have more readable and robust code. >> >> Shouldn't it be like this? >> >> >> diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> index 1e728ffa279..a16e25b075d 100644 >> --- a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> +++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> @@ -3475,12 +3475,14 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist >> >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset); >> __ addl(pos, 16 * 16); >> + __ subl(len, 16 * 16); >> + >> __ cmpl(len, 32 * 16); >> __ jcc(Assembler::below, MESG_BELOW_32_BLKS); >> >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset + 16); >> __ addl(pos, 16 * 16); >> - __ subl(len, 32 * 16); >> + __ subl(len, 16 * 16); >> >> __ cmpl(len, 32 * 16); >> __ jcc(Assembler::below, NO_BIG_BLKS); >> @@ -3491,24 +3493,27 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist >> ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, >> ... > >> Oh man, the `pos`/`len` modifications in current code are confusing. I scratched my head for quite a while trying to comprehend why does `__ bind(MESG_BELOW_32_BLKS)` split the `pos += 16` and `len -= 16`? On a surface, that just looks like a bug. > > The combination of handling for the fall through from `ENCRYPT_16_BLKS` and conditional entry to `MESG_BELOW_32_BLKS` cases are subtle. > > I had also missed the fall through case in my initial proposed fix (with comp/jcc) until @sviswa7 pointed it out and suggested the current fix. The fix for `StubGenerator::aesgcm_avx512` with moving `__ addl(pos, 16 * 16)` to be before `__ bind(MESG_BELOW_32_BLKS)` works correctly for both the fall through and conditional jump cases now. > >> >> But looks that way because we do `initial_blocks_16_avx512` twice, do `pos += 16` twice, but only do the `len += 32` after the second call. Which does not help if we shortcut after the first call. In fact, I am not at all sure that checking `len < 32` _without_ modifying `len` beforehand does not break the assumptions downstream: >> >> ``` >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset); >> __ addl(pos, 16 * 16); >> __ cmpl(len, 32 * 16); >> __ jcc(Assembler::below, MESG_BELOW_32_BLKS); >> ``` >> >> Really, in these kind of intrinsics, _you want_ to make sure `pos` and `len` updates are tightly bound together, otherwise these kinds of mistakes would keep happening. You will lose on code density a bit, but would have more readable and robust code. >> >> Shouldn't it be like this? >> >> ``` >> diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> index 1e728ffa279..a16e25b075d 100644 >> --- a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> +++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp >> @@ -3475,12 +3475,14 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist >> >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset); >> __ addl(pos, 16 * 16); >> + __ subl(len, 16 * 16); >> + >> __ cmpl(len, 32 * 16); >> __ jcc(Assembler::below, MESG_BELOW_32_BLKS); >> >> initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, ... > @jianglizhou thank you for the AVX2 related output from the unit test pre-fix. From this I was able to trace the point of failure and see that your proposed changes are good for approval. Thank you for your work on these issues! @smemery Thanks for carefully testing the changes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28363#issuecomment-3590993560 From jpai at openjdk.org Sat Nov 29 07:12:53 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Sat, 29 Nov 2025 07:12:53 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 15:06:50 GMT, Chen Liang wrote: >> src/java.base/share/classes/jdk/internal/vm/annotation/TrustFinalFields.java line 49: >> >>> 47: /// As a result, this should be used on classes where package-wide trusting is >>> 48: /// not possible due to backward compatibility concerns, such as for `java.util` >>> 49: /// classes. >> >> Should this sentence be reworded? It's not clear what the backward compatible concerns (for `java.util` package) are. I think it might be better to leave out any backward compatibility part when explaining which classes to use this annotation on. > > Existing users have been hacking java.util final fields. I think leaving out the backward compatibility part causes more trouble, because otherwise people can just blanket-approve java.util classes for trusting and break those applications. Hello Chen, > because otherwise people can just blanket-approve java.util classes for trusting and break those applications. This is one of the reasons why I asked some of the questions that I did. We have seen several PRs in the recent past where `@Stable` annotation has been introduced in the core classes of Java SE because it aids constant folding optimizations. Most of those changes have been backed merely by JMH benchmarks. It won't be a surprise if we start seeing another round of PRs where the usage of this new `@TrustFinalFields` gets proposed to some of these classes in the JDK because it shows an improvement in some micro benchmark. It also won't be a surprise if those PRs too won't have associated regression tests. Furthermore, unlike `@Stable` which gets applied directly on the field(s) of interest, this new annotation will be applied a bit "far away" from such fields. So it will need additional review cycles to understand if this usage can impact the code functionally in any manner. Specifying the semantics of this annotation in various usage scenarios, in its javadoc, will aid in reviewing su ch changes in future, instead of having to regularly look into the JVM code to understand how this annotation behaves. Classes in `java.util` aren't special in any way. So if applications are changing the values of final fields of some of those classes, then the same would be done for other packages of Java SE APIs too. If, like you note, applying `@TrustFinalFields` on such classes is going to break applications, then it will be useful to specify what kind of breakages those will be (in a similar manner to what the `@Stable` annotation's javadoc does). Very specifically, I think adding a few sentences clarifying the following scenarios in this annotation's javadoc will be useful: - Will this annotation be honoured only on the specific class that it is applied to? Or will it be taken into consideration for final fields in subclasses too? - If this annotation gets applied on a class and if that class has some final fields which are already marked `@Stable`, what kind of implications will that have, if any? - If this annotation is marked on a class which has a `final` array field (for example `final long[] ids`), is it useful to continue placing a `@Stable` annotation on such array fields if the elements of those arrays are going to be initialized to a non-default value just once? - If after all the precautions are taken, if the final field of a class annotated with `@TrustFinalFields` does get updated to a new value, what kind of impact would it have (stating that such behaviour is unspecified and in general is a bad idea would be enough, if that's all there is to it) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2572850670 From duke at openjdk.org Sat Nov 29 08:47:35 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 29 Nov 2025 08:47:35 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v9] In-Reply-To: References: Message-ID: > If nodes both are constant, support constant folding. Zihao Lin has updated the pull request incrementally with two additional commits since the last revision: - fix test failed - fix make unsigned ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28097/files - new: https://git.openjdk.org/jdk/pull/28097/files/a85229f0..6f57bcb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=07-08 Stats: 82 lines in 3 files changed: 24 ins; 48 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From sviswanathan at openjdk.org Sat Nov 29 17:42:54 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 29 Nov 2025 17:42:54 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v23] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 15:47:54 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 > - Incorporating polished comments suggestions from Daniel > - Review comments resolution > - Review comments resolutions > - Review comments resolution > - Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. > - Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated > - Review comments resolution > - Review comments resolutions > - Moving demotion candidate marking to AD file, review comments resolutions > - ... and 11 more: https://git.openjdk.org/jdk/compare/1ce2a44e...93577b83 Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3520771776 From liach at openjdk.org Sun Nov 30 05:24:43 2025 From: liach at openjdk.org (Chen Liang) Date: Sun, 30 Nov 2025 05:24:43 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v2] In-Reply-To: References: Message-ID: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Essay ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28540/files - new: https://git.openjdk.org/jdk/pull/28540/files/f02b9da2..712dbf1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=00-01 Stats: 150 lines in 2 files changed: 130 ins; 8 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/28540.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28540/head:pull/28540 PR: https://git.openjdk.org/jdk/pull/28540 From liach at openjdk.org Sun Nov 30 05:24:44 2025 From: liach at openjdk.org (Chen Liang) Date: Sun, 30 Nov 2025 05:24:44 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v2] In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 07:10:20 GMT, Jaikiran Pai wrote: >> Existing users have been hacking java.util final fields. I think leaving out the backward compatibility part causes more trouble, because otherwise people can just blanket-approve java.util classes for trusting and break those applications. > > Hello Chen, > >> because otherwise people can just blanket-approve java.util classes for trusting and break those applications. > > This is one of the reasons why I asked some of the questions that I did. We have seen several PRs in the recent past where `@Stable` annotation has been introduced in the core classes of Java SE because it aids constant folding optimizations. Most of those changes have been backed merely by JMH benchmarks. It won't be a surprise if we start seeing another round of PRs where the usage of this new `@TrustFinalFields` gets proposed to some of these classes in the JDK because it shows an improvement in some micro benchmark. It also won't be a surprise if those PRs too won't have associated regression tests. Furthermore, unlike `@Stable` which gets applied directly on the field(s) of interest, this new annotation will be applied a bit "far away" from such fields. So it will need additional review cycles to understand if this usage can impact the code functionally in any manner. Specifying the semantics of this annotation in various usage scenarios, in its javadoc, will aid in reviewing such changes in future, instead of having to regularly look into the JVM code to understand how this annotation behaves. > > Classes in `java.util` aren't special in any way. So if applications are changing the values of final fields of some of those classes, then the same would be done for other packages of Java SE APIs too. If, like you note, applying `@TrustFinalFields` on such classes is going to break applications, then it will be useful to specify what kind of breakages those will be (in a similar manner to what the `@Stable` annotation's javadoc does). > > Very specifically, I think adding a few sentences clarifying the following scenarios in this annotation's javadoc will be useful: > > - Will this annotation be honoured only on the specific class that it is applied to? Or will it be taken into consideration for final fields in subclasses too? > - If this annotation gets applied on a class and if that class has some final fields which are already marked `@Stable`, what kind of implications will that have, if any? > - If this annotation is marked on a class which has a `final` array field (for example `final long[] ids`), is it useful to continue placing a `@Stable` annotation on such array fields if the elements of those arrays are going to be initialized to a non-default value just once? > - If after all the precautions are taken, if the final field of a class... If you want an essay, I have written one - I just hope whatever bikeshedding for this essay does not affect the progress of Lazy Constant's performance demands. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2573355600 From alanb at openjdk.org Sun Nov 30 07:51:51 2025 From: alanb at openjdk.org (Alan Bateman) Date: Sun, 30 Nov 2025 07:51:51 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v2] In-Reply-To: References: Message-ID: On Sun, 30 Nov 2025 05:24:43 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Essay src/java.base/share/classes/jdk/internal/vm/annotation/constant-folding.md line 1: > 1: Constant Folding in the Hotspot Compiler I assume any write-up of HotSpot constant folding should move into src/hotspot tree, maybe a block comment in one of the source files? src/java.base/share/classes/jdk/internal/vm/annotation/constant-folding.md line 106: > 104: `trustedFinal` setting. > 105: > 106: ### Make Final Mean Final I think you can drop this section for now. It's okay to reference JEP 500 but it will be annoying to have to maintain this text as there are many steps to follow this one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2573492977 PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2573493426 From alanb at openjdk.org Sun Nov 30 07:54:46 2025 From: alanb at openjdk.org (Alan Bateman) Date: Sun, 30 Nov 2025 07:54:46 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v2] In-Reply-To: References: Message-ID: On Sun, 30 Nov 2025 05:19:22 GMT, Chen Liang wrote: >> Hello Chen, >> >>> because otherwise people can just blanket-approve java.util classes for trusting and break those applications. >> >> This is one of the reasons why I asked some of the questions that I did. We have seen several PRs in the recent past where `@Stable` annotation has been introduced in the core classes of Java SE because it aids constant folding optimizations. Most of those changes have been backed merely by JMH benchmarks. It won't be a surprise if we start seeing another round of PRs where the usage of this new `@TrustFinalFields` gets proposed to some of these classes in the JDK because it shows an improvement in some micro benchmark. It also won't be a surprise if those PRs too won't have associated regression tests. Furthermore, unlike `@Stable` which gets applied directly on the field(s) of interest, this new annotation will be applied a bit "far away" from such fields. So it will need additional review cycles to understand if this usage can impact the code functionally in any manner. Specifying the semantics of this annotation in various usage scenarios, in its javadoc, will aid in reviewing such changes in future, instead of having to regularly look into the JVM code to understand how this annotation behaves. >> >> Classes in `java.util` aren't special in any way. So if applications are changing the values of final fields of some of those classes, then the same would be done for other packages of Java SE APIs too. If, like you note, applying `@TrustFinalFields` on such classes is going to break applications, then it will be useful to specify what kind of breakages those will be (in a similar manner to what the `@Stable` annotation's javadoc does). >> >> Very specifically, I think adding a few sentences clarifying the following scenarios in this annotation's javadoc will be useful: >> >> - Will this annotation be honoured only on the specific class that it is applied to? Or will it be taken into consideration for final fields in subclasses too? >> - If this annotation gets applied on a class and if that class has some final fields which are already marked `@Stable`, what kind of implications will that have, if any? >> - If this annotation is marked on a class which has a `final` array field (for example `final long[] ids`), is it useful to continue placing a `@Stable` annotation on such array fields if the elements of those arrays are going to be initialized to a non-default value just once? >> - If after all the precautions are taken, if... > > If you want an essay, I have written one - I just hope whatever bikeshedding for this essay does not affect the progress of Lazy Constant's performance demands. > * If after all the precautions are taken, if the final field of a class annotated with `@TrustFinalFields` does get updated to a new value, what kind of impact would it have (stating that such behaviour is unspecified and in general is a bad idea would be enough, if that's all there is to it) Field.set, which is probably the API that these libraries are using, already includes a warning about "unpredictable effects, including cases in which other parts of a program continue to use the original value of this field", so I think that is okay for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2573494589 From duke at openjdk.org Sun Nov 30 08:05:52 2025 From: duke at openjdk.org (Zihao Lin) Date: Sun, 30 Nov 2025 08:05:52 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: <-1wiWF_UEvCO6xPuYvIsElBzPPQDejGahm9Xd5YszPU=.cfb41cb1-f681-4e75-8c29-2d928468f53b@github.com> References: <2oDqUvcW_3hJRPRri4uttpkgfeCovL4ZZkcI0R1bB1A=.173b3a58-d0f1-4b29-94d1-77b0a350c790@github.com> <2wAnS7drj_r3dqsy5CEF9vBG40KizHsQDOxMeNymwhw=.9bc29879-eead-401c-b750-814592feff63@github.com> <-1wiWF_UEvCO6xPuYvIsElBzPPQDejGahm9Xd5YszPU=.cfb41cb1-f681-4e75-8c29-2d928468f53b@github.com> Message-ID: On Fri, 28 Nov 2025 14:51:49 GMT, Roland Westrelin wrote: >> The assert failed because `find_inst_mem()` skipped an Initialize memory projection whose `adr_type` was still the general slice, then tried to fetch the instance-specific projection from `_node_map` and got nullptr. That happens when a precise `NarrowMemProj` already exists: the code doesn?t create a new one and also never records the mapping, so later lookup fails. >> >> The fix records the mapping even if the precise `NarrowMemProj` is already present (not newly created). > > I had a closer look and I think you ran into an inconsistency. Let me see if I can get it fixed as a separate change. Sure, it's better to separate to another change. I am not familiar this part, please pin me if you have better solution. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2573499061 From alanb at openjdk.org Sun Nov 30 08:07:46 2025 From: alanb at openjdk.org (Alan Bateman) Date: Sun, 30 Nov 2025 08:07:46 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v2] In-Reply-To: References: Message-ID: On Sun, 30 Nov 2025 07:51:51 GMT, Alan Bateman wrote: >> If you want an essay, I have written one - I just hope whatever bikeshedding for this essay does not affect the progress of Lazy Constant's performance demands. > >> * If after all the precautions are taken, if the final field of a class annotated with `@TrustFinalFields` does get updated to a new value, what kind of impact would it have (stating that such behaviour is unspecified and in general is a bad idea would be enough, if that's all there is to it) > > Field.set, which is probably the API that these libraries are using, already includes a warning about "unpredictable effects, including cases in which other parts of a program continue to use the original value of this field", so I think that is okay for now. > Existing users have been hacking java.util final fields. I think leaving out the backward compatibility part causes more trouble, because otherwise people can just blanket-approve java.util classes for trusting and break those applications. I don't think we have a lot of data on this as it doesn't lend itself to static analysis. Aside from serialization libraries, it's possible the hacking of finals is ad hoc and in random areas (someone pointed to something in Netty hacking a final field in a class in sun.nio.ch at one point). So I probably wouldn't call out java.util specifically but maybe you brought that up specifically as there are so many performance critical classes there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2573499552 From liach at openjdk.org Sun Nov 30 14:52:52 2025 From: liach at openjdk.org (Chen Liang) Date: Sun, 30 Nov 2025 14:52:52 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v2] In-Reply-To: References: Message-ID: On Sun, 30 Nov 2025 07:47:39 GMT, Alan Bateman wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Essay > > src/java.base/share/classes/jdk/internal/vm/annotation/constant-folding.md line 1: > >> 1: Constant Folding in the Hotspot Compiler > > I assume any write-up of HotSpot constant folding should move into src/hotspot tree, maybe a block comment in one of the source files? I intend this to be a user-oriented guide on constant folding. I should just call it constant folding. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2573814710 From liach at openjdk.org Sun Nov 30 14:52:53 2025 From: liach at openjdk.org (Chen Liang) Date: Sun, 30 Nov 2025 14:52:53 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v2] In-Reply-To: References: Message-ID: On Sun, 30 Nov 2025 14:50:23 GMT, Chen Liang wrote: >> src/java.base/share/classes/jdk/internal/vm/annotation/constant-folding.md line 1: >> >>> 1: Constant Folding in the Hotspot Compiler >> >> I assume any write-up of HotSpot constant folding should move into src/hotspot tree, maybe a block comment in one of the source files? > > I intend this to be a user-oriented guide on constant folding. I should just call it constant folding. I intend this to be a user-oriented guide on constant folding. I should just call it constant folding. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2573814847 From alautiero at gmail.com Sun Nov 30 16:50:44 2025 From: alautiero at gmail.com (Alessandro Autiero) Date: Sun, 30 Nov 2025 17:50:44 +0100 Subject: [C2] PEXT/PDEP intrinsics cause performance regression on AMD pre-Zen 3 CPUs Message-ID: Hi, today I stumbled upon a performance issue with the Long.compress/expand and Integer.compress/expand intrinsics on certain AMD processors. I discovered this while working on an optimized varint decoder where I was hoping to use Long.compress() to speed up bit extraction. Instead, I found my "optimized" version was slower than my naive loop-based implementation. After some digging, I believe I understand what's happening. **Background** The compress and expand methods (added in JDK 19 via JDK-8283893 [1]) are intrinsified by C2 to use the BMI2 PEXT and PDEP instructions when the CPU reports BMI2 support. This works great on Intel Haswell+ and AMD Zen 3+, where these instructions execute in dedicated hardware with approximately 3-cycle latency. However, AMD processors from Excavator before Zen 3 implement PEXT/PDEP via microcode emulation rather than native hardware. This is confirmed by AMD's Software Optimization Guide for Family 19h Processors [2], Section 2.10.2, which states that Zen 3 has native ALU support for these instructions. Wikipedia's page on x86 Bit Manipulation Instruction Sets [3] also documents this behavior: > AMD processors before Zen 3 that implement PDEP and PEXT do so in > microcode, with a latency of 18 cycles rather than (Zen 3) 3 cycles. As a > result it is often faster to use other instructions on these processors. **Reproducer** Here is a JMH benchmark that demonstrates the issue by comparing the intrinsified path against the software fallback using ControlIntrinsic flags: ``` import org.openjdk.jmh.annotations.*; import java.util.concurrent.ThreadLocalRandom; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Warmup(iterations = 5, time = 1) @Measurement(iterations = 5, time = 1) @State(Scope.Benchmark) public class PextPdepPerformanceBug { // I'm not using constants to prevent constant folding private long longValue; private long longMask; private int intValue; private int intMask; @Setup(Level.Iteration) public void setup() { var rng = ThreadLocalRandom.current(); longValue = rng.nextLong(); longMask = rng.nextLong(); intValue = rng.nextInt(); intMask = rng.nextInt(); } // Long.compress (PEXT 64-bit) @Benchmark @Fork(value = 2, jvmArgsAppend = { "-XX:+UnlockDiagnosticVMOptions", "-XX:ControlIntrinsic=-_compress_l", "-Xcomp" }) public long compressLongSoftware() { return Long.compress(longValue, longMask); } @Benchmark @Fork(value = 2, jvmArgsAppend = { "-XX:+UnlockDiagnosticVMOptions", "-XX:ControlIntrinsic=+_compress_l", "-Xcomp" }) public long compressLongIntrinsic() { return Long.compress(longValue, longMask); } // Long.expand (PDEP 64-bit) @Benchmark @Fork(value = 2, jvmArgsAppend = { "-XX:+UnlockDiagnosticVMOptions", "-XX:ControlIntrinsic=-_expand_l", "-Xcomp" }) public long expandLongSoftware() { return Long.expand(longValue, longMask); } @Benchmark @Fork(value = 2, jvmArgsAppend = { "-XX:+UnlockDiagnosticVMOptions", "-XX:ControlIntrinsic=+_expand_l", "-Xcomp" }) public long expandLongIntrinsic() { return Long.expand(longValue, longMask); } // Integer.compress (PEXT 32-bit) @Benchmark @Fork(value = 2, jvmArgsAppend = { "-XX:+UnlockDiagnosticVMOptions", "-XX:ControlIntrinsic=-_compress_i", "-Xcomp" }) public int compressIntSoftware() { return Integer.compress(intValue, intMask); } @Benchmark @Fork(value = 2, jvmArgsAppend = { "-XX:+UnlockDiagnosticVMOptions", "-XX:ControlIntrinsic=+_compress_i", "-Xcomp" }) public int compressIntIntrinsic() { return Integer.compress(intValue, intMask); } // Integer.expand (PDEP 32-bit) @Benchmark @Fork(value = 2, jvmArgsAppend = { "-XX:+UnlockDiagnosticVMOptions", "-XX:ControlIntrinsic=-_expand_i", "-Xcomp" }) public int expandIntSoftware() { return Integer.expand(intValue, intMask); } @Benchmark @Fork(value = 2, jvmArgsAppend = { "-XX:+UnlockDiagnosticVMOptions", "-XX:ControlIntrinsic=+_expand_i", "-Xcomp" }) public int expandIntIntrinsic() { return Integer.expand(intValue, intMask); } } ``` Here are the results on an i7 9700K, which supports the BMI2 instruction set and is not affected by this issue: ``` Benchmark Mode Cnt Score Error Units PextPdepPerformanceBug.compressIntIntrinsic avgt 10 0,545 ? 0,002 ns/op PextPdepPerformanceBug.compressIntSoftware avgt 10 11,357 ? 0,033 ns/op PextPdepPerformanceBug.compressLongIntrinsic avgt 10 0,552 ? 0,012 ns/op PextPdepPerformanceBug.compressLongSoftware avgt 10 16,197 ? 0,203 ns/op PextPdepPerformanceBug.expandIntIntrinsic avgt 10 0,546 ? 0,006 ns/op PextPdepPerformanceBug.expandIntSoftware avgt 10 12,179 ? 0,457 ns/op PextPdepPerformanceBug.expandLongIntrinsic avgt 10 0,548 ? 0,018 ns/op PextPdepPerformanceBug.expandLongSoftware avgt 10 17,658 ? 0,534 ns/op ``` And here are the results on a Ryzen 7 2700, which supports the BMI2 instruction set. but is also affected by this issue: ``` Benchmark Mode Cnt Score Error Units PextPdepPerformanceBug.compressIntIntrinsic avgt 10 28.010 ? 9.929 ns/op PextPdepPerformanceBug.compressIntSoftware avgt 10 20.008 ? 2.129 ns/op PextPdepPerformanceBug.compressLongIntrinsic avgt 10 48.999 ? 8.468 ns/op PextPdepPerformanceBug.compressLongSoftware avgt 10 28.638 ? 5.336 ns/op PextPdepPerformanceBug.expandIntIntrinsic avgt 10 24.860 ? 6.784 ns/op PextPdepPerformanceBug.expandIntSoftware avgt 10 19.277 ? 1.719 ns/op PextPdepPerformanceBug.expandLongIntrinsic avgt 10 43.889 ? 10.575 ns/op PextPdepPerformanceBug.expandLongSoftware avgt 10 27.350 ? 1.898 ns/op ``` **Precedent and Scope** A similar issue was reported in JDK-8334474 [4], where the compress/expand intrinsics were disabled on RISC-V because the vectorized implementation caused regressions compared to the pure-Java fallback. This led me to investigate whether other JDK intrinsics relying on BMI2 instructions might be affected. The good news is that, as stated before, PEXT and PDEP are the only BMI2 instructions that AMD implemented via microcode on pre-Zen 3 processors: the others execute efficiently on all BMI2-capable hardware. I also verified that no other JDK methods use PEXT/PDEP, so the four methods covered in this report (Long.compress, Long.expand, Integer.compress, Integer.expand) should be the only ones affected. It's worth verifying this though as the JDK is very large and I could have missed such examples. **Mitigation** The intrinsic selection logic should check both BMI2 support and CPU vendor/family. Specifically, disable these intrinsics when the CPU vendor is AMD and the family is less than 0x19 (Zen 3). I think this could be implemented in x86.ad [5], alongside the existing BMI2 check, but I'm not familiar with C2's source code. Still, I would be happy to work on this issue myself if the issue is verified and it's acceptable for me to work on it. Thanks for reading! [1] https://bugs.openjdk.org/browse/JDK-8283893 [2] https://developer.amd.com/resources/developer-guides-manuals/ [3] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set [4] https://bugs.openjdk.org/browse/JDK-8334474 [5] https://github.com/jatin-bhateja/jdk/blob/7d35a283cf2497565d230e3d5426f563f7e5870d/src/hotspot/cpu/x86/x86.ad#L3183 -------------- next part -------------- An HTML attachment was scrubbed... URL: