From xgong at openjdk.org Fri Jan 2 03:00:57 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 2 Jan 2026 03:00:57 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests ping again~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3704368628 From jbhateja at openjdk.org Fri Jan 2 05:18:56 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 2 Jan 2026 05:18:56 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests LGTM Best Regards ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28960#pullrequestreview-3622220598 From jbhateja at openjdk.org Fri Jan 2 05:45:58 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 2 Jan 2026 05:45:58 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v3] In-Reply-To: References: Message-ID: > Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. > > Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin > PS: Validation performed using Intel SDE 9.58. Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8373724 - 8373724: Assertion failure in TestSignumVector.java with UseAPX ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28999/files - new: https://git.openjdk.org/jdk/pull/28999/files/bc86d54d..2a63c92b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=01-02 Stats: 2683 lines in 1256 files changed: 410 ins; 251 del; 2022 mod Patch: https://git.openjdk.org/jdk/pull/28999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28999/head:pull/28999 PR: https://git.openjdk.org/jdk/pull/28999 From bkilambi at openjdk.org Fri Jan 2 10:13:03 2026 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 2 Jan 2026 10:13:03 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v4] In-Reply-To: <4o-mRvCoV4nHqDouamLFsjYVVHhSuAOurJipQmy3xo8=.08cbadfa-ebdf-4c6b-a7f3-efe808f82b92@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <4o-mRvCoV4nHqDouamLFsjYVVHhSuAOurJipQmy3xo8=.08cbadfa-ebdf-4c6b-a7f3-efe808f82b92@github.com> Message-ID: On Wed, 24 Dec 2025 09:15:00 GMT, Jatin Bhateja wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > Common IR changes looks good to me, adding some minor comments. Hi @jatin-bhateja could you please take another look at the patch? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3704967816 From qamai at openjdk.org Fri Jan 2 15:42:10 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 Jan 2026 15:42:10 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests Thanks, LGTM. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/28960#pullrequestreview-3623364261 From duke at openjdk.org Sat Jan 3 00:23:13 2026 From: duke at openjdk.org (Shawn M Emery) Date: Sat, 3 Jan 2026 00:23:13 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: Message-ID: > This change allows use of the AVX512_VBMI/VMBI2 instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.2 to 0.5%, encapsulation is 0.3 to 1.5%, and decapsulation is 0 to 0.9%. > > Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Update copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28815/files - new: https://git.openjdk.org/jdk/pull/28815/files/d2cadaf9..7cd8de53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28815/head:pull/28815 PR: https://git.openjdk.org/jdk/pull/28815 From jiefu at openjdk.org Sat Jan 3 13:35:05 2026 From: jiefu at openjdk.org (Jie Fu) Date: Sat, 3 Jan 2026 13:35:05 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests Good. ------------- Marked as reviewed by jiefu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28960#pullrequestreview-3624339667 From jbhateja at openjdk.org Sun Jan 4 10:30:23 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 4 Jan 2026 10:30:23 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 29 Dec 2025 17:39:42 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Address review comments for the JTREG test and microbenchmark > - Merge branch 'master' > - Address review comments > - Fix build failures on Mac > - Address review comments > - Merge 'master' > - 8366444: Add support for add/mul reduction operations for Float16 > > This patch adds mid-end support for vectorized add/mul reduction > operations for half floats. It also includes backend aarch64 support for > these operations. Only vectorization support through autovectorization > is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate > the implementation to be strictly ordered. The following is how each of > these reductions is implemented for different aarch64 targets - > > For AddReduction : > On Neon only targets (UseSVE = 0): Generates scalarized additions > using the scalar "fadd" instruction for both 8B and 16B vector lengths. > This is because Neon does not provide a direct instruction for computing > strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the "fadda" instruction which > computes add reduction for floating point in strict order. > > For MulReduction : > Both Neon and SVE do not provide a direct instruction for computing > strictly ordered floating point multiply reduction. For vector lengths > of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is > generated and multiply reduction for vector lengths > 16B is not > supported. > > Below is the performance of the two newly added microbenchmarks in > Float16OperationsBenchmark.java tested on three different aarch64 > machines and with varying MaxVectorSize - > > Note: On all machines, the score (ops/ms) is compared with the master > branch without this patch which generates a sequence of loads ("ldrsh") > to load the FP16 value into an FPR and a scalar "fadd/fmul" to > add/multiply the loaded value to the running sum/product. The ratios > given below are the ratios between the throughput with this patch and > the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the > master branch. > > N1 (UseSVE = 0... Common IR changes looks good to me. Best Regards, Jatin ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27526#pullrequestreview-3624931740 From xgong at openjdk.org Mon Jan 5 01:58:14 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 5 Jan 2026 01:58:14 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests Thanks for all the review and comments! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3708682947 From xgong at openjdk.org Mon Jan 5 01:58:15 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 5 Jan 2026 01:58:15 GMT Subject: Integrated: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 23 Dec 2025 06:45:46 GMT, Xiaohong Gong wrote: > The test fails intermittently with the following error: > > > Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) > > > The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. > > For example, given array elements: > > [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] > > > Sequential scalar addition produces: > > 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f > > > However, `reduceLanes()` might compute: > > (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL > > > The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. > > Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. > > This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. > > Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. > > [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) This pull request has now been integrated. Changeset: 6eaabed5 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca Stats: 43 lines in 1 file changed: 1 ins; 32 del; 10 mod 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently Reviewed-by: jiefu, jbhateja, erfang, qamai ------------- PR: https://git.openjdk.org/jdk/pull/28960 From shade at openjdk.org Mon Jan 5 06:38:07 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 Jan 2026 06:38:07 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v10] In-Reply-To: References: Message-ID: > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - More comments - Tighten up the comments - Simplify third case: no need to loop, just restart the search - Actually have a second "fast" case: receiver is not found in the table, and the table is full - Pushing/popping for rare CAS path is counter-productive - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Tighten up some more - ... and 13 more: https://git.openjdk.org/jdk/compare/6eaabed5...e4a4719f ------------- Changes: https://git.openjdk.org/jdk/pull/25305/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=09 Stats: 418 lines in 8 files changed: 202 ins; 197 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From chagedorn at openjdk.org Mon Jan 5 07:47:05 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 07:47:05 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v4] In-Reply-To: References: Message-ID: <2ct7k0J84Z7D5BbrauNhR4ATvjTNnbYe7Wbjo9xgIF8=.41d0c040-76b8-478b-817a-4efba252e67d@github.com> On Fri, 12 Dec 2025 18:56:18 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. >> >> The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for >> `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially >> introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. >> >> >> >> https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 >> >> In our case, it happens that the `Load` node gets folded to a constant during the initial >> `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being >> returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only >> has one usage, and this triggers the optimization during verification. >> >> >> static int test0() { >> var c = new MyClass(); >> // the conversion ensures that the ConL node only has one use >> // in the end, which triggers the optimization >> return (int) c.l; >> } >> >> >> The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, >> because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in >> `PhaseGVN::transform`. >> >> For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created >> and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with >> `can_reshape` later. >> >> >> This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` >> prevents its from occurring when boxing elimination is enabled. Boxing elimination is >> disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), >> which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear >> that the issue was on mainline. >> >> ### Testing >> - [x] [GitHub Actions](TODO) >> - [x] tier1-3, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Move to igvn directory and use test.main.class Looks good to me, too, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28448#pullrequestreview-3625777581 From chagedorn at openjdk.org Mon Jan 5 07:47:06 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 07:47:06 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> Message-ID: On Fri, 12 Dec 2025 18:53:17 GMT, Beno?t Maillard wrote: >> test/hotspot/jtreg/compiler/c2/igvn/TestMissingOptMemBarRemovePrecedentEdge.java line 2: >> >>> (failed to retrieve contents of file, check the PR for context) >> Should the test go into an `igvn` directory? Or something else a bit more specific? > > Moved it to `compiler/c2/igvn` I guess we could clean this up at some point. We now have `igvn`, `c2/igvn`, and `c2/gvn`. And some other C2 specific tests are in folders inside `c2` while others are in the base directory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2660559734 From chagedorn at openjdk.org Mon Jan 5 07:47:07 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 07:47:07 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> Message-ID: <7wuX5LiMHSgw9nk11rPHxoSIBJpCrkWZeQpVkRLkRJ0=.cc7fb403-6a41-4941-8502-c5a88175cfab@github.com> On Mon, 5 Jan 2026 07:41:07 GMT, Christian Hagedorn wrote: >> Moved it to `compiler/c2/igvn` > > I guess we could clean this up at some point. We now have `igvn`, `c2/igvn`, and `c2/gvn`. And some other C2 specific tests are in folders inside `c2` while others are in the base directory. Suggestion: * Copyright (c) 2026, Oracle and/or its affiliates. All rights reserved. You are probably also the first one this year to change `node.cpp` and `graphKit.cpp`, so we need an update there as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2660564360 From hgreule at openjdk.org Mon Jan 5 07:57:35 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 5 Jan 2026 07:57:35 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v6] In-Reply-To: References: Message-ID: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: update copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27886/files - new: https://git.openjdk.org/jdk/pull/27886/files/db8fd790..86f2ead8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=04-05 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27886/head:pull/27886 PR: https://git.openjdk.org/jdk/pull/27886 From hgreule at openjdk.org Mon Jan 5 08:03:11 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 5 Jan 2026 08:03:11 GMT Subject: RFR: 8373555: C2: Optimize redundant input calculations for sign comparisons [v2] In-Reply-To: References: Message-ID: > Instead of sign-comparisons with And,Or,Xor,Max,Min nodes, we can directly compare to one of the inputs of the binary nodes if the other input is irrelevant to the comparison. > > There are potentially more operations, but these mentioned here are the most obvious ones. Max and Min could theoretically be expanded to arbitrary comparisons to constants, but I didn't want to introduce more complexity for now. > > Please let me know what you think :) Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: update copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28782/files - new: https://git.openjdk.org/jdk/pull/28782/files/e007f6c9..d298bf21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28782&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28782&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28782/head:pull/28782 PR: https://git.openjdk.org/jdk/pull/28782 From hgreule at openjdk.org Mon Jan 5 08:03:13 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 5 Jan 2026 08:03:13 GMT Subject: RFR: 8373555: C2: Optimize redundant input calculations for sign comparisons [v2] In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 15:49:53 GMT, Galder Zamarre?o wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyright year > > Neat! At a glance I don't see anything wrong. Just a small question: what testing did you carry out? @galderz thanks, I mainly tested `test/hotspot/jtreg:tier1` and the tests running GHA. It would be great if someone else could submit more tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28782#issuecomment-3709326006 From bmaillard at openjdk.org Mon Jan 5 08:13:55 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 08:13:55 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v5] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. > > The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for > `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially > introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. > > > > https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 > > In our case, it happens that the `Load` node gets folded to a constant during the initial > `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being > returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only > has one usage, and this triggers the optimization during verification. > > > static int test0() { > var c = new MyClass(); > // the conversion ensures that the ConL node only has one use > // in the end, which triggers the optimization > return (int) c.l; > } > > > The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, > because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in > `PhaseGVN::transform`. > > For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created > and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with > `can_reshape` later. > > > This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` > prevents its from occurring when boxing elimination is enabled. Boxing elimination is > disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), > which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear > that the issue was on mainline. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/c2/igvn/TestMissingOptMemBarRemovePrecedentEdge.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28448/files - new: https://git.openjdk.org/jdk/pull/28448/files/a32ee08c..bbb7181b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28448/head:pull/28448 PR: https://git.openjdk.org/jdk/pull/28448 From erfang at openjdk.org Mon Jan 5 08:14:10 2026 From: erfang at openjdk.org (Eric Fang) Date: Mon, 5 Jan 2026 08:14:10 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Update copyright year to 2026 - Merge branch 'master' into JDK-8370863-mask-cast-opt - Convert the check condition for vector length into an assertion Also refined the tests. - Refine code comments - Merge branch 'master' into JDK-8370863-mask-cast-opt - Merge branch 'master' into JDK-8370863-mask-cast-opt - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java - Refine the test code and comments - Merge branch 'master' into JDK-8370863-mask-cast-opt - Don't read and write the same memory in the JMH benchmarks - ... and 2 more: https://git.openjdk.org/jdk/compare/6eaabed5...9c38a6d9 ------------- Changes: https://git.openjdk.org/jdk/pull/28313/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=07 Stats: 643 lines in 7 files changed: 528 ins; 16 del; 99 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From roland at openjdk.org Mon Jan 5 08:45:27 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 08:45:27 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v3] In-Reply-To: References: Message-ID: > A `CreateEx` gets sunk out of loop by > `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the > following logic: > > > return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && > in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); > > > in `CreateExNode::Identity()` triggers which leads to the crash > because `call->in(TypeFunc::Parms)` is not even an object in this > particular case. > > It's actually not clear to me what that logic in > `CreateExNode::Identity()` is expected to do and I wonder if it's > still needed. > > Anyway, the fix I propose is to skip `CreateEx` in > `PhaseIdealLoop::try_sink_out_of_loop()`. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into JDK-8373508 - Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java Co-authored-by: Christian Hagedorn - whitespaces - tests - more - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28842/files - new: https://git.openjdk.org/jdk/pull/28842/files/e4bdff59..968ebef2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28842&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28842&range=01-02 Stats: 16526 lines in 2401 files changed: 8803 ins; 2140 del; 5583 mod Patch: https://git.openjdk.org/jdk/pull/28842.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28842/head:pull/28842 PR: https://git.openjdk.org/jdk/pull/28842 From bmaillard at openjdk.org Mon Jan 5 09:02:33 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 09:02:33 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: <7wuX5LiMHSgw9nk11rPHxoSIBJpCrkWZeQpVkRLkRJ0=.cc7fb403-6a41-4941-8502-c5a88175cfab@github.com> References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> <7wuX5LiMHSgw9nk11rPHxoSIBJpCrkWZeQpVkRLkRJ0=.cc7fb403-6a41-4941-8502-c5a88175cfab@github.com> Message-ID: On Mon, 5 Jan 2026 07:43:27 GMT, Christian Hagedorn wrote: >> I guess we could clean this up at some point. We now have `igvn`, `c2/igvn`, and `c2/gvn`. And some other C2 specific tests are in folders inside `c2` while others are in the base directory. > > Suggestion: > > * Copyright (c) 2026, Oracle and/or its affiliates. All rights reserved. > > > You are probably also the first one this year to change `node.cpp` and `graphKit.cpp`, so we need an update there as well. @chhagedorn I filed [JDK-8374511](https://bugs.openjdk.org/browse/JDK-8374511) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2660744502 From dfenacci at openjdk.org Mon Jan 5 09:08:02 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 5 Jan 2026 09:08:02 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 02:01:10 GMT, Vladimir Ivanov wrote: > Strength-reducing an interface call to a virtual call for interfaces with > unique implementors can use receiver type information to narrow the context. > > C2 tracks interface types and receiver type information can be used to reveal > an interface with a unique implementor which can't be derived from the call > site itself. > > Since C2 effectively accumulates a union interface type from multiple subtype checks, iterating over individual components of a type may reveal a candidate for a strength-reduction. The only prerequisite is that a candidate has to be a subtype of the declared interface. > > Testing: hs-tier1 - hs-tier5 src/hotspot/share/opto/doCall.cpp line 340: > 338: // number of implementors for decl_interface is 0 or 1. If > 339: // it's 0 then no class implements decl_interface and there's > 340: // no point in inlining. Does the above comment still hold? Or did you remove it because it is not relevant anymore? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2640468378 From shade at openjdk.org Mon Jan 5 09:40:20 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 Jan 2026 09:40:20 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v10] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 06:38:07 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - More comments > - Tighten up the comments > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - ... and 13 more: https://git.openjdk.org/jdk/compare/6eaabed5...e4a4719f Remerged from master, re-ran `tier1` and `hotspot_compiler` tests on Linux x86_64, all clean. There is an unrelated GHA infra failure (https://github.com/openjdk/jdk/pull/29030), which IMO does not block the integration, as at least Windows x86_64 passed in GHA, and Linux x86_64 passes locally. Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3709622490 PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3709623137 From shade at openjdk.org Mon Jan 5 09:40:23 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 Jan 2026 09:40:23 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 15:23:42 GMT, Aleksey Shipilev wrote: > I'll task one of our folks to do it after NY break. That would be: https://bugs.openjdk.org/browse/JDK-8374513 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3709629142 From shade at openjdk.org Mon Jan 5 09:40:24 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 Jan 2026 09:40:24 GMT Subject: Integrated: 8357258: x86: Improve receiver type profiling reliability In-Reply-To: References: Message-ID: On Mon, 19 May 2025 14:59:36 GMT, Aleksey Shipilev wrote: > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: e676c9de Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e676c9de3da3b820081cde1b11c0df3129787130 Stats: 418 lines in 8 files changed: 202 ins; 197 del; 19 mod 8357258: x86: Improve receiver type profiling reliability Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/25305 From bmaillard at openjdk.org Mon Jan 5 10:38:39 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 10:38:39 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v4] In-Reply-To: <2ct7k0J84Z7D5BbrauNhR4ATvjTNnbYe7Wbjo9xgIF8=.41d0c040-76b8-478b-817a-4efba252e67d@github.com> References: <2ct7k0J84Z7D5BbrauNhR4ATvjTNnbYe7Wbjo9xgIF8=.41d0c040-76b8-478b-817a-4efba252e67d@github.com> Message-ID: On Mon, 5 Jan 2026 07:43:52 GMT, Christian Hagedorn wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to igvn directory and use test.main.class > > Looks good to me, too, thanks! Thank you for reviewing @chhagedorn @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28448#issuecomment-3709866622 From duke at openjdk.org Mon Jan 5 11:31:26 2026 From: duke at openjdk.org (Yi Wu) Date: Mon, 5 Jan 2026 11:31:26 GMT Subject: RFR: 8373344: Add support for min/max reduction operations for Float16 [v2] In-Reply-To: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> References: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> Message-ID: <72RiEFo5wji1vOtIRzFwO03_0OsaCe2zzHjp9YPD8-k=.6a88f4eb-3fff-4fe5-bb43-260d81b7954a@github.com> > This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations. > Both floating point min/max reductions don?t require strict order, because they are associative. > > It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions. > The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv. > > Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline. > > Neoverse N1 (UseSVE = 0, max vector length = 16B): > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 3.69 6.44 > ReductionMaxFP16 512 thrpt 9 3.71 7.62 > ReductionMaxFP16 1024 thrpt 9 4.16 8.64 > ReductionMaxFP16 2048 thrpt 9 4.44 9.12 > ReductionMinFP16 256 thrpt 9 3.69 6.43 > ReductionMinFP16 512 thrpt 9 3.70 7.62 > ReductionMinFP16 1024 thrpt 9 4.16 8.64 > ReductionMinFP16 2048 thrpt 9 4.44 9.10 > > > Neoverse V1 (UseSVE = 1, max vector length = 32B): > > Benchmark vectorDim Mode Cnt 8B 16B 32B > ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 > ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 > ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 > ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 > ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 > ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 > ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 > ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 > > > Neoverse V2 (UseSVE = 2, max vector length = 16B): > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 4.78 10.00 > ReductionMaxFP16 512 thrpt 9 3.74 11.33 > ReductionMaxFP16 1024 thrpt 9 3.86 9.59 > ReductionMaxFP16 2048 thrpt 9 3.94 8.71 > ReductionMinFP16 256 thrpt 9 4.78 10.00 > ReductionMinFP16 512 thrpt 9 3.74 11.29 > ReductionMinFP16 1024 thrpt 9 3.86 9.58 > ReductionMinFP16 2048 thrpt 9 3.94 8.71 > > > Testing: > hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse N1/V1/V2. Yi Wu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Replace assert with verify - Add IRNode constant and code refactor - Merge remote-tracking branch 'origin/master' into yiwu-8373344 - 8373344: Add support for FP16 min/max reduction operations This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations. Both floating point min/max reductions don?t require strict order, because they are associative. It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions. The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv. Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline. Neoverse N1 (UseSVE = 0, max vector length = 16B): Benchmark vectorDim Mode Cnt 8B 16B ReductionMaxFP16 256 thrpt 9 3.69 6.44 ReductionMaxFP16 512 thrpt 9 3.71 7.62 ReductionMaxFP16 1024 thrpt 9 4.16 8.64 ReductionMaxFP16 2048 thrpt 9 4.44 9.12 ReductionMinFP16 256 thrpt 9 3.69 6.43 ReductionMinFP16 512 thrpt 9 3.70 7.62 ReductionMinFP16 1024 thrpt 9 4.16 8.64 ReductionMinFP16 2048 thrpt 9 4.44 9.10 Neoverse V1 (UseSVE = 1, max vector length = 32B): Benchmark vectorDim Mode Cnt 8B 16B 32B ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 Neoverse V2 (UseSVE = 2, max vector length = 16B): Benchmark vectorDim Mode Cnt 8B 16B ReductionMaxFP16 256 thrpt 9 4.78 10.00 ReductionMaxFP16 512 thrpt 9 3.74 11.33 ReductionMaxFP16 1024 thrpt 9 3.86 9.59 ReductionMaxFP16 2048 thrpt 9 3.94 8.71 ReductionMinFP16 256 thrpt 9 4.78 10.00 ReductionMinFP16 512 thrpt 9 3.74 11.29 ReductionMinFP16 1024 thrpt 9 3.86 9.58 ReductionMinFP16 2048 thrpt 9 3.94 8.71 Testing: hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse N1/V1/V2. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28828/files - new: https://git.openjdk.org/jdk/pull/28828/files/2f80bc4f..9971752e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=00-01 Stats: 17385 lines in 2438 files changed: 9261 ins; 2408 del; 5716 mod Patch: https://git.openjdk.org/jdk/pull/28828.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28828/head:pull/28828 PR: https://git.openjdk.org/jdk/pull/28828 From duke at openjdk.org Mon Jan 5 11:33:33 2026 From: duke at openjdk.org (Yi Wu) Date: Mon, 5 Jan 2026 11:33:33 GMT Subject: RFR: 8373344: Add support for min/max reduction operations for Float16 [v2] In-Reply-To: References: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> Message-ID: On Mon, 22 Dec 2025 09:40:42 GMT, Galder Zamarre?o wrote: >> Yi Wu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Replace assert with verify >> - Add IRNode constant and code refactor >> - Merge remote-tracking branch 'origin/master' into yiwu-8373344 >> - 8373344: Add support for FP16 min/max reduction operations >> >> This patch adds mid-end support for vectorized min/max reduction >> operations for half floats. It also includes backend AArch64 support >> for these operations. >> Both floating point min/max reductions don?t require strict order, >> because they are associative. >> >> It will generate NEON fminv/fmaxv reduction instructions when >> max vector length is 8B or 16B. On SVE supporting machines >> with vector lengths > 16B, it will generate the SVE fminv/fmaxv >> instructions. >> The patch also adds support for partial min/max reductions on >> SVE machines using fminv/fmaxv. >> >> Ratio of throughput(ops/ms) > 1 indicates the performance with >> this patch is better than the mainline. >> >> Neoverse N1 (UseSVE = 0, max vector length = 16B): >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionMaxFP16 256 thrpt 9 3.69 6.44 >> ReductionMaxFP16 512 thrpt 9 3.71 7.62 >> ReductionMaxFP16 1024 thrpt 9 4.16 8.64 >> ReductionMaxFP16 2048 thrpt 9 4.44 9.12 >> ReductionMinFP16 256 thrpt 9 3.69 6.43 >> ReductionMinFP16 512 thrpt 9 3.70 7.62 >> ReductionMinFP16 1024 thrpt 9 4.16 8.64 >> ReductionMinFP16 2048 thrpt 9 4.44 9.10 >> >> Neoverse V1 (UseSVE = 1, max vector length = 32B): >> Benchmark vectorDim Mode Cnt 8B 16B 32B >> ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 >> ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 >> ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 >> ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 >> ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 >> ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 >> ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 >> ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 >> >> Neoverse V2 (UseSVE = 2, max vector length = 16B)... > > Thanks @yiwu0b11, some superficial comments Thanks @galderz for the code review, I've updated the code and also replaced assert with [verify](https://github.com/openjdk/jdk/pull/28095) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28828#issuecomment-3710056269 From chagedorn at openjdk.org Mon Jan 5 11:34:10 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 11:34:10 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> <7wuX5LiMHSgw9nk11rPHxoSIBJpCrkWZeQpVkRLkRJ0=.cc7fb403-6a41-4941-8502-c5a88175cfab@github.com> Message-ID: On Mon, 5 Jan 2026 08:59:24 GMT, Beno?t Maillard wrote: >> Suggestion: >> >> * Copyright (c) 2026, Oracle and/or its affiliates. All rights reserved. >> >> >> You are probably also the first one this year to change `node.cpp` and `graphKit.cpp`, so we need an update there as well. > > @chhagedorn I filed [JDK-8374511](https://bugs.openjdk.org/browse/JDK-8374511) > You are probably also the first one this year to change node.cpp and graphKit.cpp, so we need an update there as well. Can you double-check if you also need to update those with latest master? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2661187908 From epeter at openjdk.org Mon Jan 5 11:38:55 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 Jan 2026 11:38:55 GMT Subject: RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs Message-ID: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. --------------------------- **Details** Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. image `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. ------------- Commit messages: - fix - Merge branch 'master' into JDK-8373453-SW-same-input-v2 - JDK-8373453 Changes: https://git.openjdk.org/jdk/pull/29028/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29028&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373453 Stats: 116 lines in 3 files changed: 109 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/29028.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29028/head:pull/29028 PR: https://git.openjdk.org/jdk/pull/29028 From thartmann at openjdk.org Mon Jan 5 12:00:15 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 5 Jan 2026 12:00:15 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: <5swbCzLMjJxib2WAy0PoLxwSbnID63a-1mygNQSTol8=.f78e3810-1861-4a36-8999-e14a0b2d7353@github.com> On Wed, 17 Dec 2025 17:43:34 GMT, Tobias Hotz wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplify test, add temporary @IR rule for testLongRange and improve comments > > Thanks everybody! @ichttt There is a bug in the test, could you please have a look at [JDK-8374436](https://bugs.openjdk.org/browse/JDK-8374436)? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3710134476 From bmaillard at openjdk.org Mon Jan 5 12:14:17 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 12:14:17 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> <7wuX5LiMHSgw9nk11rPHxoSIBJpCrkWZeQpVkRLkRJ0=.cc7fb403-6a41-4941-8502-c5a88175cfab@github.com> Message-ID: On Mon, 5 Jan 2026 11:30:51 GMT, Christian Hagedorn wrote: > Can you double-check if you also need to update those with latest master? Sorry @chhagedorn, I fell into the trap of only reading the suggested change. I have checked, and we need to change those indeed. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2661288840 From bmaillard at openjdk.org Mon Jan 5 12:14:15 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 12:14:15 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v6] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. > > The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for > `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially > introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. > > > > https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 > > In our case, it happens that the `Load` node gets folded to a constant during the initial > `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being > returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only > has one usage, and this triggers the optimization during verification. > > > static int test0() { > var c = new MyClass(); > // the conversion ensures that the ConL node only has one use > // in the end, which triggers the optimization > return (int) c.l; > } > > > The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, > because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in > `PhaseGVN::transform`. > > For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created > and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with > `can_reshape` later. > > > This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` > prevents its from occurring when boxing elimination is enabled. Boxing elimination is > disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), > which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear > that the issue was on mainline. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: - Update copyright year in graphKit.cpp - Update copyright year in node.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28448/files - new: https://git.openjdk.org/jdk/pull/28448/files/bbb7181b..00b169b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=04-05 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28448/head:pull/28448 PR: https://git.openjdk.org/jdk/pull/28448 From epeter at openjdk.org Mon Jan 5 12:38:27 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 Jan 2026 12:38:27 GMT Subject: RFR: 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns Message-ID: In [JDK-8370922](https://bugs.openjdk.org/browse/JDK-8370922), I added Float16 support to the Template Library. I missed to tag Float16.float16ToRawShortBits as having non-deterministic result. We need to do that, just like for float and double equivalents: Arithmetic operations (e.g. add, mul, fma, ...) are allowed to pick either input if it gets two NaNs of different bit patterns. That way, those operations can generate different NaN bit patterns depending on if we use interpreter/C1/C2. If we now convert to raw bits, we get different bits, and would wrongly conclude that we get a wrong result. Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. ------------- Commit messages: - JDK-8374489 Changes: https://git.openjdk.org/jdk/pull/29033/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29033&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374489 Stats: 5 lines in 1 file changed: 2 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29033/head:pull/29033 PR: https://git.openjdk.org/jdk/pull/29033 From dzhang at openjdk.org Mon Jan 5 12:40:35 2026 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 5 Jan 2026 12:40:35 GMT Subject: RFR: 8374525: RISC-V: Several masked float16 vector operations are not supported Message-ID: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> Hi, Can you help to review this patch? Thanks! Currently, the masked versions of the following 8 Float16 operations are not supported. But we return true in `Matcher::match_rule_supported_vector_masked` for these operations on RISC-V platforms with Zvfh. We need to explicitly disable them on this CPU platform to make it clear. Op_AddVHF: Op_SubVHF: Op_MulVHF: Op_DivVHF: Op_MaxVHF: Op_MinVHF: Op_SqrtVHF: Op_FmaVHF: When the support for Float16 vector classes is added in VectorAPI and the masked Float16 IR can be generated, these masked operations will be enabled and relevant backend support added. ------------- Commit messages: - 8374525: RISC-V: Several masked float16 vector operations are not supported Changes: https://git.openjdk.org/jdk/pull/29035/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29035&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374525 Stats: 19 lines in 1 file changed: 17 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29035/head:pull/29035 PR: https://git.openjdk.org/jdk/pull/29035 From bkilambi at openjdk.org Mon Jan 5 12:43:36 2026 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 5 Jan 2026 12:43:36 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> Message-ID: On Thu, 11 Dec 2025 12:06:49 GMT, Marc Chevalier wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > As for the IR verification failure, I've looked a bit and couldn't find such an issue already. Since it reproduces on master, I suggest you file a ticket, indeed. Thanks! Hi @marc-chevalier @eme64 Would you please be able to run some testing internally before I integrate this patch? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3710264065 From fjiang at openjdk.org Mon Jan 5 12:46:10 2026 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 5 Jan 2026 12:46:10 GMT Subject: RFR: 8374525: RISC-V: Several masked float16 vector operations are not supported In-Reply-To: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> References: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> Message-ID: <_PQTBDZTtAOzgIiMGM5AXmZVc8XaAoX7RZFyy7susrE=.730ca502-0931-4318-bf23-5bd4880547da@github.com> On Mon, 5 Jan 2026 12:32:13 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, the masked versions of the following 8 Float16 operations are not supported. > But we return true in `Matcher::match_rule_supported_vector_masked` for these operations > on RISC-V platforms with Zvfh. We need to explicitly disable them on this CPU platform > to make it clear. > > Op_AddVHF: > Op_SubVHF: > Op_MulVHF: > Op_DivVHF: > Op_MaxVHF: > Op_MinVHF: > Op_SqrtVHF: > Op_FmaVHF: > > When the support for Float16 vector classes is added in VectorAPI and the masked > Float16 IR can be generated, these masked operations will be enabled and relevant > backend support added. Nice catch! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/29035#pullrequestreview-3626694940 From fyang at openjdk.org Mon Jan 5 12:52:08 2026 From: fyang at openjdk.org (Fei Yang) Date: Mon, 5 Jan 2026 12:52:08 GMT Subject: RFR: 8374525: RISC-V: Several masked float16 vector operations are not supported In-Reply-To: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> References: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> Message-ID: <85y6cMcHNDFQwLJwbZw3IO-ffovhQkj0kGCbgSUX1i8=.ea2c8254-d413-4cd4-8712-0aa5113d7c21@github.com> On Mon, 5 Jan 2026 12:32:13 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, the masked versions of the following 8 Float16 operations are not supported. > But we return true in `Matcher::match_rule_supported_vector_masked` for these operations > on RISC-V platforms with Zvfh. We need to explicitly disable them on this CPU platform > to make it clear. > > Op_AddVHF: > Op_SubVHF: > Op_MulVHF: > Op_DivVHF: > Op_MaxVHF: > Op_MinVHF: > Op_SqrtVHF: > Op_FmaVHF: > > When the support for Float16 vector classes is added in VectorAPI and the masked > Float16 IR can be generated, these masked operations will be enabled and relevant > backend support added. Looks reasonable. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29035#pullrequestreview-3626711028 From roland at openjdk.org Mon Jan 5 13:35:15 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 13:35:15 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v2] In-Reply-To: <72DTyUKDKlyEKHeDIYF5GUV6u__CGKT5LDAKglL0s6M=.4eaf8397-02d7-48b0-9261-276ffeb236ed@github.com> References: <72DTyUKDKlyEKHeDIYF5GUV6u__CGKT5LDAKglL0s6M=.4eaf8397-02d7-48b0-9261-276ffeb236ed@github.com> Message-ID: On Sat, 20 Dec 2025 01:39:52 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java >> >> Co-authored-by: Christian Hagedorn > > I'll running Oracle testing before approving. @dean-long @chhagedorn I merged with latest. Can one of you approve that PR again? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28842#issuecomment-3710441551 From chagedorn at openjdk.org Mon Jan 5 13:38:17 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 13:38:17 GMT Subject: RFR: 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack Message-ID: The usages of `LinearScanStatistic::Counter::counter_fpu_stack` were removed with [JDK-8351156](https://bugs.openjdk.org/browse/JDK-8351156) when the FPU stack support was removed. But the definition in the `Counter` enum was missed to clean up. When now printing the counters with `-XX:+CountLinearScan -XX:+CITime`, we reach a `ShouldNotReach` in `LinearScanStatistic::counter_name()` for `Counter::counter_fpu_stack` because there is no longer an entry there - it's dead. I added a simple hello world sanity test that runs with `-XX:+CountLinearScan -XX:+CITime`. Thanks, Christian ------------- Commit messages: - 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack Changes: https://git.openjdk.org/jdk/pull/29037/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29037&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374518 Stats: 39 lines in 2 files changed: 37 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29037.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29037/head:pull/29037 PR: https://git.openjdk.org/jdk/pull/29037 From thartmann at openjdk.org Mon Jan 5 13:46:09 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 5 Jan 2026 13:46:09 GMT Subject: RFR: 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 13:28:41 GMT, Christian Hagedorn wrote: > The usages of `LinearScanStatistic::Counter::counter_fpu_stack` were removed with [JDK-8351156](https://bugs.openjdk.org/browse/JDK-8351156) when the FPU stack support was removed. But the definition in the `Counter` enum was missed to clean up. When now printing the counters with `-XX:+CountLinearScan -XX:+CITime`, we reach a `ShouldNotReach` in `LinearScanStatistic::counter_name()` for `Counter::counter_fpu_stack` because there is no longer an entry there - it's dead. > > I added a simple hello world sanity test that runs with `-XX:+CountLinearScan -XX:+CITime`. > > Thanks, > Christian Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29037#pullrequestreview-3626893282 From mdoerr at openjdk.org Mon Jan 5 13:46:10 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 5 Jan 2026 13:46:10 GMT Subject: RFR: 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 13:28:41 GMT, Christian Hagedorn wrote: > The usages of `LinearScanStatistic::Counter::counter_fpu_stack` were removed with [JDK-8351156](https://bugs.openjdk.org/browse/JDK-8351156) when the FPU stack support was removed. But the definition in the `Counter` enum was missed to clean up. When now printing the counters with `-XX:+CountLinearScan -XX:+CITime`, we reach a `ShouldNotReach` in `LinearScanStatistic::counter_name()` for `Counter::counter_fpu_stack` because there is no longer an entry there - it's dead. > > I added a simple hello world sanity test that runs with `-XX:+CountLinearScan -XX:+CITime`. > > Thanks, > Christian LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29037#pullrequestreview-3626894269 From chagedorn at openjdk.org Mon Jan 5 13:50:41 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 13:50:41 GMT Subject: RFR: 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack In-Reply-To: References: Message-ID: <-VWBBSYz-jpRw7qkrR89ff5XdFIq3t74swGb6TXPMrY=.d11a109f-f5a6-419c-81de-174cddfa7f41@github.com> On Mon, 5 Jan 2026 13:42:30 GMT, Tobias Hartmann wrote: >> The usages of `LinearScanStatistic::Counter::counter_fpu_stack` were removed with [JDK-8351156](https://bugs.openjdk.org/browse/JDK-8351156) when the FPU stack support was removed. But the definition in the `Counter` enum was missed to clean up. When now printing the counters with `-XX:+CountLinearScan -XX:+CITime`, we reach a `ShouldNotReach` in `LinearScanStatistic::counter_name()` for `Counter::counter_fpu_stack` because there is no longer an entry there - it's dead. >> >> I added a simple hello world sanity test that runs with `-XX:+CountLinearScan -XX:+CITime`. >> >> Thanks, >> Christian > > Looks good and trivial. Thanks for your reviews @TobiHartmann and @TheRealMDoerr! I will wait until some sanity testing passed before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29037#issuecomment-3710491827 From chagedorn at openjdk.org Mon Jan 5 13:51:45 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 13:51:45 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v3] In-Reply-To: References: Message-ID: <-WXHvbhVQ6IBgkIcfqljfvJLmirqvCJYto3q3FGW87c=.1ec527ec-fc14-4a81-81ef-44c613325d76@github.com> On Mon, 5 Jan 2026 08:45:27 GMT, Roland Westrelin wrote: >> A `CreateEx` gets sunk out of loop by >> `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the >> following logic: >> >> >> return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && >> in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); >> >> >> in `CreateExNode::Identity()` triggers which leads to the crash >> because `call->in(TypeFunc::Parms)` is not even an object in this >> particular case. >> >> It's actually not clear to me what that logic in >> `CreateExNode::Identity()` is expected to do and I wonder if it's >> still needed. >> >> Anyway, the fix I propose is to skip `CreateEx` in >> `PhaseIdealLoop::try_sink_out_of_loop()`. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8373508 > - Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java > > Co-authored-by: Christian Hagedorn > - whitespaces > - tests > - more > - fix Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28842#pullrequestreview-3626912910 From chagedorn at openjdk.org Mon Jan 5 13:54:19 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 13:54:19 GMT Subject: RFR: 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns In-Reply-To: References: Message-ID: <-slQ5lBZXQXTQvWnTZQJGLJer-qfKmygd1eah6aNdeA=.a0ef8145-9167-4eeb-b8b2-991581eecfc1@github.com> On Mon, 5 Jan 2026 11:55:01 GMT, Emanuel Peter wrote: > In [JDK-8370922](https://bugs.openjdk.org/browse/JDK-8370922), I added Float16 support to the Template Library. I missed to tag Float16.float16ToRawShortBits as having non-deterministic result. > > We need to do that, just like for float and double equivalents: > Arithmetic operations (e.g. add, mul, fma, ...) are allowed to pick either input if it gets two NaNs of different bit patterns. > That way, those operations can generate different NaN bit patterns depending on if we use interpreter/C1/C2. > If we now convert to raw bits, we get different bits, and would wrongly conclude that we get a wrong result. > > Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29033#pullrequestreview-3626921086 From chagedorn at openjdk.org Mon Jan 5 13:56:23 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 13:56:23 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v6] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 12:14:15 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. >> >> The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for >> `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially >> introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. >> >> >> >> https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 >> >> In our case, it happens that the `Load` node gets folded to a constant during the initial >> `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being >> returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only >> has one usage, and this triggers the optimization during verification. >> >> >> static int test0() { >> var c = new MyClass(); >> // the conversion ensures that the ConL node only has one use >> // in the end, which triggers the optimization >> return (int) c.l; >> } >> >> >> The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, >> because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in >> `PhaseGVN::transform`. >> >> For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created >> and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with >> `can_reshape` later. >> >> >> This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` >> prevents its from occurring when boxing elimination is enabled. Boxing elimination is >> disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), >> which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear >> that the issue was on mainline. >> >> ### Testing >> - [x] [GitHub Actions](TODO) >> - [x] tier1-3, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright year in graphKit.cpp > - Update copyright year in node.cpp Looks good, thanks for updating the copyright years! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28448#pullrequestreview-3626922994 From roland at openjdk.org Mon Jan 5 14:04:09 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 14:04:09 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v2] In-Reply-To: <72DTyUKDKlyEKHeDIYF5GUV6u__CGKT5LDAKglL0s6M=.4eaf8397-02d7-48b0-9261-276ffeb236ed@github.com> References: <72DTyUKDKlyEKHeDIYF5GUV6u__CGKT5LDAKglL0s6M=.4eaf8397-02d7-48b0-9261-276ffeb236ed@github.com> Message-ID: <0x_C5-vZljPjiExoPBsPcdxT77UVCh6objTPPr1VD1o=.92722c22-a04d-4d89-ab43-b25748a22e5a@github.com> On Sat, 20 Dec 2025 01:39:52 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java >> >> Co-authored-by: Christian Hagedorn > > I'll running Oracle testing before approving. @dean-long @chhagedorn thanks for the reviews and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/28842#issuecomment-3710540340 From roland at openjdk.org Mon Jan 5 14:06:29 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 14:06:29 GMT Subject: Integrated: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 11:04:52 GMT, Roland Westrelin wrote: > A `CreateEx` gets sunk out of loop by > `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the > following logic: > > > return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && > in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); > > > in `CreateExNode::Identity()` triggers which leads to the crash > because `call->in(TypeFunc::Parms)` is not even an object in this > particular case. > > It's actually not clear to me what that logic in > `CreateExNode::Identity()` is expected to do and I wonder if it's > still needed. > > Anyway, the fix I propose is to skip `CreateEx` in > `PhaseIdealLoop::try_sink_out_of_loop()`. This pull request has now been integrated. Changeset: 6ae3e064 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/6ae3e064352a56c5be140fba1ad6d040219432b0 Stats: 159 lines in 3 files changed: 159 ins; 0 del; 0 mod 8373508: C2: sinking CreateEx out of loop breaks the graph Reviewed-by: chagedorn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/28842 From krk at openjdk.org Mon Jan 5 14:32:01 2026 From: krk at openjdk.org (Kerem Kat) Date: Mon, 5 Jan 2026 14:32:01 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v7] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - address comments - fix rename - rename test file - Merge branch 'master' into fix-c2-segfault-unlocknode - fix test spacing - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - ... and 2 more: https://git.openjdk.org/jdk/compare/3f9191b0...64e3dc5d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/b5e878c7..64e3dc5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=05-06 Stats: 36125 lines in 2664 files changed: 21296 ins; 6892 del; 7937 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From bmaillard at openjdk.org Mon Jan 5 14:42:10 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 14:42:10 GMT Subject: Integrated: 8367627: C2: Missed Ideal() optimization opportunity with MemBar In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 09:31:56 GMT, Beno?t Maillard wrote: > This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. > > The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for > `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially > introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. > > > > https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 > > In our case, it happens that the `Load` node gets folded to a constant during the initial > `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being > returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only > has one usage, and this triggers the optimization during verification. > > > static int test0() { > var c = new MyClass(); > // the conversion ensures that the ConL node only has one use > // in the end, which triggers the optimization > return (int) c.l; > } > > > The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, > because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in > `PhaseGVN::transform`. > > For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created > and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with > `can_reshape` later. > > > This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` > prevents its from occurring when boxing elimination is enabled. Boxing elimination is > disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), > which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear > that the issue was on mainline. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! This pull request has now been integrated. Changeset: 4458cab4 Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/4458cab4b0063f39333392321f542d0aa0db490d Stats: 97 lines in 3 files changed: 94 ins; 0 del; 3 mod 8367627: C2: Missed Ideal() optimization opportunity with MemBar Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28448 From roland at openjdk.org Mon Jan 5 16:30:13 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 16:30:13 GMT Subject: [jdk26] RFR: 8373524: C2: no reachable node should have no use Message-ID: Hi all, This pull request contains a backport of commit [e72f205a](https://github.com/openjdk/jdk/commit/e72f205ae312b15ebab0cbeedb73bbf86e485251) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Roland Westrelin on 19 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. Thanks! ------------- Commit messages: - Backport e72f205ae312b15ebab0cbeedb73bbf86e485251 Changes: https://git.openjdk.org/jdk/pull/29042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29042&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373524 Stats: 94 lines in 2 files changed: 91 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29042/head:pull/29042 PR: https://git.openjdk.org/jdk/pull/29042 From roland at openjdk.org Mon Jan 5 16:30:31 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 16:30:31 GMT Subject: [jdk26] RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis Message-ID: Hi all, This pull request contains a backport of commit [2ba423db](https://github.com/openjdk/jdk/commit/2ba423db9925355348106fc9fcf84450123d2605) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Roland Westrelin on 18 Dec 2025 and was reviewed by Roberto Casta?eda Lozano, Daniel Lund?n and Damon Fenacci. Thanks! ------------- Commit messages: - Backport 2ba423db9925355348106fc9fcf84450123d2605 Changes: https://git.openjdk.org/jdk/pull/29041/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29041&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370200 Stats: 195 lines in 6 files changed: 173 ins; 16 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/29041.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29041/head:pull/29041 PR: https://git.openjdk.org/jdk/pull/29041 From kxu at openjdk.org Mon Jan 5 16:34:13 2026 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 5 Jan 2026 16:34:13 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v27] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 49 commits: - Update license header years - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - remove trailing whitespaces - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - additional suggestions from code review - Apply suggestions from code review Co-authored-by: Christian Hagedorn - fix trip counter loop-variant detection - fix bad merge with ctrl_is_member() - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp - Merge branch 'master' into counted-loop-refactor - ... and 39 more: https://git.openjdk.org/jdk/compare/4458cab4...8b5dfad6 ------------- Changes: https://git.openjdk.org/jdk/pull/24458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=26 Stats: 1231 lines in 3 files changed: 626 ins; 295 del; 310 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From shade at openjdk.org Mon Jan 5 16:46:08 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 Jan 2026 16:46:08 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v7] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Enable more testing - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Update src/hotspot/share/compiler/compiler_globals.hpp Co-authored-by: Tobias Hartmann - Revert separate patch - Final - ... and 2 more: https://git.openjdk.org/jdk/compare/040ed4ab...dbd560dc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/2d02b713..dbd560dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=05-06 Stats: 63794 lines in 3039 files changed: 40196 ins; 14158 del; 9440 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From sviswanathan at openjdk.org Mon Jan 5 17:04:47 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 5 Jan 2026 17:04:47 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v3] In-Reply-To: References: Message-ID: On Fri, 2 Jan 2026 05:45:58 GMT, Jatin Bhateja wrote: >> Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. >> >> Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> PS: Validation performed using Intel SDE 9.58. > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8373724 > - 8373724: Assertion failure in TestSignumVector.java with UseAPX @jatin-bhateja Thanks for looking into this. There is a build failure in GHA with the following message: "src/hotspot/cpu/x86/x86.ad:2645:13: error: ?bool is_ndd_demotable(const MachNode*)? defined but not used [-Werror=unused-function]" ------------- PR Comment: https://git.openjdk.org/jdk/pull/28999#issuecomment-3711301714 From jbhateja at openjdk.org Mon Jan 5 17:45:39 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 Jan 2026 17:45:39 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v4] In-Reply-To: References: Message-ID: <0Mz4vIOBTO7xZMs7IJKmHKsV7KWyKipwBeWkpzENCBw=.b033cde2-02ba-4090-85e0-b607bb9bb74c@github.com> > Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. > > Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin > PS: Validation performed using Intel SDE 9.58. Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Use ASSERT pre-processor macro instead of PRODUCT to fix optimized build failure - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8373724 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8373724 - 8373724: Assertion failure in TestSignumVector.java with UseAPX ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28999/files - new: https://git.openjdk.org/jdk/pull/28999/files/2a63c92b..05db5651 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=02-03 Stats: 925 lines in 37 files changed: 628 ins; 238 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/28999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28999/head:pull/28999 PR: https://git.openjdk.org/jdk/pull/28999 From jbhateja at openjdk.org Mon Jan 5 17:50:10 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 Jan 2026 17:50:10 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v5] In-Reply-To: References: Message-ID: > Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. > > Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin > PS: Validation performed using Intel SDE 9.58. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28999/files - new: https://git.openjdk.org/jdk/pull/28999/files/05db5651..29093665 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28999/head:pull/28999 PR: https://git.openjdk.org/jdk/pull/28999 From duke at openjdk.org Mon Jan 5 18:40:51 2026 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 5 Jan 2026 18:40:51 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 21:50:27 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test, add temporary @IR rule for testLongRange and improve comments Oops, sorry for that! I've created #29045 to fix the issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3711646576 From duke at openjdk.org Mon Jan 5 18:45:42 2026 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 5 Jan 2026 18:45:42 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero Message-ID: This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero ------------- Commit messages: - Fix div by zero due to const 2 being zero cauing failing tests Changes: https://git.openjdk.org/jdk/pull/29045/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29045&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374436 Stats: 10 lines in 1 file changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29045.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29045/head:pull/29045 PR: https://git.openjdk.org/jdk/pull/29045 From kvn at openjdk.org Mon Jan 5 18:47:52 2026 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 Jan 2026 18:47:52 GMT Subject: RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> References: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> Message-ID: On Mon, 5 Jan 2026 07:37:34 GMT, Emanuel Peter wrote: > In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. > > Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. > > Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. > Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. > > --------------------------- > > **Details** > > Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. > > image > > `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. Good. I approve this conservative fix. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29028#pullrequestreview-3628029840 From kvn at openjdk.org Mon Jan 5 18:49:29 2026 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 Jan 2026 18:49:29 GMT Subject: RFR: 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 11:55:01 GMT, Emanuel Peter wrote: > In [JDK-8370922](https://bugs.openjdk.org/browse/JDK-8370922), I added Float16 support to the Template Library. I missed to tag Float16.float16ToRawShortBits as having non-deterministic result. > > We need to do that, just like for float and double equivalents: > Arithmetic operations (e.g. add, mul, fma, ...) are allowed to pick either input if it gets two NaNs of different bit patterns. > That way, those operations can generate different NaN bit patterns depending on if we use interpreter/C1/C2. > If we now convert to raw bits, we get different bits, and would wrongly conclude that we get a wrong result. > > Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29033#pullrequestreview-3628033479 From kxu at openjdk.org Mon Jan 5 18:58:49 2026 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 5 Jan 2026 18:58:49 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v27] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 16:34:13 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 49 commits: > > - Update license header years > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - remove trailing whitespaces > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - additional suggestions from code review > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - fix trip counter loop-variant detection > - fix bad merge with ctrl_is_member() > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > > # Conflicts: > # src/hotspot/share/opto/loopnode.cpp > - Merge branch 'master' into counted-loop-refactor > - ... and 39 more: https://git.openjdk.org/jdk/compare/4458cab4...8b5dfad6 Merged in the latest master and updated license headers. [counted-loop-refactor-old-vs-new](https://github.com/tabjy/jdk/commits/counted-loop-refactor-old-vs-new/) branch is also updated. Please note GHA job `linux-x64 / build (debug)` is currently failing across the jdk repo due to insufficient disk space. I'll try trigger it again tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3711694813 From sviswanathan at openjdk.org Mon Jan 5 21:43:30 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 5 Jan 2026 21:43:30 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v5] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 17:50:10 GMT, Jatin Bhateja wrote: >> Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. >> >> Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> PS: Validation performed using Intel SDE 9.58. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright src/hotspot/cpu/x86/x86.ad line 9947: > 9945: match(Set dst (AddI src1 (LoadI src2))); > 9946: effect(KILL cr); > 9947: flag(PD::Flag_sets_overflow_flag, PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_carry_flag, PD::Flag_sets_parity_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 10237: > 10235: match(Set dst (AddL src1 (LoadL src2))); > 10236: effect(KILL cr); > 10237: flag(PD::Flag_sets_overflow_flag, PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_carry_flag, PD::Flag_sets_parity_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 11585: > 11583: match(Set dst (MulL src1 (LoadL src2))); > 11584: effect(KILL cr); > 11585: flag(PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 13038: > 13036: match(Set dst (AndI src1 (LoadI src2))); > 13037: effect(KILL cr); > 13038: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 13683: > 13681: match(Set dst (AndL src1 (LoadL src2))); > 13682: effect(KILL cr); > 13683: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 14000: > 13998: match(Set dst (OrL src1 (LoadL src2))); > 13999: effect(KILL cr); > 14000: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 14182: > 14180: match(Set dst (XorL src1 (LoadL src2))); > 14181: effect(KILL cr); > 14182: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 here as the second operand is a memory operand. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662838200 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662836624 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662831056 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662827932 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662821662 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662367856 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662358281 From xgong at openjdk.org Tue Jan 6 02:47:14 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 6 Jan 2026 02:47:14 GMT Subject: [jdk26] RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently Message-ID: Hi all, This pull request contains a backport of commit [6eaabed5](https://github.com/openjdk/jdk/commit/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Xiaohong Gong on 5 Jan 2026 and was reviewed by Jie Fu, Jatin Bhateja, Eric Fang and Quan Anh Mai. Thanks! ------------- Commit messages: - Backport 6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca Changes: https://git.openjdk.org/jdk/pull/29053/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29053&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373722 Stats: 43 lines in 1 file changed: 1 ins; 32 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/29053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29053/head:pull/29053 PR: https://git.openjdk.org/jdk/pull/29053 From jbhateja at openjdk.org Tue Jan 6 04:20:14 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 Jan 2026 04:20:14 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v5] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:02:50 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright > > src/hotspot/cpu/x86/x86.ad line 14000: > >> 13998: match(Set dst (OrL src1 (LoadL src2))); >> 13999: effect(KILL cr); >> 14000: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); > > Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. We already have a check for [memory operands (mapping to multiple input edges](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L2664)) in place, ADLC generates DFA for both direct and flipped versions of the memory patterns. So Flag_ndd_demotable_opr2 will cover the flipped operand case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2663532523 From wenanjian at openjdk.org Tue Jan 6 06:22:34 2026 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 6 Jan 2026 06:22:34 GMT Subject: RFR: 8374184: RISC-V: implement GCM intrinsic with Zvkg and Zvkned extension Message-ID: This patch implement GCM intrinsic with Zvkg and Zvkned extension in RISCV. According to java api of `implGCMCrypt0` in `src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java`, we only deal with the data multiples of PARALLEL_LEN(512). Passed related jtreg in test/hotspot/jtreg/compiler/codegen/aes/ test/jdk/com/sun/crypto/ ------------- Commit messages: - modify x10 register use - modify register use - make some clean up - optimize tmp register use - change andr to andi - modify the input according to api and some name - RISC-V: implement GCM intrinsic with Zvkg and Zvkned extension Changes: https://git.openjdk.org/jdk/pull/28894/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28894&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374184 Stats: 128 lines in 1 file changed: 128 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28894.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28894/head:pull/28894 PR: https://git.openjdk.org/jdk/pull/28894 From thartmann at openjdk.org Tue Jan 6 07:25:00 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 Jan 2026 07:25:00 GMT Subject: [jdk26] RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 02:39:58 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [6eaabed5](https://github.com/openjdk/jdk/commit/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 5 Jan 2026 and was reviewed by Jie Fu, Jatin Bhateja, Eric Fang and Quan Anh Mai. > > Thanks! Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29053#pullrequestreview-3629740362 From xgong at openjdk.org Tue Jan 6 07:40:58 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 6 Jan 2026 07:40:58 GMT Subject: RFR: 8373344: Add support for min/max reduction operations for Float16 [v2] In-Reply-To: <72RiEFo5wji1vOtIRzFwO03_0OsaCe2zzHjp9YPD8-k=.6a88f4eb-3fff-4fe5-bb43-260d81b7954a@github.com> References: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> <72RiEFo5wji1vOtIRzFwO03_0OsaCe2zzHjp9YPD8-k=.6a88f4eb-3fff-4fe5-bb43-260d81b7954a@github.com> Message-ID: On Mon, 5 Jan 2026 11:31:26 GMT, Yi Wu wrote: >> This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations. >> Both floating point min/max reductions don?t require strict order, because they are associative. >> >> It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions. >> The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv. >> >> Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline. >> >> Neoverse N1 (UseSVE = 0, max vector length = 16B): >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionMaxFP16 256 thrpt 9 3.69 6.44 >> ReductionMaxFP16 512 thrpt 9 3.71 7.62 >> ReductionMaxFP16 1024 thrpt 9 4.16 8.64 >> ReductionMaxFP16 2048 thrpt 9 4.44 9.12 >> ReductionMinFP16 256 thrpt 9 3.69 6.43 >> ReductionMinFP16 512 thrpt 9 3.70 7.62 >> ReductionMinFP16 1024 thrpt 9 4.16 8.64 >> ReductionMinFP16 2048 thrpt 9 4.44 9.10 >> >> >> Neoverse V1 (UseSVE = 1, max vector length = 32B): >> >> Benchmark vectorDim Mode Cnt 8B 16B 32B >> ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 >> ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 >> ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 >> ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 >> ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 >> ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 >> ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 >> ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 >> >> >> Neoverse V2 (UseSVE = 2, max vector length = 16B): >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionMaxFP16 256 thrpt 9 4.78 10.00 >> ReductionMaxFP16 512 thrpt 9 3.74 11.33 >> ReductionMaxFP16 1024 thrpt 9 3.86 9.59 >> ReductionMaxFP16 2048 thrpt 9 3.94 8.71 >> ReductionMinFP16 256 thrpt 9 4.78 10.00 >> ReductionMinFP16 512 thrpt 9 3.74 11.29 >> ReductionMinFP16 1024 thrpt 9 3.86 9.58 >> ReductionMinFP16 2048 thrpt 9 3.94 8.71 >> >> >> Testing: >> hotspot_all, jdk (tier1-3) and langtools (tier1) all pass ... > > Yi Wu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Replace assert with verify > - Add IRNode constant and code refactor > - Merge remote-tracking branch 'origin/master' into yiwu-8373344 > - 8373344: Add support for FP16 min/max reduction operations > > This patch adds mid-end support for vectorized min/max reduction > operations for half floats. It also includes backend AArch64 support > for these operations. > Both floating point min/max reductions don?t require strict order, > because they are associative. > > It will generate NEON fminv/fmaxv reduction instructions when > max vector length is 8B or 16B. On SVE supporting machines > with vector lengths > 16B, it will generate the SVE fminv/fmaxv > instructions. > The patch also adds support for partial min/max reductions on > SVE machines using fminv/fmaxv. > > Ratio of throughput(ops/ms) > 1 indicates the performance with > this patch is better than the mainline. > > Neoverse N1 (UseSVE = 0, max vector length = 16B): > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 3.69 6.44 > ReductionMaxFP16 512 thrpt 9 3.71 7.62 > ReductionMaxFP16 1024 thrpt 9 4.16 8.64 > ReductionMaxFP16 2048 thrpt 9 4.44 9.12 > ReductionMinFP16 256 thrpt 9 3.69 6.43 > ReductionMinFP16 512 thrpt 9 3.70 7.62 > ReductionMinFP16 1024 thrpt 9 4.16 8.64 > ReductionMinFP16 2048 thrpt 9 4.44 9.10 > > Neoverse V1 (UseSVE = 1, max vector length = 32B): > Benchmark vectorDim Mode Cnt 8B 16B 32B > ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 > ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 > ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 > ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 > ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 > ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 > ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 > ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 > > Neoverse V2 (UseSVE = 2, max vector length = 16B): > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 t... src/hotspot/cpu/aarch64/aarch64_vector.ad line 381: > 379: case Op_XorReductionV: > 380: case Op_MinReductionVHF: > 381: case Op_MaxReductionVHF: We can use the NEON instructions if the vector size <= 16B as well for partial cases. Did you test the performance with NEON instead of using predicated SVE instructions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2663933727 From chagedorn at openjdk.org Tue Jan 6 07:53:31 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 07:53:31 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero That looks good, thanks for fixing this! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29045#pullrequestreview-3629813472 From hgreule at openjdk.org Tue Jan 6 08:04:31 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 6 Jan 2026 08:04:31 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 07:50:31 GMT, Christian Hagedorn wrote: >> This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 >> I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. >> The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. >> I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero > > That looks good, thanks for fixing this! @chhagedorn does this need a copyright update? Otherwise I updated it in #27886 already as I adjusted the test there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29045#issuecomment-3713582662 From chagedorn at openjdk.org Tue Jan 6 08:24:13 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 08:24:13 GMT Subject: [jdk26] RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 16:20:29 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [2ba423db](https://github.com/openjdk/jdk/commit/2ba423db9925355348106fc9fcf84450123d2605) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 18 Dec 2025 and was reviewed by Roberto Casta?eda Lozano, Daniel Lund?n and Damon Fenacci. > > Thanks! Looks good! I will submit some testing for it together with https://github.com/openjdk/jdk/pull/29042. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29041#pullrequestreview-3629909672 From chagedorn at openjdk.org Tue Jan 6 08:24:14 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 08:24:14 GMT Subject: [jdk26] RFR: 8373524: C2: no reachable node should have no use In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 16:20:56 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [e72f205a](https://github.com/openjdk/jdk/commit/e72f205ae312b15ebab0cbeedb73bbf86e485251) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 19 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. > > Thanks! Looks good! I will submit some testing for it together with https://github.com/openjdk/jdk/pull/29041. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29042#pullrequestreview-3629909720 From chagedorn at openjdk.org Tue Jan 6 08:33:29 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 08:33:29 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero Good catch! Yes, we should update it. I guess it does not hurt if we wait until you update it with your PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29045#issuecomment-3713674804 From epeter at openjdk.org Tue Jan 6 08:54:55 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 Jan 2026 08:54:55 GMT Subject: RFR: 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns In-Reply-To: References: Message-ID: <_PLUeKZwBJ17zJFW2nzUASQpFvsx592Kf4ZawxXn1jc=.3afa6dc5-72b1-42ea-b565-7f2753da1c8f@github.com> On Mon, 5 Jan 2026 18:46:10 GMT, Vladimir Kozlov wrote: >> In [JDK-8370922](https://bugs.openjdk.org/browse/JDK-8370922), I added Float16 support to the Template Library. I missed to tag Float16.float16ToRawShortBits as having non-deterministic result. >> >> We need to do that, just like for float and double equivalents: >> Arithmetic operations (e.g. add, mul, fma, ...) are allowed to pick either input if it gets two NaNs of different bit patterns. >> That way, those operations can generate different NaN bit patterns depending on if we use interpreter/C1/C2. >> If we now convert to raw bits, we get different bits, and would wrongly conclude that we get a wrong result. >> >> Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. > > Good. @vnkozlov @chhagedorn Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29033#issuecomment-3713746281 From epeter at openjdk.org Tue Jan 6 08:54:57 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 Jan 2026 08:54:57 GMT Subject: Integrated: 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 11:55:01 GMT, Emanuel Peter wrote: > In [JDK-8370922](https://bugs.openjdk.org/browse/JDK-8370922), I added Float16 support to the Template Library. I missed to tag Float16.float16ToRawShortBits as having non-deterministic result. > > We need to do that, just like for float and double equivalents: > Arithmetic operations (e.g. add, mul, fma, ...) are allowed to pick either input if it gets two NaNs of different bit patterns. > That way, those operations can generate different NaN bit patterns depending on if we use interpreter/C1/C2. > If we now convert to raw bits, we get different bits, and would wrongly conclude that we get a wrong result. > > Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. This pull request has now been integrated. Changeset: 2cb228e1 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/2cb228e142369ec73d768d8a69653a984b1c5908 Stats: 5 lines in 1 file changed: 2 ins; 1 del; 2 mod 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/29033 From xgong at openjdk.org Tue Jan 6 09:26:23 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 6 Jan 2026 09:26:23 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector Message-ID: ### Problem: Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: // A fatal error has been detected by the Java Runtime Environment: // // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector // ... The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 ### Root Cause: The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. Here is the simplified ideal graph showing the crash scenario: Con #top | ConI \ / \ / VectorStoreMask | VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong ### Detailed Scenario: Following is the method in the test case that hits the assertion: https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. When compiling a specific test case such as: https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() / \ AddP \ | \ LoadNClass \ ConP #IntMaxMask | | \ | | \ DecodeNClass | \ / | \ / | CmpP | | | Bool #ne | | / If / | / IfFalse / | / | / CheckCastPP # IntMaxMask | VectorUnbox # Start of inlining IntMaxMask::toLong() | \ ConI \ / VectorStoreMask | VectorMaskToLong The generated mask (`VectorBox`) is a `DoubleMaxMask`, but the code path expects an `IntMaxMask` for `IntMaxMask::toLong()`. Since this is an unreachable branch, the control input of `CheckCastPP` becomes `TOP` during IGVN, propagating the `TOP` type to subsequent data nodes until reaching `VectorStoreMask`. `VectorStoreMask` has another non-TOP input (`ConI`), which stops further `TOP` propagation. With stress VM options, the IGVN worklist order is shuffled, causing `VectorMaskToLongNode::Ideal()` to be invoked before dead path cleanup completes, which triggers the assertion failure. ### Solution: Replace `is_vect()` with the safer `isa_vect()`, which checks whether the type is a vector type before casting and returns `nullptr` if it is not. Additionally, check for `nullptr` and skip the optimization if the type check fails. An alternative solution would be to detect `top` inputs during IGVN for the relevant vector nodes and skip certain optimizations when such inputs are encountered. That is probably the right long-term direction. However, because this handling is currently missing for all vector nodes, I'd like to leave it as a separate follow-up topic for discussion. ### Testing: Ran the test 800 times on SVE/NEON/AVX2 systems with no failures observed. Note that no new test case was added because it is so challenging to me to reproduce this issue reliably. The issue depends on a specific IGVN optimization sequence that occurs non-deterministically due to the worklist shuffling behavior under stress VM options. [1] https://bugs.openjdk.org/browse/JDK-8367292 ------------- Commit messages: - 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector Changes: https://git.openjdk.org/jdk/pull/29057/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29057&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374043 Stats: 12 lines in 2 files changed: 7 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/29057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29057/head:pull/29057 PR: https://git.openjdk.org/jdk/pull/29057 From chagedorn at openjdk.org Tue Jan 6 10:27:30 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 10:27:30 GMT Subject: Integrated: 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 13:28:41 GMT, Christian Hagedorn wrote: > The usages of `LinearScanStatistic::Counter::counter_fpu_stack` were removed with [JDK-8351156](https://bugs.openjdk.org/browse/JDK-8351156) when the FPU stack support was removed. But the definition in the `Counter` enum was missed to clean up. When now printing the counters with `-XX:+CountLinearScan -XX:+CITime`, we reach a `ShouldNotReach` in `LinearScanStatistic::counter_name()` for `Counter::counter_fpu_stack` because there is no longer an entry there - it's dead. > > I added a simple hello world sanity test that runs with `-XX:+CountLinearScan -XX:+CITime`. > > Thanks, > Christian This pull request has now been integrated. Changeset: 938bbd5b Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/938bbd5b604e990514b64a0451ed1bceb07eb23b Stats: 39 lines in 2 files changed: 37 ins; 1 del; 1 mod 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack Reviewed-by: thartmann, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/29037 From epeter at openjdk.org Tue Jan 6 10:30:27 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 Jan 2026 10:30:27 GMT Subject: RFR: 8374528: C2 SuperWord: TestAliasingFuzzer.java strengthen no-multiversioning IR rule Message-ID: Now that some of the related issues are fixed, we should strengthen the IR rule for multiversioning. I ran the fuzzer test many times on many platforms. That is not proof that we won't find a failing case in the CI within a week. But it also is not helpful to run more tests before integration. If there are failures, we can just backout this change and address the broken example at that point. ------------- Commit messages: - JDK-8374528 Changes: https://git.openjdk.org/jdk/pull/29036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374528 Stats: 30 lines in 1 file changed: 0 ins; 15 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/29036.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29036/head:pull/29036 PR: https://git.openjdk.org/jdk/pull/29036 From chagedorn at openjdk.org Tue Jan 6 10:39:45 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 10:39:45 GMT Subject: RFR: 8374528: C2 SuperWord: TestAliasingFuzzer.java strengthen no-multiversioning IR rule In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 13:09:36 GMT, Emanuel Peter wrote: > Now that some of the related issues are fixed, we should strengthen the IR rule for multiversioning. > > I ran the fuzzer test many times on many platforms. That is not proof that we won't find a failing case in the CI within a week. But it also is not helpful to run more tests before integration. If there are failures, we can just backout this change and address the broken example at that point. Looks good! > But it also is not helpful to run more tests before integration. If there are failures, we can just backout this change and address the broken example at that point. Absolutely, I agree with that. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29036#pullrequestreview-3630390469 From krk at openjdk.org Tue Jan 6 11:42:15 2026 From: krk at openjdk.org (Kerem Kat) Date: Tue, 6 Jan 2026 11:42:15 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v7] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 14:32:01 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - ... and 2 more: https://git.openjdk.org/jdk/compare/a429b9dc...64e3dc5d Merged from master to take https://bugs.openjdk.org/browse/JDK-8374507 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3714385051 From krk at openjdk.org Tue Jan 6 11:42:10 2026 From: krk at openjdk.org (Kerem Kat) Date: Tue, 6 Jan 2026 11:42:10 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v8] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - address comments - fix rename - rename test file - Merge branch 'master' into fix-c2-segfault-unlocknode - fix test spacing - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - ... and 3 more: https://git.openjdk.org/jdk/compare/3586f365...8713f16d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/64e3dc5d..8713f16d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=06-07 Stats: 1445 lines in 220 files changed: 392 ins; 709 del; 344 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From dzhang at openjdk.org Tue Jan 6 12:52:24 2026 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 6 Jan 2026 12:52:24 GMT Subject: Integrated: 8374525: RISC-V: Several masked float16 vector operations are not supported In-Reply-To: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> References: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> Message-ID: On Mon, 5 Jan 2026 12:32:13 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, the masked versions of the following 8 Float16 operations are not supported. > But we return true in `Matcher::match_rule_supported_vector_masked` for these operations > on RISC-V platforms with Zvfh. We need to explicitly disable them on this CPU platform > to make it clear. > > Op_AddVHF: > Op_SubVHF: > Op_MulVHF: > Op_DivVHF: > Op_MaxVHF: > Op_MinVHF: > Op_SqrtVHF: > Op_FmaVHF: > > When the support for Float16 vector classes is added in VectorAPI and the masked > Float16 IR can be generated, these masked operations will be enabled and relevant > backend support added. This pull request has now been integrated. Changeset: df5b49e6 Author: Dingli Zhang URL: https://git.openjdk.org/jdk/commit/df5b49e604d3204c6383484ba3807d39abd0b0f1 Stats: 19 lines in 1 file changed: 17 ins; 2 del; 0 mod 8374525: RISC-V: Several masked float16 vector operations are not supported Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/29035 From dzhang at openjdk.org Tue Jan 6 12:52:24 2026 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 6 Jan 2026 12:52:24 GMT Subject: RFR: 8374525: RISC-V: Several masked float16 vector operations are not supported In-Reply-To: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> References: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> Message-ID: On Mon, 5 Jan 2026 12:32:13 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, the masked versions of the following 8 Float16 operations are not supported. > But we return true in `Matcher::match_rule_supported_vector_masked` for these operations > on RISC-V platforms with Zvfh. We need to explicitly disable them on this CPU platform > to make it clear. > > Op_AddVHF: > Op_SubVHF: > Op_MulVHF: > Op_DivVHF: > Op_MaxVHF: > Op_MinVHF: > Op_SqrtVHF: > Op_FmaVHF: > > When the support for Float16 vector classes is added in VectorAPI and the masked > Float16 IR can be generated, these masked operations will be enabled and relevant > backend support added. Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29035#issuecomment-3714580978 From thartmann at openjdk.org Tue Jan 6 14:00:37 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 Jan 2026 14:00:37 GMT Subject: RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> References: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> Message-ID: <4IzjJa5BUpNHSmMmUoafC3uyv0COPwPKIfFEYO_NnOE=.4a286d36-53db-4a9f-bec0-6b1fb1ef8503@github.com> On Mon, 5 Jan 2026 07:37:34 GMT, Emanuel Peter wrote: > In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. > > Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. > > Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. > Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. > > --------------------------- > > **Details** > > Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. > > image > > `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. Looks good to me too. Great that we have a regression test for this rare case now. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29028#pullrequestreview-3630997120 From thartmann at openjdk.org Tue Jan 6 14:20:09 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 Jan 2026 14:20:09 GMT Subject: RFR: 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap In-Reply-To: References: Message-ID: <0ZP63PHTbXgTcztA8wpWQd3Zj7YzLkOW9udimgYmSTs=.94d58938-cd00-4e85-80c1-2ca8b610afac@github.com> On Tue, 23 Dec 2025 18:16:33 GMT, Boris Ulasevich wrote: > We observed a deoptimization storm caused by GraphKit::uncommon_trap generator logic. GraphKit::uncommon_trap considers the too_many_recompiles metric. If the threshold is overflowed, it replaces Deoptimization::Action_reinterpret with Deoptimization::Action_none (see code snippet below). > > This replacement changes the uncommon_trap logic: once execution hits a trap, the VM performs deoptimization but does not recompile the method anymore. In an "unlucky" case, when the code part calling this uncommon_trap becomes frequent, a deoptimization storm occurs (thousands of deoptimizations per second) causing a significant performance drop. > > The original problematic method, which triggered repeated recompilations, is a high-performance compressed binary serialization algorithm with heavy use of conditional branches driven by bitmasks. See a standalone [synthetic benchmark](https://bugs.openjdk.org/secure/attachment/118045/UnstableIf.java) to reproduce the issue. > > The issue arises when the method overcomes a global recompilation threshold before stabilizing specific trap counters. > > Current thresholds: > - Recompilation Limit (too_many_recompiles): > Condition: decompile_count() >= (PerMethodRecompilationCutoff / 2) + 1 > Default: 201 (derived from default PerMethodRecompilationCutoff = 400). > - Specific Trap Limits (too_many_traps): > Checks if the trap count for a specific reason exceeds: > PerMethodTrapLimit (Default: 100) - for Reason_unstable_if, Reason_unstable_fused_if, etc. > PerMethodSpecTrapLimit (Default: 5000) - for Reason_speculate_class_check, Reason_speculate_null_check, etc. > > With the gived defaults, if the only reason for the method recompilation is unstable_if, the system stabilizes after 100 traps (PerMethodTrapLimit). However, if the method experiences traps and recompilations for different reasons, the total number of recompilations can exceed 200 before hitting the limit for unstable_if traps. This triggers Action_none and causes the deopt storm. > > The proposal is a minimal change in GraphKit::uncommon_trap: apply the same `too_many_recompiles` threshold inside `Parse::path_is_suitable_for_uncommon_trap` - this ensures that on the final recompilation C2 gets a hint not to speculate on untaken branches anymore. > > As an alternative solution, we can revisit GraphKit::uncommon_trap. This "Temporary fix" has persisted in the codebase for 17 years, so it is probably time to change it as well. Any comments are welcome > > case Deoptimization::Action_reinter... Is this related to [JDK-8243615](https://bugs.openjdk.org/browse/JDK-8243615)? Could you convert your `UnstableIf.java` test to a jtreg test? Maybe by running in a different process and counting the number of deoptimization events? [JDK-8243615](https://bugs.openjdk.org/browse/JDK-8243615) also has a test attached. ------------- PR Review: https://git.openjdk.org/jdk/pull/28966#pullrequestreview-3631065370 From thartmann at openjdk.org Tue Jan 6 14:45:39 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 Jan 2026 14:45:39 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 21:50:27 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test, add temporary @IR rule for testLongRange and improve comments Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3714967557 From thartmann at openjdk.org Tue Jan 6 14:45:44 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 Jan 2026 14:45:44 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero Looks good to me too. Thanks for quickly fixing this. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29045#pullrequestreview-3631165318 From epeter at openjdk.org Tue Jan 6 15:37:20 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 Jan 2026 15:37:20 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 207: > 205: @Run(test = {"testIntConstantFolding", "testIntConstantFoldingSpecialCase"}) > 206: public void checkIntConstants(RunInfo info) { > 207: if (INT_CONST_2 == 0) { Since you are working on this: Could `testIntRandomLimits` not also have a division by zero exception? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29045#discussion_r2665332193 From dfenacci at openjdk.org Tue Jan 6 16:21:14 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 6 Jan 2026 16:21:14 GMT Subject: RFR: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check Message-ID: # Issue The assertion https://github.com/openjdk/jdk/blob/1c16d8a900928ebfa5e2343cf33312539509e815/src/hotspot/share/opto/callGenerator.cpp#L421 in `LateInlineMHCallGenerator::do_late_inline_check` fails while running a 24h-long Renaissance benchmark test. # Cause The assert failure is due to both `!cg->is_late_inline()` and `cg->is_mh_late_inline()` being false because the CallGenerator `cg` is of type `LateInlineVirtualCallGenerator`. `cg` is created just above by calling `for_method_handle_inline`. The only way for `cg` to be of type `LateInlineVirtualCallGenerator` is that `for_method_handle_inline` (which finds out that the intrinsic id is `vmIntrinsics::_linkToInterface`) calls `optimize_virtual_call`, which apparently cannot devirtualize the call, and then calls `call_generator` and this creates and returns a `LateInlineVirtualCallGenerator` (at the end of the method). # Fix There seem to be no apparent reason why the CallGenerator returned by `for_method_handle_inline` couldn't be of type `LateInlineVirtualCallGenerator`. So the sensible fix is to relax the assert to accommodate this type of call generator. # Testing Tier 1-3+ Failing test before and after. Unfortunately it has proven impractical to create a specific test that consistently (or even only intermittently) reproduces the issue. ------------- Commit messages: - JDK-8342772: update copyright year - Merge branch 'master' into JDK-8342772 - JDK-8342772: new line - JDK-8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check Changes: https://git.openjdk.org/jdk/pull/28793/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28793&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342772 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28793.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28793/head:pull/28793 PR: https://git.openjdk.org/jdk/pull/28793 From sviswanathan at openjdk.org Tue Jan 6 16:54:50 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 6 Jan 2026 16:54:50 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v5] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 04:16:18 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 14000: >> >>> 13998: match(Set dst (OrL src1 (LoadL src2))); >>> 13999: effect(KILL cr); >>> 14000: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); >> >> Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. > > We already have a check for [memory operands (mapping to multiple input edges](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L2664)) in place, ADLC generates DFA for both direct and flipped versions of the memory patterns. So Flag_ndd_demotable_opr2 will cover the flipped operand case. Thanks for the clarification. May be we should then add PD::Flag_ndd_demotable_opr2 to the following as well to be consistent: xorI_rReg_rReg_mem_ndd orI_rReg_rReg_mem_ndd mulI_rReg_rReg_mem_ndd ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2665582938 From qamai at openjdk.org Tue Jan 6 16:57:18 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 6 Jan 2026 16:57:18 GMT Subject: RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> References: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> Message-ID: On Mon, 5 Jan 2026 07:37:34 GMT, Emanuel Peter wrote: > In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. > > Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. > > Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. > Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. > > --------------------------- > > **Details** > > Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. > > image > > `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29028#pullrequestreview-3631677709 From duke at openjdk.org Tue Jan 6 18:30:06 2026 From: duke at openjdk.org (Tobias Hotz) Date: Tue, 6 Jan 2026 18:30:06 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 15:34:06 GMT, Emanuel Peter wrote: >> This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 >> I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. >> The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. >> I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero > > test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 207: > >> 205: @Run(test = {"testIntConstantFolding", "testIntConstantFoldingSpecialCase"}) >> 206: public void checkIntConstants(RunInfo info) { >> 207: if (INT_CONST_2 == 0) { > > Since you are working on this: Could `testIntRandomLimits` not also have a division by zero exception? Yes, it could! But this case is already covered in https://github.com/openjdk/jdk/pull/29045/files#diff-6f6b705b394c4ecdf97f05cfa5b4bd12cbac18a60a95a1ec78c943d5055a0f80R501 (the code is a bit more complex since because of the clamping we can't just check a single value) I just forgot this case since the initial version of this test did not have random constants ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29045#discussion_r2665875848 From vlivanov at openjdk.org Tue Jan 6 22:18:29 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 Jan 2026 22:18:29 GMT Subject: RFR: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:39:00 GMT, Damon Fenacci wrote: > # Issue > The assertion > https://github.com/openjdk/jdk/blob/1c16d8a900928ebfa5e2343cf33312539509e815/src/hotspot/share/opto/callGenerator.cpp#L421 > in `LateInlineMHCallGenerator::do_late_inline_check` fails while running a 24h-long Renaissance benchmark test. > > # Cause > The assert failure is due to both `!cg->is_late_inline()` and `cg->is_mh_late_inline()` being false because the CallGenerator `cg` is of type `LateInlineVirtualCallGenerator`. `cg` is created just above by calling `for_method_handle_inline`. The only way for `cg` to be of type `LateInlineVirtualCallGenerator` is that `for_method_handle_inline` (which finds out that the intrinsic id is `vmIntrinsics::_linkToInterface`) calls `optimize_virtual_call`, which apparently cannot devirtualize the call, and then calls `call_generator` and this creates and returns a `LateInlineVirtualCallGenerator` (at the end of the method). > > # Fix > There seem to be no apparent reason why the CallGenerator returned by `for_method_handle_inline` couldn't be of type `LateInlineVirtualCallGenerator`. So the sensible fix is to relax the assert to accommodate this type of call generator. > > # Testing > Tier 1-3+ > Failing test before and after. > > Unfortunately it has proven impractical to create a specific test that consistently (or even only intermittently) reproduces the issue. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28793#pullrequestreview-3632709444 From jkarthikeyan at openjdk.org Tue Jan 6 23:48:09 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 6 Jan 2026 23:48:09 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v4] In-Reply-To: References: Message-ID: > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Update copyright year - Merge branch 'master' into jdk-8365570 - Remove CompLevel.C2 from test - Merge branch 'master' into jdk-8365570 - Update comment for constraint casts - Fix truncation assert for constraint casts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26827/files - new: https://git.openjdk.org/jdk/pull/26827/files/f433930e..ebe5a1d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=02-03 Stats: 75767 lines in 3383 files changed: 40560 ins; 15067 del; 20140 mod Patch: https://git.openjdk.org/jdk/pull/26827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26827/head:pull/26827 PR: https://git.openjdk.org/jdk/pull/26827 From jkarthikeyan at openjdk.org Tue Jan 6 23:53:18 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 6 Jan 2026 23:53:18 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v5] In-Reply-To: References: Message-ID: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Use Xcomp test run instead of Warmup(0) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26827/files - new: https://git.openjdk.org/jdk/pull/26827/files/ebe5a1d1..50bc1326 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26827/head:pull/26827 PR: https://git.openjdk.org/jdk/pull/26827 From jkarthikeyan at openjdk.org Tue Jan 6 23:53:20 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 6 Jan 2026 23:53:20 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 08:46:37 GMT, Tobias Hartmann wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comment for constraint casts > > Sounds good, thanks for the update! @TobiHartmann @chhagedorn May I have some reviews on the updated patch? Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3716759982 From jkarthikeyan at openjdk.org Tue Jan 6 23:53:21 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 6 Jan 2026 23:53:21 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> <8jnY6pqofieRIfV5fCqFxvHZMF3nAZbh7yAD7C_G5FU=.a12c98f6-c715-43f5-9528-62fcfdfc6e59@github.com> Message-ID: On Sat, 6 Dec 2025 20:24:28 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Thanks for looking into it! >> >> I would still add the fix, just in case. And I think the test as well, even if it does not reproduce any more. >> >> I was wondering: before the merge, when the test still reproduced: >> If you removed the `@Warmup(0)` and `CompLevel.C2`, and instead just do `framework.addFlags` with `-Xcomp`, would that reproduce too? If so, you could have a framework run with and one without Xcomp, the one with Xcomp also should have a compileonly. What do you think? >> >> Or we just push the patch as is, to be sure this is done and integrated. What do you think @chhagedorn ? > > Yep, I can replicate the crash on the old commit with `TestFramework.runWithFlags("-Xcomp", "-XX:CompileCommand=compileonly,*TestSubwordTruncation::*");` instead of `@Warmup(0)`. I think this would also be a good option, as it would let you get coverage with Xcomp on the other tests as well. I've pushed a commit that changes the Warmup(0) to the second test run. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2666621951 From vpaprotski at openjdk.org Wed Jan 7 00:22:35 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 7 Jan 2026 00:22:35 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: Message-ID: On Sat, 3 Jan 2026 00:23:13 GMT, Shawn M Emery wrote: >> This change allows use of the AVX512_VBMI/VMBI2 instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.2 to 0.5%, encapsulation is 0.3 to 1.5%, and decapsulation is 0 to 0.9%. >> >> Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright year "Insert 0b0000 nibble after every third nibble". I only have two questions, looks good otherwise. PS: things I've considered: - Loop controls? - ML_KEM.java guarantees (per callee comment and assert) lengths are multiple of 64 - also same as original code - Why not simply a vpermb? Have zeroes already from the masked load with k1.. - shuffle granularity is actually 4-bits, not 8-bits - logical shift already zeroes top bits, so `vpand` not required? - odd columns not shifted, so still have extra bits that need clearing - Why VBMI? - needed for `evpermb` src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 862: > 860: __ addptr(condensed, condensedOffs); > 861: > 862: if (VM_Version::supports_avx512_vbmi2()) { Which instruction needs vbmi2? All I could spot was that `evpermb` that needs vbmi. Relax the restriction slightly? src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 906: > 904: __ addptr(condensed, 192); > 905: __ addptr(parsed, 256); > 906: __ subl(parsedLength, 128); (128 instead of 256 here because `parsedLength` is an index to an `short` array..) I am confused by the stride. The `twelve2Sixteen()` seems to (almost) guarantee that the parsed length is a multiple of 64 (last block can be 48 bytes). This would imply a stride of 128 bytes for `parsed`. And 96 for `condensed`. This is exactly how the existing code already behaves so I am less concerned, but I would like an explanation why it works? ------------- PR Review: https://git.openjdk.org/jdk/pull/28815#pullrequestreview-3632845110 PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2666594767 PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2666663039 From jbhateja at openjdk.org Wed Jan 7 02:15:22 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 Jan 2026 02:15:22 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v6] In-Reply-To: References: Message-ID: > Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. > > Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin > PS: Validation performed using Intel SDE 9.58. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28999/files - new: https://git.openjdk.org/jdk/pull/28999/files/29093665..de6b115c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=04-05 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28999/head:pull/28999 PR: https://git.openjdk.org/jdk/pull/28999 From jbhateja at openjdk.org Wed Jan 7 02:15:23 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 Jan 2026 02:15:23 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v5] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 16:51:00 GMT, Sandhya Viswanathan wrote: >> We already have a check for [memory operands (mapping to multiple input edges](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L2664)) in place, ADLC generates DFA for both direct and flipped versions of the memory patterns. So Flag_ndd_demotable_opr2 will cover the flipped operand case. > > Thanks for the clarification. May be we should then add PD::Flag_ndd_demotable_opr2 to the following as well to be consistent: > xorI_rReg_rReg_mem_ndd > orI_rReg_rReg_mem_ndd > mulI_rReg_rReg_mem_ndd Thanks @sviswa7 , comment addressed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2666833538