From xgong at openjdk.org Fri Jan 2 03:00:57 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 2 Jan 2026 03:00:57 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests ping again~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3704368628 From jbhateja at openjdk.org Fri Jan 2 05:18:56 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 2 Jan 2026 05:18:56 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests LGTM Best Regards ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28960#pullrequestreview-3622220598 From jbhateja at openjdk.org Fri Jan 2 05:45:58 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 2 Jan 2026 05:45:58 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v3] In-Reply-To: References: Message-ID: > Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. > > Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin > PS: Validation performed using Intel SDE 9.58. Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8373724 - 8373724: Assertion failure in TestSignumVector.java with UseAPX ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28999/files - new: https://git.openjdk.org/jdk/pull/28999/files/bc86d54d..2a63c92b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=01-02 Stats: 2683 lines in 1256 files changed: 410 ins; 251 del; 2022 mod Patch: https://git.openjdk.org/jdk/pull/28999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28999/head:pull/28999 PR: https://git.openjdk.org/jdk/pull/28999 From bkilambi at openjdk.org Fri Jan 2 10:13:03 2026 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 2 Jan 2026 10:13:03 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v4] In-Reply-To: <4o-mRvCoV4nHqDouamLFsjYVVHhSuAOurJipQmy3xo8=.08cbadfa-ebdf-4c6b-a7f3-efe808f82b92@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <4o-mRvCoV4nHqDouamLFsjYVVHhSuAOurJipQmy3xo8=.08cbadfa-ebdf-4c6b-a7f3-efe808f82b92@github.com> Message-ID: On Wed, 24 Dec 2025 09:15:00 GMT, Jatin Bhateja wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > Common IR changes looks good to me, adding some minor comments. Hi @jatin-bhateja could you please take another look at the patch? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3704967816 From qamai at openjdk.org Fri Jan 2 15:42:10 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 Jan 2026 15:42:10 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests Thanks, LGTM. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/28960#pullrequestreview-3623364261 From duke at openjdk.org Sat Jan 3 00:23:13 2026 From: duke at openjdk.org (Shawn M Emery) Date: Sat, 3 Jan 2026 00:23:13 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: Message-ID: > This change allows use of the AVX512_VBMI/VMBI2 instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.2 to 0.5%, encapsulation is 0.3 to 1.5%, and decapsulation is 0 to 0.9%. > > Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Update copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28815/files - new: https://git.openjdk.org/jdk/pull/28815/files/d2cadaf9..7cd8de53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28815/head:pull/28815 PR: https://git.openjdk.org/jdk/pull/28815 From jiefu at openjdk.org Sat Jan 3 13:35:05 2026 From: jiefu at openjdk.org (Jie Fu) Date: Sat, 3 Jan 2026 13:35:05 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests Good. ------------- Marked as reviewed by jiefu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28960#pullrequestreview-3624339667 From jbhateja at openjdk.org Sun Jan 4 10:30:23 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 4 Jan 2026 10:30:23 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 29 Dec 2025 17:39:42 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Address review comments for the JTREG test and microbenchmark > - Merge branch 'master' > - Address review comments > - Fix build failures on Mac > - Address review comments > - Merge 'master' > - 8366444: Add support for add/mul reduction operations for Float16 > > This patch adds mid-end support for vectorized add/mul reduction > operations for half floats. It also includes backend aarch64 support for > these operations. Only vectorization support through autovectorization > is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate > the implementation to be strictly ordered. The following is how each of > these reductions is implemented for different aarch64 targets - > > For AddReduction : > On Neon only targets (UseSVE = 0): Generates scalarized additions > using the scalar "fadd" instruction for both 8B and 16B vector lengths. > This is because Neon does not provide a direct instruction for computing > strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the "fadda" instruction which > computes add reduction for floating point in strict order. > > For MulReduction : > Both Neon and SVE do not provide a direct instruction for computing > strictly ordered floating point multiply reduction. For vector lengths > of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is > generated and multiply reduction for vector lengths > 16B is not > supported. > > Below is the performance of the two newly added microbenchmarks in > Float16OperationsBenchmark.java tested on three different aarch64 > machines and with varying MaxVectorSize - > > Note: On all machines, the score (ops/ms) is compared with the master > branch without this patch which generates a sequence of loads ("ldrsh") > to load the FP16 value into an FPR and a scalar "fadd/fmul" to > add/multiply the loaded value to the running sum/product. The ratios > given below are the ratios between the throughput with this patch and > the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the > master branch. > > N1 (UseSVE = 0... Common IR changes looks good to me. Best Regards, Jatin ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27526#pullrequestreview-3624931740 From xgong at openjdk.org Mon Jan 5 01:58:14 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 5 Jan 2026 01:58:14 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests Thanks for all the review and comments! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3708682947 From xgong at openjdk.org Mon Jan 5 01:58:15 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 5 Jan 2026 01:58:15 GMT Subject: Integrated: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 23 Dec 2025 06:45:46 GMT, Xiaohong Gong wrote: > The test fails intermittently with the following error: > > > Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) > > > The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. > > For example, given array elements: > > [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] > > > Sequential scalar addition produces: > > 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f > > > However, `reduceLanes()` might compute: > > (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL > > > The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. > > Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. > > This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. > > Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. > > [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) This pull request has now been integrated. Changeset: 6eaabed5 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca Stats: 43 lines in 1 file changed: 1 ins; 32 del; 10 mod 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently Reviewed-by: jiefu, jbhateja, erfang, qamai ------------- PR: https://git.openjdk.org/jdk/pull/28960 From shade at openjdk.org Mon Jan 5 06:38:07 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 Jan 2026 06:38:07 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v10] In-Reply-To: References: Message-ID: > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - More comments - Tighten up the comments - Simplify third case: no need to loop, just restart the search - Actually have a second "fast" case: receiver is not found in the table, and the table is full - Pushing/popping for rare CAS path is counter-productive - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Tighten up some more - ... and 13 more: https://git.openjdk.org/jdk/compare/6eaabed5...e4a4719f ------------- Changes: https://git.openjdk.org/jdk/pull/25305/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=09 Stats: 418 lines in 8 files changed: 202 ins; 197 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From chagedorn at openjdk.org Mon Jan 5 07:47:05 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 07:47:05 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v4] In-Reply-To: References: Message-ID: <2ct7k0J84Z7D5BbrauNhR4ATvjTNnbYe7Wbjo9xgIF8=.41d0c040-76b8-478b-817a-4efba252e67d@github.com> On Fri, 12 Dec 2025 18:56:18 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. >> >> The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for >> `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially >> introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. >> >> >> >> https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 >> >> In our case, it happens that the `Load` node gets folded to a constant during the initial >> `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being >> returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only >> has one usage, and this triggers the optimization during verification. >> >> >> static int test0() { >> var c = new MyClass(); >> // the conversion ensures that the ConL node only has one use >> // in the end, which triggers the optimization >> return (int) c.l; >> } >> >> >> The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, >> because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in >> `PhaseGVN::transform`. >> >> For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created >> and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with >> `can_reshape` later. >> >> >> This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` >> prevents its from occurring when boxing elimination is enabled. Boxing elimination is >> disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), >> which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear >> that the issue was on mainline. >> >> ### Testing >> - [x] [GitHub Actions](TODO) >> - [x] tier1-3, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Move to igvn directory and use test.main.class Looks good to me, too, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28448#pullrequestreview-3625777581 From chagedorn at openjdk.org Mon Jan 5 07:47:06 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 07:47:06 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> Message-ID: On Fri, 12 Dec 2025 18:53:17 GMT, Beno?t Maillard wrote: >> test/hotspot/jtreg/compiler/c2/igvn/TestMissingOptMemBarRemovePrecedentEdge.java line 2: >> >>> (failed to retrieve contents of file, check the PR for context) >> Should the test go into an `igvn` directory? Or something else a bit more specific? > > Moved it to `compiler/c2/igvn` I guess we could clean this up at some point. We now have `igvn`, `c2/igvn`, and `c2/gvn`. And some other C2 specific tests are in folders inside `c2` while others are in the base directory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2660559734 From chagedorn at openjdk.org Mon Jan 5 07:47:07 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 07:47:07 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> Message-ID: <7wuX5LiMHSgw9nk11rPHxoSIBJpCrkWZeQpVkRLkRJ0=.cc7fb403-6a41-4941-8502-c5a88175cfab@github.com> On Mon, 5 Jan 2026 07:41:07 GMT, Christian Hagedorn wrote: >> Moved it to `compiler/c2/igvn` > > I guess we could clean this up at some point. We now have `igvn`, `c2/igvn`, and `c2/gvn`. And some other C2 specific tests are in folders inside `c2` while others are in the base directory. Suggestion: * Copyright (c) 2026, Oracle and/or its affiliates. All rights reserved. You are probably also the first one this year to change `node.cpp` and `graphKit.cpp`, so we need an update there as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2660564360 From hgreule at openjdk.org Mon Jan 5 07:57:35 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 5 Jan 2026 07:57:35 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v6] In-Reply-To: References: Message-ID: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: update copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27886/files - new: https://git.openjdk.org/jdk/pull/27886/files/db8fd790..86f2ead8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=04-05 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/27886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27886/head:pull/27886 PR: https://git.openjdk.org/jdk/pull/27886 From hgreule at openjdk.org Mon Jan 5 08:03:11 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 5 Jan 2026 08:03:11 GMT Subject: RFR: 8373555: C2: Optimize redundant input calculations for sign comparisons [v2] In-Reply-To: References: Message-ID: > Instead of sign-comparisons with And,Or,Xor,Max,Min nodes, we can directly compare to one of the inputs of the binary nodes if the other input is irrelevant to the comparison. > > There are potentially more operations, but these mentioned here are the most obvious ones. Max and Min could theoretically be expanded to arbitrary comparisons to constants, but I didn't want to introduce more complexity for now. > > Please let me know what you think :) Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: update copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28782/files - new: https://git.openjdk.org/jdk/pull/28782/files/e007f6c9..d298bf21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28782&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28782&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28782/head:pull/28782 PR: https://git.openjdk.org/jdk/pull/28782 From hgreule at openjdk.org Mon Jan 5 08:03:13 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 5 Jan 2026 08:03:13 GMT Subject: RFR: 8373555: C2: Optimize redundant input calculations for sign comparisons [v2] In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 15:49:53 GMT, Galder Zamarre?o wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyright year > > Neat! At a glance I don't see anything wrong. Just a small question: what testing did you carry out? @galderz thanks, I mainly tested `test/hotspot/jtreg:tier1` and the tests running GHA. It would be great if someone else could submit more tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28782#issuecomment-3709326006 From bmaillard at openjdk.org Mon Jan 5 08:13:55 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 08:13:55 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v5] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. > > The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for > `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially > introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. > > > > https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 > > In our case, it happens that the `Load` node gets folded to a constant during the initial > `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being > returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only > has one usage, and this triggers the optimization during verification. > > > static int test0() { > var c = new MyClass(); > // the conversion ensures that the ConL node only has one use > // in the end, which triggers the optimization > return (int) c.l; > } > > > The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, > because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in > `PhaseGVN::transform`. > > For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created > and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with > `can_reshape` later. > > > This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` > prevents its from occurring when boxing elimination is enabled. Boxing elimination is > disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), > which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear > that the issue was on mainline. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/c2/igvn/TestMissingOptMemBarRemovePrecedentEdge.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28448/files - new: https://git.openjdk.org/jdk/pull/28448/files/a32ee08c..bbb7181b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28448/head:pull/28448 PR: https://git.openjdk.org/jdk/pull/28448 From erfang at openjdk.org Mon Jan 5 08:14:10 2026 From: erfang at openjdk.org (Eric Fang) Date: Mon, 5 Jan 2026 08:14:10 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Update copyright year to 2026 - Merge branch 'master' into JDK-8370863-mask-cast-opt - Convert the check condition for vector length into an assertion Also refined the tests. - Refine code comments - Merge branch 'master' into JDK-8370863-mask-cast-opt - Merge branch 'master' into JDK-8370863-mask-cast-opt - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java - Refine the test code and comments - Merge branch 'master' into JDK-8370863-mask-cast-opt - Don't read and write the same memory in the JMH benchmarks - ... and 2 more: https://git.openjdk.org/jdk/compare/6eaabed5...9c38a6d9 ------------- Changes: https://git.openjdk.org/jdk/pull/28313/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=07 Stats: 643 lines in 7 files changed: 528 ins; 16 del; 99 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From roland at openjdk.org Mon Jan 5 08:45:27 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 08:45:27 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v3] In-Reply-To: References: Message-ID: > A `CreateEx` gets sunk out of loop by > `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the > following logic: > > > return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && > in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); > > > in `CreateExNode::Identity()` triggers which leads to the crash > because `call->in(TypeFunc::Parms)` is not even an object in this > particular case. > > It's actually not clear to me what that logic in > `CreateExNode::Identity()` is expected to do and I wonder if it's > still needed. > > Anyway, the fix I propose is to skip `CreateEx` in > `PhaseIdealLoop::try_sink_out_of_loop()`. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into JDK-8373508 - Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java Co-authored-by: Christian Hagedorn - whitespaces - tests - more - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28842/files - new: https://git.openjdk.org/jdk/pull/28842/files/e4bdff59..968ebef2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28842&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28842&range=01-02 Stats: 16526 lines in 2401 files changed: 8803 ins; 2140 del; 5583 mod Patch: https://git.openjdk.org/jdk/pull/28842.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28842/head:pull/28842 PR: https://git.openjdk.org/jdk/pull/28842 From bmaillard at openjdk.org Mon Jan 5 09:02:33 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 09:02:33 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: <7wuX5LiMHSgw9nk11rPHxoSIBJpCrkWZeQpVkRLkRJ0=.cc7fb403-6a41-4941-8502-c5a88175cfab@github.com> References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> <7wuX5LiMHSgw9nk11rPHxoSIBJpCrkWZeQpVkRLkRJ0=.cc7fb403-6a41-4941-8502-c5a88175cfab@github.com> Message-ID: On Mon, 5 Jan 2026 07:43:27 GMT, Christian Hagedorn wrote: >> I guess we could clean this up at some point. We now have `igvn`, `c2/igvn`, and `c2/gvn`. And some other C2 specific tests are in folders inside `c2` while others are in the base directory. > > Suggestion: > > * Copyright (c) 2026, Oracle and/or its affiliates. All rights reserved. > > > You are probably also the first one this year to change `node.cpp` and `graphKit.cpp`, so we need an update there as well. @chhagedorn I filed [JDK-8374511](https://bugs.openjdk.org/browse/JDK-8374511) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2660744502 From dfenacci at openjdk.org Mon Jan 5 09:08:02 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 5 Jan 2026 09:08:02 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 02:01:10 GMT, Vladimir Ivanov wrote: > Strength-reducing an interface call to a virtual call for interfaces with > unique implementors can use receiver type information to narrow the context. > > C2 tracks interface types and receiver type information can be used to reveal > an interface with a unique implementor which can't be derived from the call > site itself. > > Since C2 effectively accumulates a union interface type from multiple subtype checks, iterating over individual components of a type may reveal a candidate for a strength-reduction. The only prerequisite is that a candidate has to be a subtype of the declared interface. > > Testing: hs-tier1 - hs-tier5 src/hotspot/share/opto/doCall.cpp line 340: > 338: // number of implementors for decl_interface is 0 or 1. If > 339: // it's 0 then no class implements decl_interface and there's > 340: // no point in inlining. Does the above comment still hold? Or did you remove it because it is not relevant anymore? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2640468378 From shade at openjdk.org Mon Jan 5 09:40:20 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 Jan 2026 09:40:20 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v10] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 06:38:07 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - More comments > - Tighten up the comments > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - ... and 13 more: https://git.openjdk.org/jdk/compare/6eaabed5...e4a4719f Remerged from master, re-ran `tier1` and `hotspot_compiler` tests on Linux x86_64, all clean. There is an unrelated GHA infra failure (https://github.com/openjdk/jdk/pull/29030), which IMO does not block the integration, as at least Windows x86_64 passed in GHA, and Linux x86_64 passes locally. Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3709622490 PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3709623137 From shade at openjdk.org Mon Jan 5 09:40:23 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 Jan 2026 09:40:23 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 15:23:42 GMT, Aleksey Shipilev wrote: > I'll task one of our folks to do it after NY break. That would be: https://bugs.openjdk.org/browse/JDK-8374513 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3709629142 From shade at openjdk.org Mon Jan 5 09:40:24 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 Jan 2026 09:40:24 GMT Subject: Integrated: 8357258: x86: Improve receiver type profiling reliability In-Reply-To: References: Message-ID: On Mon, 19 May 2025 14:59:36 GMT, Aleksey Shipilev wrote: > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: e676c9de Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e676c9de3da3b820081cde1b11c0df3129787130 Stats: 418 lines in 8 files changed: 202 ins; 197 del; 19 mod 8357258: x86: Improve receiver type profiling reliability Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/25305 From bmaillard at openjdk.org Mon Jan 5 10:38:39 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 10:38:39 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v4] In-Reply-To: <2ct7k0J84Z7D5BbrauNhR4ATvjTNnbYe7Wbjo9xgIF8=.41d0c040-76b8-478b-817a-4efba252e67d@github.com> References: <2ct7k0J84Z7D5BbrauNhR4ATvjTNnbYe7Wbjo9xgIF8=.41d0c040-76b8-478b-817a-4efba252e67d@github.com> Message-ID: On Mon, 5 Jan 2026 07:43:52 GMT, Christian Hagedorn wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to igvn directory and use test.main.class > > Looks good to me, too, thanks! Thank you for reviewing @chhagedorn @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28448#issuecomment-3709866622 From duke at openjdk.org Mon Jan 5 11:31:26 2026 From: duke at openjdk.org (Yi Wu) Date: Mon, 5 Jan 2026 11:31:26 GMT Subject: RFR: 8373344: Add support for min/max reduction operations for Float16 [v2] In-Reply-To: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> References: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> Message-ID: <72RiEFo5wji1vOtIRzFwO03_0OsaCe2zzHjp9YPD8-k=.6a88f4eb-3fff-4fe5-bb43-260d81b7954a@github.com> > This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations. > Both floating point min/max reductions don?t require strict order, because they are associative. > > It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions. > The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv. > > Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline. > > Neoverse N1 (UseSVE = 0, max vector length = 16B): > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 3.69 6.44 > ReductionMaxFP16 512 thrpt 9 3.71 7.62 > ReductionMaxFP16 1024 thrpt 9 4.16 8.64 > ReductionMaxFP16 2048 thrpt 9 4.44 9.12 > ReductionMinFP16 256 thrpt 9 3.69 6.43 > ReductionMinFP16 512 thrpt 9 3.70 7.62 > ReductionMinFP16 1024 thrpt 9 4.16 8.64 > ReductionMinFP16 2048 thrpt 9 4.44 9.10 > > > Neoverse V1 (UseSVE = 1, max vector length = 32B): > > Benchmark vectorDim Mode Cnt 8B 16B 32B > ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 > ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 > ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 > ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 > ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 > ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 > ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 > ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 > > > Neoverse V2 (UseSVE = 2, max vector length = 16B): > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 4.78 10.00 > ReductionMaxFP16 512 thrpt 9 3.74 11.33 > ReductionMaxFP16 1024 thrpt 9 3.86 9.59 > ReductionMaxFP16 2048 thrpt 9 3.94 8.71 > ReductionMinFP16 256 thrpt 9 4.78 10.00 > ReductionMinFP16 512 thrpt 9 3.74 11.29 > ReductionMinFP16 1024 thrpt 9 3.86 9.58 > ReductionMinFP16 2048 thrpt 9 3.94 8.71 > > > Testing: > hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse N1/V1/V2. Yi Wu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Replace assert with verify - Add IRNode constant and code refactor - Merge remote-tracking branch 'origin/master' into yiwu-8373344 - 8373344: Add support for FP16 min/max reduction operations This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations. Both floating point min/max reductions don?t require strict order, because they are associative. It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions. The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv. Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline. Neoverse N1 (UseSVE = 0, max vector length = 16B): Benchmark vectorDim Mode Cnt 8B 16B ReductionMaxFP16 256 thrpt 9 3.69 6.44 ReductionMaxFP16 512 thrpt 9 3.71 7.62 ReductionMaxFP16 1024 thrpt 9 4.16 8.64 ReductionMaxFP16 2048 thrpt 9 4.44 9.12 ReductionMinFP16 256 thrpt 9 3.69 6.43 ReductionMinFP16 512 thrpt 9 3.70 7.62 ReductionMinFP16 1024 thrpt 9 4.16 8.64 ReductionMinFP16 2048 thrpt 9 4.44 9.10 Neoverse V1 (UseSVE = 1, max vector length = 32B): Benchmark vectorDim Mode Cnt 8B 16B 32B ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 Neoverse V2 (UseSVE = 2, max vector length = 16B): Benchmark vectorDim Mode Cnt 8B 16B ReductionMaxFP16 256 thrpt 9 4.78 10.00 ReductionMaxFP16 512 thrpt 9 3.74 11.33 ReductionMaxFP16 1024 thrpt 9 3.86 9.59 ReductionMaxFP16 2048 thrpt 9 3.94 8.71 ReductionMinFP16 256 thrpt 9 4.78 10.00 ReductionMinFP16 512 thrpt 9 3.74 11.29 ReductionMinFP16 1024 thrpt 9 3.86 9.58 ReductionMinFP16 2048 thrpt 9 3.94 8.71 Testing: hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse N1/V1/V2. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28828/files - new: https://git.openjdk.org/jdk/pull/28828/files/2f80bc4f..9971752e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=00-01 Stats: 17385 lines in 2438 files changed: 9261 ins; 2408 del; 5716 mod Patch: https://git.openjdk.org/jdk/pull/28828.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28828/head:pull/28828 PR: https://git.openjdk.org/jdk/pull/28828 From duke at openjdk.org Mon Jan 5 11:33:33 2026 From: duke at openjdk.org (Yi Wu) Date: Mon, 5 Jan 2026 11:33:33 GMT Subject: RFR: 8373344: Add support for min/max reduction operations for Float16 [v2] In-Reply-To: References: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> Message-ID: On Mon, 22 Dec 2025 09:40:42 GMT, Galder Zamarre?o wrote: >> Yi Wu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Replace assert with verify >> - Add IRNode constant and code refactor >> - Merge remote-tracking branch 'origin/master' into yiwu-8373344 >> - 8373344: Add support for FP16 min/max reduction operations >> >> This patch adds mid-end support for vectorized min/max reduction >> operations for half floats. It also includes backend AArch64 support >> for these operations. >> Both floating point min/max reductions don?t require strict order, >> because they are associative. >> >> It will generate NEON fminv/fmaxv reduction instructions when >> max vector length is 8B or 16B. On SVE supporting machines >> with vector lengths > 16B, it will generate the SVE fminv/fmaxv >> instructions. >> The patch also adds support for partial min/max reductions on >> SVE machines using fminv/fmaxv. >> >> Ratio of throughput(ops/ms) > 1 indicates the performance with >> this patch is better than the mainline. >> >> Neoverse N1 (UseSVE = 0, max vector length = 16B): >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionMaxFP16 256 thrpt 9 3.69 6.44 >> ReductionMaxFP16 512 thrpt 9 3.71 7.62 >> ReductionMaxFP16 1024 thrpt 9 4.16 8.64 >> ReductionMaxFP16 2048 thrpt 9 4.44 9.12 >> ReductionMinFP16 256 thrpt 9 3.69 6.43 >> ReductionMinFP16 512 thrpt 9 3.70 7.62 >> ReductionMinFP16 1024 thrpt 9 4.16 8.64 >> ReductionMinFP16 2048 thrpt 9 4.44 9.10 >> >> Neoverse V1 (UseSVE = 1, max vector length = 32B): >> Benchmark vectorDim Mode Cnt 8B 16B 32B >> ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 >> ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 >> ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 >> ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 >> ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 >> ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 >> ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 >> ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 >> >> Neoverse V2 (UseSVE = 2, max vector length = 16B)... > > Thanks @yiwu0b11, some superficial comments Thanks @galderz for the code review, I've updated the code and also replaced assert with [verify](https://github.com/openjdk/jdk/pull/28095) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28828#issuecomment-3710056269 From chagedorn at openjdk.org Mon Jan 5 11:34:10 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 11:34:10 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> <7wuX5LiMHSgw9nk11rPHxoSIBJpCrkWZeQpVkRLkRJ0=.cc7fb403-6a41-4941-8502-c5a88175cfab@github.com> Message-ID: On Mon, 5 Jan 2026 08:59:24 GMT, Beno?t Maillard wrote: >> Suggestion: >> >> * Copyright (c) 2026, Oracle and/or its affiliates. All rights reserved. >> >> >> You are probably also the first one this year to change `node.cpp` and `graphKit.cpp`, so we need an update there as well. > > @chhagedorn I filed [JDK-8374511](https://bugs.openjdk.org/browse/JDK-8374511) > You are probably also the first one this year to change node.cpp and graphKit.cpp, so we need an update there as well. Can you double-check if you also need to update those with latest master? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2661187908 From epeter at openjdk.org Mon Jan 5 11:38:55 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 Jan 2026 11:38:55 GMT Subject: RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs Message-ID: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. --------------------------- **Details** Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. image `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. ------------- Commit messages: - fix - Merge branch 'master' into JDK-8373453-SW-same-input-v2 - JDK-8373453 Changes: https://git.openjdk.org/jdk/pull/29028/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29028&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373453 Stats: 116 lines in 3 files changed: 109 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/29028.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29028/head:pull/29028 PR: https://git.openjdk.org/jdk/pull/29028 From thartmann at openjdk.org Mon Jan 5 12:00:15 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 5 Jan 2026 12:00:15 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: <5swbCzLMjJxib2WAy0PoLxwSbnID63a-1mygNQSTol8=.f78e3810-1861-4a36-8999-e14a0b2d7353@github.com> On Wed, 17 Dec 2025 17:43:34 GMT, Tobias Hotz wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplify test, add temporary @IR rule for testLongRange and improve comments > > Thanks everybody! @ichttt There is a bug in the test, could you please have a look at [JDK-8374436](https://bugs.openjdk.org/browse/JDK-8374436)? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3710134476 From bmaillard at openjdk.org Mon Jan 5 12:14:17 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 12:14:17 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> <7wuX5LiMHSgw9nk11rPHxoSIBJpCrkWZeQpVkRLkRJ0=.cc7fb403-6a41-4941-8502-c5a88175cfab@github.com> Message-ID: On Mon, 5 Jan 2026 11:30:51 GMT, Christian Hagedorn wrote: > Can you double-check if you also need to update those with latest master? Sorry @chhagedorn, I fell into the trap of only reading the suggested change. I have checked, and we need to change those indeed. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2661288840 From bmaillard at openjdk.org Mon Jan 5 12:14:15 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 12:14:15 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v6] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. > > The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for > `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially > introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. > > > > https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 > > In our case, it happens that the `Load` node gets folded to a constant during the initial > `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being > returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only > has one usage, and this triggers the optimization during verification. > > > static int test0() { > var c = new MyClass(); > // the conversion ensures that the ConL node only has one use > // in the end, which triggers the optimization > return (int) c.l; > } > > > The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, > because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in > `PhaseGVN::transform`. > > For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created > and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with > `can_reshape` later. > > > This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` > prevents its from occurring when boxing elimination is enabled. Boxing elimination is > disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), > which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear > that the issue was on mainline. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: - Update copyright year in graphKit.cpp - Update copyright year in node.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28448/files - new: https://git.openjdk.org/jdk/pull/28448/files/bbb7181b..00b169b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=04-05 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28448/head:pull/28448 PR: https://git.openjdk.org/jdk/pull/28448 From epeter at openjdk.org Mon Jan 5 12:38:27 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 Jan 2026 12:38:27 GMT Subject: RFR: 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns Message-ID: In [JDK-8370922](https://bugs.openjdk.org/browse/JDK-8370922), I added Float16 support to the Template Library. I missed to tag Float16.float16ToRawShortBits as having non-deterministic result. We need to do that, just like for float and double equivalents: Arithmetic operations (e.g. add, mul, fma, ...) are allowed to pick either input if it gets two NaNs of different bit patterns. That way, those operations can generate different NaN bit patterns depending on if we use interpreter/C1/C2. If we now convert to raw bits, we get different bits, and would wrongly conclude that we get a wrong result. Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. ------------- Commit messages: - JDK-8374489 Changes: https://git.openjdk.org/jdk/pull/29033/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29033&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374489 Stats: 5 lines in 1 file changed: 2 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29033/head:pull/29033 PR: https://git.openjdk.org/jdk/pull/29033 From dzhang at openjdk.org Mon Jan 5 12:40:35 2026 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 5 Jan 2026 12:40:35 GMT Subject: RFR: 8374525: RISC-V: Several masked float16 vector operations are not supported Message-ID: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> Hi, Can you help to review this patch? Thanks! Currently, the masked versions of the following 8 Float16 operations are not supported. But we return true in `Matcher::match_rule_supported_vector_masked` for these operations on RISC-V platforms with Zvfh. We need to explicitly disable them on this CPU platform to make it clear. Op_AddVHF: Op_SubVHF: Op_MulVHF: Op_DivVHF: Op_MaxVHF: Op_MinVHF: Op_SqrtVHF: Op_FmaVHF: When the support for Float16 vector classes is added in VectorAPI and the masked Float16 IR can be generated, these masked operations will be enabled and relevant backend support added. ------------- Commit messages: - 8374525: RISC-V: Several masked float16 vector operations are not supported Changes: https://git.openjdk.org/jdk/pull/29035/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29035&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374525 Stats: 19 lines in 1 file changed: 17 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29035/head:pull/29035 PR: https://git.openjdk.org/jdk/pull/29035 From bkilambi at openjdk.org Mon Jan 5 12:43:36 2026 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 5 Jan 2026 12:43:36 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> Message-ID: On Thu, 11 Dec 2025 12:06:49 GMT, Marc Chevalier wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > As for the IR verification failure, I've looked a bit and couldn't find such an issue already. Since it reproduces on master, I suggest you file a ticket, indeed. Thanks! Hi @marc-chevalier @eme64 Would you please be able to run some testing internally before I integrate this patch? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3710264065 From fjiang at openjdk.org Mon Jan 5 12:46:10 2026 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 5 Jan 2026 12:46:10 GMT Subject: RFR: 8374525: RISC-V: Several masked float16 vector operations are not supported In-Reply-To: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> References: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> Message-ID: <_PQTBDZTtAOzgIiMGM5AXmZVc8XaAoX7RZFyy7susrE=.730ca502-0931-4318-bf23-5bd4880547da@github.com> On Mon, 5 Jan 2026 12:32:13 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, the masked versions of the following 8 Float16 operations are not supported. > But we return true in `Matcher::match_rule_supported_vector_masked` for these operations > on RISC-V platforms with Zvfh. We need to explicitly disable them on this CPU platform > to make it clear. > > Op_AddVHF: > Op_SubVHF: > Op_MulVHF: > Op_DivVHF: > Op_MaxVHF: > Op_MinVHF: > Op_SqrtVHF: > Op_FmaVHF: > > When the support for Float16 vector classes is added in VectorAPI and the masked > Float16 IR can be generated, these masked operations will be enabled and relevant > backend support added. Nice catch! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/29035#pullrequestreview-3626694940 From fyang at openjdk.org Mon Jan 5 12:52:08 2026 From: fyang at openjdk.org (Fei Yang) Date: Mon, 5 Jan 2026 12:52:08 GMT Subject: RFR: 8374525: RISC-V: Several masked float16 vector operations are not supported In-Reply-To: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> References: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> Message-ID: <85y6cMcHNDFQwLJwbZw3IO-ffovhQkj0kGCbgSUX1i8=.ea2c8254-d413-4cd4-8712-0aa5113d7c21@github.com> On Mon, 5 Jan 2026 12:32:13 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, the masked versions of the following 8 Float16 operations are not supported. > But we return true in `Matcher::match_rule_supported_vector_masked` for these operations > on RISC-V platforms with Zvfh. We need to explicitly disable them on this CPU platform > to make it clear. > > Op_AddVHF: > Op_SubVHF: > Op_MulVHF: > Op_DivVHF: > Op_MaxVHF: > Op_MinVHF: > Op_SqrtVHF: > Op_FmaVHF: > > When the support for Float16 vector classes is added in VectorAPI and the masked > Float16 IR can be generated, these masked operations will be enabled and relevant > backend support added. Looks reasonable. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29035#pullrequestreview-3626711028 From roland at openjdk.org Mon Jan 5 13:35:15 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 13:35:15 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v2] In-Reply-To: <72DTyUKDKlyEKHeDIYF5GUV6u__CGKT5LDAKglL0s6M=.4eaf8397-02d7-48b0-9261-276ffeb236ed@github.com> References: <72DTyUKDKlyEKHeDIYF5GUV6u__CGKT5LDAKglL0s6M=.4eaf8397-02d7-48b0-9261-276ffeb236ed@github.com> Message-ID: On Sat, 20 Dec 2025 01:39:52 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java >> >> Co-authored-by: Christian Hagedorn > > I'll running Oracle testing before approving. @dean-long @chhagedorn I merged with latest. Can one of you approve that PR again? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28842#issuecomment-3710441551 From chagedorn at openjdk.org Mon Jan 5 13:38:17 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 13:38:17 GMT Subject: RFR: 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack Message-ID: The usages of `LinearScanStatistic::Counter::counter_fpu_stack` were removed with [JDK-8351156](https://bugs.openjdk.org/browse/JDK-8351156) when the FPU stack support was removed. But the definition in the `Counter` enum was missed to clean up. When now printing the counters with `-XX:+CountLinearScan -XX:+CITime`, we reach a `ShouldNotReach` in `LinearScanStatistic::counter_name()` for `Counter::counter_fpu_stack` because there is no longer an entry there - it's dead. I added a simple hello world sanity test that runs with `-XX:+CountLinearScan -XX:+CITime`. Thanks, Christian ------------- Commit messages: - 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack Changes: https://git.openjdk.org/jdk/pull/29037/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29037&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374518 Stats: 39 lines in 2 files changed: 37 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29037.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29037/head:pull/29037 PR: https://git.openjdk.org/jdk/pull/29037 From thartmann at openjdk.org Mon Jan 5 13:46:09 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 5 Jan 2026 13:46:09 GMT Subject: RFR: 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 13:28:41 GMT, Christian Hagedorn wrote: > The usages of `LinearScanStatistic::Counter::counter_fpu_stack` were removed with [JDK-8351156](https://bugs.openjdk.org/browse/JDK-8351156) when the FPU stack support was removed. But the definition in the `Counter` enum was missed to clean up. When now printing the counters with `-XX:+CountLinearScan -XX:+CITime`, we reach a `ShouldNotReach` in `LinearScanStatistic::counter_name()` for `Counter::counter_fpu_stack` because there is no longer an entry there - it's dead. > > I added a simple hello world sanity test that runs with `-XX:+CountLinearScan -XX:+CITime`. > > Thanks, > Christian Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29037#pullrequestreview-3626893282 From mdoerr at openjdk.org Mon Jan 5 13:46:10 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 5 Jan 2026 13:46:10 GMT Subject: RFR: 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 13:28:41 GMT, Christian Hagedorn wrote: > The usages of `LinearScanStatistic::Counter::counter_fpu_stack` were removed with [JDK-8351156](https://bugs.openjdk.org/browse/JDK-8351156) when the FPU stack support was removed. But the definition in the `Counter` enum was missed to clean up. When now printing the counters with `-XX:+CountLinearScan -XX:+CITime`, we reach a `ShouldNotReach` in `LinearScanStatistic::counter_name()` for `Counter::counter_fpu_stack` because there is no longer an entry there - it's dead. > > I added a simple hello world sanity test that runs with `-XX:+CountLinearScan -XX:+CITime`. > > Thanks, > Christian LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29037#pullrequestreview-3626894269 From chagedorn at openjdk.org Mon Jan 5 13:50:41 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 13:50:41 GMT Subject: RFR: 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack In-Reply-To: References: Message-ID: <-VWBBSYz-jpRw7qkrR89ff5XdFIq3t74swGb6TXPMrY=.d11a109f-f5a6-419c-81de-174cddfa7f41@github.com> On Mon, 5 Jan 2026 13:42:30 GMT, Tobias Hartmann wrote: >> The usages of `LinearScanStatistic::Counter::counter_fpu_stack` were removed with [JDK-8351156](https://bugs.openjdk.org/browse/JDK-8351156) when the FPU stack support was removed. But the definition in the `Counter` enum was missed to clean up. When now printing the counters with `-XX:+CountLinearScan -XX:+CITime`, we reach a `ShouldNotReach` in `LinearScanStatistic::counter_name()` for `Counter::counter_fpu_stack` because there is no longer an entry there - it's dead. >> >> I added a simple hello world sanity test that runs with `-XX:+CountLinearScan -XX:+CITime`. >> >> Thanks, >> Christian > > Looks good and trivial. Thanks for your reviews @TobiHartmann and @TheRealMDoerr! I will wait until some sanity testing passed before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29037#issuecomment-3710491827 From chagedorn at openjdk.org Mon Jan 5 13:51:45 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 13:51:45 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v3] In-Reply-To: References: Message-ID: <-WXHvbhVQ6IBgkIcfqljfvJLmirqvCJYto3q3FGW87c=.1ec527ec-fc14-4a81-81ef-44c613325d76@github.com> On Mon, 5 Jan 2026 08:45:27 GMT, Roland Westrelin wrote: >> A `CreateEx` gets sunk out of loop by >> `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the >> following logic: >> >> >> return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && >> in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); >> >> >> in `CreateExNode::Identity()` triggers which leads to the crash >> because `call->in(TypeFunc::Parms)` is not even an object in this >> particular case. >> >> It's actually not clear to me what that logic in >> `CreateExNode::Identity()` is expected to do and I wonder if it's >> still needed. >> >> Anyway, the fix I propose is to skip `CreateEx` in >> `PhaseIdealLoop::try_sink_out_of_loop()`. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8373508 > - Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java > > Co-authored-by: Christian Hagedorn > - whitespaces > - tests > - more > - fix Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28842#pullrequestreview-3626912910 From chagedorn at openjdk.org Mon Jan 5 13:54:19 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 13:54:19 GMT Subject: RFR: 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns In-Reply-To: References: Message-ID: <-slQ5lBZXQXTQvWnTZQJGLJer-qfKmygd1eah6aNdeA=.a0ef8145-9167-4eeb-b8b2-991581eecfc1@github.com> On Mon, 5 Jan 2026 11:55:01 GMT, Emanuel Peter wrote: > In [JDK-8370922](https://bugs.openjdk.org/browse/JDK-8370922), I added Float16 support to the Template Library. I missed to tag Float16.float16ToRawShortBits as having non-deterministic result. > > We need to do that, just like for float and double equivalents: > Arithmetic operations (e.g. add, mul, fma, ...) are allowed to pick either input if it gets two NaNs of different bit patterns. > That way, those operations can generate different NaN bit patterns depending on if we use interpreter/C1/C2. > If we now convert to raw bits, we get different bits, and would wrongly conclude that we get a wrong result. > > Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29033#pullrequestreview-3626921086 From chagedorn at openjdk.org Mon Jan 5 13:56:23 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 13:56:23 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v6] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 12:14:15 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. >> >> The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for >> `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially >> introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. >> >> >> >> https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 >> >> In our case, it happens that the `Load` node gets folded to a constant during the initial >> `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being >> returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only >> has one usage, and this triggers the optimization during verification. >> >> >> static int test0() { >> var c = new MyClass(); >> // the conversion ensures that the ConL node only has one use >> // in the end, which triggers the optimization >> return (int) c.l; >> } >> >> >> The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, >> because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in >> `PhaseGVN::transform`. >> >> For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created >> and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with >> `can_reshape` later. >> >> >> This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` >> prevents its from occurring when boxing elimination is enabled. Boxing elimination is >> disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), >> which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear >> that the issue was on mainline. >> >> ### Testing >> - [x] [GitHub Actions](TODO) >> - [x] tier1-3, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright year in graphKit.cpp > - Update copyright year in node.cpp Looks good, thanks for updating the copyright years! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28448#pullrequestreview-3626922994 From roland at openjdk.org Mon Jan 5 14:04:09 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 14:04:09 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v2] In-Reply-To: <72DTyUKDKlyEKHeDIYF5GUV6u__CGKT5LDAKglL0s6M=.4eaf8397-02d7-48b0-9261-276ffeb236ed@github.com> References: <72DTyUKDKlyEKHeDIYF5GUV6u__CGKT5LDAKglL0s6M=.4eaf8397-02d7-48b0-9261-276ffeb236ed@github.com> Message-ID: <0x_C5-vZljPjiExoPBsPcdxT77UVCh6objTPPr1VD1o=.92722c22-a04d-4d89-ab43-b25748a22e5a@github.com> On Sat, 20 Dec 2025 01:39:52 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java >> >> Co-authored-by: Christian Hagedorn > > I'll running Oracle testing before approving. @dean-long @chhagedorn thanks for the reviews and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/28842#issuecomment-3710540340 From roland at openjdk.org Mon Jan 5 14:06:29 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 14:06:29 GMT Subject: Integrated: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 11:04:52 GMT, Roland Westrelin wrote: > A `CreateEx` gets sunk out of loop by > `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the > following logic: > > > return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && > in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); > > > in `CreateExNode::Identity()` triggers which leads to the crash > because `call->in(TypeFunc::Parms)` is not even an object in this > particular case. > > It's actually not clear to me what that logic in > `CreateExNode::Identity()` is expected to do and I wonder if it's > still needed. > > Anyway, the fix I propose is to skip `CreateEx` in > `PhaseIdealLoop::try_sink_out_of_loop()`. This pull request has now been integrated. Changeset: 6ae3e064 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/6ae3e064352a56c5be140fba1ad6d040219432b0 Stats: 159 lines in 3 files changed: 159 ins; 0 del; 0 mod 8373508: C2: sinking CreateEx out of loop breaks the graph Reviewed-by: chagedorn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/28842 From krk at openjdk.org Mon Jan 5 14:32:01 2026 From: krk at openjdk.org (Kerem Kat) Date: Mon, 5 Jan 2026 14:32:01 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v7] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - address comments - fix rename - rename test file - Merge branch 'master' into fix-c2-segfault-unlocknode - fix test spacing - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - ... and 2 more: https://git.openjdk.org/jdk/compare/3f9191b0...64e3dc5d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/b5e878c7..64e3dc5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=05-06 Stats: 36125 lines in 2664 files changed: 21296 ins; 6892 del; 7937 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From bmaillard at openjdk.org Mon Jan 5 14:42:10 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 5 Jan 2026 14:42:10 GMT Subject: Integrated: 8367627: C2: Missed Ideal() optimization opportunity with MemBar In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 09:31:56 GMT, Beno?t Maillard wrote: > This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. > > The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for > `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially > introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. > > > > https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 > > In our case, it happens that the `Load` node gets folded to a constant during the initial > `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being > returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only > has one usage, and this triggers the optimization during verification. > > > static int test0() { > var c = new MyClass(); > // the conversion ensures that the ConL node only has one use > // in the end, which triggers the optimization > return (int) c.l; > } > > > The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, > because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in > `PhaseGVN::transform`. > > For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created > and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with > `can_reshape` later. > > > This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` > prevents its from occurring when boxing elimination is enabled. Boxing elimination is > disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), > which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear > that the issue was on mainline. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! This pull request has now been integrated. Changeset: 4458cab4 Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/4458cab4b0063f39333392321f542d0aa0db490d Stats: 97 lines in 3 files changed: 94 ins; 0 del; 3 mod 8367627: C2: Missed Ideal() optimization opportunity with MemBar Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28448 From roland at openjdk.org Mon Jan 5 16:30:13 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 16:30:13 GMT Subject: [jdk26] RFR: 8373524: C2: no reachable node should have no use Message-ID: Hi all, This pull request contains a backport of commit [e72f205a](https://github.com/openjdk/jdk/commit/e72f205ae312b15ebab0cbeedb73bbf86e485251) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Roland Westrelin on 19 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. Thanks! ------------- Commit messages: - Backport e72f205ae312b15ebab0cbeedb73bbf86e485251 Changes: https://git.openjdk.org/jdk/pull/29042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29042&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373524 Stats: 94 lines in 2 files changed: 91 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29042/head:pull/29042 PR: https://git.openjdk.org/jdk/pull/29042 From roland at openjdk.org Mon Jan 5 16:30:31 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 16:30:31 GMT Subject: [jdk26] RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis Message-ID: Hi all, This pull request contains a backport of commit [2ba423db](https://github.com/openjdk/jdk/commit/2ba423db9925355348106fc9fcf84450123d2605) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Roland Westrelin on 18 Dec 2025 and was reviewed by Roberto Casta?eda Lozano, Daniel Lund?n and Damon Fenacci. Thanks! ------------- Commit messages: - Backport 2ba423db9925355348106fc9fcf84450123d2605 Changes: https://git.openjdk.org/jdk/pull/29041/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29041&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370200 Stats: 195 lines in 6 files changed: 173 ins; 16 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/29041.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29041/head:pull/29041 PR: https://git.openjdk.org/jdk/pull/29041 From kxu at openjdk.org Mon Jan 5 16:34:13 2026 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 5 Jan 2026 16:34:13 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v27] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 49 commits: - Update license header years - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - remove trailing whitespaces - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - additional suggestions from code review - Apply suggestions from code review Co-authored-by: Christian Hagedorn - fix trip counter loop-variant detection - fix bad merge with ctrl_is_member() - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp - Merge branch 'master' into counted-loop-refactor - ... and 39 more: https://git.openjdk.org/jdk/compare/4458cab4...8b5dfad6 ------------- Changes: https://git.openjdk.org/jdk/pull/24458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=26 Stats: 1231 lines in 3 files changed: 626 ins; 295 del; 310 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From shade at openjdk.org Mon Jan 5 16:46:08 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 Jan 2026 16:46:08 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v7] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Enable more testing - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Update src/hotspot/share/compiler/compiler_globals.hpp Co-authored-by: Tobias Hartmann - Revert separate patch - Final - ... and 2 more: https://git.openjdk.org/jdk/compare/040ed4ab...dbd560dc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/2d02b713..dbd560dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=05-06 Stats: 63794 lines in 3039 files changed: 40196 ins; 14158 del; 9440 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From sviswanathan at openjdk.org Mon Jan 5 17:04:47 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 5 Jan 2026 17:04:47 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v3] In-Reply-To: References: Message-ID: On Fri, 2 Jan 2026 05:45:58 GMT, Jatin Bhateja wrote: >> Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. >> >> Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> PS: Validation performed using Intel SDE 9.58. > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8373724 > - 8373724: Assertion failure in TestSignumVector.java with UseAPX @jatin-bhateja Thanks for looking into this. There is a build failure in GHA with the following message: "src/hotspot/cpu/x86/x86.ad:2645:13: error: ?bool is_ndd_demotable(const MachNode*)? defined but not used [-Werror=unused-function]" ------------- PR Comment: https://git.openjdk.org/jdk/pull/28999#issuecomment-3711301714 From jbhateja at openjdk.org Mon Jan 5 17:45:39 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 Jan 2026 17:45:39 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v4] In-Reply-To: References: Message-ID: <0Mz4vIOBTO7xZMs7IJKmHKsV7KWyKipwBeWkpzENCBw=.b033cde2-02ba-4090-85e0-b607bb9bb74c@github.com> > Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. > > Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin > PS: Validation performed using Intel SDE 9.58. Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Use ASSERT pre-processor macro instead of PRODUCT to fix optimized build failure - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8373724 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8373724 - 8373724: Assertion failure in TestSignumVector.java with UseAPX ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28999/files - new: https://git.openjdk.org/jdk/pull/28999/files/2a63c92b..05db5651 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=02-03 Stats: 925 lines in 37 files changed: 628 ins; 238 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/28999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28999/head:pull/28999 PR: https://git.openjdk.org/jdk/pull/28999 From jbhateja at openjdk.org Mon Jan 5 17:50:10 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 Jan 2026 17:50:10 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v5] In-Reply-To: References: Message-ID: > Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. > > Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin > PS: Validation performed using Intel SDE 9.58. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28999/files - new: https://git.openjdk.org/jdk/pull/28999/files/05db5651..29093665 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28999/head:pull/28999 PR: https://git.openjdk.org/jdk/pull/28999 From duke at openjdk.org Mon Jan 5 18:40:51 2026 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 5 Jan 2026 18:40:51 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 21:50:27 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test, add temporary @IR rule for testLongRange and improve comments Oops, sorry for that! I've created #29045 to fix the issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3711646576 From duke at openjdk.org Mon Jan 5 18:45:42 2026 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 5 Jan 2026 18:45:42 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero Message-ID: This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero ------------- Commit messages: - Fix div by zero due to const 2 being zero cauing failing tests Changes: https://git.openjdk.org/jdk/pull/29045/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29045&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374436 Stats: 10 lines in 1 file changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29045.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29045/head:pull/29045 PR: https://git.openjdk.org/jdk/pull/29045 From kvn at openjdk.org Mon Jan 5 18:47:52 2026 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 Jan 2026 18:47:52 GMT Subject: RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> References: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> Message-ID: On Mon, 5 Jan 2026 07:37:34 GMT, Emanuel Peter wrote: > In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. > > Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. > > Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. > Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. > > --------------------------- > > **Details** > > Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. > > image > > `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. Good. I approve this conservative fix. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29028#pullrequestreview-3628029840 From kvn at openjdk.org Mon Jan 5 18:49:29 2026 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 Jan 2026 18:49:29 GMT Subject: RFR: 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 11:55:01 GMT, Emanuel Peter wrote: > In [JDK-8370922](https://bugs.openjdk.org/browse/JDK-8370922), I added Float16 support to the Template Library. I missed to tag Float16.float16ToRawShortBits as having non-deterministic result. > > We need to do that, just like for float and double equivalents: > Arithmetic operations (e.g. add, mul, fma, ...) are allowed to pick either input if it gets two NaNs of different bit patterns. > That way, those operations can generate different NaN bit patterns depending on if we use interpreter/C1/C2. > If we now convert to raw bits, we get different bits, and would wrongly conclude that we get a wrong result. > > Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29033#pullrequestreview-3628033479 From kxu at openjdk.org Mon Jan 5 18:58:49 2026 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 5 Jan 2026 18:58:49 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v27] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 16:34:13 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 49 commits: > > - Update license header years > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - remove trailing whitespaces > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - additional suggestions from code review > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - fix trip counter loop-variant detection > - fix bad merge with ctrl_is_member() > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > > # Conflicts: > # src/hotspot/share/opto/loopnode.cpp > - Merge branch 'master' into counted-loop-refactor > - ... and 39 more: https://git.openjdk.org/jdk/compare/4458cab4...8b5dfad6 Merged in the latest master and updated license headers. [counted-loop-refactor-old-vs-new](https://github.com/tabjy/jdk/commits/counted-loop-refactor-old-vs-new/) branch is also updated. Please note GHA job `linux-x64 / build (debug)` is currently failing across the jdk repo due to insufficient disk space. I'll try trigger it again tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3711694813 From sviswanathan at openjdk.org Mon Jan 5 21:43:30 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 5 Jan 2026 21:43:30 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v5] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 17:50:10 GMT, Jatin Bhateja wrote: >> Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. >> >> Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> PS: Validation performed using Intel SDE 9.58. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright src/hotspot/cpu/x86/x86.ad line 9947: > 9945: match(Set dst (AddI src1 (LoadI src2))); > 9946: effect(KILL cr); > 9947: flag(PD::Flag_sets_overflow_flag, PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_carry_flag, PD::Flag_sets_parity_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 10237: > 10235: match(Set dst (AddL src1 (LoadL src2))); > 10236: effect(KILL cr); > 10237: flag(PD::Flag_sets_overflow_flag, PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_carry_flag, PD::Flag_sets_parity_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 11585: > 11583: match(Set dst (MulL src1 (LoadL src2))); > 11584: effect(KILL cr); > 11585: flag(PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 13038: > 13036: match(Set dst (AndI src1 (LoadI src2))); > 13037: effect(KILL cr); > 13038: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 13683: > 13681: match(Set dst (AndL src1 (LoadL src2))); > 13682: effect(KILL cr); > 13683: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 14000: > 13998: match(Set dst (OrL src1 (LoadL src2))); > 13999: effect(KILL cr); > 14000: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. src/hotspot/cpu/x86/x86.ad line 14182: > 14180: match(Set dst (XorL src1 (LoadL src2))); > 14181: effect(KILL cr); > 14182: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); Remove PD::Flag_ndd_demotable_opr2 here as the second operand is a memory operand. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662838200 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662836624 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662831056 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662827932 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662821662 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662367856 PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2662358281 From xgong at openjdk.org Tue Jan 6 02:47:14 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 6 Jan 2026 02:47:14 GMT Subject: [jdk26] RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently Message-ID: Hi all, This pull request contains a backport of commit [6eaabed5](https://github.com/openjdk/jdk/commit/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Xiaohong Gong on 5 Jan 2026 and was reviewed by Jie Fu, Jatin Bhateja, Eric Fang and Quan Anh Mai. Thanks! ------------- Commit messages: - Backport 6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca Changes: https://git.openjdk.org/jdk/pull/29053/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29053&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373722 Stats: 43 lines in 1 file changed: 1 ins; 32 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/29053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29053/head:pull/29053 PR: https://git.openjdk.org/jdk/pull/29053 From jbhateja at openjdk.org Tue Jan 6 04:20:14 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 Jan 2026 04:20:14 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v5] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:02:50 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright > > src/hotspot/cpu/x86/x86.ad line 14000: > >> 13998: match(Set dst (OrL src1 (LoadL src2))); >> 13999: effect(KILL cr); >> 14000: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); > > Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. We already have a check for [memory operands (mapping to multiple input edges](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L2664)) in place, ADLC generates DFA for both direct and flipped versions of the memory patterns. So Flag_ndd_demotable_opr2 will cover the flipped operand case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2663532523 From wenanjian at openjdk.org Tue Jan 6 06:22:34 2026 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 6 Jan 2026 06:22:34 GMT Subject: RFR: 8374184: RISC-V: implement GCM intrinsic with Zvkg and Zvkned extension Message-ID: This patch implement GCM intrinsic with Zvkg and Zvkned extension in RISCV. According to java api of `implGCMCrypt0` in `src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java`, we only deal with the data multiples of PARALLEL_LEN(512). Passed related jtreg in test/hotspot/jtreg/compiler/codegen/aes/ test/jdk/com/sun/crypto/ ------------- Commit messages: - modify x10 register use - modify register use - make some clean up - optimize tmp register use - change andr to andi - modify the input according to api and some name - RISC-V: implement GCM intrinsic with Zvkg and Zvkned extension Changes: https://git.openjdk.org/jdk/pull/28894/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28894&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374184 Stats: 128 lines in 1 file changed: 128 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28894.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28894/head:pull/28894 PR: https://git.openjdk.org/jdk/pull/28894 From thartmann at openjdk.org Tue Jan 6 07:25:00 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 Jan 2026 07:25:00 GMT Subject: [jdk26] RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 02:39:58 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [6eaabed5](https://github.com/openjdk/jdk/commit/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 5 Jan 2026 and was reviewed by Jie Fu, Jatin Bhateja, Eric Fang and Quan Anh Mai. > > Thanks! Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29053#pullrequestreview-3629740362 From xgong at openjdk.org Tue Jan 6 07:40:58 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 6 Jan 2026 07:40:58 GMT Subject: RFR: 8373344: Add support for min/max reduction operations for Float16 [v2] In-Reply-To: <72RiEFo5wji1vOtIRzFwO03_0OsaCe2zzHjp9YPD8-k=.6a88f4eb-3fff-4fe5-bb43-260d81b7954a@github.com> References: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> <72RiEFo5wji1vOtIRzFwO03_0OsaCe2zzHjp9YPD8-k=.6a88f4eb-3fff-4fe5-bb43-260d81b7954a@github.com> Message-ID: On Mon, 5 Jan 2026 11:31:26 GMT, Yi Wu wrote: >> This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations. >> Both floating point min/max reductions don?t require strict order, because they are associative. >> >> It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions. >> The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv. >> >> Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline. >> >> Neoverse N1 (UseSVE = 0, max vector length = 16B): >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionMaxFP16 256 thrpt 9 3.69 6.44 >> ReductionMaxFP16 512 thrpt 9 3.71 7.62 >> ReductionMaxFP16 1024 thrpt 9 4.16 8.64 >> ReductionMaxFP16 2048 thrpt 9 4.44 9.12 >> ReductionMinFP16 256 thrpt 9 3.69 6.43 >> ReductionMinFP16 512 thrpt 9 3.70 7.62 >> ReductionMinFP16 1024 thrpt 9 4.16 8.64 >> ReductionMinFP16 2048 thrpt 9 4.44 9.10 >> >> >> Neoverse V1 (UseSVE = 1, max vector length = 32B): >> >> Benchmark vectorDim Mode Cnt 8B 16B 32B >> ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 >> ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 >> ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 >> ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 >> ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 >> ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 >> ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 >> ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 >> >> >> Neoverse V2 (UseSVE = 2, max vector length = 16B): >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionMaxFP16 256 thrpt 9 4.78 10.00 >> ReductionMaxFP16 512 thrpt 9 3.74 11.33 >> ReductionMaxFP16 1024 thrpt 9 3.86 9.59 >> ReductionMaxFP16 2048 thrpt 9 3.94 8.71 >> ReductionMinFP16 256 thrpt 9 4.78 10.00 >> ReductionMinFP16 512 thrpt 9 3.74 11.29 >> ReductionMinFP16 1024 thrpt 9 3.86 9.58 >> ReductionMinFP16 2048 thrpt 9 3.94 8.71 >> >> >> Testing: >> hotspot_all, jdk (tier1-3) and langtools (tier1) all pass ... > > Yi Wu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Replace assert with verify > - Add IRNode constant and code refactor > - Merge remote-tracking branch 'origin/master' into yiwu-8373344 > - 8373344: Add support for FP16 min/max reduction operations > > This patch adds mid-end support for vectorized min/max reduction > operations for half floats. It also includes backend AArch64 support > for these operations. > Both floating point min/max reductions don?t require strict order, > because they are associative. > > It will generate NEON fminv/fmaxv reduction instructions when > max vector length is 8B or 16B. On SVE supporting machines > with vector lengths > 16B, it will generate the SVE fminv/fmaxv > instructions. > The patch also adds support for partial min/max reductions on > SVE machines using fminv/fmaxv. > > Ratio of throughput(ops/ms) > 1 indicates the performance with > this patch is better than the mainline. > > Neoverse N1 (UseSVE = 0, max vector length = 16B): > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 3.69 6.44 > ReductionMaxFP16 512 thrpt 9 3.71 7.62 > ReductionMaxFP16 1024 thrpt 9 4.16 8.64 > ReductionMaxFP16 2048 thrpt 9 4.44 9.12 > ReductionMinFP16 256 thrpt 9 3.69 6.43 > ReductionMinFP16 512 thrpt 9 3.70 7.62 > ReductionMinFP16 1024 thrpt 9 4.16 8.64 > ReductionMinFP16 2048 thrpt 9 4.44 9.10 > > Neoverse V1 (UseSVE = 1, max vector length = 32B): > Benchmark vectorDim Mode Cnt 8B 16B 32B > ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 > ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 > ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 > ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 > ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 > ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 > ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 > ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 > > Neoverse V2 (UseSVE = 2, max vector length = 16B): > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 t... src/hotspot/cpu/aarch64/aarch64_vector.ad line 381: > 379: case Op_XorReductionV: > 380: case Op_MinReductionVHF: > 381: case Op_MaxReductionVHF: We can use the NEON instructions if the vector size <= 16B as well for partial cases. Did you test the performance with NEON instead of using predicated SVE instructions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2663933727 From chagedorn at openjdk.org Tue Jan 6 07:53:31 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 07:53:31 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero That looks good, thanks for fixing this! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29045#pullrequestreview-3629813472 From hgreule at openjdk.org Tue Jan 6 08:04:31 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 6 Jan 2026 08:04:31 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 07:50:31 GMT, Christian Hagedorn wrote: >> This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 >> I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. >> The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. >> I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero > > That looks good, thanks for fixing this! @chhagedorn does this need a copyright update? Otherwise I updated it in #27886 already as I adjusted the test there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29045#issuecomment-3713582662 From chagedorn at openjdk.org Tue Jan 6 08:24:13 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 08:24:13 GMT Subject: [jdk26] RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 16:20:29 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [2ba423db](https://github.com/openjdk/jdk/commit/2ba423db9925355348106fc9fcf84450123d2605) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 18 Dec 2025 and was reviewed by Roberto Casta?eda Lozano, Daniel Lund?n and Damon Fenacci. > > Thanks! Looks good! I will submit some testing for it together with https://github.com/openjdk/jdk/pull/29042. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29041#pullrequestreview-3629909672 From chagedorn at openjdk.org Tue Jan 6 08:24:14 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 08:24:14 GMT Subject: [jdk26] RFR: 8373524: C2: no reachable node should have no use In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 16:20:56 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [e72f205a](https://github.com/openjdk/jdk/commit/e72f205ae312b15ebab0cbeedb73bbf86e485251) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 19 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. > > Thanks! Looks good! I will submit some testing for it together with https://github.com/openjdk/jdk/pull/29041. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29042#pullrequestreview-3629909720 From chagedorn at openjdk.org Tue Jan 6 08:33:29 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 08:33:29 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero Good catch! Yes, we should update it. I guess it does not hurt if we wait until you update it with your PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29045#issuecomment-3713674804 From epeter at openjdk.org Tue Jan 6 08:54:55 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 Jan 2026 08:54:55 GMT Subject: RFR: 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns In-Reply-To: References: Message-ID: <_PLUeKZwBJ17zJFW2nzUASQpFvsx592Kf4ZawxXn1jc=.3afa6dc5-72b1-42ea-b565-7f2753da1c8f@github.com> On Mon, 5 Jan 2026 18:46:10 GMT, Vladimir Kozlov wrote: >> In [JDK-8370922](https://bugs.openjdk.org/browse/JDK-8370922), I added Float16 support to the Template Library. I missed to tag Float16.float16ToRawShortBits as having non-deterministic result. >> >> We need to do that, just like for float and double equivalents: >> Arithmetic operations (e.g. add, mul, fma, ...) are allowed to pick either input if it gets two NaNs of different bit patterns. >> That way, those operations can generate different NaN bit patterns depending on if we use interpreter/C1/C2. >> If we now convert to raw bits, we get different bits, and would wrongly conclude that we get a wrong result. >> >> Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. > > Good. @vnkozlov @chhagedorn Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29033#issuecomment-3713746281 From epeter at openjdk.org Tue Jan 6 08:54:57 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 Jan 2026 08:54:57 GMT Subject: Integrated: 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 11:55:01 GMT, Emanuel Peter wrote: > In [JDK-8370922](https://bugs.openjdk.org/browse/JDK-8370922), I added Float16 support to the Template Library. I missed to tag Float16.float16ToRawShortBits as having non-deterministic result. > > We need to do that, just like for float and double equivalents: > Arithmetic operations (e.g. add, mul, fma, ...) are allowed to pick either input if it gets two NaNs of different bit patterns. > That way, those operations can generate different NaN bit patterns depending on if we use interpreter/C1/C2. > If we now convert to raw bits, we get different bits, and would wrongly conclude that we get a wrong result. > > Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. This pull request has now been integrated. Changeset: 2cb228e1 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/2cb228e142369ec73d768d8a69653a984b1c5908 Stats: 5 lines in 1 file changed: 2 ins; 1 del; 2 mod 8374489: Template Library: need to tag Float16.float16ToRawShortBits as having non-deterministic result because of multiple NaN bit patterns Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/29033 From xgong at openjdk.org Tue Jan 6 09:26:23 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 6 Jan 2026 09:26:23 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector Message-ID: ### Problem: Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: // A fatal error has been detected by the Java Runtime Environment: // // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector // ... The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 ### Root Cause: The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. Here is the simplified ideal graph showing the crash scenario: Con #top | ConI \ / \ / VectorStoreMask | VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong ### Detailed Scenario: Following is the method in the test case that hits the assertion: https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. When compiling a specific test case such as: https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() / \ AddP \ | \ LoadNClass \ ConP #IntMaxMask | | \ | | \ DecodeNClass | \ / | \ / | CmpP | | | Bool #ne | | / If / | / IfFalse / | / | / CheckCastPP # IntMaxMask | VectorUnbox # Start of inlining IntMaxMask::toLong() | \ ConI \ / VectorStoreMask | VectorMaskToLong The generated mask (`VectorBox`) is a `DoubleMaxMask`, but the code path expects an `IntMaxMask` for `IntMaxMask::toLong()`. Since this is an unreachable branch, the control input of `CheckCastPP` becomes `TOP` during IGVN, propagating the `TOP` type to subsequent data nodes until reaching `VectorStoreMask`. `VectorStoreMask` has another non-TOP input (`ConI`), which stops further `TOP` propagation. With stress VM options, the IGVN worklist order is shuffled, causing `VectorMaskToLongNode::Ideal()` to be invoked before dead path cleanup completes, which triggers the assertion failure. ### Solution: Replace `is_vect()` with the safer `isa_vect()`, which checks whether the type is a vector type before casting and returns `nullptr` if it is not. Additionally, check for `nullptr` and skip the optimization if the type check fails. An alternative solution would be to detect `top` inputs during IGVN for the relevant vector nodes and skip certain optimizations when such inputs are encountered. That is probably the right long-term direction. However, because this handling is currently missing for all vector nodes, I'd like to leave it as a separate follow-up topic for discussion. ### Testing: Ran the test 800 times on SVE/NEON/AVX2 systems with no failures observed. Note that no new test case was added because it is so challenging to me to reproduce this issue reliably. The issue depends on a specific IGVN optimization sequence that occurs non-deterministically due to the worklist shuffling behavior under stress VM options. [1] https://bugs.openjdk.org/browse/JDK-8367292 ------------- Commit messages: - 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector Changes: https://git.openjdk.org/jdk/pull/29057/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29057&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374043 Stats: 12 lines in 2 files changed: 7 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/29057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29057/head:pull/29057 PR: https://git.openjdk.org/jdk/pull/29057 From chagedorn at openjdk.org Tue Jan 6 10:27:30 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 10:27:30 GMT Subject: Integrated: 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 13:28:41 GMT, Christian Hagedorn wrote: > The usages of `LinearScanStatistic::Counter::counter_fpu_stack` were removed with [JDK-8351156](https://bugs.openjdk.org/browse/JDK-8351156) when the FPU stack support was removed. But the definition in the `Counter` enum was missed to clean up. When now printing the counters with `-XX:+CountLinearScan -XX:+CITime`, we reach a `ShouldNotReach` in `LinearScanStatistic::counter_name()` for `Counter::counter_fpu_stack` because there is no longer an entry there - it's dead. > > I added a simple hello world sanity test that runs with `-XX:+CountLinearScan -XX:+CITime`. > > Thanks, > Christian This pull request has now been integrated. Changeset: 938bbd5b Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/938bbd5b604e990514b64a0451ed1bceb07eb23b Stats: 39 lines in 2 files changed: 37 ins; 1 del; 1 mod 8374518: C1: Remove dead LinearScanStatistic::Counter::counter_fpu_stack Reviewed-by: thartmann, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/29037 From epeter at openjdk.org Tue Jan 6 10:30:27 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 Jan 2026 10:30:27 GMT Subject: RFR: 8374528: C2 SuperWord: TestAliasingFuzzer.java strengthen no-multiversioning IR rule Message-ID: Now that some of the related issues are fixed, we should strengthen the IR rule for multiversioning. I ran the fuzzer test many times on many platforms. That is not proof that we won't find a failing case in the CI within a week. But it also is not helpful to run more tests before integration. If there are failures, we can just backout this change and address the broken example at that point. ------------- Commit messages: - JDK-8374528 Changes: https://git.openjdk.org/jdk/pull/29036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374528 Stats: 30 lines in 1 file changed: 0 ins; 15 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/29036.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29036/head:pull/29036 PR: https://git.openjdk.org/jdk/pull/29036 From chagedorn at openjdk.org Tue Jan 6 10:39:45 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jan 2026 10:39:45 GMT Subject: RFR: 8374528: C2 SuperWord: TestAliasingFuzzer.java strengthen no-multiversioning IR rule In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 13:09:36 GMT, Emanuel Peter wrote: > Now that some of the related issues are fixed, we should strengthen the IR rule for multiversioning. > > I ran the fuzzer test many times on many platforms. That is not proof that we won't find a failing case in the CI within a week. But it also is not helpful to run more tests before integration. If there are failures, we can just backout this change and address the broken example at that point. Looks good! > But it also is not helpful to run more tests before integration. If there are failures, we can just backout this change and address the broken example at that point. Absolutely, I agree with that. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29036#pullrequestreview-3630390469 From krk at openjdk.org Tue Jan 6 11:42:15 2026 From: krk at openjdk.org (Kerem Kat) Date: Tue, 6 Jan 2026 11:42:15 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v7] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 14:32:01 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - ... and 2 more: https://git.openjdk.org/jdk/compare/a429b9dc...64e3dc5d Merged from master to take https://bugs.openjdk.org/browse/JDK-8374507 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3714385051 From krk at openjdk.org Tue Jan 6 11:42:10 2026 From: krk at openjdk.org (Kerem Kat) Date: Tue, 6 Jan 2026 11:42:10 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v8] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - address comments - fix rename - rename test file - Merge branch 'master' into fix-c2-segfault-unlocknode - fix test spacing - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - ... and 3 more: https://git.openjdk.org/jdk/compare/3586f365...8713f16d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/64e3dc5d..8713f16d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=06-07 Stats: 1445 lines in 220 files changed: 392 ins; 709 del; 344 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From dzhang at openjdk.org Tue Jan 6 12:52:24 2026 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 6 Jan 2026 12:52:24 GMT Subject: Integrated: 8374525: RISC-V: Several masked float16 vector operations are not supported In-Reply-To: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> References: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> Message-ID: On Mon, 5 Jan 2026 12:32:13 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, the masked versions of the following 8 Float16 operations are not supported. > But we return true in `Matcher::match_rule_supported_vector_masked` for these operations > on RISC-V platforms with Zvfh. We need to explicitly disable them on this CPU platform > to make it clear. > > Op_AddVHF: > Op_SubVHF: > Op_MulVHF: > Op_DivVHF: > Op_MaxVHF: > Op_MinVHF: > Op_SqrtVHF: > Op_FmaVHF: > > When the support for Float16 vector classes is added in VectorAPI and the masked > Float16 IR can be generated, these masked operations will be enabled and relevant > backend support added. This pull request has now been integrated. Changeset: df5b49e6 Author: Dingli Zhang URL: https://git.openjdk.org/jdk/commit/df5b49e604d3204c6383484ba3807d39abd0b0f1 Stats: 19 lines in 1 file changed: 17 ins; 2 del; 0 mod 8374525: RISC-V: Several masked float16 vector operations are not supported Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/29035 From dzhang at openjdk.org Tue Jan 6 12:52:24 2026 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 6 Jan 2026 12:52:24 GMT Subject: RFR: 8374525: RISC-V: Several masked float16 vector operations are not supported In-Reply-To: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> References: <9nCd1-IrGcIIyH0yZSiJ_JOjoESPzG_QaKnJ6VKE53Q=.0c473fa1-b18d-4fd7-a7de-39c2496df156@github.com> Message-ID: On Mon, 5 Jan 2026 12:32:13 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, the masked versions of the following 8 Float16 operations are not supported. > But we return true in `Matcher::match_rule_supported_vector_masked` for these operations > on RISC-V platforms with Zvfh. We need to explicitly disable them on this CPU platform > to make it clear. > > Op_AddVHF: > Op_SubVHF: > Op_MulVHF: > Op_DivVHF: > Op_MaxVHF: > Op_MinVHF: > Op_SqrtVHF: > Op_FmaVHF: > > When the support for Float16 vector classes is added in VectorAPI and the masked > Float16 IR can be generated, these masked operations will be enabled and relevant > backend support added. Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29035#issuecomment-3714580978 From thartmann at openjdk.org Tue Jan 6 14:00:37 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 Jan 2026 14:00:37 GMT Subject: RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> References: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> Message-ID: <4IzjJa5BUpNHSmMmUoafC3uyv0COPwPKIfFEYO_NnOE=.4a286d36-53db-4a9f-bec0-6b1fb1ef8503@github.com> On Mon, 5 Jan 2026 07:37:34 GMT, Emanuel Peter wrote: > In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. > > Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. > > Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. > Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. > > --------------------------- > > **Details** > > Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. > > image > > `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. Looks good to me too. Great that we have a regression test for this rare case now. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29028#pullrequestreview-3630997120 From thartmann at openjdk.org Tue Jan 6 14:20:09 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 Jan 2026 14:20:09 GMT Subject: RFR: 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap In-Reply-To: References: Message-ID: <0ZP63PHTbXgTcztA8wpWQd3Zj7YzLkOW9udimgYmSTs=.94d58938-cd00-4e85-80c1-2ca8b610afac@github.com> On Tue, 23 Dec 2025 18:16:33 GMT, Boris Ulasevich wrote: > We observed a deoptimization storm caused by GraphKit::uncommon_trap generator logic. GraphKit::uncommon_trap considers the too_many_recompiles metric. If the threshold is overflowed, it replaces Deoptimization::Action_reinterpret with Deoptimization::Action_none (see code snippet below). > > This replacement changes the uncommon_trap logic: once execution hits a trap, the VM performs deoptimization but does not recompile the method anymore. In an "unlucky" case, when the code part calling this uncommon_trap becomes frequent, a deoptimization storm occurs (thousands of deoptimizations per second) causing a significant performance drop. > > The original problematic method, which triggered repeated recompilations, is a high-performance compressed binary serialization algorithm with heavy use of conditional branches driven by bitmasks. See a standalone [synthetic benchmark](https://bugs.openjdk.org/secure/attachment/118045/UnstableIf.java) to reproduce the issue. > > The issue arises when the method overcomes a global recompilation threshold before stabilizing specific trap counters. > > Current thresholds: > - Recompilation Limit (too_many_recompiles): > Condition: decompile_count() >= (PerMethodRecompilationCutoff / 2) + 1 > Default: 201 (derived from default PerMethodRecompilationCutoff = 400). > - Specific Trap Limits (too_many_traps): > Checks if the trap count for a specific reason exceeds: > PerMethodTrapLimit (Default: 100) - for Reason_unstable_if, Reason_unstable_fused_if, etc. > PerMethodSpecTrapLimit (Default: 5000) - for Reason_speculate_class_check, Reason_speculate_null_check, etc. > > With the gived defaults, if the only reason for the method recompilation is unstable_if, the system stabilizes after 100 traps (PerMethodTrapLimit). However, if the method experiences traps and recompilations for different reasons, the total number of recompilations can exceed 200 before hitting the limit for unstable_if traps. This triggers Action_none and causes the deopt storm. > > The proposal is a minimal change in GraphKit::uncommon_trap: apply the same `too_many_recompiles` threshold inside `Parse::path_is_suitable_for_uncommon_trap` - this ensures that on the final recompilation C2 gets a hint not to speculate on untaken branches anymore. > > As an alternative solution, we can revisit GraphKit::uncommon_trap. This "Temporary fix" has persisted in the codebase for 17 years, so it is probably time to change it as well. Any comments are welcome > > case Deoptimization::Action_reinter... Is this related to [JDK-8243615](https://bugs.openjdk.org/browse/JDK-8243615)? Could you convert your `UnstableIf.java` test to a jtreg test? Maybe by running in a different process and counting the number of deoptimization events? [JDK-8243615](https://bugs.openjdk.org/browse/JDK-8243615) also has a test attached. ------------- PR Review: https://git.openjdk.org/jdk/pull/28966#pullrequestreview-3631065370 From thartmann at openjdk.org Tue Jan 6 14:45:39 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 Jan 2026 14:45:39 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 21:50:27 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test, add temporary @IR rule for testLongRange and improve comments Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3714967557 From thartmann at openjdk.org Tue Jan 6 14:45:44 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 Jan 2026 14:45:44 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero Looks good to me too. Thanks for quickly fixing this. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29045#pullrequestreview-3631165318 From epeter at openjdk.org Tue Jan 6 15:37:20 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 Jan 2026 15:37:20 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 207: > 205: @Run(test = {"testIntConstantFolding", "testIntConstantFoldingSpecialCase"}) > 206: public void checkIntConstants(RunInfo info) { > 207: if (INT_CONST_2 == 0) { Since you are working on this: Could `testIntRandomLimits` not also have a division by zero exception? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29045#discussion_r2665332193 From dfenacci at openjdk.org Tue Jan 6 16:21:14 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 6 Jan 2026 16:21:14 GMT Subject: RFR: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check Message-ID: # Issue The assertion https://github.com/openjdk/jdk/blob/1c16d8a900928ebfa5e2343cf33312539509e815/src/hotspot/share/opto/callGenerator.cpp#L421 in `LateInlineMHCallGenerator::do_late_inline_check` fails while running a 24h-long Renaissance benchmark test. # Cause The assert failure is due to both `!cg->is_late_inline()` and `cg->is_mh_late_inline()` being false because the CallGenerator `cg` is of type `LateInlineVirtualCallGenerator`. `cg` is created just above by calling `for_method_handle_inline`. The only way for `cg` to be of type `LateInlineVirtualCallGenerator` is that `for_method_handle_inline` (which finds out that the intrinsic id is `vmIntrinsics::_linkToInterface`) calls `optimize_virtual_call`, which apparently cannot devirtualize the call, and then calls `call_generator` and this creates and returns a `LateInlineVirtualCallGenerator` (at the end of the method). # Fix There seem to be no apparent reason why the CallGenerator returned by `for_method_handle_inline` couldn't be of type `LateInlineVirtualCallGenerator`. So the sensible fix is to relax the assert to accommodate this type of call generator. # Testing Tier 1-3+ Failing test before and after. Unfortunately it has proven impractical to create a specific test that consistently (or even only intermittently) reproduces the issue. ------------- Commit messages: - JDK-8342772: update copyright year - Merge branch 'master' into JDK-8342772 - JDK-8342772: new line - JDK-8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check Changes: https://git.openjdk.org/jdk/pull/28793/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28793&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342772 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28793.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28793/head:pull/28793 PR: https://git.openjdk.org/jdk/pull/28793 From sviswanathan at openjdk.org Tue Jan 6 16:54:50 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 6 Jan 2026 16:54:50 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v5] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 04:16:18 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 14000: >> >>> 13998: match(Set dst (OrL src1 (LoadL src2))); >>> 13999: effect(KILL cr); >>> 14000: flag(PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_parity_flag, PD::Flag_clears_overflow_flag, PD::Flag_clears_carry_flag, PD::Flag_ndd_demotable_opr1, PD::Flag_ndd_demotable_opr2); >> >> Remove PD::Flag_ndd_demotable_opr2 as the second operand is a memory operand. > > We already have a check for [memory operands (mapping to multiple input edges](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L2664)) in place, ADLC generates DFA for both direct and flipped versions of the memory patterns. So Flag_ndd_demotable_opr2 will cover the flipped operand case. Thanks for the clarification. May be we should then add PD::Flag_ndd_demotable_opr2 to the following as well to be consistent: xorI_rReg_rReg_mem_ndd orI_rReg_rReg_mem_ndd mulI_rReg_rReg_mem_ndd ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2665582938 From qamai at openjdk.org Tue Jan 6 16:57:18 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 6 Jan 2026 16:57:18 GMT Subject: RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> References: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> Message-ID: On Mon, 5 Jan 2026 07:37:34 GMT, Emanuel Peter wrote: > In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. > > Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. > > Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. > Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. > > --------------------------- > > **Details** > > Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. > > image > > `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29028#pullrequestreview-3631677709 From duke at openjdk.org Tue Jan 6 18:30:06 2026 From: duke at openjdk.org (Tobias Hotz) Date: Tue, 6 Jan 2026 18:30:06 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 15:34:06 GMT, Emanuel Peter wrote: >> This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 >> I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. >> The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. >> I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero > > test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java line 207: > >> 205: @Run(test = {"testIntConstantFolding", "testIntConstantFoldingSpecialCase"}) >> 206: public void checkIntConstants(RunInfo info) { >> 207: if (INT_CONST_2 == 0) { > > Since you are working on this: Could `testIntRandomLimits` not also have a division by zero exception? Yes, it could! But this case is already covered in https://github.com/openjdk/jdk/pull/29045/files#diff-6f6b705b394c4ecdf97f05cfa5b4bd12cbac18a60a95a1ec78c943d5055a0f80R501 (the code is a bit more complex since because of the clamping we can't just check a single value) I just forgot this case since the initial version of this test did not have random constants ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29045#discussion_r2665875848 From vlivanov at openjdk.org Tue Jan 6 22:18:29 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 Jan 2026 22:18:29 GMT Subject: RFR: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:39:00 GMT, Damon Fenacci wrote: > # Issue > The assertion > https://github.com/openjdk/jdk/blob/1c16d8a900928ebfa5e2343cf33312539509e815/src/hotspot/share/opto/callGenerator.cpp#L421 > in `LateInlineMHCallGenerator::do_late_inline_check` fails while running a 24h-long Renaissance benchmark test. > > # Cause > The assert failure is due to both `!cg->is_late_inline()` and `cg->is_mh_late_inline()` being false because the CallGenerator `cg` is of type `LateInlineVirtualCallGenerator`. `cg` is created just above by calling `for_method_handle_inline`. The only way for `cg` to be of type `LateInlineVirtualCallGenerator` is that `for_method_handle_inline` (which finds out that the intrinsic id is `vmIntrinsics::_linkToInterface`) calls `optimize_virtual_call`, which apparently cannot devirtualize the call, and then calls `call_generator` and this creates and returns a `LateInlineVirtualCallGenerator` (at the end of the method). > > # Fix > There seem to be no apparent reason why the CallGenerator returned by `for_method_handle_inline` couldn't be of type `LateInlineVirtualCallGenerator`. So the sensible fix is to relax the assert to accommodate this type of call generator. > > # Testing > Tier 1-3+ > Failing test before and after. > > Unfortunately it has proven impractical to create a specific test that consistently (or even only intermittently) reproduces the issue. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28793#pullrequestreview-3632709444 From jkarthikeyan at openjdk.org Tue Jan 6 23:48:09 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 6 Jan 2026 23:48:09 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v4] In-Reply-To: References: Message-ID: > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Update copyright year - Merge branch 'master' into jdk-8365570 - Remove CompLevel.C2 from test - Merge branch 'master' into jdk-8365570 - Update comment for constraint casts - Fix truncation assert for constraint casts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26827/files - new: https://git.openjdk.org/jdk/pull/26827/files/f433930e..ebe5a1d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=02-03 Stats: 75767 lines in 3383 files changed: 40560 ins; 15067 del; 20140 mod Patch: https://git.openjdk.org/jdk/pull/26827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26827/head:pull/26827 PR: https://git.openjdk.org/jdk/pull/26827 From jkarthikeyan at openjdk.org Tue Jan 6 23:53:18 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 6 Jan 2026 23:53:18 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v5] In-Reply-To: References: Message-ID: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Use Xcomp test run instead of Warmup(0) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26827/files - new: https://git.openjdk.org/jdk/pull/26827/files/ebe5a1d1..50bc1326 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26827/head:pull/26827 PR: https://git.openjdk.org/jdk/pull/26827 From jkarthikeyan at openjdk.org Tue Jan 6 23:53:20 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 6 Jan 2026 23:53:20 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 08:46:37 GMT, Tobias Hartmann wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comment for constraint casts > > Sounds good, thanks for the update! @TobiHartmann @chhagedorn May I have some reviews on the updated patch? Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3716759982 From jkarthikeyan at openjdk.org Tue Jan 6 23:53:21 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 6 Jan 2026 23:53:21 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> <8jnY6pqofieRIfV5fCqFxvHZMF3nAZbh7yAD7C_G5FU=.a12c98f6-c715-43f5-9528-62fcfdfc6e59@github.com> Message-ID: On Sat, 6 Dec 2025 20:24:28 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Thanks for looking into it! >> >> I would still add the fix, just in case. And I think the test as well, even if it does not reproduce any more. >> >> I was wondering: before the merge, when the test still reproduced: >> If you removed the `@Warmup(0)` and `CompLevel.C2`, and instead just do `framework.addFlags` with `-Xcomp`, would that reproduce too? If so, you could have a framework run with and one without Xcomp, the one with Xcomp also should have a compileonly. What do you think? >> >> Or we just push the patch as is, to be sure this is done and integrated. What do you think @chhagedorn ? > > Yep, I can replicate the crash on the old commit with `TestFramework.runWithFlags("-Xcomp", "-XX:CompileCommand=compileonly,*TestSubwordTruncation::*");` instead of `@Warmup(0)`. I think this would also be a good option, as it would let you get coverage with Xcomp on the other tests as well. I've pushed a commit that changes the Warmup(0) to the second test run. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2666621951 From vpaprotski at openjdk.org Wed Jan 7 00:22:35 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 7 Jan 2026 00:22:35 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: Message-ID: On Sat, 3 Jan 2026 00:23:13 GMT, Shawn M Emery wrote: >> This change allows use of the AVX512_VBMI/VMBI2 instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.2 to 0.5%, encapsulation is 0.3 to 1.5%, and decapsulation is 0 to 0.9%. >> >> Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright year "Insert 0b0000 nibble after every third nibble". I only have two questions, looks good otherwise. PS: things I've considered: - Loop controls? - ML_KEM.java guarantees (per callee comment and assert) lengths are multiple of 64 - also same as original code - Why not simply a vpermb? Have zeroes already from the masked load with k1.. - shuffle granularity is actually 4-bits, not 8-bits - logical shift already zeroes top bits, so `vpand` not required? - odd columns not shifted, so still have extra bits that need clearing - Why VBMI? - needed for `evpermb` src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 862: > 860: __ addptr(condensed, condensedOffs); > 861: > 862: if (VM_Version::supports_avx512_vbmi2()) { Which instruction needs vbmi2? All I could spot was that `evpermb` that needs vbmi. Relax the restriction slightly? src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 906: > 904: __ addptr(condensed, 192); > 905: __ addptr(parsed, 256); > 906: __ subl(parsedLength, 128); (128 instead of 256 here because `parsedLength` is an index to an `short` array..) I am confused by the stride. The `twelve2Sixteen()` seems to (almost) guarantee that the parsed length is a multiple of 64 (last block can be 48 bytes). This would imply a stride of 128 bytes for `parsed`. And 96 for `condensed`. This is exactly how the existing code already behaves so I am less concerned, but I would like an explanation why it works? ------------- PR Review: https://git.openjdk.org/jdk/pull/28815#pullrequestreview-3632845110 PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2666594767 PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2666663039 From jbhateja at openjdk.org Wed Jan 7 02:15:22 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 Jan 2026 02:15:22 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v6] In-Reply-To: References: Message-ID: > Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. > > Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin > PS: Validation performed using Intel SDE 9.58. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28999/files - new: https://git.openjdk.org/jdk/pull/28999/files/29093665..de6b115c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=04-05 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28999/head:pull/28999 PR: https://git.openjdk.org/jdk/pull/28999 From jbhateja at openjdk.org Wed Jan 7 02:15:23 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 Jan 2026 02:15:23 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v5] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 16:51:00 GMT, Sandhya Viswanathan wrote: >> We already have a check for [memory operands (mapping to multiple input edges](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L2664)) in place, ADLC generates DFA for both direct and flipped versions of the memory patterns. So Flag_ndd_demotable_opr2 will cover the flipped operand case. > > Thanks for the clarification. May be we should then add PD::Flag_ndd_demotable_opr2 to the following as well to be consistent: > xorI_rReg_rReg_mem_ndd > orI_rReg_rReg_mem_ndd > mulI_rReg_rReg_mem_ndd Thanks @sviswa7 , comment addressed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28999#discussion_r2666833538 From duke at openjdk.org Wed Jan 7 06:22:48 2026 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 7 Jan 2026 06:22:48 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 23:29:52 GMT, Volodymyr Paprotski wrote: >> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright year > > src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 862: > >> 860: __ addptr(condensed, condensedOffs); >> 861: >> 862: if (VM_Version::supports_avx512_vbmi2()) { > > Which instruction needs vbmi2? All I could spot was that `evpermb` that needs vbmi. Relax the restriction slightly? Good catch! Initially the code was using `vpshldvw`, but was changed to just use `vpsrlvw`. Fixed in next commit. I should probably update the bug synopsis to exclude VBMI2? > src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 906: > >> 904: __ addptr(condensed, 192); >> 905: __ addptr(parsed, 256); >> 906: __ subl(parsedLength, 128); > > (128 instead of 256 here because `parsedLength` is an index to an `short` array..) > > I am confused by the stride. The `twelve2Sixteen()` seems to (almost) guarantee that the parsed length is a multiple of 64 (last block can be 48 bytes). This would imply a stride of 128 bytes for `parsed`. And 96 for `condensed`. > > This is exactly how the existing code already behaves so I am less concerned, but I would like an explanation why it works? I believe the numbers are right: with each pass 256 bytes of coefficients are `parsed` into the parse buffer. This means that half of the coefficients have been processed (`parseLength` = 128). Would having a comment stating as such be sufficient for your concerns? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2667206396 PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2667206828 From chagedorn at openjdk.org Wed Jan 7 06:48:21 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 Jan 2026 06:48:21 GMT Subject: RFR: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:39:00 GMT, Damon Fenacci wrote: > # Issue > The assertion > https://github.com/openjdk/jdk/blob/1c16d8a900928ebfa5e2343cf33312539509e815/src/hotspot/share/opto/callGenerator.cpp#L421 > in `LateInlineMHCallGenerator::do_late_inline_check` fails while running a 24h-long Renaissance benchmark test. > > # Cause > The assert failure is due to both `!cg->is_late_inline()` and `cg->is_mh_late_inline()` being false because the CallGenerator `cg` is of type `LateInlineVirtualCallGenerator`. `cg` is created just above by calling `for_method_handle_inline`. The only way for `cg` to be of type `LateInlineVirtualCallGenerator` is that `for_method_handle_inline` (which finds out that the intrinsic id is `vmIntrinsics::_linkToInterface`) calls `optimize_virtual_call`, which apparently cannot devirtualize the call, and then calls `call_generator` and this creates and returns a `LateInlineVirtualCallGenerator` (at the end of the method). > > # Fix > There seem to be no apparent reason why the CallGenerator returned by `for_method_handle_inline` couldn't be of type `LateInlineVirtualCallGenerator`. So the sensible fix is to relax the assert to accommodate this type of call generator. > > # Testing > Tier 1-3+ > Failing test before and after. > > Unfortunately it has proven impractical to create a specific test that consistently (or even only intermittently) reproduces the issue. Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28793#pullrequestreview-3633559533 From chagedorn at openjdk.org Wed Jan 7 06:54:33 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 Jan 2026 06:54:33 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v5] In-Reply-To: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> References: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> Message-ID: On Tue, 6 Jan 2026 23:53:18 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Use Xcomp test run instead of Warmup(0) Update looks good! Thanks for coming back to this. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26827#pullrequestreview-3633565775 From thartmann at openjdk.org Wed Jan 7 06:54:34 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 Jan 2026 06:54:34 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v5] In-Reply-To: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> References: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> Message-ID: On Tue, 6 Jan 2026 23:53:18 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Use Xcomp test run instead of Warmup(0) Looks good to me too, thanks! I re-submitted some quick testing and will report back once it passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26827#pullrequestreview-3633571895 From fyang at openjdk.org Wed Jan 7 06:59:48 2026 From: fyang at openjdk.org (Fei Yang) Date: Wed, 7 Jan 2026 06:59:48 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v9] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 12:56:01 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Fix from Bhavana Kilambi for failing JTREG regressions on AARCH64 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Including test changes from Bhavana Kilambi (ARM) > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Optimizing tail handling > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Cleanups > - Fix failing jtreg test in CI > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Cleanups > - ... and 13 more: https://git.openjdk.org/jdk/compare/5e7ae281...703f313d Hi, I have a minor question about the tests. test/jdk/jdk/incubator/vector/Float16Vector64Tests.java line 1893: > 1891: VectorMask m = three.compare(VectorOperators.LE, higher); > 1892: assert(m.allTrue()); > 1893: m = higher.min((short)-1).test(VectorOperators.IS_NEGATIVE); I find that `higher.min((short)-1)` produces a float16 vector of 4 NaNs. So are we testing for negative NaNs with `VectorOperators.IS_NEGATIVE`? Is it more reasonable to test `VectorOperators.IS_NAN` instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/28002#pullrequestreview-3633583291 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2667282723 From thartmann at openjdk.org Wed Jan 7 07:05:57 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 Jan 2026 07:05:57 GMT Subject: RFR: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:39:00 GMT, Damon Fenacci wrote: > # Issue > The assertion > https://github.com/openjdk/jdk/blob/1c16d8a900928ebfa5e2343cf33312539509e815/src/hotspot/share/opto/callGenerator.cpp#L421 > in `LateInlineMHCallGenerator::do_late_inline_check` fails while running a 24h-long Renaissance benchmark test. > > # Cause > The assert failure is due to both `!cg->is_late_inline()` and `cg->is_mh_late_inline()` being false because the CallGenerator `cg` is of type `LateInlineVirtualCallGenerator`. `cg` is created just above by calling `for_method_handle_inline`. The only way for `cg` to be of type `LateInlineVirtualCallGenerator` is that `for_method_handle_inline` (which finds out that the intrinsic id is `vmIntrinsics::_linkToInterface`) calls `optimize_virtual_call`, which apparently cannot devirtualize the call, and then calls `call_generator` and this creates and returns a `LateInlineVirtualCallGenerator` (at the end of the method). > > # Fix > There seem to be no apparent reason why the CallGenerator returned by `for_method_handle_inline` couldn't be of type `LateInlineVirtualCallGenerator`. So the sensible fix is to relax the assert to accommodate this type of call generator. > > # Testing > Tier 1-3+ > Failing test before and after. > > Unfortunately it has proven impractical to create a specific test that consistently (or even only intermittently) reproduces the issue. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28793#pullrequestreview-3633598263 From chagedorn at openjdk.org Wed Jan 7 07:23:09 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 Jan 2026 07:23:09 GMT Subject: [jdk26] RFR: 8373524: C2: no reachable node should have no use In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 16:20:56 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [e72f205a](https://github.com/openjdk/jdk/commit/e72f205ae312b15ebab0cbeedb73bbf86e485251) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 19 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. > > Thanks! Testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29042#issuecomment-3717629017 From chagedorn at openjdk.org Wed Jan 7 07:23:08 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 Jan 2026 07:23:08 GMT Subject: [jdk26] RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 16:20:29 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [2ba423db](https://github.com/openjdk/jdk/commit/2ba423db9925355348106fc9fcf84450123d2605) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 18 Dec 2025 and was reviewed by Roberto Casta?eda Lozano, Daniel Lund?n and Damon Fenacci. > > Thanks! Testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29041#issuecomment-3717629364 From dfenacci at openjdk.org Wed Jan 7 07:32:54 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 7 Jan 2026 07:32:54 GMT Subject: RFR: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 22:15:42 GMT, Vladimir Ivanov wrote: >> # Issue >> The assertion >> https://github.com/openjdk/jdk/blob/1c16d8a900928ebfa5e2343cf33312539509e815/src/hotspot/share/opto/callGenerator.cpp#L421 >> in `LateInlineMHCallGenerator::do_late_inline_check` fails while running a 24h-long Renaissance benchmark test. >> >> # Cause >> The assert failure is due to both `!cg->is_late_inline()` and `cg->is_mh_late_inline()` being false because the CallGenerator `cg` is of type `LateInlineVirtualCallGenerator`. `cg` is created just above by calling `for_method_handle_inline`. The only way for `cg` to be of type `LateInlineVirtualCallGenerator` is that `for_method_handle_inline` (which finds out that the intrinsic id is `vmIntrinsics::_linkToInterface`) calls `optimize_virtual_call`, which apparently cannot devirtualize the call, and then calls `call_generator` and this creates and returns a `LateInlineVirtualCallGenerator` (at the end of the method). >> >> # Fix >> There seem to be no apparent reason why the CallGenerator returned by `for_method_handle_inline` couldn't be of type `LateInlineVirtualCallGenerator`. So the sensible fix is to relax the assert to accommodate this type of call generator. >> >> # Testing >> Tier 1-3+ >> Failing test before and after. >> >> Unfortunately it has proven impractical to create a specific test that consistently (or even only intermittently) reproduces the issue. > > Looks good. Thanks @iwanowww @TobiHartmann @chhagedorn for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28793#issuecomment-3717648402 From dfenacci at openjdk.org Wed Jan 7 07:32:55 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 7 Jan 2026 07:32:55 GMT Subject: Integrated: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:39:00 GMT, Damon Fenacci wrote: > # Issue > The assertion > https://github.com/openjdk/jdk/blob/1c16d8a900928ebfa5e2343cf33312539509e815/src/hotspot/share/opto/callGenerator.cpp#L421 > in `LateInlineMHCallGenerator::do_late_inline_check` fails while running a 24h-long Renaissance benchmark test. > > # Cause > The assert failure is due to both `!cg->is_late_inline()` and `cg->is_mh_late_inline()` being false because the CallGenerator `cg` is of type `LateInlineVirtualCallGenerator`. `cg` is created just above by calling `for_method_handle_inline`. The only way for `cg` to be of type `LateInlineVirtualCallGenerator` is that `for_method_handle_inline` (which finds out that the intrinsic id is `vmIntrinsics::_linkToInterface`) calls `optimize_virtual_call`, which apparently cannot devirtualize the call, and then calls `call_generator` and this creates and returns a `LateInlineVirtualCallGenerator` (at the end of the method). > > # Fix > There seem to be no apparent reason why the CallGenerator returned by `for_method_handle_inline` couldn't be of type `LateInlineVirtualCallGenerator`. So the sensible fix is to relax the assert to accommodate this type of call generator. > > # Testing > Tier 1-3+ > Failing test before and after. > > Unfortunately it has proven impractical to create a specific test that consistently (or even only intermittently) reproduces the issue. This pull request has now been integrated. Changeset: c1c0ac87 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/c1c0ac877033c3edb0c2681c2c5f825be8adcfb3 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check Reviewed-by: vlivanov, chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/28793 From roland at openjdk.org Wed Jan 7 08:08:51 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 Jan 2026 08:08:51 GMT Subject: [jdk26] RFR: 8373524: C2: no reachable node should have no use In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 07:19:34 GMT, Christian Hagedorn wrote: >> Hi all, >> >> This pull request contains a backport of commit [e72f205a](https://github.com/openjdk/jdk/commit/e72f205ae312b15ebab0cbeedb73bbf86e485251) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Roland Westrelin on 19 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. >> >> Thanks! > > Testing passed. @chhagedorn thanks for the review and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/29042#issuecomment-3717741652 From roland at openjdk.org Wed Jan 7 08:08:53 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 Jan 2026 08:08:53 GMT Subject: [jdk26] Integrated: 8373524: C2: no reachable node should have no use In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 16:20:56 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [e72f205a](https://github.com/openjdk/jdk/commit/e72f205ae312b15ebab0cbeedb73bbf86e485251) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 19 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. > > Thanks! This pull request has now been integrated. Changeset: ebe89745 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ebe8974556296137b57f356db3a29df705755c56 Stats: 94 lines in 2 files changed: 91 ins; 0 del; 3 mod 8373524: C2: no reachable node should have no use Reviewed-by: chagedorn Backport-of: e72f205ae312b15ebab0cbeedb73bbf86e485251 ------------- PR: https://git.openjdk.org/jdk/pull/29042 From roland at openjdk.org Wed Jan 7 08:11:35 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 Jan 2026 08:11:35 GMT Subject: [jdk26] RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: <1dQGxW_YqFILD8HsgAezgmQK0o_igjzmOL00WpbMrSA=.7442939c-26ea-4840-93c5-6f82c47344f1@github.com> On Wed, 7 Jan 2026 07:19:44 GMT, Christian Hagedorn wrote: >> Hi all, >> >> This pull request contains a backport of commit [2ba423db](https://github.com/openjdk/jdk/commit/2ba423db9925355348106fc9fcf84450123d2605) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Roland Westrelin on 18 Dec 2025 and was reviewed by Roberto Casta?eda Lozano, Daniel Lund?n and Damon Fenacci. >> >> Thanks! > > Testing passed. @chhagedorn thanks for the review and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/29041#issuecomment-3717743505 From roland at openjdk.org Wed Jan 7 08:11:37 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 Jan 2026 08:11:37 GMT Subject: [jdk26] Integrated: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: <1QwWQBXomjV6PcMAbtI-3ercEJA_yndvjfJyRjZIOcI=.f55919b8-dd96-461d-91d2-bd41b16c9860@github.com> On Mon, 5 Jan 2026 16:20:29 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [2ba423db](https://github.com/openjdk/jdk/commit/2ba423db9925355348106fc9fcf84450123d2605) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 18 Dec 2025 and was reviewed by Roberto Casta?eda Lozano, Daniel Lund?n and Damon Fenacci. > > Thanks! This pull request has now been integrated. Changeset: 32134656 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/32134656dfd314649957fa6da8d86d5f77011cef Stats: 195 lines in 6 files changed: 173 ins; 16 del; 6 mod 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis Reviewed-by: chagedorn Backport-of: 2ba423db9925355348106fc9fcf84450123d2605 ------------- PR: https://git.openjdk.org/jdk/pull/29041 From roland at openjdk.org Wed Jan 7 08:16:55 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 Jan 2026 08:16:55 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v4] In-Reply-To: References: Message-ID: > The base input of `AddP` is expected to only be set for heap accesses > but I noticed some inconsistencies so I added an assert in the `AddP` > constructor and fixed issues that it caught. AFAFICT, the > inconsistencies shouldn't create issues. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - more - more - review - Merge branch 'master' into JDK-8373343 - review - review - review - merge - more - more - ... and 3 more: https://git.openjdk.org/jdk/compare/695159e3...b20f41db ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28769/files - new: https://git.openjdk.org/jdk/pull/28769/files/007e73cd..b20f41db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=02-03 Stats: 16548 lines in 2404 files changed: 8811 ins; 2140 del; 5597 mod Patch: https://git.openjdk.org/jdk/pull/28769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28769/head:pull/28769 PR: https://git.openjdk.org/jdk/pull/28769 From roland at openjdk.org Wed Jan 7 08:20:43 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 Jan 2026 08:20:43 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v3] In-Reply-To: References: Message-ID: On Mon, 22 Dec 2025 07:23:34 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/graphKit.cpp line 3590: > >> 3588: } >> 3589: constant_value = Klass::_lh_neutral_value; // put in a known value >> 3590: Node* lhp = basic_plus_adr(top(), klass_node, in_bytes(Klass::layout_helper_offset())); > > Same thought here: could we have a separate `off_heap_plus_addr()` or something like that instead of passing in `top()` on each call site? > > This and the other suggestion could also be done separately. I gave that one a try and I found that pattern to be common and I think it would be best done as a separate change. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2667477734 From dfenacci at openjdk.org Wed Jan 7 08:28:14 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 7 Jan 2026 08:28:14 GMT Subject: [jdk26] RFR: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check Message-ID: <6c3aoRq3DcmnhdnhUWP9SLRnBbhJ1o8DwVVW5zArqqM=.537ab223-d9af-41f6-a4d8-ee29d0fbfa55@github.com> Hi all, This pull request contains a backport of commit [c1c0ac87](https://github.com/openjdk/jdk/commit/c1c0ac877033c3edb0c2681c2c5f825be8adcfb3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Damon Fenacci on 7 Jan 2026 and was reviewed by Vladimir Ivanov, Christian Hagedorn and Tobias Hartmann. Thanks! ------------- Commit messages: - Backport c1c0ac877033c3edb0c2681c2c5f825be8adcfb3 Changes: https://git.openjdk.org/jdk/pull/29079/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29079&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342772 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29079/head:pull/29079 PR: https://git.openjdk.org/jdk/pull/29079 From roland at openjdk.org Wed Jan 7 08:30:08 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 Jan 2026 08:30:08 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v4] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 08:16:55 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - more > - more > - review > - Merge branch 'master' into JDK-8373343 > - review > - review > - review > - merge > - more > - more > - ... and 3 more: https://git.openjdk.org/jdk/compare/d5b742aa...b20f41db I made one tweak in the updated change: `ClearArrayNode::clear_memory()` now takes an extra argument that tells whether it's writing to raw memory or not. That feels cleaner given whether raw memory is used or not can be figured out from where `ClearArrayNode::clear_memory()` is called. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28769#issuecomment-3717804909 From jbhateja at openjdk.org Wed Jan 7 09:05:47 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 Jan 2026 09:05:47 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v9] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 06:55:45 GMT, Fei Yang wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - Fix from Bhavana Kilambi for failing JTREG regressions on AARCH64 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Including test changes from Bhavana Kilambi (ARM) >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Optimizing tail handling >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Cleanups >> - Fix failing jtreg test in CI >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Cleanups >> - ... and 13 more: https://git.openjdk.org/jdk/compare/5e7ae281...703f313d > > test/jdk/jdk/incubator/vector/Float16Vector64Tests.java line 1893: > >> 1891: VectorMask m = three.compare(VectorOperators.LE, higher); >> 1892: assert(m.allTrue()); >> 1893: m = higher.min((short)-1).test(VectorOperators.IS_NEGATIVE); > > I find that `higher.min((short)-1)` produces a float16 vector of 4 NaNs. So are we testing for negative NaNs with `VectorOperators.IS_NEGATIVE`? Is it more reasonable to test `VectorOperators.IS_NAN` instead? Thanks for catching this, all the Float16Vector lanes and short argument passed to shorthand APIs are assumed to be encoded in IEEE 754 binary 16 format, we should be passing Float16 bit representation of -1 here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2667602759 From qamai at openjdk.org Wed Jan 7 09:42:22 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 7 Jan 2026 09:42:22 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v5] In-Reply-To: References: Message-ID: > Hi, > > This is extracted from #28570 , there are 2 issues here: > > - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. > - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. > > Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into intrinsicsadrtype - copyright year - Merge branch 'master' into intrinsicsadrtype - consolidate the memory effect into a function - Use MemBar instead of widening the intrinsic memory - Fix Shenandoah - Fix memory around intrinsics nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28789/files - new: https://git.openjdk.org/jdk/pull/28789/files/c3503ed9..b871ba8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=03-04 Stats: 39504 lines in 2993 files changed: 15957 ins; 4935 del; 18612 mod Patch: https://git.openjdk.org/jdk/pull/28789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28789/head:pull/28789 PR: https://git.openjdk.org/jdk/pull/28789 From xgong at openjdk.org Wed Jan 7 09:43:05 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 7 Jan 2026 09:43:05 GMT Subject: [jdk26] RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 07:21:27 GMT, Tobias Hartmann wrote: >> Hi all, >> >> This pull request contains a backport of commit [6eaabed5](https://github.com/openjdk/jdk/commit/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Xiaohong Gong on 5 Jan 2026 and was reviewed by Jie Fu, Jatin Bhateja, Eric Fang and Quan Anh Mai. >> >> Thanks! > > Looks good. Thanks for your review @TobiHartmann ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29053#issuecomment-3718037535 From xgong at openjdk.org Wed Jan 7 09:43:08 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 7 Jan 2026 09:43:08 GMT Subject: [jdk26] RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: Message-ID: <1NHAS5n-BzZG5q4O0ab3BwZoPovdBgqQHF6qhImVJ0k=.d477e7b0-4392-4cff-8c25-c977ec551c67@github.com> On Tue, 6 Jan 2026 02:39:58 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [6eaabed5](https://github.com/openjdk/jdk/commit/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 5 Jan 2026 and was reviewed by Jie Fu, Jatin Bhateja, Eric Fang and Quan Anh Mai. > > Thanks! Seems the GHA failure is not caused by this change as I can observe the same issue on other PRs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29053#issuecomment-3718041844 From xgong at openjdk.org Wed Jan 7 09:46:25 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 7 Jan 2026 09:46:25 GMT Subject: [jdk26] Integrated: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 02:39:58 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [6eaabed5](https://github.com/openjdk/jdk/commit/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 5 Jan 2026 and was reviewed by Jie Fu, Jatin Bhateja, Eric Fang and Quan Anh Mai. > > Thanks! This pull request has now been integrated. Changeset: 93675e6e Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/93675e6e044c4dcfcbafac658f47554a44eb27a8 Stats: 43 lines in 1 file changed: 1 ins; 32 del; 10 mod 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently Reviewed-by: thartmann Backport-of: 6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca ------------- PR: https://git.openjdk.org/jdk/pull/29053 From qamai at openjdk.org Wed Jan 7 09:56:26 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 7 Jan 2026 09:56:26 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v7] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - copyright year - Merge branch 'master' into typejoin - sort order - Merge branch 'master' into typejoin - Merge branch 'master' into typejoin - Move dual to ASSERT only - Keep old version for verification - whitespace - Reimplement Type::join ------------- Changes: https://git.openjdk.org/jdk/pull/28051/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=06 Stats: 1889 lines in 7 files changed: 1013 ins; 479 del; 397 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051 From epeter at openjdk.org Wed Jan 7 10:00:11 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 Jan 2026 10:00:11 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v5] In-Reply-To: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> References: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> Message-ID: On Tue, 6 Jan 2026 23:53:18 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Use Xcomp test run instead of Warmup(0) Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26827#pullrequestreview-3634120853 From thartmann at openjdk.org Wed Jan 7 10:16:11 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 Jan 2026 10:16:11 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v5] In-Reply-To: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> References: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> Message-ID: On Tue, 6 Jan 2026 23:53:18 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Use Xcomp test run instead of Warmup(0) I see timeouts with the test in various configurations, maybe the default timeout should be increased? compiler.lib.ir_framework.driver.TestVMException: There were one or multiple errors. Please check stderr for more information. at compiler.lib.ir_framework.driver.TestVMProcess.throwTestVMException(TestVMProcess.java:251) at compiler.lib.ir_framework.driver.TestVMProcess.checkTestVMExitCode(TestVMProcess.java:232) at compiler.lib.ir_framework.driver.TestVMProcess.(TestVMProcess.java:77) at compiler.lib.ir_framework.TestFramework.runTestVM(TestFramework.java:879) at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:839) at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:431) at compiler.lib.ir_framework.TestFramework.runWithFlags(TestFramework.java:257) at compiler.vectorization.TestSubwordTruncation.main(TestSubwordTruncation.java:509) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) at java.base/java.lang.Thread.run(Thread.java:1516) JavaTest Message: Test threw exception: compiler.lib.ir_framework.driver.TestVMException JavaTest Message: shutting down test result: Error. "driver" action timed out with a timeout of 480 seconds on agent 136; but completed after timeout - suppressed status: "Failed. `main' threw exception: compiler.lib.ir_framework.driver.TestVMException: There were one or multiple errors. Please check stderr for more information." test result: Error. "driver" action timed out with a timeout of 480 seconds on agent 136; but completed after timeout - suppressed status: "Failed. `main' threw exception: compiler.lib.ir_framework.driver.TestVMException: There were one or multiple errors. Please check stderr for more information." ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3718183534 From thartmann at openjdk.org Wed Jan 7 10:18:46 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 Jan 2026 10:18:46 GMT Subject: [jdk26] RFR: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check In-Reply-To: <6c3aoRq3DcmnhdnhUWP9SLRnBbhJ1o8DwVVW5zArqqM=.537ab223-d9af-41f6-a4d8-ee29d0fbfa55@github.com> References: <6c3aoRq3DcmnhdnhUWP9SLRnBbhJ1o8DwVVW5zArqqM=.537ab223-d9af-41f6-a4d8-ee29d0fbfa55@github.com> Message-ID: On Wed, 7 Jan 2026 08:17:55 GMT, Damon Fenacci wrote: > Hi all, > > This pull request contains a backport of commit [c1c0ac87](https://github.com/openjdk/jdk/commit/c1c0ac877033c3edb0c2681c2c5f825be8adcfb3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Damon Fenacci on 7 Jan 2026 and was reviewed by Vladimir Ivanov, Christian Hagedorn and Tobias Hartmann. > > Thanks! Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29079#pullrequestreview-3634199871 From duke at openjdk.org Wed Jan 7 11:36:32 2026 From: duke at openjdk.org (Tobias Hotz) Date: Wed, 7 Jan 2026 11:36:32 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: <2IdQg8dyfH2eYVDeAa4OdHep2-aAxs6yFCZuJ9UhX00=.26d1937d-eda5-4969-aaf2-c5811ec30329@github.com> On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero GHA Failures look unrelated ------------- PR Comment: https://git.openjdk.org/jdk/pull/29045#issuecomment-3718459487 From duke at openjdk.org Wed Jan 7 11:36:33 2026 From: duke at openjdk.org (duke) Date: Wed, 7 Jan 2026 11:36:33 GMT Subject: RFR: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero @ichttt Your change (at version 31ddd36956c796a263ef310809a6eb351d25bbc4) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29045#issuecomment-3718466951 From duke at openjdk.org Wed Jan 7 11:52:54 2026 From: duke at openjdk.org (Tobias Hotz) Date: Wed, 7 Jan 2026 11:52:54 GMT Subject: Integrated: 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 18:37:54 GMT, Tobias Hotz wrote: > This PR is a follow-up to https://github.com/openjdk/jdk/pull/26143 > I've missed that INT_CONST_2 and LONG_CONST_2 may be zero depending on the seed, which causes arithmetic exceptions that are not being caught. > The fix is simple: Detect if these constants are zero, and if so, expect a div by zero exception to be thrown. > I've not added additional test as this is a testbug, but I verified this test works correctly now if INT_CONST_2 is zero This pull request has now been integrated. Changeset: d7a3df63 Author: Tobias Hotz Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/d7a3df639977ac8442eec1efb41de6dc50384150 Stats: 10 lines in 1 file changed: 8 ins; 0 del; 2 mod 8374436: compiler/igvn/IntegerDivValueTests.java failed with division by zero Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/29045 From epeter at openjdk.org Wed Jan 7 12:40:54 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 Jan 2026 12:40:54 GMT Subject: RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: References: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> Message-ID: On Tue, 6 Jan 2026 16:53:22 GMT, Quan Anh Mai wrote: >> In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. >> >> Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. >> >> Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. >> Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. >> >> --------------------------- >> >> **Details** >> >> Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. >> >> image >> >> `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. > > Marked as reviewed by qamai (Committer). @merykitty @TobiHartmann @vnkozlov Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29028#issuecomment-3718669694 From epeter at openjdk.org Wed Jan 7 12:40:56 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 Jan 2026 12:40:56 GMT Subject: Integrated: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> References: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> Message-ID: On Mon, 5 Jan 2026 07:37:34 GMT, Emanuel Peter wrote: > In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. > > Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. > > Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. > Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. > > --------------------------- > > **Details** > > Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. > > image > > `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. This pull request has now been integrated. Changeset: da14813a Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/da14813a5bdadaf0a1f81fa57ff6e1b103eaf113 Stats: 116 lines in 3 files changed: 109 ins; 0 del; 7 mod 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs Reviewed-by: kvn, thartmann, qamai ------------- PR: https://git.openjdk.org/jdk/pull/29028 From duke at openjdk.org Wed Jan 7 13:21:35 2026 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 7 Jan 2026 13:21:35 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 00:18:43 GMT, Volodymyr Paprotski wrote: > "Insert 0b0000 nibble after every third nibble". I only have two questions, looks good otherwise. Yes, that is the idea. > > PS: things I've considered: > > * Loop controls? > > * ML_KEM.java guarantees (per callee comment and assert) lengths are multiple of 64 > * also same as original code > * Why not simply a vpermb? Have zeroes already from the masked load with k1.. It *is* using vpermb (evpermb() generates the EVEX encoded VPERMB) > > * shuffle granularity is actually 4-bits, not 8-bits Really? In what instruction? I hadn't found it in the manual. > * logical shift already zeroes top bits, so `vpand` not required? Only every 2nd byte is shifted, the rest needs to be masked. > > * odd columns not shifted, so still have extra bits that need clearing Yes, that is what the vpand does. (actually, it also (unnecessarily) masks the shifted bytes. > * Why VBMI? > > * needed for `evpermb` Yes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28815#issuecomment-3718842604 From chagedorn at openjdk.org Wed Jan 7 13:37:10 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 Jan 2026 13:37:10 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> <8jnY6pqofieRIfV5fCqFxvHZMF3nAZbh7yAD7C_G5FU=.a12c98f6-c715-43f5-9528-62fcfdfc6e59@github.com> Message-ID: On Tue, 6 Jan 2026 23:48:51 GMT, Jasmine Karthikeyan wrote: >> Yep, I can replicate the crash on the old commit with `TestFramework.runWithFlags("-Xcomp", "-XX:CompileCommand=compileonly,*TestSubwordTruncation::*");` instead of `@Warmup(0)`. I think this would also be a good option, as it would let you get coverage with Xcomp on the other tests as well. > > I've pushed a commit that changes the Warmup(0) to the second test run. Given the timeout reported by Tobias, I would rather opt for `@Warmup(0)` with one `run()` only. We will eventually run the test with `-Xcomp` on higher tiers, for example at tier6, so we get `Xcomp` coverage at some point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2668480186 From bulasevich at openjdk.org Wed Jan 7 14:35:08 2026 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 7 Jan 2026 14:35:08 GMT Subject: RFR: 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap [v2] In-Reply-To: References: Message-ID: > We observed a deoptimization storm caused by GraphKit::uncommon_trap generator logic. GraphKit::uncommon_trap considers the too_many_recompiles metric. If the threshold is overflowed, it replaces Deoptimization::Action_reinterpret with Deoptimization::Action_none (see code snippet below). > > This replacement changes the uncommon_trap logic: once execution hits a trap, the VM performs deoptimization but does not recompile the method anymore. In an "unlucky" case, when the code part calling this uncommon_trap becomes frequent, a deoptimization storm occurs (thousands of deoptimizations per second) causing a significant performance drop. > > The original problematic method, which triggered repeated recompilations, is a high-performance compressed binary serialization algorithm with heavy use of conditional branches driven by bitmasks. See a standalone [synthetic benchmark](https://bugs.openjdk.org/secure/attachment/118045/UnstableIf.java) to reproduce the issue. > > The issue arises when the method overcomes a global recompilation threshold before stabilizing specific trap counters. > > Current thresholds: > - Recompilation Limit (too_many_recompiles): > Condition: decompile_count() >= (PerMethodRecompilationCutoff / 2) + 1 > Default: 201 (derived from default PerMethodRecompilationCutoff = 400). > - Specific Trap Limits (too_many_traps): > Checks if the trap count for a specific reason exceeds: > PerMethodTrapLimit (Default: 100) - for Reason_unstable_if, Reason_unstable_fused_if, etc. > PerMethodSpecTrapLimit (Default: 5000) - for Reason_speculate_class_check, Reason_speculate_null_check, etc. > > With the gived defaults, if the only reason for the method recompilation is unstable_if, the system stabilizes after 100 traps (PerMethodTrapLimit). However, if the method experiences traps and recompilations for different reasons, the total number of recompilations can exceed 200 before hitting the limit for unstable_if traps. This triggers Action_none and causes the deopt storm. > > The proposal is a minimal change in GraphKit::uncommon_trap: apply the same `too_many_recompiles` threshold inside `Parse::path_is_suitable_for_uncommon_trap` - this ensures that on the final recompilation C2 gets a hint not to speculate on untaken branches anymore. > > As an alternative solution, we can revisit GraphKit::uncommon_trap. This "Temporary fix" has persisted in the codebase for 17 years, so it is probably time to change it as well. Any comments are welcome > > case Deoptimization::Action_reinter... Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: using too_many_traps_or_recompiles. adding DeoptStorm jtreg test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28966/files - new: https://git.openjdk.org/jdk/pull/28966/files/258a9673..74c02fe1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28966&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28966&range=00-01 Stats: 106 lines in 2 files changed: 104 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28966/head:pull/28966 PR: https://git.openjdk.org/jdk/pull/28966 From sviswanathan at openjdk.org Wed Jan 7 16:07:23 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 7 Jan 2026 16:07:23 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v6] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 02:15:22 GMT, Jatin Bhateja wrote: >> Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. >> >> Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> PS: Validation performed using Intel SDE 9.58. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28999#pullrequestreview-3635600641 From mchevalier at openjdk.org Wed Jan 7 16:26:58 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 7 Jan 2026 16:26:58 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v7] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 09:56:26 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. >> >> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). >> >> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. >> >> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: >> >> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. >> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. >> >> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - copyright year > - Merge branch 'master' into typejoin > - sort order > - Merge branch 'master' into typejoin > - Merge branch 'master' into typejoin > - Move dual to ASSERT only > - Keep old version for verification > - whitespace > - Reimplement Type::join Good improvement! It's really much simpler, easy to reason about... If we have problems with operations, it's probably not by lack of dual. I'm not super familiar with the abstraction of some derived clases of `TypePtr` (like `TypeMetadataPtr`), but I didn't see anything shocking. I do have a few little comments. I can think of only one type-related thing that would be even better: swapping names of top, bottom, join and meet to match the terminology of the rest of the world. Seriously, that would be nice... src/hotspot/share/opto/memnode.cpp line 2019: > 2017: if (is_mismatched_access()) { > 2018: return _type; > 2019: } How is that related to the reimplementation of join? src/hotspot/share/opto/rangeinference.cpp line 700: > 698: return int_type_union(t1, t2); > 699: } else { > 700: return CT::make_or_top(TypeIntPrototype{{MAX2(t1->_lo, t2->_lo), MIN2(t1->_hi, t2->_hi)}, Nothing dramatic, but why is this branch implemented directly, why the other branch is calling another function `int_type_union`. This function seems to be used only here and in testing. If I understand correctly, this branch is meant to go away when the `_is_dual` is removed, but it is (almost) the same as in `int_type_xjoin` (which makes sense). I'm not very decided what would be the best, but I'm slightly annoyed they look different. Should we have a `int_type_intersection`? But I don't see what it would do that `int_type_xjoin` doesn't do. Or simply, why do we need `int_type_union`? Once this else-branch removed, wouldn't `int_type_xmeet` simply be a call to `int_type_union`, and so maybe we could avoid this extra-step? I'm fine if the answer is "it's part of the future clean up work", but then, I think we could have a tracking ticket and a TODO comment. src/hotspot/share/opto/rangeinference.cpp line 710: > 708: > 709: template > 710: const Type* TypeIntHelper::int_type_xjoin(const CT* t1, const CT* t2) { Unlike `int_type_union`, I couldn't find test for `int_type_xjoin`. have I missed something? Is that expected? I agree it's not the the most scary function, but... why not! And it would at least be there for skeleton when adding other abstract domains that might be less obvious. src/hotspot/share/opto/subnode.cpp line 2014: > 2012: const Type* in_type = phase->type(in1); > 2013: if ((in_type->isa_int() && in_type->is_int()->_lo >= 0) || > 2014: (in_type->isa_long() && in_type->is_long()->_lo >= 0)) { I'm a bit annoyed here because we look quite a lot inside the details of the implementation of the types. I'm not fan of the previous situation either, as at most one side of the `||` would make sense. For instance, if we want to make use of bitwise information for this, we could look at the highest bit (that is a sign bit in two complement). That is just an example, we could find other tricks to conclude that an abstract int is non-negative that may not involve ranges (or not only ranges), and I find unfortunate to look inside the type, rather than the type telling. I think the usual approach would be to check that the guard/intersection (depending on the formalism) with the negative numbers is empty (or with non-negative numbers is the same), but we quickly have again the problem that `in_type` can be either int or long and a lot of ways to write that would need to split cases. Maybe an approximation, would be to have `TypeInt::is_non_negative` and alike not to overengineer guarding with arbitrary expression, but still limit the scope of who looks into the implementation? src/hotspot/share/opto/type.cpp line 1029: > 1027: tty->print("t1 meets t2 = "); mt1->dump(); tty->cr(); > 1028: tty->print("t2 meets t1 = "); mt2->dump(); tty->cr(); > 1029: fatal("meet not commutative"); I see it was like that before, but I think it's discouraged to have many tty->print to avoid interleaved output of concurrent prints. The preferred solution is to use a stringstream rather than a lock. Also, do we need a flush before the fatal? I've seen some (other) prints that are cut before the end on assert failures. Same under. src/hotspot/share/opto/type.cpp line 1515: > 1513: return this; > 1514: case FloatCon: > 1515: assert(jint_cast(_f) != jint_cast(t->getf()), "Equivalent instances should not appear here"); Because of the if (t1 == t2) { return t1; } in the binary `Type::xmeet`, right? (similar under). src/hotspot/share/opto/type.cpp line 2401: > 2399: > 2400: //------------------------------meet------------------------------------------- > 2401: // Compute the MEET of two types. It returns a new Type object. Should we clean up these comments? I think we can have it on the base method, but maybe not that useful on each override? Also, it's not a great comment: it doesn't bring much more compared to the signature, it speaks abotu two types, but one of them is `this` and so implicit which is somewhat confusing (it would make more sense on `const Type* Type::xmeet(const Type* t1, const Type* t2)`). The comments says it returns a new Type object, which is ambiguous: these methods can return `this` or `t`, which are not new objects, for most definitions I'd give to this. I think it means that it doesn't mutate `this` or `t`, but actually return the result, but that is clear from the `const`s. Some of these comments have a header line of dashes where it's written `xmeet` and some (like this one) only `meet`. Overall, I think these comments bring nothing, or possibly confusion, and it would be a good opportunity to get rid of them. ------------- PR Review: https://git.openjdk.org/jdk/pull/28051#pullrequestreview-3635235449 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2668746050 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2668821963 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2668829347 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2668881848 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2668967484 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2669031908 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2669077305 From vpaprotski at openjdk.org Wed Jan 7 16:42:24 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 7 Jan 2026 16:42:24 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: Message-ID: <7gLxGLuYKPYchraxk4Z4hh_ThfgGsGGdYAL2LVaDBvg=.63281063-7480-431f-b24e-1304ef92326e@github.com> On Wed, 7 Jan 2026 13:18:50 GMT, Ferenc Rakoczi wrote: > > "Insert 0b0000 nibble after every third nibble". I only have two questions, looks good otherwise. @ferakocz apologies for the misunderstanding; everything after the PS was not a request for change.. those were the questions that occurred to me and I found the answer.. The only reason I put them in was for the next reviewer. Or if I am wrong, e.g. no, I did not find a better instruction than vpermb either. (My first reaction to seeing the java code, was 'oh, this is easy, just a `vpermb`, then had to reason out why not..) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28815#issuecomment-3719734572 From vpaprotski at openjdk.org Wed Jan 7 16:42:27 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 7 Jan 2026 16:42:27 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 06:18:55 GMT, Shawn M Emery wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 862: >> >>> 860: __ addptr(condensed, condensedOffs); >>> 861: >>> 862: if (VM_Version::supports_avx512_vbmi2()) { >> >> Which instruction needs vbmi2? All I could spot was that `evpermb` that needs vbmi. Relax the restriction slightly? > > Good catch! Initially the code was using `vpshldvw`, but was changed to just use `vpsrlvw`. Fixed in next commit. > I should probably update the bug synopsis to exclude VBMI2? I would be happy with just the code being pedantic. Everything else is 'just nice' :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2669186665 From vpaprotski at openjdk.org Wed Jan 7 16:46:08 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 7 Jan 2026 16:46:08 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 06:19:09 GMT, Shawn M Emery wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 906: >> >>> 904: __ addptr(condensed, 192); >>> 905: __ addptr(parsed, 256); >>> 906: __ subl(parsedLength, 128); >> >> (128 instead of 256 here because `parsedLength` is an index to an `short` array..) >> >> I am confused by the stride. The `twelve2Sixteen()` seems to (almost) guarantee that the parsed length is a multiple of 64 (last block can be 48 bytes). This would imply a stride of 128 bytes for `parsed`. And 96 for `condensed`. >> >> This is exactly how the existing code already behaves so I am less concerned, but I would like an explanation why it works? > > I believe the numbers are right: with each pass 256 bytes of coefficients are `parsed` into the parse buffer. This means that half of the coefficients have been processed (`parsedLength` = 128). Would having a comment stating as such address your concerns? I wasn't as clear in my question. The asm is indeed processing the bytes in the increment. What I was trying to convince myself about.. 'how come we are not reading past the end of the array. Or are we?'. On one hand, this is exactly what the existing asm code does, so I will assume that its correct. However, on the java side/version of this code, I could only convince myself about processing ~two AVX512 vectors at a time, not four. So either I cant count, or there is some further (implicit) restrictions on the callers of `twelve2Sixteen` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2669202305 From jbhateja at openjdk.org Wed Jan 7 17:05:31 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 Jan 2026 17:05:31 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v6] In-Reply-To: References: Message-ID: <4n-meCFhxfGZQNKXkwobe_O0_z3vOA7nZv5I6ooEFns=.a4901d5f-3a64-4ae3-abca-252b3ddc9b90@github.com> On Wed, 7 Jan 2026 16:03:13 GMT, Sandhya Viswanathan wrote: > Looks good to me. Thanks @sviswa7 for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28999#issuecomment-3719835527 From jbhateja at openjdk.org Wed Jan 7 17:05:33 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 Jan 2026 17:05:33 GMT Subject: Integrated: 8373724: Assertion failure in TestSignumVector.java with UseAPX In-Reply-To: References: Message-ID: On Fri, 26 Dec 2025 12:31:39 GMT, Jatin Bhateja wrote: > Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. > > Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin > PS: Validation performed using Intel SDE 9.58. This pull request has now been integrated. Changeset: 640343f7 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/640343f7d94894b0378ea5b1768eeac203a9aaf8 Stats: 78 lines in 1 file changed: 2 ins; 1 del; 75 mod 8373724: Assertion failure in TestSignumVector.java with UseAPX Reviewed-by: sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/28999 From kxu at openjdk.org Wed Jan 7 17:08:57 2026 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 7 Jan 2026 17:08:57 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v28] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - Update license header years - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - remove trailing whitespaces - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - additional suggestions from code review - Apply suggestions from code review Co-authored-by: Christian Hagedorn - fix trip counter loop-variant detection - fix bad merge with ctrl_is_member() - Merge remote-tracking branch 'origin/master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp - ... and 40 more: https://git.openjdk.org/jdk/compare/640343f7...7783d609 ------------- Changes: https://git.openjdk.org/jdk/pull/24458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=27 Stats: 1231 lines in 3 files changed: 626 ins; 295 del; 310 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From bulasevich at openjdk.org Wed Jan 7 17:28:14 2026 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 7 Jan 2026 17:28:14 GMT Subject: RFR: 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap [v3] In-Reply-To: References: Message-ID: > We observed a deoptimization storm caused by GraphKit::uncommon_trap generator logic. GraphKit::uncommon_trap considers the too_many_recompiles metric. If the threshold is overflowed, it replaces Deoptimization::Action_reinterpret with Deoptimization::Action_none (see code snippet below). > > This replacement changes the uncommon_trap logic: once execution hits a trap, the VM performs deoptimization but does not recompile the method anymore. In an "unlucky" case, when the code part calling this uncommon_trap becomes frequent, a deoptimization storm occurs (thousands of deoptimizations per second) causing a significant performance drop. > > The original problematic method, which triggered repeated recompilations, is a high-performance compressed binary serialization algorithm with heavy use of conditional branches driven by bitmasks. See a standalone [synthetic benchmark](https://bugs.openjdk.org/secure/attachment/118045/UnstableIf.java) to reproduce the issue. > > The issue arises when the method overcomes a global recompilation threshold before stabilizing specific trap counters. > > Current thresholds: > - Recompilation Limit (too_many_recompiles): > Condition: decompile_count() >= (PerMethodRecompilationCutoff / 2) + 1 > Default: 201 (derived from default PerMethodRecompilationCutoff = 400). > - Specific Trap Limits (too_many_traps): > Checks if the trap count for a specific reason exceeds: > PerMethodTrapLimit (Default: 100) - for Reason_unstable_if, Reason_unstable_fused_if, etc. > PerMethodSpecTrapLimit (Default: 5000) - for Reason_speculate_class_check, Reason_speculate_null_check, etc. > > With the gived defaults, if the only reason for the method recompilation is unstable_if, the system stabilizes after 100 traps (PerMethodTrapLimit). However, if the method experiences traps and recompilations for different reasons, the total number of recompilations can exceed 200 before hitting the limit for unstable_if traps. This triggers Action_none and causes the deopt storm. > > The proposal is a minimal change in GraphKit::uncommon_trap: apply the same `too_many_recompiles` threshold inside `Parse::path_is_suitable_for_uncommon_trap` - this ensures that on the final recompilation C2 gets a hint not to speculate on untaken branches anymore. > > As an alternative solution, we can revisit GraphKit::uncommon_trap. This "Temporary fix" has persisted in the codebase for 17 years, so it is probably time to change it as well. Any comments are welcome > > case Deoptimization::Action_reinter... Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: using too_many_traps_or_recompiles. adding DeoptStorm jtreg test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28966/files - new: https://git.openjdk.org/jdk/pull/28966/files/74c02fe1..074fade0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28966&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28966&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28966/head:pull/28966 PR: https://git.openjdk.org/jdk/pull/28966 From dlunden at openjdk.org Wed Jan 7 17:31:43 2026 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 7 Jan 2026 17:31:43 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v6] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 02:15:22 GMT, Jatin Bhateja wrote: >> Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. >> >> Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> PS: Validation performed using Intel SDE 9.58. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions I will review this changeset now after integration, but, for future reference, please note that HotSpot changes require at least **two** reviews before integration (see https://openjdk.org/guide/#life-of-a-pr). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28999#issuecomment-3719942433 From duke at openjdk.org Wed Jan 7 17:38:11 2026 From: duke at openjdk.org (Yi Wu) Date: Wed, 7 Jan 2026 17:38:11 GMT Subject: RFR: 8373344: Add support for min/max reduction operations for Float16 [v2] In-Reply-To: References: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> <72RiEFo5wji1vOtIRzFwO03_0OsaCe2zzHjp9YPD8-k=.6a88f4eb-3fff-4fe5-bb43-260d81b7954a@github.com> Message-ID: On Tue, 6 Jan 2026 07:28:59 GMT, Xiaohong Gong wrote: >> Yi Wu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Replace assert with verify >> - Add IRNode constant and code refactor >> - Merge remote-tracking branch 'origin/master' into yiwu-8373344 >> - 8373344: Add support for FP16 min/max reduction operations >> >> This patch adds mid-end support for vectorized min/max reduction >> operations for half floats. It also includes backend AArch64 support >> for these operations. >> Both floating point min/max reductions don?t require strict order, >> because they are associative. >> >> It will generate NEON fminv/fmaxv reduction instructions when >> max vector length is 8B or 16B. On SVE supporting machines >> with vector lengths > 16B, it will generate the SVE fminv/fmaxv >> instructions. >> The patch also adds support for partial min/max reductions on >> SVE machines using fminv/fmaxv. >> >> Ratio of throughput(ops/ms) > 1 indicates the performance with >> this patch is better than the mainline. >> >> Neoverse N1 (UseSVE = 0, max vector length = 16B): >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionMaxFP16 256 thrpt 9 3.69 6.44 >> ReductionMaxFP16 512 thrpt 9 3.71 7.62 >> ReductionMaxFP16 1024 thrpt 9 4.16 8.64 >> ReductionMaxFP16 2048 thrpt 9 4.44 9.12 >> ReductionMinFP16 256 thrpt 9 3.69 6.43 >> ReductionMinFP16 512 thrpt 9 3.70 7.62 >> ReductionMinFP16 1024 thrpt 9 4.16 8.64 >> ReductionMinFP16 2048 thrpt 9 4.44 9.10 >> >> Neoverse V1 (UseSVE = 1, max vector length = 32B): >> Benchmark vectorDim Mode Cnt 8B 16B 32B >> ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 >> ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 >> ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 >> ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 >> ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 >> ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 >> ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 >> ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 >> >> Neoverse V2 (UseSVE = 2, max vector length = 16B)... > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 381: > >> 379: case Op_XorReductionV: >> 380: case Op_MinReductionVHF: >> 381: case Op_MaxReductionVHF: > > We can use the NEON instructions if the vector size <= 16B as well for partial cases. Did you test the performance with NEON instead of using predicated SVE instructions? You mean move it down, like `Op_AddReductionVI` and `Op_AddReductionVL` to use `return !VM_Version::use_neon_for_vector(length_in_bytes);`? It doesn't to make much of a difference. Neoverse V1 (UseSVE = 1, max vector length = 32B) Benchmark vectorDim Mode Cnt 8B(old) 8B(new) chg2/chg1 16B(old) 16B(new) chg2/chg1 32B(old) 32B(new) chg2/chg1 ReductionMaxFP16 256 thrpt 9 3.96 3.96 1.00 8.63 8.62 1.00 8.02 8.02 1.00 ReductionMaxFP16 512 thrpt 9 3.54 3.54 1.00 9.25 9.25 1.00 11.71 11.71 1.00 ReductionMaxFP16 1024 thrpt 9 3.77 3.77 1.00 8.70 8.71 1.00 14.12 14.07 1.00 ReductionMaxFP16 2048 thrpt 9 3.88 3.88 1.00 8.45 8.44 1.00 14.69 14.69 1.00 ReductionMinFP16 256 thrpt 9 3.96 3.96 1.00 8.62 8.61 1.00 8.02 8.03 1.00 ReductionMinFP16 512 thrpt 9 3.55 3.54 1.00 9.26 9.28 1.00 11.72 11.69 1.00 ReductionMinFP16 1024 thrpt 9 3.76 3.76 1.00 8.69 8.70 1.00 14.10 14.12 1.00 ReductionMinFP16 2048 thrpt 9 3.87 3.87 1.00 8.44 8.45 1.00 14.76 14.70 1.00 Neoverse V2 (UseSVE = 2, max vector length = 16B) Benchmark vectorDim Mode Cnt 8B(old) 8B(new) chg2/chg1 16B(old) 16B(new) chg2/chg1 ReductionMaxFP16 256 thrpt 9 4.77 4.78 1.00 10.00 10.00 1.00 ReductionMaxFP16 512 thrpt 9 3.75 3.74 1.00 11.32 11.33 1.00 ReductionMaxFP16 1024 thrpt 9 3.87 3.86 1.00 9.59 9.59 1.00 ReductionMaxFP16 2048 thrpt 9 3.94 3.94 1.00 8.72 8.71 1.00 ReductionMinFP16 256 thrpt 9 4.77 4.78 1.00 9.97 10.00 1.00 ReductionMinFP16 512 thrpt 9 3.77 3.74 0.99 11.35 11.29 0.99 ReductionMinFP16 1024 thrpt 9 3.86 3.86 1.00 9.56 9.58 1.00 ReductionMinFP16 2048 thrpt 9 3.94 3.94 1.00 8.71 8.71 1.00 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2669419647 From duke at openjdk.org Wed Jan 7 17:43:05 2026 From: duke at openjdk.org (Benjamin Peterson) Date: Wed, 7 Jan 2026 17:43:05 GMT Subject: RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> References: <6A8IBLAgkPPh3xe0UPZKvi95LR8vketGgp_WlVDHUSM=.d91aad65-c9bf-4ce7-9868-6dcd86ac1e0b@github.com> Message-ID: On Mon, 5 Jan 2026 07:37:34 GMT, Emanuel Peter wrote: > In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example. > > Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold. > > Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example. > Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples. > > --------------------------- > > **Details** > > Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop. > > image > > `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive. src/hotspot/share/opto/vectorization.cpp line 238: > 236: // For now, we can only handle slices with a single memory input before the loop, > 237: // so if we find multiple, we bail out of auto vectorization. If this becomes > 238: // too restrictive in the fututure, we could consider tracking multiple inputs. typo "fututure" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29028#discussion_r2669456472 From duke at openjdk.org Wed Jan 7 17:50:06 2026 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 7 Jan 2026 17:50:06 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: Message-ID: <_6Hgz5wXGpIYkofBL-qbExAyU3wAvzbyjiUHFUA4IK8=.40cf0b66-f7ff-458c-86ca-7e997e7c3abf@github.com> On Wed, 7 Jan 2026 16:43:30 GMT, Volodymyr Paprotski wrote: >> I believe the numbers are right: with each pass 256 bytes of coefficients are `parsed` into the parse buffer. This means that half of the coefficients have been processed (`parsedLength` = 128). Would having a comment stating as such address your concerns? > > I wasn't as clear in my question. The asm is indeed processing the bytes in the increment. What I was trying to convince myself about.. 'how come we are not reading past the end of the array. Or are we?'. > > On one hand, this is exactly what the existing asm code does, so I will assume that its correct. However, on the java side/version of this code, I could only convince myself about processing ~two AVX512 vectors at a time, not four. > > So either I cant count, or there is some further (implicit) restrictions on the callers of `twelve2Sixteen` In ML_KEM.java there is this assert (and this is the only call to implKyber12To16() assert ((remainder == 0) || (remainder == 48)) && (index + i * 96 <= condensed.length); implKyber12To16(condensed, index, parsed, parsedLength); and one can check how the callers of twelve2Sixteen() make sure that this is the case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2669490940 From qamai at openjdk.org Wed Jan 7 18:00:44 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 7 Jan 2026 18:00:44 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v8] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28051/files - new: https://git.openjdk.org/jdk/pull/28051/files/5e0917cf..5124698d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=06-07 Stats: 83 lines in 2 files changed: 22 ins; 21 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051 From qamai at openjdk.org Wed Jan 7 18:00:52 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 7 Jan 2026 18:00:52 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v7] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 15:03:44 GMT, Marc Chevalier wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - copyright year >> - Merge branch 'master' into typejoin >> - sort order >> - Merge branch 'master' into typejoin >> - Merge branch 'master' into typejoin >> - Move dual to ASSERT only >> - Keep old version for verification >> - whitespace >> - Reimplement Type::join > > src/hotspot/share/opto/rangeinference.cpp line 700: > >> 698: return int_type_union(t1, t2); >> 699: } else { >> 700: return CT::make_or_top(TypeIntPrototype{{MAX2(t1->_lo, t2->_lo), MIN2(t1->_hi, t2->_hi)}, > > Nothing dramatic, but why is this branch implemented directly, why the other branch is calling another function `int_type_union`. This function seems to be used only here and in testing. If I understand correctly, this branch is meant to go away when the `_is_dual` is removed, but it is (almost) the same as in `int_type_xjoin` (which makes sense). > > I'm not very decided what would be the best, but I'm slightly annoyed they look different. Should we have a `int_type_intersection`? But I don't see what it would do that `int_type_xjoin` doesn't do. Or simply, why do we need `int_type_union`? Once this else-branch removed, wouldn't `int_type_xmeet` simply be a call to `int_type_union`, and so maybe we could avoid this extra-step? > > I'm fine if the answer is "it's part of the future clean up work", but then, I think we could have a tracking ticket and a TODO comment. The asymmetry comes from the fact that a union of 2 non-empty sets is always non-empty, while the intersection may not, and there is no construct for `TypeIntMirror-or-empty` yet. I have at least refactored the `TypeIntPrototype` computation to a common function. > src/hotspot/share/opto/rangeinference.cpp line 710: > >> 708: >> 709: template >> 710: const Type* TypeIntHelper::int_type_xjoin(const CT* t1, const CT* t2) { > > Unlike `int_type_union`, I couldn't find test for `int_type_xjoin`. have I missed something? Is that expected? I agree it's not the the most scary function, but... why not! And it would at least be there for skeleton when adding other abstract domains that might be less obvious. It is a pretty trivial function, and similar to other `meet` and `join`, it is tested against fundamental set operation laws. I don't think there is a test for `int_type_union`, either. It appears in the test file because we want to meet `TypeIntMirror`s there. > src/hotspot/share/opto/type.cpp line 1029: > >> 1027: tty->print("t1 meets t2 = "); mt1->dump(); tty->cr(); >> 1028: tty->print("t2 meets t1 = "); mt2->dump(); tty->cr(); >> 1029: fatal("meet not commutative"); > > I see it was like that before, but I think it's discouraged to have many tty->print to avoid interleaved output of concurrent prints. The preferred solution is to use a stringstream rather than a lock. Also, do we need a flush before the fatal? I've seen some (other) prints that are cut before the end on assert failures. > > Same under. That's a good idea, I have replaced the series of `tty->print` in this function with a `stringStream` > src/hotspot/share/opto/type.cpp line 2401: > >> 2399: >> 2400: //------------------------------meet------------------------------------------- >> 2401: // Compute the MEET of two types. It returns a new Type object. > > Should we clean up these comments? I think we can have it on the base method, but maybe not that useful on each override? Also, it's not a great comment: it doesn't bring much more compared to the signature, it speaks abotu two types, but one of them is `this` and so implicit which is somewhat confusing (it would make more sense on `const Type* Type::xmeet(const Type* t1, const Type* t2)`). The comments says it returns a new Type object, which is ambiguous: these methods can return `this` or `t`, which are not new objects, for most definitions I'd give to this. I think it means that it doesn't mutate `this` or `t`, but actually return the result, but that is clear from the `const`s. > > Some of these comments have a header line of dashes where it's written `xmeet` and some (like this one) only `meet`. > > Overall, I think these comments bring nothing, or possibly confusion, and it would be a good opportunity to get rid of them. Done, I have removed them ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2669527966 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2669521513 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2669511484 PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2669509065 From vpaprotski at openjdk.org Wed Jan 7 18:04:14 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 7 Jan 2026 18:04:14 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: <_6Hgz5wXGpIYkofBL-qbExAyU3wAvzbyjiUHFUA4IK8=.40cf0b66-f7ff-458c-86ca-7e997e7c3abf@github.com> References: <_6Hgz5wXGpIYkofBL-qbExAyU3wAvzbyjiUHFUA4IK8=.40cf0b66-f7ff-458c-86ca-7e997e7c3abf@github.com> Message-ID: On Wed, 7 Jan 2026 17:47:59 GMT, Ferenc Rakoczi wrote: >> I wasn't as clear in my question. The asm is indeed processing the bytes in the increment. What I was trying to convince myself about.. 'how come we are not reading past the end of the array. Or are we?'. >> >> On one hand, this is exactly what the existing asm code does, so I will assume that its correct. However, on the java side/version of this code, I could only convince myself about processing ~two AVX512 vectors at a time, not four. >> >> So either I cant count, or there is some further (implicit) restrictions on the callers of `twelve2Sixteen` > > In ML_KEM.java there is this assert (and this is the only call to implKyber12To16() > > assert ((remainder == 0) || (remainder == 48)) && > (index + i * 96 <= condensed.length); > implKyber12To16(condensed, index, parsed, parsedLength); > > and one can check how the callers of twelve2Sixteen() make sure that this is the case. Yep, thats exactly the assert I was looking at as well.. looks to me like its dividing the 'expanded-short-array-length' by 64 and ensuring the remainder is zero (ignoring the 48 for a bit.. and the condensed-length check). (for simplicity) So the 'expanded' array length should be a multiple of 64; i.e. 128-bytes. But we stride the expanded array by 256 bytes? (and parsedLength by 128-shorts..) I haven't checked the callers of `twelve2Sixteen` but I suspect that the length of the expanded array is always a multiple of 256-bytes (128-shorts).. in which case, the assert is 'incomplete'? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2669535184 From qamai at openjdk.org Wed Jan 7 18:05:55 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 7 Jan 2026 18:05:55 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 09:13:29 GMT, Marc Chevalier wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into typejoin >> - Move dual to ASSERT only >> - Keep old version for verification >> - whitespace >> - Reimplement Type::join > > We probably need associativity, but then, if we have an abstract domain that is a lattice, that should be rather straightforward. If we start having simple posets instead, we can be very sound, but associativity might require more care... @marc-chevalier Thanks for your reviews. I have addressed and responded to your concerns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3720085865 From qamai at openjdk.org Wed Jan 7 18:06:00 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 7 Jan 2026 18:06:00 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v7] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 14:44:35 GMT, Marc Chevalier wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - copyright year >> - Merge branch 'master' into typejoin >> - sort order >> - Merge branch 'master' into typejoin >> - Merge branch 'master' into typejoin >> - Move dual to ASSERT only >> - Keep old version for verification >> - whitespace >> - Reimplement Type::join > > src/hotspot/share/opto/memnode.cpp line 2019: > >> 2017: if (is_mismatched_access()) { >> 2018: return _type; >> 2019: } > > How is that related to the reimplementation of join? For mismatched accesses, the code below may perform `meet` and `join` of unrelated types, such as when we try to `LoadL` from a `byte[]`. The new `meet` and `join` forbid that, and the code below tries to reason about the value at a memory, which is impossible for mismatched accesses anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2669543481 From qamai at openjdk.org Wed Jan 7 18:10:08 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 7 Jan 2026 18:10:08 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v7] In-Reply-To: References: Message-ID: <3xUCngr7Ia4rUEzYTJaXZvQF1AfzMusCfjt1qydWlJE=.69a96499-a02a-4cf6-9cfc-f0dc489fcd2a@github.com> On Wed, 7 Jan 2026 15:19:04 GMT, Marc Chevalier wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - copyright year >> - Merge branch 'master' into typejoin >> - sort order >> - Merge branch 'master' into typejoin >> - Merge branch 'master' into typejoin >> - Move dual to ASSERT only >> - Keep old version for verification >> - whitespace >> - Reimplement Type::join > > src/hotspot/share/opto/subnode.cpp line 2014: > >> 2012: const Type* in_type = phase->type(in1); >> 2013: if ((in_type->isa_int() && in_type->is_int()->_lo >= 0) || >> 2014: (in_type->isa_long() && in_type->is_long()->_lo >= 0)) { > > I'm a bit annoyed here because we look quite a lot inside the details of the implementation of the types. I'm not fan of the previous situation either, as at most one side of the `||` would make sense. > > For instance, if we want to make use of bitwise information for this, we could look at the highest bit (that is a sign bit in two complement). That is just an example, we could find other tricks to conclude that an abstract int is non-negative that may not involve ranges (or not only ranges), and I find unfortunate to look inside the type, rather than the type telling. I think the usual approach would be to check that the guard/intersection (depending on the formalism) with the negative numbers is empty (or with non-negative numbers is the same), but we quickly have again the problem that `in_type` can be either int or long and a lot of ways to write that would need to split cases. Maybe an approximation, would be to have `TypeInt::is_non_negative` and alike not to overengineer guarding with arbitrary expression, but still limit the scope of who looks into the implementation? But checking the bit is also looking into the details of the types, right? Trying to answer whether a value can be negative with `_lo >= 0` seems to be the most straightforward approach to me. Anyway, the internal details of `TypeInt` are designed to be exposed, and we use them all the time. The methods act more as utilities than encapsulations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2669566956 From qamai at openjdk.org Wed Jan 7 18:35:46 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 7 Jan 2026 18:35:46 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 18:02:31 GMT, Quan Anh Mai wrote: >> We probably need associativity, but then, if we have an abstract domain that is a lattice, that should be rather straightforward. If we start having simple posets instead, we can be very sound, but associativity might require more care... > > @marc-chevalier Thanks for your reviews. I have addressed and responded to your concerns. > Thanks @merykitty. I do hope I am not misunderstanding. I would argue that in theory, empty is correct here and null is wrong, but in practice I don't think it matters except for dead code. Can you come up with an example where the difference matters? C2 only remembers one Klass, so it makes sense to use LCA(Klass1,Klass2) for meet(Klass1,Klass) in the summarized result, even though it loses information. Something that looks like this: Object o; Integer i = (Integer) o; Float f = (Float) f; // C2 may wrongly assume it is dead code here, the reason we do not encounter this is that `GraphKit::gen_checkcast` does the null path and the non-null path separately, so we end up with `Phi(null, non-null Float)` and since the `non-null Float` is dead it is left with `null` ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3720195876 From bulasevich at openjdk.org Wed Jan 7 18:41:42 2026 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 7 Jan 2026 18:41:42 GMT Subject: RFR: 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap [v3] In-Reply-To: <0ZP63PHTbXgTcztA8wpWQd3Zj7YzLkOW9udimgYmSTs=.94d58938-cd00-4e85-80c1-2ca8b610afac@github.com> References: <0ZP63PHTbXgTcztA8wpWQd3Zj7YzLkOW9udimgYmSTs=.94d58938-cd00-4e85-80c1-2ca8b610afac@github.com> Message-ID: On Tue, 6 Jan 2026 14:16:42 GMT, Tobias Hartmann wrote: > Is this related to [JDK-8243615](https://bugs.openjdk.org/browse/JDK-8243615)? Oh, yes - this is related, and we had a similar fix five years ago.. @wzhuo > Could you convert your `UnstableIf.java` test to a jtreg test? Done. I converted it to a jtreg test. I?m skipping the heavyweight part (200+ lines of code) that reproduces the issue without changing the PerMethodRecompilationCutoff limit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28966#issuecomment-3720217553 From duke at openjdk.org Wed Jan 7 19:03:03 2026 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 7 Jan 2026 19:03:03 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2] In-Reply-To: References: <_6Hgz5wXGpIYkofBL-qbExAyU3wAvzbyjiUHFUA4IK8=.40cf0b66-f7ff-458c-86ca-7e997e7c3abf@github.com> Message-ID: On Wed, 7 Jan 2026 17:59:52 GMT, Volodymyr Paprotski wrote: >> In ML_KEM.java there is this assert (and this is the only call to implKyber12To16() >> >> assert ((remainder == 0) || (remainder == 48)) && >> (index + i * 96 <= condensed.length); >> implKyber12To16(condensed, index, parsed, parsedLength); >> >> and one can check how the callers of twelve2Sixteen() make sure that this is the case. > > Yep, thats exactly the assert I was looking at as well.. looks to me like its dividing the 'expanded-short-array-length' by 64 and ensuring the remainder is zero (ignoring the 48 for a bit.. and the condensed-length check). > > (for simplicity) So the 'expanded' array length should be a multiple of 64; i.e. 128-bytes. But we stride the expanded array by 256 bytes? (and parsedLength by 128-shorts..) > > I haven't checked the callers of `twelve2Sixteen` but I suspect that the length of the expanded array is always a multiple of 256-bytes (128-shorts).. in which case, the assert is 'incomplete'? Oooops, yes, the assert and the comment on twelve2sixteen() should be fixed. All of the calls are processing 192 or 384 bytes (and producing 128 or 256 shorts). The comment and assert belonged to an earlier version and were not updated when I changed my mind about the implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2669726088 From vlivanov at openjdk.org Wed Jan 7 19:16:17 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 7 Jan 2026 19:16:17 GMT Subject: RFR: 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap [v3] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 17:28:14 GMT, Boris Ulasevich wrote: >> We observed a deoptimization storm caused by GraphKit::uncommon_trap generator logic. GraphKit::uncommon_trap considers the too_many_recompiles metric. If the threshold is overflowed, it replaces Deoptimization::Action_reinterpret with Deoptimization::Action_none (see code snippet below). >> >> This replacement changes the uncommon_trap logic: once execution hits a trap, the VM performs deoptimization but does not recompile the method anymore. In an "unlucky" case, when the code part calling this uncommon_trap becomes frequent, a deoptimization storm occurs (thousands of deoptimizations per second) causing a significant performance drop. >> >> The original problematic method, which triggered repeated recompilations, is a high-performance compressed binary serialization algorithm with heavy use of conditional branches driven by bitmasks. See a standalone [synthetic benchmark](https://bugs.openjdk.org/secure/attachment/118045/UnstableIf.java) to reproduce the issue. >> >> The issue arises when the method overcomes a global recompilation threshold before stabilizing specific trap counters. >> >> Current thresholds: >> - Recompilation Limit (too_many_recompiles): >> Condition: decompile_count() >= (PerMethodRecompilationCutoff / 2) + 1 >> Default: 201 (derived from default PerMethodRecompilationCutoff = 400). >> - Specific Trap Limits (too_many_traps): >> Checks if the trap count for a specific reason exceeds: >> PerMethodTrapLimit (Default: 100) - for Reason_unstable_if, Reason_unstable_fused_if, etc. >> PerMethodSpecTrapLimit (Default: 5000) - for Reason_speculate_class_check, Reason_speculate_null_check, etc. >> >> With the gived defaults, if the only reason for the method recompilation is unstable_if, the system stabilizes after 100 traps (PerMethodTrapLimit). However, if the method experiences traps and recompilations for different reasons, the total number of recompilations can exceed 200 before hitting the limit for unstable_if traps. This triggers Action_none and causes the deopt storm. >> >> The proposal is a minimal change in GraphKit::uncommon_trap: apply the same `too_many_recompiles` threshold inside `Parse::path_is_suitable_for_uncommon_trap` - this ensures that on the final recompilation C2 gets a hint not to speculate on untaken branches anymore. >> >> As an alternative solution, we can revisit GraphKit::uncommon_trap. This "Temporary fix" has persisted in the codebase for 17 years, so it is probably time to change it as well. Any comments are we... > > Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > using too_many_traps_or_recompiles. adding DeoptStorm jtreg test I believe all places where an uncommon trap with `Action_reinterpret` guarded by `too_many_traps` is susceptible to the very same problem. The culprit seems to be the discrepancy between `too_many_traps` and `too_many_recompiles` where many places where uncommon traps are inserted are guarded by `too_many_traps` while `GraphKit::uncommon_trap()` checks specifically for `too_many_recompiles`. As the bug demonstrates, disabling recompilation while keeping the uncommon trap in place (substituting `Action_maybe_recompile`/`Action_maybe_recompile` with `Action_none`) can induce a lot of overhead. So, a better strategy is to avoid an uncommon trap in the first place rather than letting it to degenerate into `Action_none` and, also, assert whenever the situation occurs at runtime. Speaking of the proposed fix, my concern is that it addresses only one particular instance of the problem. Can we do better and fix similar bugs all at once? That would require aligning `too_many_traps` and `too_many_recompiles` use sites. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28966#issuecomment-3720372003 From vlivanov at openjdk.org Wed Jan 7 19:20:28 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 7 Jan 2026 19:20:28 GMT Subject: RFR: 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap [v3] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 17:28:14 GMT, Boris Ulasevich wrote: >> We observed a deoptimization storm caused by GraphKit::uncommon_trap generator logic. GraphKit::uncommon_trap considers the too_many_recompiles metric. If the threshold is overflowed, it replaces Deoptimization::Action_reinterpret with Deoptimization::Action_none (see code snippet below). >> >> This replacement changes the uncommon_trap logic: once execution hits a trap, the VM performs deoptimization but does not recompile the method anymore. In an "unlucky" case, when the code part calling this uncommon_trap becomes frequent, a deoptimization storm occurs (thousands of deoptimizations per second) causing a significant performance drop. >> >> The original problematic method, which triggered repeated recompilations, is a high-performance compressed binary serialization algorithm with heavy use of conditional branches driven by bitmasks. See a standalone [synthetic benchmark](https://bugs.openjdk.org/secure/attachment/118045/UnstableIf.java) to reproduce the issue. >> >> The issue arises when the method overcomes a global recompilation threshold before stabilizing specific trap counters. >> >> Current thresholds: >> - Recompilation Limit (too_many_recompiles): >> Condition: decompile_count() >= (PerMethodRecompilationCutoff / 2) + 1 >> Default: 201 (derived from default PerMethodRecompilationCutoff = 400). >> - Specific Trap Limits (too_many_traps): >> Checks if the trap count for a specific reason exceeds: >> PerMethodTrapLimit (Default: 100) - for Reason_unstable_if, Reason_unstable_fused_if, etc. >> PerMethodSpecTrapLimit (Default: 5000) - for Reason_speculate_class_check, Reason_speculate_null_check, etc. >> >> With the gived defaults, if the only reason for the method recompilation is unstable_if, the system stabilizes after 100 traps (PerMethodTrapLimit). However, if the method experiences traps and recompilations for different reasons, the total number of recompilations can exceed 200 before hitting the limit for unstable_if traps. This triggers Action_none and causes the deopt storm. >> >> The proposal is a minimal change in GraphKit::uncommon_trap: apply the same `too_many_recompiles` threshold inside `Parse::path_is_suitable_for_uncommon_trap` - this ensures that on the final recompilation C2 gets a hint not to speculate on untaken branches anymore. >> >> As an alternative solution, we can revisit GraphKit::uncommon_trap. This "Temporary fix" has persisted in the codebase for 17 years, so it is probably time to change it as well. Any comments are we... > > Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > using too_many_traps_or_recompiles. adding DeoptStorm jtreg test BTW [JDK-6529811](https://bugs.openjdk.org/browse/JDK-6529811) did not introduce the heuristic in `GraphKit::uncommon_trap()`. The code predates OpenJDK. JDK-6529811 mentions an alternative way to fix the pathological behavior: 5. The Action_none bailout is dangerous. GraphKit::uncommon_trap should bail out to Action_make_not_compilable. That way the log will print an interesting failure event, and performance will degrade into the interpreter, which is faster than the deoptimizer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28966#issuecomment-3720391303 From vlivanov at openjdk.org Wed Jan 7 19:58:39 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 7 Jan 2026 19:58:39 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v4] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 18:31:17 GMT, Quan Anh Mai wrote: >> @marc-chevalier Thanks for your reviews. I have addressed and responded to your concerns. > >> Thanks @merykitty. I do hope I am not misunderstanding. I would argue that in theory, empty is correct here and null is wrong, but in practice I don't think it matters except for dead code. Can you come up with an example where the difference matters? > C2 only remembers one Klass, so it makes sense to use LCA(Klass1,Klass2) for meet(Klass1,Klass) in the summarized result, even though it loses information. > > Something that looks like this: > > Object o; > Integer i = (Integer) o; > Float f = (Float) f; > // C2 may wrongly assume it is dead code here, the reason we do not encounter this is that `GraphKit::gen_checkcast` does the null path and the non-null path separately, so we end up with `Phi(null, non-null Float)` and since the `non-null Float` is dead it is left with `null` I haven't looked at the PR in details, but I have a data point on `Type::join()` as it is implemented now: C2 tracks interface types these days and while experimenting with JDK-8373633 I tried to filter receiver type and narrow it using a context interface (think of `(I1 & I2 & I3 & I4).filter(I1) == (I1 & I2)` where `I2 <: I1` and `I3`/`I4` are unrelated. But the current implementation doesn't have any effect and leaves the original type intact (`(I1 & I2 & I3 & I4).filter(I1) == (I1 & I2 & I3 & I4)`). As a result of `Type::join()` and `Type::dual()` decoupling we can improve `Type::join` and make results more accurate for oop types (both interfaces and classes). I'm glad to see we are moving in that direction. Good work, @merykitty. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28051#issuecomment-3720520666 From psandoz at openjdk.org Wed Jan 7 20:25:31 2026 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 7 Jan 2026 20:25:31 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v9] In-Reply-To: References: Message-ID: <3PzPbEnPapV-B3OenjmG6paXsyLFayh33S-f0IBI-LY=.773757f6-c9e0-48ac-b89d-aa81fd6b47f8@github.com> On Wed, 17 Dec 2025 12:56:01 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Fix from Bhavana Kilambi for failing JTREG regressions on AARCH64 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Including test changes from Bhavana Kilambi (ARM) > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Optimizing tail handling > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Cleanups > - Fix failing jtreg test in CI > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Cleanups > - ... and 13 more: https://git.openjdk.org/jdk/compare/5e7ae281...703f313d Just some quick comments for now. I think this is better heading in the right direction. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractSpecies.java line 436: > 434: } else { > 435: assert(Float16.valueOf(i).intValue() == i); > 436: } It would be clearer if the same pattern is copied as for the other types. Assign and assert, no need to check bounds. We don't need to be performant here. (This code may become even clearer when we can leverage patterns on the primitive types and custom numeric types.) src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 3083: > 3081: * @return a {@code Float16Vector} with the same shape and information content > 3082: */ > 3083: public abstract Float16Vector reinterpretAsFloat16s(); At some point we should consider consolidating these methods into one which accepts the lane element type as an argument. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorShape.java line 277: > 275: if (etype == Float16.class) { > 276: etype = short.class; > 277: } My suggestion may not worth it, but i was wondering if we could get the lane type and then use the carrier type, rather then encoding this more specifically. ------------- PR Review: https://git.openjdk.org/jdk/pull/28002#pullrequestreview-3636482293 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2669808367 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2669818576 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2669827315 From duke at openjdk.org Wed Jan 7 23:28:33 2026 From: duke at openjdk.org (duke) Date: Wed, 7 Jan 2026 23:28:33 GMT Subject: Withdrawn: 8367706: Remove redundant register used by cmove in C1 LIR generation In-Reply-To: References: Message-ID: On Tue, 16 Sep 2025 09:35:03 GMT, lusou-zhangquan wrote: > This PR removes redundant temp register used by cmove in C1 LIRGenerator::do_LookupSwitch and LIRGenerator::do_TableSwitch. The issue [8367706](https://bugs.openjdk.org/browse/JDK-8367706) is reported by me and it's my pleasure to fix it. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/27307 From duke at openjdk.org Thu Jan 8 00:24:11 2026 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 8 Jan 2026 00:24:11 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v3] In-Reply-To: References: Message-ID: > This change allows use of the AVX512_VBMI/VMBI2 instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.3 to 0.6%, encapsulation is 0.4 to 1.7%, and decapsulation is 0.1 to 0.9%. > > Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI Change Swap to Dup named function/variable Check for only VBMI support (not VBMI2) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28815/files - new: https://git.openjdk.org/jdk/pull/28815/files/7cd8de53..4af75963 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28815/head:pull/28815 PR: https://git.openjdk.org/jdk/pull/28815 From jkarthikeyan at openjdk.org Thu Jan 8 04:03:48 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 8 Jan 2026 04:03:48 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v6] In-Reply-To: References: Message-ID: > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Revert "Use Xcomp test run instead of Warmup(0)" This reverts commit 50bc132676e5a1276bdf2c236ae57873375a773d. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26827/files - new: https://git.openjdk.org/jdk/pull/26827/files/50bc1326..0b445a5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=04-05 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26827/head:pull/26827 PR: https://git.openjdk.org/jdk/pull/26827 From jkarthikeyan at openjdk.org Thu Jan 8 04:03:49 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 8 Jan 2026 04:03:49 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v5] In-Reply-To: References: <3L4a5JCekql7Q6DsQiS886cJUT_A3VOudur5kqwXDko=.f99d0326-c9fa-4156-8cae-2cd54ae1b3d0@github.com> Message-ID: On Wed, 7 Jan 2026 10:14:05 GMT, Tobias Hartmann wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Use Xcomp test run instead of Warmup(0) > > I see timeouts with the test in various configurations, maybe the default timeout should be increased? > > > compiler.lib.ir_framework.driver.TestVMException: There were one or multiple errors. Please check stderr for more information. > at compiler.lib.ir_framework.driver.TestVMProcess.throwTestVMException(TestVMProcess.java:251) > at compiler.lib.ir_framework.driver.TestVMProcess.checkTestVMExitCode(TestVMProcess.java:232) > at compiler.lib.ir_framework.driver.TestVMProcess.(TestVMProcess.java:77) > at compiler.lib.ir_framework.TestFramework.runTestVM(TestFramework.java:879) > at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:839) > at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:431) > at compiler.lib.ir_framework.TestFramework.runWithFlags(TestFramework.java:257) > at compiler.vectorization.TestSubwordTruncation.main(TestSubwordTruncation.java:509) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) > at java.base/java.lang.Thread.run(Thread.java:1516) > > JavaTest Message: Test threw exception: compiler.lib.ir_framework.driver.TestVMException > JavaTest Message: shutting down test > > result: Error. "driver" action timed out with a timeout of 480 seconds on agent 136; but completed after timeout - suppressed status: "Failed. `main' threw exception: compiler.lib.ir_framework.driver.TestVMException: There were one or multiple errors. Please check stderr for more information." > > > test result: Error. "driver" action timed out with a timeout of 480 seconds on agent 136; but completed after timeout - suppressed status: "Failed. `main' threw exception: compiler.lib.ir_framework.driver.TestVMException: There were one or multiple errors. Please check stderr for more information." Thank you for running testing @TobiHartmann! I've pushed a commit that reverts the change made to run with `-Xcomp` directly, so hopefully the timeout is prevented. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3721777830 From jkarthikeyan at openjdk.org Thu Jan 8 04:03:50 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 8 Jan 2026 04:03:50 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> <8jnY6pqofieRIfV5fCqFxvHZMF3nAZbh7yAD7C_G5FU=.a12c98f6-c715-43f5-9528-62fcfdfc6e59@github.com> Message-ID: <8lQeYpLJeew0hmhS9FTgLe6faFvQk5u0ZmXLGpJ54O4=.5cd6e3b9-cea0-40f2-84b7-2566bf2cc87c@github.com> On Wed, 7 Jan 2026 13:32:34 GMT, Christian Hagedorn wrote: >> I've pushed a commit that changes the Warmup(0) to the second test run. > > Given the timeout reported by Tobias, I would rather opt for `@Warmup(0)` with one `run()` only. We will eventually run the test with `-Xcomp` on higher tiers, for example at tier6, so we get `Xcomp` coverage at some point. I think that's a good point, I've reverted this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2670769891 From duke at openjdk.org Thu Jan 8 05:17:19 2026 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 8 Jan 2026 05:17:19 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v2] In-Reply-To: References: <_6Hgz5wXGpIYkofBL-qbExAyU3wAvzbyjiUHFUA4IK8=.40cf0b66-f7ff-458c-86ca-7e997e7c3abf@github.com> Message-ID: On Wed, 7 Jan 2026 18:59:16 GMT, Ferenc Rakoczi wrote: >> Yep, thats exactly the assert I was looking at as well.. looks to me like its dividing the 'expanded-short-array-length' by 64 and ensuring the remainder is zero (ignoring the 48 for a bit.. and the condensed-length check). >> >> (for simplicity) So the 'expanded' array length should be a multiple of 64; i.e. 128-bytes. But we stride the expanded array by 256 bytes? (and parsedLength by 128-shorts..) >> >> I haven't checked the callers of `twelve2Sixteen` but I suspect that the length of the expanded array is always a multiple of 256-bytes (128-shorts).. in which case, the assert is 'incomplete'? > > Oooops, yes, the assert and the comment on twelve2sixteen() should be fixed. All of the calls are processing 192 or 384 bytes (and producing 128 or 256 shorts). The comment and assert belonged to an earlier version and were not updated when I changed my mind about the implementation. I've filed bug: [JDK-8374755](https://bugs.openjdk.org/browse/JDK-8374755) ML-KEM's 12-bit decompression uses incorrect assertions to track this issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2670893072 From xgong at openjdk.org Thu Jan 8 06:04:09 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 8 Jan 2026 06:04:09 GMT Subject: RFR: 8373344: Add support for min/max reduction operations for Float16 [v2] In-Reply-To: References: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> <72RiEFo5wji1vOtIRzFwO03_0OsaCe2zzHjp9YPD8-k=.6a88f4eb-3fff-4fe5-bb43-260d81b7954a@github.com> Message-ID: <5gzL6qDaWABljSVUvMLm1oyMgTXwrcjDv1h9ZQH6_DA=.5b284a21-fa15-4c2b-8517-665a6ae5a4c6@github.com> On Wed, 7 Jan 2026 17:33:42 GMT, Yi Wu wrote: >You mean move it down, like Op_AddReductionVI and Op_AddReductionVL to use return !VM_Version::use_neon_for_vector(length_in_bytes);? Yes, that was what I mean. > It doesn't to make much of a difference. So what does `8B/16B/32B` mean? I guess it means the real vector size of the reduction operation? But how did you test these cases, as I noticed the code of benchmarks do not have any parallelization differences. Is the vectorization factor decided by using different `MaxVectorSize` vm option ? If so, then I think the partial cases are not touched. Could you please check whether instruction of `VectorMaskGenNode` is generated from the generated code? I assume there should be difference, because for partial cases (vector_size < MaxVectorSize), it uses the SVE predicated instructions before, while it uses NEON instructions after. And the instruction latency/throughput of SVE reduction are much worse than NEON ones. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2670981173 From thartmann at openjdk.org Thu Jan 8 07:05:35 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Jan 2026 07:05:35 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v6] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 04:03:48 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Use Xcomp test run instead of Warmup(0)" > > This reverts commit 50bc132676e5a1276bdf2c236ae57873375a773d. Sounds good, I re-submitted testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3722377780 From dfenacci at openjdk.org Thu Jan 8 07:28:27 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 8 Jan 2026 07:28:27 GMT Subject: [jdk26] RFR: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check In-Reply-To: References: <6c3aoRq3DcmnhdnhUWP9SLRnBbhJ1o8DwVVW5zArqqM=.537ab223-d9af-41f6-a4d8-ee29d0fbfa55@github.com> Message-ID: <9zkeOHB-VI6OhJK6i6NTQkB7BUzQ1rHpZChfCDeNdHU=.77f6a6e6-7957-4e38-91c1-d96f4fdaf5b8@github.com> On Wed, 7 Jan 2026 10:15:26 GMT, Tobias Hartmann wrote: >> Hi all, >> >> This pull request contains a backport of commit [c1c0ac87](https://github.com/openjdk/jdk/commit/c1c0ac877033c3edb0c2681c2c5f825be8adcfb3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Damon Fenacci on 7 Jan 2026 and was reviewed by Vladimir Ivanov, Christian Hagedorn and Tobias Hartmann. >> >> Thanks! > > Looks good and trivial. Thanks for your review @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29079#issuecomment-3722454855 From dfenacci at openjdk.org Thu Jan 8 07:28:28 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 8 Jan 2026 07:28:28 GMT Subject: [jdk26] Integrated: 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check In-Reply-To: <6c3aoRq3DcmnhdnhUWP9SLRnBbhJ1o8DwVVW5zArqqM=.537ab223-d9af-41f6-a4d8-ee29d0fbfa55@github.com> References: <6c3aoRq3DcmnhdnhUWP9SLRnBbhJ1o8DwVVW5zArqqM=.537ab223-d9af-41f6-a4d8-ee29d0fbfa55@github.com> Message-ID: On Wed, 7 Jan 2026 08:17:55 GMT, Damon Fenacci wrote: > Hi all, > > This pull request contains a backport of commit [c1c0ac87](https://github.com/openjdk/jdk/commit/c1c0ac877033c3edb0c2681c2c5f825be8adcfb3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Damon Fenacci on 7 Jan 2026 and was reviewed by Vladimir Ivanov, Christian Hagedorn and Tobias Hartmann. > > Thanks! This pull request has now been integrated. Changeset: 5964a12a Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/5964a12adc186be17cb34ab8032b9d07caae1551 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8342772: Assert in LateInlineMHCallGenerator::do_late_inline_check Reviewed-by: thartmann Backport-of: c1c0ac877033c3edb0c2681c2c5f825be8adcfb3 ------------- PR: https://git.openjdk.org/jdk/pull/29079 From mchevalier at openjdk.org Thu Jan 8 07:49:35 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 8 Jan 2026 07:49:35 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v7] In-Reply-To: <3xUCngr7Ia4rUEzYTJaXZvQF1AfzMusCfjt1qydWlJE=.69a96499-a02a-4cf6-9cfc-f0dc489fcd2a@github.com> References: <3xUCngr7Ia4rUEzYTJaXZvQF1AfzMusCfjt1qydWlJE=.69a96499-a02a-4cf6-9cfc-f0dc489fcd2a@github.com> Message-ID: <5vqLsR213YqedBOeImZLEoqWkr3gSKkEmqQYBtuCFY8=.9c974619-917f-435c-b6e3-4146de76d600@github.com> On Wed, 7 Jan 2026 18:06:29 GMT, Quan Anh Mai wrote: > But checking the bit is also looking into the details of the types, right? yes, and if it's done in `AbsNode`, I wouldn't be happy either. But if it's done in `TypeInt`, it's fine. > Anyway, the internal details of TypeInt are designed to be exposed, and we use them all the time. The methods act more as utilities than encapsulations. That's exactly what I'm challenging. I think adding some new abstract domains and make use of them would be easier if we don't look inside the details all the time. In particular, we are rarely doing very specialized things, mostly abstract integer with a value, or abstract integers together... I think looking at the bounds of ranges everywhere makes improvements to the type system harder, and the local code less clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2671257676 From chagedorn at openjdk.org Thu Jan 8 08:01:11 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 Jan 2026 08:01:11 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v6] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 04:03:48 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Use Xcomp test run instead of Warmup(0)" > > This reverts commit 50bc132676e5a1276bdf2c236ae57873375a773d. Looks good, thanks for the update! Let's wait for the testing to be completed. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26827#pullrequestreview-3638228406 From mhaessig at openjdk.org Thu Jan 8 08:08:20 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 8 Jan 2026 08:08:20 GMT Subject: RFR: 8374528: C2 SuperWord: TestAliasingFuzzer.java strengthen no-multiversioning IR rule In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 13:09:36 GMT, Emanuel Peter wrote: > Now that some of the related issues are fixed, we should strengthen the IR rule for multiversioning. > > I ran the fuzzer test many times on many platforms. That is not proof that we won't find a failing case in the CI within a week. But it also is not helpful to run more tests before integration. If there are failures, we can just backout this change and address the broken example at that point. Looks good and I agree strengthening is a good idea. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/29036#pullrequestreview-3638261062 From qamai at openjdk.org Thu Jan 8 08:17:48 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 8 Jan 2026 08:17:48 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v7] In-Reply-To: <5vqLsR213YqedBOeImZLEoqWkr3gSKkEmqQYBtuCFY8=.9c974619-917f-435c-b6e3-4146de76d600@github.com> References: <3xUCngr7Ia4rUEzYTJaXZvQF1AfzMusCfjt1qydWlJE=.69a96499-a02a-4cf6-9cfc-f0dc489fcd2a@github.com> <5vqLsR213YqedBOeImZLEoqWkr3gSKkEmqQYBtuCFY8=.9c974619-917f-435c-b6e3-4146de76d600@github.com> Message-ID: On Thu, 8 Jan 2026 07:46:34 GMT, Marc Chevalier wrote: >> But checking the bit is also looking into the details of the types, right? Trying to answer whether a value can be negative with `_lo >= 0` seems to be the most straightforward approach to me. Anyway, the internal details of `TypeInt` are designed to be exposed, and we use them all the time. The methods act more as utilities than encapsulations. > >> But checking the bit is also looking into the details of the types, right? > > yes, and if it's done in `AbsNode`, I wouldn't be happy either. But if it's done in `TypeInt`, it's fine. > >> Anyway, the internal details of TypeInt are designed to be exposed, and we use them all the time. The methods act more as utilities than encapsulations. > > That's exactly what I'm challenging. I think adding some new abstract domains and make use of them would be easier if we don't look inside the details all the time. In particular, we are rarely doing very specialized things, mostly abstract integer with a value, or abstract integers together... I think looking at the bounds of ranges everywhere makes improvements to the type system harder, and the local code less clear. I see, can we leave it for a future cleanup? I see there are around 15 places where we do `_lo >= 0`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2671358540 From mhaessig at openjdk.org Thu Jan 8 08:19:26 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 8 Jan 2026 08:19:26 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 09:18:32 GMT, Xiaohong Gong wrote: > ### Problem: > > Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: > > > // A fatal error has been detected by the Java Runtime Environment: > // > // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 > // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector > // ... > > > The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: > > https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 > > ### Root Cause: > > The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. > > Here is the simplified ideal graph showing the crash scenario: > > > Con #top > | ConI > \ / > \ / > VectorStoreMask > | > VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong > > > ### Detailed Scenario: > > Following is the method in the test case that hits the assertion: > > https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 > > This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. > > When compiling a specific test case such as: > https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 > > the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: > > > VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() > / \ > AddP \ > | \ > LoadNClass \ > ConP #IntMaxMask | | > \ | | > \ DecodeNClass | > \ / | > \ / | > CmpP ... A drive-by comment on the reproducibility: - Does this only reproduce for specific hardware features or on all relatively new vector instruction sets? - Have you tried to reproduce this using the `StressSeed` flag? In the hs-error file you should find it with all the hotspot flags and rerunning the test with that seed often leads to a reproducible failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29057#issuecomment-3722723846 From xgong at openjdk.org Thu Jan 8 08:36:28 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 8 Jan 2026 08:36:28 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: <9T5JoGJnjYUMJiPNOGfQOmlNjbtqv30YqfAraq2Egx0=.e3ed686d-17b7-46da-a3c4-6799b22e0540@github.com> On Thu, 8 Jan 2026 08:16:04 GMT, Manuel H?ssig wrote: > A drive-by comment on the reproducibility: > > * Does this only reproduce for specific hardware features or on all relatively new vector instruction sets? Thanks for looking at this PR! This can be reproduced on hardwares that 1) support vector api well in backend, 2) do not support predicate features like AVX-512 and RVV. > * Have you tried to reproduce this using the `StressSeed` flag? In the hs-error file you should find it with all the hotspot flags and rerunning the test with that seed often leads to a reproducible failure. Yes, I can reproduce this issue with all the stress flags reported in the hs-error file, but limited to the existing test case and the failure still happens randomly. I tried with the failure seed, but I failed to reproduce with it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29057#issuecomment-3722785955 From epeter at openjdk.org Thu Jan 8 08:36:38 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jan 2026 08:36:38 GMT Subject: RFR: 8374528: C2 SuperWord: TestAliasingFuzzer.java strengthen no-multiversioning IR rule In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 10:35:55 GMT, Christian Hagedorn wrote: >> Now that some of the related issues are fixed, we should strengthen the IR rule for multiversioning. >> >> I ran the fuzzer test many times on many platforms. That is not proof that we won't find a failing case in the CI within a week. But it also is not helpful to run more tests before integration. If there are failures, we can just backout this change and address the broken example at that point. > > Looks good! > >> But it also is not helpful to run more tests before integration. If there are failures, we can just backout this change and address the broken example at that point. > > Absolutely, I agree with that. @chhagedorn @mhaessig Thanks for the reviews! One more step towards reducing "brittleness" of the auto vectorizer. More to come. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29036#issuecomment-3722776225 From epeter at openjdk.org Thu Jan 8 08:36:39 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jan 2026 08:36:39 GMT Subject: Integrated: 8374528: C2 SuperWord: TestAliasingFuzzer.java strengthen no-multiversioning IR rule In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 13:09:36 GMT, Emanuel Peter wrote: > Now that some of the related issues are fixed, we should strengthen the IR rule for multiversioning. > > I ran the fuzzer test many times on many platforms. That is not proof that we won't find a failing case in the CI within a week. But it also is not helpful to run more tests before integration. If there are failures, we can just backout this change and address the broken example at that point. This pull request has now been integrated. Changeset: a71326a0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/a71326a0e2660158fdb85282da4b59ce61c66ee3 Stats: 30 lines in 1 file changed: 0 ins; 15 del; 15 mod 8374528: C2 SuperWord: TestAliasingFuzzer.java strengthen no-multiversioning IR rule Reviewed-by: chagedorn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/29036 From mchevalier at openjdk.org Thu Jan 8 08:46:47 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 8 Jan 2026 08:46:47 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v7] In-Reply-To: References: <3xUCngr7Ia4rUEzYTJaXZvQF1AfzMusCfjt1qydWlJE=.69a96499-a02a-4cf6-9cfc-f0dc489fcd2a@github.com> <5vqLsR213YqedBOeImZLEoqWkr3gSKkEmqQYBtuCFY8=.9c974619-917f-435c-b6e3-4146de76d600@github.com> Message-ID: On Thu, 8 Jan 2026 08:15:20 GMT, Quan Anh Mai wrote: >>> But checking the bit is also looking into the details of the types, right? >> >> yes, and if it's done in `AbsNode`, I wouldn't be happy either. But if it's done in `TypeInt`, it's fine. >> >>> Anyway, the internal details of TypeInt are designed to be exposed, and we use them all the time. The methods act more as utilities than encapsulations. >> >> That's exactly what I'm challenging. I think adding some new abstract domains and make use of them would be easier if we don't look inside the details all the time. In particular, we are rarely doing very specialized things, mostly abstract integer with a value, or abstract integers together... I think looking at the bounds of ranges everywhere makes improvements to the type system harder, and the local code less clear. > > I see, can we leave it for a future cleanup? I see there are around 15 places where we do `_lo >= 0`. Fine with me! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2671440987 From qamai at openjdk.org Thu Jan 8 08:52:32 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 8 Jan 2026 08:52:32 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v3] In-Reply-To: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: > Hi, > > The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: > > t1 = int:0 > t2 = int:-2..3, widen = 3 > > Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. > > The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into widen - copyright year - Merge branch 'master' into widen - RangeInference::infer should ensure correct value of _widen ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28952/files - new: https://git.openjdk.org/jdk/pull/28952/files/2fb0af13..ecee9cff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28952&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28952&range=01-02 Stats: 18971 lines in 1438 files changed: 3853 ins; 2027 del; 13091 mod Patch: https://git.openjdk.org/jdk/pull/28952.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28952/head:pull/28952 PR: https://git.openjdk.org/jdk/pull/28952 From mchevalier at openjdk.org Thu Jan 8 09:33:42 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 8 Jan 2026 09:33:42 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order Message-ID: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). Thanks, Marc ------------- Commit messages: - shuffle Changes: https://git.openjdk.org/jdk/pull/29110/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29110&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374622 Stats: 33 lines in 2 files changed: 24 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29110/head:pull/29110 PR: https://git.openjdk.org/jdk/pull/29110 From mhaessig at openjdk.org Thu Jan 8 10:02:44 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 8 Jan 2026 10:02:44 GMT Subject: RFR: 8373555: C2: Optimize redundant input calculations for sign comparisons [v2] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 08:03:11 GMT, Hannes Greule wrote: >> Instead of sign-comparisons with And,Or,Xor,Max,Min nodes, we can directly compare to one of the inputs of the binary nodes if the other input is irrelevant to the comparison. >> >> There are potentially more operations, but these mentioned here are the most obvious ones. Max and Min could theoretically be expanded to arbitrary comparisons to constants, but I didn't want to introduce more complexity for now. >> >> Please let me know what you think :) > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > update copyright year Thank you for this neat improvement, @SirYwell! The logic good, but I have some suggestions below. Out of curiosity: have you thought about writing the test using the Template Framework to maybe get away with a less repetitive test? Also, I started some testing, since the logic looks sound. src/hotspot/share/opto/subnode.cpp line 1546: > 1544: // For comparisons of the form op(a, b) < 0 or op(a, b) >= 0, > 1545: // it might be enough to compare a < 0 or a >= 0 (or b < 0 or b >= 0) instead. > 1546: // As a special case, xor requires negating the test Could you please point out that the trick is to do with the sign bit so the next person does not have to derive it? Perhaps Suggestion: // For comparisons of the form op(a, b) < 0 or op(a, b) >= 0, // it might be enough to compare a < 0 or a >= 0 (or b < 0 or b >= 0) instead. // Consider a & b >= 0 for example: if we know a < 0, then we know the sign bit is 1, // so we only need to check whether b >= 0 to know the result. // As a special case, xor requires negating the test src/hotspot/share/opto/subnode.cpp line 1549: > 1547: // if one argument is known to be negative: -1 ^ b < 0 <==> b >= 0 > 1548: // but not if it is known to be nonnegative: 0 ^ b < 0 <==> b < 0 > 1549: static Node* simplify_sign_invariant_comparison_input(PhaseGVN* phase, BoolNode* bool_node) { Please add `const` to the local variables and arguments that you are not mutating. src/hotspot/share/opto/subnode.cpp line 1556: > 1554: } > 1555: BasicType bt = cop == Op_CmpI ? T_INT : T_LONG; > 1556: Node* in_op = cmp->in(1); Please assert the precondition that `cmp->in(2)` is a zero constant. test/hotspot/jtreg/compiler/c2/gvn/BoolNodeSimplifySignInvariantTests.java line 85: > 83: > 84: @Run(test = { > 85: "knownAndInputInt1", "knownAndInputInt2", "knownAndInputInt3", "knownAndInputInt4", For the readability of the test it would be much easier if the methods were not numbered, but named with `Lt` or `Ge` for the comparison and `L` and `R` for which side is pinned to the "interesting" range. That would help a lot when trying to see if the test is correct. test/hotspot/jtreg/compiler/c2/gvn/BoolNodeSimplifySignInvariantTests.java line 711: > 709: } > 710: > 711: record IntRange(int lo, int hi) { Do you think this would be useful beyond this test? If so, it would be good to add it to the test library. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28782#pullrequestreview-3638337799 PR Review Comment: https://git.openjdk.org/jdk/pull/28782#discussion_r2671453075 PR Review Comment: https://git.openjdk.org/jdk/pull/28782#discussion_r2671684814 PR Review Comment: https://git.openjdk.org/jdk/pull/28782#discussion_r2671394680 PR Review Comment: https://git.openjdk.org/jdk/pull/28782#discussion_r2671651386 PR Review Comment: https://git.openjdk.org/jdk/pull/28782#discussion_r2671701046 From mhaessig at openjdk.org Thu Jan 8 10:02:45 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 8 Jan 2026 10:02:45 GMT Subject: RFR: 8373555: C2: Optimize redundant input calculations for sign comparisons [v2] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 08:27:32 GMT, Manuel H?ssig wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyright year > > src/hotspot/share/opto/subnode.cpp line 1556: > >> 1554: } >> 1555: BasicType bt = cop == Op_CmpI ? T_INT : T_LONG; >> 1556: Node* in_op = cmp->in(1); > > Please assert the precondition that `cmp->in(2)` is a zero constant. Also, asserting the test is what we expect would also be good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28782#discussion_r2671610528 From shade at openjdk.org Thu Jan 8 10:29:03 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Jan 2026 10:29:03 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v8] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: Fix ------------- Changes: https://git.openjdk.org/jdk/pull/26068/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=07 Stats: 20 lines in 4 files changed: 18 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From galder at openjdk.org Thu Jan 8 10:42:27 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 8 Jan 2026 10:42:27 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v6] In-Reply-To: References: Message-ID: > Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. > > Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. > > I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. > > To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). > > During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. > > I've run tier1-3 tests on linux/x64 successfully. Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'master' into topic.uses-min-max - Module also needed in the wrapper test class - It's the templated test that needs the module - Add missing module to test - Merge branch 'master' into topic.uses-min-max - Test Float16 - Only apply to uses that match original IR node - Merge branch 'master' into topic.uses-min-max - Use is_MinMax() instead of spelling out individual Min/Max opcodes - Refactor MaxNode to MinMaxNode and add is_MinMax() query - ... and 3 more: https://git.openjdk.org/jdk/compare/067fd3cb...7229e345 ------------- Changes: https://git.openjdk.org/jdk/pull/28895/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=05 Stats: 272 lines in 9 files changed: 200 ins; 21 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/28895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28895/head:pull/28895 PR: https://git.openjdk.org/jdk/pull/28895 From hgreule at openjdk.org Thu Jan 8 10:55:18 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 8 Jan 2026 10:55:18 GMT Subject: RFR: 8373555: C2: Optimize redundant input calculations for sign comparisons [v2] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 10:00:19 GMT, Manuel H?ssig wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyright year > > test/hotspot/jtreg/compiler/c2/gvn/BoolNodeSimplifySignInvariantTests.java line 711: > >> 709: } >> 710: >> 711: record IntRange(int lo, int hi) { > > Do you think this would be useful beyond this test? If so, it would be good to add it to the test library. Yes, there are quite a few tests with the same code (sometimes `IntRange` and `LongRange`, sometimes just `Range`). Do you want me to extract the record in this PR so the other existing tests can be cleaned up afterwards? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28782#discussion_r2671870388 From mhaessig at openjdk.org Thu Jan 8 11:11:47 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 8 Jan 2026 11:11:47 GMT Subject: RFR: 8373555: C2: Optimize redundant input calculations for sign comparisons [v2] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 10:51:55 GMT, Hannes Greule wrote: >> test/hotspot/jtreg/compiler/c2/gvn/BoolNodeSimplifySignInvariantTests.java line 711: >> >>> 709: } >>> 710: >>> 711: record IntRange(int lo, int hi) { >> >> Do you think this would be useful beyond this test? If so, it would be good to add it to the test library. > > Yes, there are quite a few tests with the same code (sometimes `IntRange` and `LongRange`, sometimes just `Range`). Do you want me to extract the record in this PR so the other existing tests can be cleaned up afterwards? Moving it to the test library and cleaning up in a follow-up RFE sounds good ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28782#discussion_r2671928859 From thartmann at openjdk.org Thu Jan 8 11:28:16 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Jan 2026 11:28:16 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v6] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 04:03:48 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Use Xcomp test run instead of Warmup(0)" > > This reverts commit 50bc132676e5a1276bdf2c236ae57873375a773d. Marked as reviewed by thartmann (Reviewer). All tests passed. ------------- PR Review: https://git.openjdk.org/jdk/pull/26827#pullrequestreview-3638976060 PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3723435745 From chagedorn at openjdk.org Thu Jan 8 11:46:20 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 Jan 2026 11:46:20 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v3] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 08:18:25 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/graphKit.cpp line 3590: >> >>> 3588: } >>> 3589: constant_value = Klass::_lh_neutral_value; // put in a known value >>> 3590: Node* lhp = basic_plus_adr(top(), klass_node, in_bytes(Klass::layout_helper_offset())); >> >> Same thought here: could we have a separate `off_heap_plus_addr()` or something like that instead of passing in `top()` on each call site? >> >> This and the other suggestion could also be done separately. > > I gave that one a try and I found that pattern to be common and I think it would be best done as a separate change. WDYT? Sounds good, let's do it separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2672027289 From chagedorn at openjdk.org Thu Jan 8 11:51:13 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 Jan 2026 11:51:13 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v4] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 08:16:55 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - more > - more > - review > - Merge branch 'master' into JDK-8373343 > - review > - review > - review > - merge > - more > - more > - ... and 3 more: https://git.openjdk.org/jdk/compare/c28f62fa...b20f41db Update looks good! Let me this another spin in our testing. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28769#pullrequestreview-3639046036 From roland at openjdk.org Thu Jan 8 11:56:32 2026 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 8 Jan 2026 11:56:32 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v3] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 11:42:40 GMT, Christian Hagedorn wrote: >> I gave that one a try and I found that pattern to be common and I think it would be best done as a separate change. WDYT? > > Sounds good, let's do it separately. I filed: https://bugs.openjdk.org/browse/JDK-8374789 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2672057908 From mchevalier at openjdk.org Thu Jan 8 12:05:36 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 8 Jan 2026 12:05:36 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 29 Dec 2025 17:39:42 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Address review comments for the JTREG test and microbenchmark > - Merge branch 'master' > - Address review comments > - Fix build failures on Mac > - Address review comments > - Merge 'master' > - 8366444: Add support for add/mul reduction operations for Float16 > > This patch adds mid-end support for vectorized add/mul reduction > operations for half floats. It also includes backend aarch64 support for > these operations. Only vectorization support through autovectorization > is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate > the implementation to be strictly ordered. The following is how each of > these reductions is implemented for different aarch64 targets - > > For AddReduction : > On Neon only targets (UseSVE = 0): Generates scalarized additions > using the scalar "fadd" instruction for both 8B and 16B vector lengths. > This is because Neon does not provide a direct instruction for computing > strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the "fadda" instruction which > computes add reduction for floating point in strict order. > > For MulReduction : > Both Neon and SVE do not provide a direct instruction for computing > strictly ordered floating point multiply reduction. For vector lengths > of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is > generated and multiply reduction for vector lengths > 16B is not > supported. > > Below is the performance of the two newly added microbenchmarks in > Float16OperationsBenchmark.java tested on three different aarch64 > machines and with varying MaxVectorSize - > > Note: On all machines, the score (ops/ms) is compared with the master > branch without this patch which generates a sequence of loads ("ldrsh") > to load the FP16 value into an FPR and a scalar "fadd/fmul" to > add/multiply the loaded value to the running sum/product. The ratios > given below are the ratios between the throughput with this patch and > the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the > master branch. > > N1 (UseSVE = 0... Sure, I'll take care of that! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3723564004 From thartmann at openjdk.org Thu Jan 8 12:27:46 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Jan 2026 12:27:46 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: On Thu, 8 Jan 2026 09:25:35 GMT, Marc Chevalier wrote: > As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. > > I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. > There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). > > Thanks, > Marc Looks good to me. Please make sure that this doesn't immediately trigger bugs in our testing :slightly_smiling_face: ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29110#pullrequestreview-3639163754 From qamai at openjdk.org Thu Jan 8 12:38:10 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 8 Jan 2026 12:38:10 GMT Subject: Withdrawn: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:30:46 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. > > I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. > > Please kindly review, thanks a lot. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/28597 From mchevalier at openjdk.org Thu Jan 8 12:44:10 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 8 Jan 2026 12:44:10 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: On Thu, 8 Jan 2026 09:25:35 GMT, Marc Chevalier wrote: > As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. > > I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. > There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). > > Thanks, > Marc Right, I didn't write there, but I've tested with tier 1-3 and internal tests. All good. I'll test with more flags, tho. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29110#issuecomment-3723681897 From chagedorn at openjdk.org Thu Jan 8 13:00:49 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 Jan 2026 13:00:49 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v28] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 17:08:57 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: > > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - Update license header years > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - remove trailing whitespaces > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - additional suggestions from code review > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - fix trip counter loop-variant detection > - fix bad merge with ctrl_is_member() > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > > # Conflicts: > # src/hotspot/share/opto/loopnode.cpp > - ... and 40 more: https://git.openjdk.org/jdk/compare/640343f7...7783d609 The last update looks good, thanks for addressing all my comments! I've submitted some testing up to tier4 with this patch and your prepared [counted-loop-refactor-old-vs-new](https://github.com/tabjy/jdk/commits/counted-loop-refactor-old-vs-new/) branch. Thanks a lot for doing that! I will report back once I got the results. If they look good, I also run this through some higher tiers over the weekend. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3723761609 From thartmann at openjdk.org Thu Jan 8 13:23:36 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Jan 2026 13:23:36 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: On Thu, 8 Jan 2026 09:25:35 GMT, Marc Chevalier wrote: > As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. > > I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. > There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). > > Thanks, > Marc Sounds good, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29110#issuecomment-3723850098 From epeter at openjdk.org Thu Jan 8 15:01:25 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jan 2026 15:01:25 GMT Subject: RFR: 8374785: Template Library: need to tag Float16.copySign as having non-deterministic result because of multiple NaNs with different sign bits Message-ID: This is a very similar issue as https://github.com/openjdk/jdk/pull/29033 / [JDK-8374489](https://bugs.openjdk.org/browse/JDK-8374489). There are `NaN` encodings that have the sign bit set, and others that have it not set. If we now copy the sign from such a `NaN` to a numeric value (e.g. `1`), we can get `1` or `-1`. jshell> var a = Float.NaN; a ==> NaN jshell> var b = Float.intBitsToFloat(0xFFC00000); b ==> NaN jshell> Math.copySign(1f, a) ==> 1.0 jshell> Math.copySign(1f, b) ==> -1.0 jshell> Float16.copySign(Float16.valueOf(1f), Float16.valueOf(a)) ==> 1.0 jshell> Float16.copySign(Float16.valueOf(1f), Float16.valueOf(b)) ==> -1.0 Since `NaN` values of different encodings are interchangable, and we cannot know what `NaN` we get, and hence the sign bit is arbitrary, we can also not know the sign of the result of `Float16.copySign`. We have to mark it as non-deterministic and hence disable result verification. Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. ------------- Commit messages: - JDK-8374785 Changes: https://git.openjdk.org/jdk/pull/29118/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29118&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374785 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29118.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29118/head:pull/29118 PR: https://git.openjdk.org/jdk/pull/29118 From thartmann at openjdk.org Thu Jan 8 15:01:26 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Jan 2026 15:01:26 GMT Subject: RFR: 8374785: Template Library: need to tag Float16.copySign as having non-deterministic result because of multiple NaNs with different sign bits In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 14:50:46 GMT, Emanuel Peter wrote: > This is a very similar issue as https://github.com/openjdk/jdk/pull/29033 / [JDK-8374489](https://bugs.openjdk.org/browse/JDK-8374489). > > There are `NaN` encodings that have the sign bit set, and others that have it not set. > If we now copy the sign from such a `NaN` to a numeric value (e.g. `1`), we can get `1` or `-1`. > > > jshell> var a = Float.NaN; > a ==> NaN > jshell> var b = Float.intBitsToFloat(0xFFC00000); > b ==> NaN > jshell> Math.copySign(1f, a) > ==> 1.0 > jshell> Math.copySign(1f, b) > ==> -1.0 > jshell> Float16.copySign(Float16.valueOf(1f), Float16.valueOf(a)) > ==> 1.0 > jshell> Float16.copySign(Float16.valueOf(1f), Float16.valueOf(b)) > ==> -1.0 > > > Since `NaN` values of different encodings are interchangable, and we cannot know what `NaN` we get, and hence the sign bit is arbitrary, we can also not know the sign of the result of `Float16.copySign`. We have to mark it as non-deterministic and hence disable result verification. > > Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29118#pullrequestreview-3639814696 From epeter at openjdk.org Thu Jan 8 15:24:54 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jan 2026 15:24:54 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> Message-ID: On Mon, 5 Jan 2026 12:40:21 GMT, Bhavana Kilambi wrote: >> As for the IR verification failure, I've looked a bit and couldn't find such an issue already. Since it reproduces on master, I suggest you file a ticket, indeed. Thanks! > > Hi @marc-chevalier @eme64 Would you please be able to run some testing internally before I integrate this patch? Thanks! @Bhavana-Kilambi This looks like a great addition! I see that you have some new benchmarks and IR tests. I wonder if it would make sense to add `Float16` benchmarks to this existing test: `test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java` And also to this Benchmark: `test/micro/org/openjdk/bench/vm/compiler/VectorReduction2.java` It would be nice if we test `simple`, `dotProduct` and `Big` reductions. It can have an impact on profitability. Your `ReductionAddFP16` and `ReductionMulFP16` are already "simple" reductions, so I'd suspect that the other reductions are also profitable. For reference: https://github.com/openjdk/jdk/pull/27803 and https://github.com/openjdk/jdk/pull/25387 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3724348378 From epeter at openjdk.org Thu Jan 8 15:24:58 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jan 2026 15:24:58 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 29 Dec 2025 17:39:42 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Address review comments for the JTREG test and microbenchmark > - Merge branch 'master' > - Address review comments > - Fix build failures on Mac > - Address review comments > - Merge 'master' > - 8366444: Add support for add/mul reduction operations for Float16 > > This patch adds mid-end support for vectorized add/mul reduction > operations for half floats. It also includes backend aarch64 support for > these operations. Only vectorization support through autovectorization > is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate > the implementation to be strictly ordered. The following is how each of > these reductions is implemented for different aarch64 targets - > > For AddReduction : > On Neon only targets (UseSVE = 0): Generates scalarized additions > using the scalar "fadd" instruction for both 8B and 16B vector lengths. > This is because Neon does not provide a direct instruction for computing > strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the "fadda" instruction which > computes add reduction for floating point in strict order. > > For MulReduction : > Both Neon and SVE do not provide a direct instruction for computing > strictly ordered floating point multiply reduction. For vector lengths > of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is > generated and multiply reduction for vector lengths > 16B is not > supported. > > Below is the performance of the two newly added microbenchmarks in > Float16OperationsBenchmark.java tested on three different aarch64 > machines and with varying MaxVectorSize - > > Note: On all machines, the score (ops/ms) is compared with the master > branch without this patch which generates a sequence of loads ("ldrsh") > to load the FP16 value into an FPR and a scalar "fadd/fmul" to > add/multiply the loaded value to the running sum/product. The ratios > given below are the ratios between the throughput with this patch and > the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the > master branch. > > N1 (UseSVE = 0... src/hotspot/cpu/aarch64/aarch64_vector.ad line 267: > 265: // Only the Neon instructions need this check. SVE supports half-precision floats > 266: // by default. > 267: if (length_in_bytes < 8 || (UseSVE == 0 && !is_feat_fp16_supported())) { Was the comment about `FEAT_FP16` not supposed to stay at the top of the first Float16 operation? Now it has moved down... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2672782299 From epeter at openjdk.org Thu Jan 8 15:30:01 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jan 2026 15:30:01 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 29 Dec 2025 17:39:42 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Address review comments for the JTREG test and microbenchmark > - Merge branch 'master' > - Address review comments > - Fix build failures on Mac > - Address review comments > - Merge 'master' > - 8366444: Add support for add/mul reduction operations for Float16 > > This patch adds mid-end support for vectorized add/mul reduction > operations for half floats. It also includes backend aarch64 support for > these operations. Only vectorization support through autovectorization > is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate > the implementation to be strictly ordered. The following is how each of > these reductions is implemented for different aarch64 targets - > > For AddReduction : > On Neon only targets (UseSVE = 0): Generates scalarized additions > using the scalar "fadd" instruction for both 8B and 16B vector lengths. > This is because Neon does not provide a direct instruction for computing > strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the "fadda" instruction which > computes add reduction for floating point in strict order. > > For MulReduction : > Both Neon and SVE do not provide a direct instruction for computing > strictly ordered floating point multiply reduction. For vector lengths > of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is > generated and multiply reduction for vector lengths > 16B is not > supported. > > Below is the performance of the two newly added microbenchmarks in > Float16OperationsBenchmark.java tested on three different aarch64 > machines and with varying MaxVectorSize - > > Note: On all machines, the score (ops/ms) is compared with the master > branch without this patch which generates a sequence of loads ("ldrsh") > to load the FP16 value into an FPR and a scalar "fadd/fmul" to > add/multiply the loaded value to the running sum/product. The ratios > given below are the ratios between the throughput with this patch and > the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the > master branch. > > N1 (UseSVE = 0... test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 459: > 457: short result = (short) 0; > 458: for (int i = 0; i < LEN; i++) { > 459: result = float16ToRawShortBits(add(shortBitsToFloat16(result), shortBitsToFloat16(input1[i]))); Why all the conversions from and to `short` / `Float16`? Is there any benefit to use `short` for the intermediate results? Why not make `result` a `Float16`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2672804207 From bmaillard at openjdk.org Thu Jan 8 15:48:18 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 8 Jan 2026 15:48:18 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v6] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 10:42:27 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge branch 'master' into topic.uses-min-max > - Module also needed in the wrapper test class > - It's the templated test that needs the module > - Add missing module to test > - Merge branch 'master' into topic.uses-min-max > - Test Float16 > - Only apply to uses that match original IR node > - Merge branch 'master' into topic.uses-min-max > - Use is_MinMax() instead of spelling out individual Min/Max opcodes > - Refactor MaxNode to MinMaxNode and add is_MinMax() query > - ... and 3 more: https://git.openjdk.org/jdk/compare/067fd3cb...7229e345 Looks good to me, thanks for making this change. I like the move from `MaxNode` to `MinMaxNode`, I always thought the naming was a bit confusing. I have also kicked off internal testing and will come back with the results once done. ------------- Marked as reviewed by bmaillard (Committer). PR Review: https://git.openjdk.org/jdk/pull/28895#pullrequestreview-3640027372 From bmaillard at openjdk.org Thu Jan 8 15:48:21 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 8 Jan 2026 15:48:21 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v2] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 08:33:48 GMT, Galder Zamarre?o wrote: >> src/hotspot/share/opto/phaseX.cpp line 2609: >> >>> 2607: for (DUIterator_Fast i2max, i2 = use->fast_outs(i2max); i2 < i2max; i2++) { >>> 2608: Node* u = use->fast_out(i2); >>> 2609: if (u->Opcode() == use->Opcode()) { >> >> So there are no Min(Max()) or Max(Min()) patterns we need to worry about? I was expecting this line to be >> >> if (u->is_MinMax()) { > > Good question. There could be some patterns but I couldn't think of any when I was working on this, so I limited it to the patterns that I knew for sure required this, e.g. Max(Max()), Min(Min()). I took a quick look and I didn't find anything obvious. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2672883785 From bmaillard at openjdk.org Thu Jan 8 15:53:48 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 8 Jan 2026 15:53:48 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v6] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 10:42:27 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge branch 'master' into topic.uses-min-max > - Module also needed in the wrapper test class > - It's the templated test that needs the module > - Add missing module to test > - Merge branch 'master' into topic.uses-min-max > - Test Float16 > - Only apply to uses that match original IR node > - Merge branch 'master' into topic.uses-min-max > - Use is_MinMax() instead of spelling out individual Min/Max opcodes > - Refactor MaxNode to MinMaxNode and add is_MinMax() query > - ... and 3 more: https://git.openjdk.org/jdk/compare/067fd3cb...7229e345 src/hotspot/share/opto/phaseX.cpp line 2605: > 2603: } > 2604: } > 2605: // Check for max(a, max(b, c)) patterns Suggestion: // Check for Max/Min(A, Max/Min(B, C)) where A == B or A == C Nit: I find it nice when we have the exact same string as in the comment where the optimization actually takes place, so we can just find it with `ctrl+f` easily. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2672898190 From epeter at openjdk.org Thu Jan 8 16:10:08 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jan 2026 16:10:08 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: References: Message-ID: <-hubTcbRW128C1Uuvoq1xx9KfIFLnYGh8gUB757p6iA=.3596b81a-b8b8-491a-a709-7268e3300a92@github.com> On Mon, 22 Dec 2025 12:09:01 GMT, Jatin Bhateja wrote: >> Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET >> Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. >> Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. >> >> Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. >> >> >> X + X * 1 = 2X >> X + X * 2 = 3X >> X + X * 4 = 5X >> X + X * 8 = 9X >> >> >> Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the >> scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. >> >> >> BASE INDEX SCALE MULTIPLER >> X X 1 2 (Terminal) >> X X 2 3 (Terminal) >> X X 4 5 (Terminal) >> X X 8 9 (Terminal) >> 3X 3X 1 6 >> X 3X 2 7 >> 5X 5X 1 10 >> X 5X 2 11 >> X 3X 4 13 >> 5X 5X 2 15 >> X 2X 8 17 >> 9X 9X 1 18 >> X 9X 2 19 >> X 5X 4 21 >> 5X 5X 4 25 >> 9X 9X 2 27 >> X 9X 4 37 >> X 5X 8 41 >> 9X 9X 4 45 >> X 9X 8 73 >> 9X 9X 8 81 >> >> >> All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. >> >> Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. >> >> >> System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- >> Baseline:- >> Benchmark Mode Cnt Score Error Units >> ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min >> ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min >> >> >> Withopt:- >> Benchmark Mode Cnt Score Error Units >> Constant... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Extending micro and jtreg tests for memory patterns Looks like a neat optimization :) I have a few initial comments. test/hotspot/jtreg/compiler/c2/TestConstantMultiplier.java line 89: > 87: )); > 88: var testTemplate1 = Template.make(() -> scope( > 89: IntStream.of(81, 73, 45, 41, 37, 27, 25, 21, 19, 13, 11).mapToObj( Is there something special about these values? If yes: add a code comment :) If no: could we add random values to the list to improve coverage and find edge cases? test/hotspot/jtreg/compiler/c2/TestConstantMultiplier.java line 102: > 100: private static void runMultBy#{multiplier}I() { > 101: int multiplicand = RANDOM.nextInt(); > 102: Verify.checkEQ(#{multiplier} * multiplicand, testMultBy#{multiplier}I(multiplicand)); I think that the `@Run` method also gets compiled, so probably both sides of the verification are compiled. Is that your intention? Probably not, right? ------------- PR Review: https://git.openjdk.org/jdk/pull/28759#pullrequestreview-3640094256 PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2672927447 PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2672963980 From epeter at openjdk.org Thu Jan 8 16:10:10 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jan 2026 16:10:10 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: <-hubTcbRW128C1Uuvoq1xx9KfIFLnYGh8gUB757p6iA=.3596b81a-b8b8-491a-a709-7268e3300a92@github.com> References: <-hubTcbRW128C1Uuvoq1xx9KfIFLnYGh8gUB757p6iA=.3596b81a-b8b8-491a-a709-7268e3300a92@github.com> Message-ID: On Thu, 8 Jan 2026 15:57:39 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Extending micro and jtreg tests for memory patterns > > test/hotspot/jtreg/compiler/c2/TestConstantMultiplier.java line 89: > >> 87: )); >> 88: var testTemplate1 = Template.make(() -> scope( >> 89: IntStream.of(81, 73, 45, 41, 37, 27, 25, 21, 19, 13, 11).mapToObj( > > Is there something special about these values? If yes: add a code comment :) > If no: could we add random values to the list to improve coverage and find edge cases? Ah, I see now that these are the values from your lookup table in the optimization. I think it would still be good if you added random values, just for result verification. And only enable IR rules for the special values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2672960296 From vpaprotski at openjdk.org Thu Jan 8 16:28:10 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 8 Jan 2026 16:28:10 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v3] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 00:24:11 GMT, Shawn M Emery wrote: >> This change allows use of the AVX512_VBMI instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.3 to 0.6%, encapsulation is 0.4 to 1.7%, and decapsulation is 0.3 to 1.9%. >> >> Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI > Change Swap to Dup named function/variable > Check for only VBMI support (not VBMI2) Looks good to me ------------- Marked as reviewed by vpaprotski (Committer). PR Review: https://git.openjdk.org/jdk/pull/28815#pullrequestreview-3640216667 From duke at openjdk.org Thu Jan 8 17:59:35 2026 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 8 Jan 2026 17:59:35 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: References: Message-ID: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> > This change allows use of the AVX512_VBMI instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.3 to 0.6%, encapsulation is 0.4 to 1.7%, and decapsulation is 0.3 to 1.9%. > > Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. Shawn M Emery has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge with mainline - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI Change Swap to Dup named function/variable Check for only VBMI support (not VBMI2) - Update copyright year - Merge with mainline - Swap parameter operation with source - Remove wrong mask from evpsrlvw - Reverse ordering for vpermb and vpsrlvw instructions - Switch from vpshldvw to vpsrlvw - Fix whitespaces - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28815/files - new: https://git.openjdk.org/jdk/pull/28815/files/4af75963..373b1339 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=02-03 Stats: 26668 lines in 2610 files changed: 7287 ins; 4136 del; 15245 mod Patch: https://git.openjdk.org/jdk/pull/28815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28815/head:pull/28815 PR: https://git.openjdk.org/jdk/pull/28815 From missa at openjdk.org Thu Jan 8 18:08:13 2026 From: missa at openjdk.org (Mohamed Issa) Date: Thu, 8 Jan 2026 18:08:13 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v6] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 17:27:01 GMT, Daniel Lund?n wrote: > I will review this changeset now after integration, but, for future reference, please note that HotSpot changes require at least **two** reviews before integration (see https://openjdk.org/guide/#life-of-a-pr). @dlunde I'm planning to backport this to JDK-26. Please let me know when you're finished reviewing as soon as possible, so I can do so. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28999#issuecomment-3725042180 From vlivanov at openjdk.org Thu Jan 8 18:45:13 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 8 Jan 2026 18:45:13 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: On Thu, 8 Jan 2026 09:25:35 GMT, Marc Chevalier wrote: > As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. > > I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. > There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). > > Thanks, > Marc src/hotspot/share/opto/compile.cpp line 2196: > 2194: > 2195: if (StressIncrementalInlining) { > 2196: shuffle_late_inlines(); It shuffles initial list, but doesn't have any effects on elements added during incremental inlining. Do we want to shuffle them as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29110#discussion_r2673467791 From coleenp at openjdk.org Thu Jan 8 23:24:18 2026 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 8 Jan 2026 23:24:18 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache Message-ID: Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. See CR for more information. Tested with tier1-4. ------------- Commit messages: - 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache Changes: https://git.openjdk.org/jdk/pull/29129/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29129&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374828 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29129/head:pull/29129 PR: https://git.openjdk.org/jdk/pull/29129 From xgong at openjdk.org Fri Jan 9 01:44:56 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 9 Jan 2026 01:44:56 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector mask relative code in c2 Message-ID: The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific features, making the related code in HotSpot difficult to understand and review. This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and maintainability. Note: This patch only adds comments; no functional changes are made. ------------- Commit messages: - 8370666: VectorAPI: Add clear comments for vector mask relative code in c2 Changes: https://git.openjdk.org/jdk/pull/29130/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29130&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370666 Stats: 149 lines in 5 files changed: 129 ins; 2 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/29130.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29130/head:pull/29130 PR: https://git.openjdk.org/jdk/pull/29130 From duke at openjdk.org Fri Jan 9 05:19:07 2026 From: duke at openjdk.org (duke) Date: Fri, 9 Jan 2026 05:19:07 GMT Subject: Withdrawn: 8367789: AArch64 missing acquire in JNI_FastGetField::generate_fast_get_int_field0 In-Reply-To: References: Message-ID: On Tue, 16 Sep 2025 21:30:54 GMT, Justin King wrote: > Use a load-acquire to match the store-release used by C++ to update `safepoint_counter` during arming. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/27325 From jkarthikeyan at openjdk.org Fri Jan 9 05:19:08 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 9 Jan 2026 05:19:08 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v6] In-Reply-To: References: Message-ID: <7R6EAsjVO5fMlO4VHAecrzkJVK7HyFe17-uufU646Wg=.5634023d-8fa8-4cd1-8478-9e56406a9b4a@github.com> On Thu, 8 Jan 2026 04:03:48 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Use Xcomp test run instead of Warmup(0)" > > This reverts commit 50bc132676e5a1276bdf2c236ae57873375a773d. Thank you for the review and testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26827#issuecomment-3727227190 From jkarthikeyan at openjdk.org Fri Jan 9 05:19:09 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 9 Jan 2026 05:19:09 GMT Subject: Integrated: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII In-Reply-To: References: Message-ID: On Mon, 18 Aug 2025 16:34:52 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! This pull request has now been integrated. Changeset: 775f48de Author: Jasmine Karthikeyan URL: https://git.openjdk.org/jdk/commit/775f48de6129092d05650fec17dad171944e6d89 Stats: 33 lines in 2 files changed: 31 ins; 0 del; 2 mod 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII Reviewed-by: chagedorn, thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/26827 From shade at openjdk.org Fri Jan 9 07:51:34 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 Jan 2026 07:51:34 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. This looks OK on a surface, but I have a suspicion this points to a bug. AOT code in mainline is not GC-sensitive. Meaning, you can generate the AOT cache with one GC and use it with another. IIRC, the only generated code we are saving in mainline are adapters. Some adapters do have GC barriers in them, mostly around peeking into OopHandles. IIRC, this is why there are phantom barriers referenced in AOT table. That peeking involves no GC action. At some point, we reasoned it is "safe" to do even if GC changes. (Yes, it is fairly awkward and flimsy.) But seeing the need for non-phantom ZGC barrier implies something new is happening in lworld? Is it really safe then? In the sense that we might be missing a real GC barrier if GC selection changes between AOT creation and usage? That safety needs to be established separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3727593202 From mchevalier at openjdk.org Fri Jan 9 08:41:54 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 9 Jan 2026 08:41:54 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: <47G92GiYmF8jY_vJ2hUNaQQoGBLqpkg_pYJT2r4jn9g=.94cb4ca9-8dfd-4a31-b77b-dbcc5b8f29d0@github.com> On Thu, 8 Jan 2026 18:41:45 GMT, Vladimir Ivanov wrote: >> As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. >> >> I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. >> There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). >> >> Thanks, >> Marc > > src/hotspot/share/opto/compile.cpp line 2196: > >> 2194: >> 2195: if (StressIncrementalInlining) { >> 2196: shuffle_late_inlines(); > > It shuffles initial list, but doesn't have any effects on elements added during incremental inlining. Do we want to shuffle them as well? Good point. I don't see why we wouldn't want that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29110#discussion_r2675322174 From stefank at openjdk.org Fri Jan 9 08:43:07 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 9 Jan 2026 08:43:07 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 07:48:16 GMT, Aleksey Shipilev wrote: > That peeking involves no GC action. Not directly related to this PR, but this caught my eyes. Do you have more information about this somewhere? On the surface this sounds incorrect for ZGC, so I'd like to make sure that there's no bug lurking in there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3727807930 From epeter at openjdk.org Fri Jan 9 08:43:15 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jan 2026 08:43:15 GMT Subject: [jdk26] RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs Message-ID: <-lUfH47pEisQOr4Qi038k3ErKZp2MQFD_YgGjp6_Oj4=.21b442c3-51b1-4d94-a361-98c40c74cf99@github.com> Hi all, This pull request contains a backport of commit [da14813a](https://github.com/openjdk/jdk/commit/da14813a5bdadaf0a1f81fa57ff6e1b103eaf113) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Emanuel Peter on 7 Jan 2026 and was reviewed by Vladimir Kozlov, Tobias Hartmann and Quan Anh Mai. Thanks! ------------- Commit messages: - Backport da14813a5bdadaf0a1f81fa57ff6e1b103eaf113 Changes: https://git.openjdk.org/jdk/pull/29123/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29123&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373453 Stats: 116 lines in 3 files changed: 109 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/29123.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29123/head:pull/29123 PR: https://git.openjdk.org/jdk/pull/29123 From thartmann at openjdk.org Fri Jan 9 08:43:16 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Jan 2026 08:43:16 GMT Subject: [jdk26] RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: <-lUfH47pEisQOr4Qi038k3ErKZp2MQFD_YgGjp6_Oj4=.21b442c3-51b1-4d94-a361-98c40c74cf99@github.com> References: <-lUfH47pEisQOr4Qi038k3ErKZp2MQFD_YgGjp6_Oj4=.21b442c3-51b1-4d94-a361-98c40c74cf99@github.com> Message-ID: On Thu, 8 Jan 2026 17:51:09 GMT, Emanuel Peter wrote: > Hi all, > > This pull request contains a backport of commit [da14813a](https://github.com/openjdk/jdk/commit/da14813a5bdadaf0a1f81fa57ff6e1b103eaf113) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Emanuel Peter on 7 Jan 2026 and was reviewed by Vladimir Kozlov, Tobias Hartmann and Quan Anh Mai. > > Thanks! Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29123#pullrequestreview-3642921859 From thartmann at openjdk.org Fri Jan 9 08:49:33 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Jan 2026 08:49:33 GMT Subject: [jdk26] RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII Message-ID: Hi all, This pull request contains a backport of commit [775f48de](https://github.com/openjdk/jdk/commit/775f48de6129092d05650fec17dad171944e6d89) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Jasmine Karthikeyan on 9 Jan 2026 and was reviewed by Christian Hagedorn, Tobias Hartmann and Emanuel Peter. Thanks! ------------- Commit messages: - Backport 775f48de6129092d05650fec17dad171944e6d89 Changes: https://git.openjdk.org/jdk/pull/29134/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29134&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8365570 Stats: 33 lines in 2 files changed: 31 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29134.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29134/head:pull/29134 PR: https://git.openjdk.org/jdk/pull/29134 From shade at openjdk.org Fri Jan 9 08:55:55 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 Jan 2026 08:55:55 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 08:38:49 GMT, Stefan Karlsson wrote: > > That peeking involves no GC action. > > Not directly related to this PR, but this caught my eyes. Do you have more information about this somewhere? On the surface this sounds incorrect for ZGC, so I'd like to make sure that there's no bug lurking in there. IIRC, it is somewhere here in `BarrierSetAssembler::c2i_entry_barrier`: void BarrierSetAssembler::c2i_entry_barrier(MacroAssembler* masm) { ... __ movptr(tmp1, Address(tmp1, ClassLoaderData::holder_offset())); __ resolve_weak_handle(tmp1, tmp2); // <--- does IN_NATIVE | ON_PHANTOM_OOP_REF inside __ cmpptr(tmp1, 0); __ jcc(Assembler::notEqual, method_live); ... } The code that is emitted in this method is part of C2I adapter, so it is stored in AOT cache. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3727863469 From bmaillard at openjdk.org Fri Jan 9 08:59:02 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 9 Jan 2026 08:59:02 GMT Subject: [jdk26] RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 08:41:55 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [775f48de](https://github.com/openjdk/jdk/commit/775f48de6129092d05650fec17dad171944e6d89) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jasmine Karthikeyan on 9 Jan 2026 and was reviewed by Christian Hagedorn, Tobias Hartmann and Emanuel Peter. > > Thanks! Looks good to me! ------------- Marked as reviewed by bmaillard (Committer). PR Review: https://git.openjdk.org/jdk/pull/29134#pullrequestreview-3642972437 From epeter at openjdk.org Fri Jan 9 09:06:53 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jan 2026 09:06:53 GMT Subject: [jdk26] RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 08:41:55 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [775f48de](https://github.com/openjdk/jdk/commit/775f48de6129092d05650fec17dad171944e6d89) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jasmine Karthikeyan on 9 Jan 2026 and was reviewed by Christian Hagedorn, Tobias Hartmann and Emanuel Peter. > > Thanks! Thanks for the backport :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29134#pullrequestreview-3643009928 From thartmann at openjdk.org Fri Jan 9 09:11:11 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Jan 2026 09:11:11 GMT Subject: [jdk26] RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 08:41:55 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [775f48de](https://github.com/openjdk/jdk/commit/775f48de6129092d05650fec17dad171944e6d89) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jasmine Karthikeyan on 9 Jan 2026 and was reviewed by Christian Hagedorn, Tobias Hartmann and Emanuel Peter. > > Thanks! Thanks for the reviews, Beno?t and Emanuel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29134#issuecomment-3727950849 From qamai at openjdk.org Fri Jan 9 10:23:45 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 9 Jan 2026 10:23:45 GMT Subject: RFR: 8372302: C2: IGVN verification fails because ModXNode::Ideal creates unused intermediate nodes [v3] In-Reply-To: References: Message-ID: <7VLl8Vka_xtTY4N4sVcw5a-4bTpcKvpvnU0M0uBCOHc=.85b6ef00-ad8a-4c0b-a8c0-62d6708af753@github.com> On Fri, 12 Dec 2025 19:05:15 GMT, Beno?t Maillard wrote: >> This PR addresses a failure in IGVN verification with `ModI` and `ModL` nodes. >> >> In `ModeXNode::Ideal`, we have code to optimize a modulo expression by expressing it in terms of other operations. There are actually two distinct cases, one where the divisor is a constant and is equal to `modulo 2^k-1` for some integer `k`, and a more general case where other transformations do not succeed. Because these transformations involve creating several new nodes (sometimes in a loop) and calling `phase->transform(...)` on them, we want to avoid accidentally triggering optimizations on the "unfinished" state of the subgraph. For this, we create a temporary dummy node and add edges to the nodes being constructed. >> >> There are some execution paths where the node is not destroyed before `Ideal` returns, and this creates issues during IGVN verification, as the verification code checks if the number of nodes has changed after having called `Ideal` on a given node and not expecting changes. >> >> The path in question is when we exit because the divisor is a constant and is the minimum value: >> https://github.com/openjdk/jdk/blob/c19b12927d2ac901ec8ccaa2de5897ee4c47af56/src/hotspot/share/opto/divnode.cpp#L1146-L1147 >> >> The zero case does not cause problems (this seems to be because it would hide behind a `div0_check` anyway). >> >> The fix is simply to only create the temporary node when it is needed, and thus avoiding returning without destroying it. >> >> ### Testing >> - [x] [GitHub Actions](TODO) >> - [x] tier1-4, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: > > - Update package > - Move to compiler/c2/igvn Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28488#pullrequestreview-3643312911 From jbhateja at openjdk.org Fri Jan 9 10:28:00 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 Jan 2026 10:28:00 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: References: <-hubTcbRW128C1Uuvoq1xx9KfIFLnYGh8gUB757p6iA=.3596b81a-b8b8-491a-a709-7268e3300a92@github.com> Message-ID: On Thu, 8 Jan 2026 16:05:47 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/TestConstantMultiplier.java line 89: >> >>> 87: )); >>> 88: var testTemplate1 = Template.make(() -> scope( >>> 89: IntStream.of(81, 73, 45, 41, 37, 27, 25, 21, 19, 13, 11).mapToObj( >> >> Is there something special about these values? If yes: add a code comment :) >> If no: could we add random values to the list to improve coverage and find edge cases? > > Ah, I see now that these are the values from your lookup table in the optimization. > > I think it would still be good if you added random values, just for result verification. > And only enable IR rules for the special values. We also do some clever optimization for POT multiplier in MulI/MulLNode::Ideal routines which breaks multiplication into LShift/Add/Sub nodes and but its target agnostic. Reason why I only selected these constants were because we are specifically handling these cases through optimum LEA based instruction sequence in the backend and Machine level IR annotation guarantees that required constant operand patten was indeed selected during matching. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2675631954 From jbhateja at openjdk.org Fri Jan 9 10:28:05 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 Jan 2026 10:28:05 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: <-hubTcbRW128C1Uuvoq1xx9KfIFLnYGh8gUB757p6iA=.3596b81a-b8b8-491a-a709-7268e3300a92@github.com> References: <-hubTcbRW128C1Uuvoq1xx9KfIFLnYGh8gUB757p6iA=.3596b81a-b8b8-491a-a709-7268e3300a92@github.com> Message-ID: <7vaiclFr4CafZJPXdzEDMSHZKKQ9KeE5h23NOetMI-A=.1d794b6e-e744-43b8-8407-2a00f24e81e9@github.com> On Thu, 8 Jan 2026 16:06:51 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Extending micro and jtreg tests for memory patterns > > test/hotspot/jtreg/compiler/c2/TestConstantMultiplier.java line 102: > >> 100: private static void runMultBy#{multiplier}I() { >> 101: int multiplicand = RANDOM.nextInt(); >> 102: Verify.checkEQ(#{multiplier} * multiplicand, testMultBy#{multiplier}I(multiplicand)); > > I think that the `@Run` method also gets compiled, so probably both sides of the verification are compiled. Is that your intention? Probably not, right? I didn't follow it, I don't intend to invoke Run in StandAlone Mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2675637149 From epeter at openjdk.org Fri Jan 9 10:28:57 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jan 2026 10:28:57 GMT Subject: RFR: 8372302: C2: IGVN verification fails because ModXNode::Ideal creates unused intermediate nodes [v3] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 19:05:15 GMT, Beno?t Maillard wrote: >> This PR addresses a failure in IGVN verification with `ModI` and `ModL` nodes. >> >> In `ModeXNode::Ideal`, we have code to optimize a modulo expression by expressing it in terms of other operations. There are actually two distinct cases, one where the divisor is a constant and is equal to `modulo 2^k-1` for some integer `k`, and a more general case where other transformations do not succeed. Because these transformations involve creating several new nodes (sometimes in a loop) and calling `phase->transform(...)` on them, we want to avoid accidentally triggering optimizations on the "unfinished" state of the subgraph. For this, we create a temporary dummy node and add edges to the nodes being constructed. >> >> There are some execution paths where the node is not destroyed before `Ideal` returns, and this creates issues during IGVN verification, as the verification code checks if the number of nodes has changed after having called `Ideal` on a given node and not expecting changes. >> >> The path in question is when we exit because the divisor is a constant and is the minimum value: >> https://github.com/openjdk/jdk/blob/c19b12927d2ac901ec8ccaa2de5897ee4c47af56/src/hotspot/share/opto/divnode.cpp#L1146-L1147 >> >> The zero case does not cause problems (this seems to be because it would hide behind a `div0_check` anyway). >> >> The fix is simply to only create the temporary node when it is needed, and thus avoiding returning without destroying it. >> >> ### Testing >> - [x] [GitHub Actions](TODO) >> - [x] tier1-4, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: > > - Update package > - Move to compiler/c2/igvn Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28488#pullrequestreview-3643323542 From thartmann at openjdk.org Fri Jan 9 10:42:51 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Jan 2026 10:42:51 GMT Subject: [jdk26] Integrated: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 08:41:55 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [775f48de](https://github.com/openjdk/jdk/commit/775f48de6129092d05650fec17dad171944e6d89) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jasmine Karthikeyan on 9 Jan 2026 and was reviewed by Christian Hagedorn, Tobias Hartmann and Emanuel Peter. > > Thanks! This pull request has now been integrated. Changeset: 10d97c5e Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/10d97c5e6e701d8db8cf2140d8893dafbc51c2c7 Stats: 33 lines in 2 files changed: 31 ins; 0 del; 2 mod 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII Reviewed-by: bmaillard, epeter Backport-of: 775f48de6129092d05650fec17dad171944e6d89 ------------- PR: https://git.openjdk.org/jdk/pull/29134 From epeter at openjdk.org Fri Jan 9 12:07:12 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jan 2026 12:07:12 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: References: <-hubTcbRW128C1Uuvoq1xx9KfIFLnYGh8gUB757p6iA=.3596b81a-b8b8-491a-a709-7268e3300a92@github.com> Message-ID: On Fri, 9 Jan 2026 10:20:40 GMT, Jatin Bhateja wrote: >> Ah, I see now that these are the values from your lookup table in the optimization. >> >> I think it would still be good if you added random values, just for result verification. >> And only enable IR rules for the special values. > > We also do some clever optimization for POT multiplier in MulI/MulLNode::Ideal routines which breaks multiplication into LShift/Add/Sub nodes and but its target agnostic. > > Reason why I only selected these constants were because we are specifically handling these cases through optimum LEA based instruction sequence in the backend and Machine level IR annotation guarantees that required constant operand patten was indeed selected during matching. Right I understand. But it is generally a good idea to not just verify your specific values, but also fuzz around a bit, just in case your optimization messes up and touches other cases near by. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2675947140 From dlunden at openjdk.org Fri Jan 9 12:33:31 2026 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 9 Jan 2026 12:33:31 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v6] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 17:27:01 GMT, Daniel Lund?n wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > I will review this changeset now after integration, but, for future reference, please note that HotSpot changes require at least **two** reviews before integration (see https://openjdk.org/guide/#life-of-a-pr). > @dlunde I'm planning to backport this to JDK-26. Please let me know when you're finished reviewing as soon as possible, so I can do so. Thanks! @missa-prime Will do! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28999#issuecomment-3728723019 From epeter at openjdk.org Fri Jan 9 12:36:02 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jan 2026 12:36:02 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: <7vaiclFr4CafZJPXdzEDMSHZKKQ9KeE5h23NOetMI-A=.1d794b6e-e744-43b8-8407-2a00f24e81e9@github.com> References: <-hubTcbRW128C1Uuvoq1xx9KfIFLnYGh8gUB757p6iA=.3596b81a-b8b8-491a-a709-7268e3300a92@github.com> <7vaiclFr4CafZJPXdzEDMSHZKKQ9KeE5h23NOetMI-A=.1d794b6e-e744-43b8-8407-2a00f24e81e9@github.com> Message-ID: On Fri, 9 Jan 2026 10:22:23 GMT, Jatin Bhateja wrote: >> test/hotspot/jtreg/compiler/c2/TestConstantMultiplier.java line 102: >> >>> 100: private static void runMultBy#{multiplier}I() { >>> 101: int multiplicand = RANDOM.nextInt(); >>> 102: Verify.checkEQ(#{multiplier} * multiplicand, testMultBy#{multiplier}I(multiplicand)); >> >> I think that the `@Run` method also gets compiled, so probably both sides of the verification are compiled. Is that your intention? Probably not, right? > > I didn't follow it, I don't intend to invoke Run in StandAlone Mode. Let me clarify: - Your `@Run` gets invoked many times, eventually it will compile. - You invoke the `testMultBy`, and eventually it will get compiled. - Now, both the multiplication in the test, and the run method are compiled. If there was a bug, it would be the same wrong result in the test and run, verification would pass, and we would not catch the bug. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2676027884 From stefank at openjdk.org Fri Jan 9 12:47:08 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 9 Jan 2026 12:47:08 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 08:50:10 GMT, Aleksey Shipilev wrote: > > > That peeking involves no GC action. > > > > > > Not directly related to this PR, but this caught my eyes. Do you have more information about this somewhere? On the surface this sounds incorrect for ZGC, so I'd like to make sure that there's no bug lurking in there. > > IIRC, it is somewhere here in `BarrierSetAssembler::c2i_entry_barrier`: > > ``` > void BarrierSetAssembler::c2i_entry_barrier(MacroAssembler* masm) { > ... > __ movptr(tmp1, Address(tmp1, ClassLoaderData::holder_offset())); > __ resolve_weak_handle(tmp1, tmp2); // <--- does IN_NATIVE | ON_PHANTOM_OOP_REF inside > __ cmpptr(tmp1, 0); > __ jcc(Assembler::notEqual, method_live); > ... > } > ``` > > Again, IIRC, the code that is emitted in this method is part of C2I adapter, so it is stored in AOT cache. That operation performs a GC action. It performs a GC load barrier when running with ZGC. I'm quite out-of-context here so I probably misunderstand some of what's being said here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3728764548 From coleenp at openjdk.org Fri Jan 9 13:21:56 2026 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 9 Jan 2026 13:21:56 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. These are all excellent questions. I ran into this with running the valhalla tests with --enable-preview and -XX:+UseZGC, and this fixed it. I supposed that that the valhalla adapters were using this method so it needed to be stored. Will ask someone who knows. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3728883489 From epeter at openjdk.org Fri Jan 9 13:36:31 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jan 2026 13:36:31 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector mask relative code in c2 In-Reply-To: References: Message-ID: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> On Fri, 9 Jan 2026 01:36:50 GMT, Xiaohong Gong wrote: > The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific > features, making the related code in HotSpot difficult to understand and review. > > This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and > maintainability. > > Note: This patch only adds comments; no functional changes are made. Nice work, thanks for taking the time for this, much appreciated! On the whole I'm super happy with this, but left a few extra comments :) src/hotspot/share/opto/type.cpp line 2452: > 2450: // stored in a predicate/mask register. > 2451: // - Returns a normal vector type (i.e. TypeVectA ~ TypeVectZ) otherwise, where > 2452: // the vector mask is stored in a vector register. The first case is `PVectMask`, and the second `NVectMask`, right? src/hotspot/share/opto/vectornode.hpp line 81: > 79: // the masked nodes. > 80: // > 81: // For example, "AddVBNode" might have two versions: Might? Suggestion: // For example: src/hotspot/share/opto/vectornode.hpp line 1478: > 1476: }; > 1477: > 1478: //-------------------------- Vector mask broadcast ------------------------------ A nit that has been bothering me for a while: I would just remove all the "title lines" with the `------`. They don't really add anything. And they are currently inconsistent in this file anyway. Some are a more of a description like here. Some repeat the node name. And in some cases they are missing anyway. src/hotspot/share/opto/vectornode.hpp line 1739: > 1737: VectorRearrangeNode(Node* vec1, Node* shuffle) > 1738: : VectorNode(vec1, shuffle, vec1->bottom_type()->is_vect()) { > 1739: // assert(mask->is_VectorMask(), "VectorBlendNode requires that third argument be a mask"); Can you add a comment for Rearrange as well? src/hotspot/share/opto/vectornode.hpp line 1841: > 1839: > 1840: // Select elements from two source vectors based on the wrapped indexes held in > 1841: // the first vector. Can you improve the documentation a little? Also: "first vector" might be understood to refer to `src1`, but that's not the case, right? You could base the description on `selectFrom`: https://download.java.net/java/early_access/jdk26/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#selectFrom(jdk.incubator.vector.Vector,jdk.incubator.vector.Vector) src/hotspot/share/opto/vectornode.hpp line 1855: > 1853: //------------------------------VectorLoadShuffleNode------------------------------ > 1854: // The target may not directly support the rearrange operation for an element type. > 1855: // In those cases, we can transform the rearrange into a different element type. Can you specify a bit more about the inputs and outputs, and what exactly the transformation does? src/hotspot/share/opto/vectornode.hpp line 1874: > 1872: // Convert a "BVectMask" into a platform-specific vector mask (either "NVectMask" > 1873: // or "PVectMask"). > 1874: class VectorLoadMaskNode : public VectorNode { I'd love to rename this. Because it is (as you say in the comments) a conversion, and not a "load" (memory op). What about `VectorConvertBooleans2MaskNode`. And below, rename `VectorStoreMaskNode` to `VectorConvertMask2BooleansNode`. You may have an even better idea. ------------- PR Review: https://git.openjdk.org/jdk/pull/29130#pullrequestreview-3643441539 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2675764381 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2675755496 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2676211590 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2676174171 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2676190790 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2676195417 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2676205674 From epeter at openjdk.org Fri Jan 9 13:36:32 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jan 2026 13:36:32 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector mask relative code in c2 In-Reply-To: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Fri, 9 Jan 2026 13:26:52 GMT, Emanuel Peter wrote: >> The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific >> features, making the related code in HotSpot difficult to understand and review. >> >> This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and >> maintainability. >> >> Note: This patch only adds comments; no functional changes are made. > > src/hotspot/share/opto/vectornode.hpp line 1855: > >> 1853: //------------------------------VectorLoadShuffleNode------------------------------ >> 1854: // The target may not directly support the rearrange operation for an element type. >> 1855: // In those cases, we can transform the rearrange into a different element type. > > Can you specify a bit more about the inputs and outputs, and what exactly the transformation does? Is this a memory instruction? Because it is called `Load`. If not: can we rename it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2676197752 From epeter at openjdk.org Fri Jan 9 13:49:51 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jan 2026 13:49:51 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v6] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 10:42:27 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge branch 'master' into topic.uses-min-max > - Module also needed in the wrapper test class > - It's the templated test that needs the module > - Add missing module to test > - Merge branch 'master' into topic.uses-min-max > - Test Float16 > - Only apply to uses that match original IR node > - Merge branch 'master' into topic.uses-min-max > - Use is_MinMax() instead of spelling out individual Min/Max opcodes > - Refactor MaxNode to MinMaxNode and add is_MinMax() query > - ... and 3 more: https://git.openjdk.org/jdk/compare/067fd3cb...7229e345 Looks great, thanks for working on this Galder! src/hotspot/share/opto/addnode.hpp line 334: > 332: > 333: public: > 334: MinMaxNode( Node *in1, Node *in2 ) : AddNode(in1,in2) { Suggestion: MinMaxNode(Node* in1, Node* in2) : AddNode(in1, in2) { Might as well fix code style while touching it. test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java line 26: > 24: /* > 25: * @test > 26: * @bug 8354244 Bug ID is different to issue number. Intentional? Suggestion: * @bug 8373134 ------------- PR Review: https://git.openjdk.org/jdk/pull/28895#pullrequestreview-3643975774 PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2676235207 PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2676243435 From epeter at openjdk.org Fri Jan 9 13:49:54 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jan 2026 13:49:54 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v6] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 13:40:13 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Merge branch 'master' into topic.uses-min-max >> - Module also needed in the wrapper test class >> - It's the templated test that needs the module >> - Add missing module to test >> - Merge branch 'master' into topic.uses-min-max >> - Test Float16 >> - Only apply to uses that match original IR node >> - Merge branch 'master' into topic.uses-min-max >> - Use is_MinMax() instead of spelling out individual Min/Max opcodes >> - Refactor MaxNode to MinMaxNode and add is_MinMax() query >> - ... and 3 more: https://git.openjdk.org/jdk/compare/067fd3cb...7229e345 > > src/hotspot/share/opto/addnode.hpp line 334: > >> 332: >> 333: public: >> 334: MinMaxNode( Node *in1, Node *in2 ) : AddNode(in1,in2) { > > Suggestion: > > MinMaxNode(Node* in1, Node* in2) : AddNode(in1, in2) { > > Might as well fix code style while touching it. Do the same below ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2676235821 From epeter at openjdk.org Fri Jan 9 13:57:32 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jan 2026 13:57:32 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <_gdObRdkYkS7d3fQQ6bcms709TpeM1IQtuPJtI0fcyE=.073d0496-8dfc-48a9-aee8-b64b408a6e62@github.com> Message-ID: On Mon, 10 Nov 2025 16:07:35 GMT, Fei Gao wrote: >> @fg1417 Are you still working on this? > > Hi @eme64, many thanks for your review. It?s really comprehensive and insightful. I?ve given a thumbs-up to all the comments that have been resolved in this commit. > >> I have one concern: We now have changed the branches. There is now a long sequence of branches if we have very few iterations, so that we only go through pre and post loop. It would be interesting to see what the performance difference is between master and patch. > > Regarding this concern, I re-ran the microbenchmarks (now merged with the existing `VectorThroughputForIterationCount.java` ), named as `bench03*_drain_memoryBound`, and collected data across different platforms, including `128-bit` and `256-bit` `AArch64` machines as well as a `512-bit` `x86` machine. > > To summarize, I observe a minor performance regression for small-iteration loops on the `128-bit` and `256-bit` `AArch64` platforms. For larger-iteration loops, there is either a performance improvement or no noticeable change. The performance data on the `512-bit x86` machine shows a similar trend, though the regression is more significant. > > **The test range of `ITERATION_COUNT` is `0?300`. For larger `ITERATION_COUNT` values, there is either a performance improvement or no noticeable change, so those results are omitted. The following data only shows cases with regressions.** > > > (FIXED_OFFSET) (RANDOMIZE_OFFSETS) (REPETITIONS) (seed) Mode Cnt > 0 TRUE 1024 42 avgt 3 > > `Diff = (patch - master) / master` > > On `128-bit aarch64` platform: > > Benchmark (ITERATION_COUNT) Units Diff > bench031B_drain_memoryBound 1 ns/op 15.15% > bench031B_drain_memoryBound 2 ns/op 10.89% > bench031B_drain_memoryBound 3 ns/op 9.27% > bench031B_drain_memoryBound 4 ns/op 7.39% > bench031B_drain_memoryBound 5 ns/op 5.86% > bench031B_drain_memoryBound 6 ns/op 5.31% > bench031B_drain_memoryBound 7 ns/op 4.39% > bench031B_drain_memoryBound 8 ns/op 4.27% > bench031B_drain_memoryBound 9 ns/op 3.60% > bench031B_drain_memoryBound 10 ns/op 3.11% > bench031B_drain_memoryBound 11 ns/op 2.97% > bench031B_drain_memoryBound 12 ns/op 3.19% > bench031B_drain_memoryBound 13 ns/op 2.90% > bench031B_drain_memoryBound 14 ns/op 2.68% > bench031B_drain_memoryBound 15 ns/op 2.37% > bench031B_drain_memoryBound 16 ns/op 2.44% > bench031B_drain_memoryBound 17 ns/op 2.11% > bench031B_drain_memoryBound 18 ns... @fg1417 I hope you had a good start into the new year. I'd love to make this PR a bit of a priority in the next weeks. Would you mind synching with master and fixing merge conflicts? I'd review, run testing and look into running some benchmarks myself. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3729006498 From jbhateja at openjdk.org Fri Jan 9 14:06:41 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 Jan 2026 14:06:41 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v7] In-Reply-To: References: Message-ID: <8ccfiOiaL6cfTltcKo3FxCUP1X8ECsW6JchUFt2YSkI=.5c44da1f-c19c-42ab-80d9-7e438a84536b@github.com> > Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET > Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. > Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. > > Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. > > > X + X * 1 = 2X > X + X * 2 = 3X > X + X * 4 = 5X > X + X * 8 = 9X > > > Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the > scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. > > > BASE INDEX SCALE MULTIPLER > X X 1 2 (Terminal) > X X 2 3 (Terminal) > X X 4 5 (Terminal) > X X 8 9 (Terminal) > 3X 3X 1 6 > X 3X 2 7 > 5X 5X 1 10 > X 5X 2 11 > X 3X 4 13 > 5X 5X 2 15 > X 2X 8 17 > 9X 9X 1 18 > X 9X 2 19 > X 5X 4 21 > 5X 5X 4 25 > 9X 9X 2 27 > X 9X 4 37 > X 5X 8 41 > 9X 9X 4 45 > X 9X 8 73 > 9X 9X 8 81 > > > All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. > > Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. > > > System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- > Baseline:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min > ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min > > > Withopt:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 283.827 ops/min > ConstantMultiplierOptimization... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8373480 - Extending micro and jtreg tests for memory patterns - Review comments resolutions - Minor cleanup in Template-Framework test - Using template-framework for JTREG test generation - Adding IR framework tests - Adding benchmark - 8373480: Optimize constant input multiplication using LEA instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28759/files - new: https://git.openjdk.org/jdk/pull/28759/files/b7756730..13c71c16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=05-06 Stats: 46866 lines in 3224 files changed: 20314 ins; 7158 del; 19394 mod Patch: https://git.openjdk.org/jdk/pull/28759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28759/head:pull/28759 PR: https://git.openjdk.org/jdk/pull/28759 From jbhateja at openjdk.org Fri Jan 9 14:06:42 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 Jan 2026 14:06:42 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: References: <-hubTcbRW128C1Uuvoq1xx9KfIFLnYGh8gUB757p6iA=.3596b81a-b8b8-491a-a709-7268e3300a92@github.com> <7vaiclFr4CafZJPXdzEDMSHZKKQ9KeE5h23NOetMI-A=.1d794b6e-e744-43b8-8407-2a00f24e81e9@github.com> Message-ID: <-KKGCQRZroe5N_-ftHQBuyu2M2ukHJ8yZTCxPBngJGA=.de3f8524-5ab0-40d5-b024-dab49731d217@github.com> On Fri, 9 Jan 2026 12:32:39 GMT, Emanuel Peter wrote: >> I didn't follow it, I don't intend to invoke Run in StandAlone Mode. > > Let me clarify: > - Your `@Run` gets invoked many times, eventually it will compile. > - You invoke the `testMultBy`, and eventually it will get compiled. > - Now, both the multiplication in the test, and the run method are compiled. If there was a bug, it would be the same wrong result in the test and run, verification would pass, and we would not catch the bug. Got it, fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2676304297 From fgao at openjdk.org Fri Jan 9 14:20:00 2026 From: fgao at openjdk.org (Fei Gao) Date: Fri, 9 Jan 2026 14:20:00 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <_gdObRdkYkS7d3fQQ6bcms709TpeM1IQtuPJtI0fcyE=.073d0496-8dfc-48a9-aee8-b64b408a6e62@github.com> Message-ID: On Mon, 10 Nov 2025 16:07:35 GMT, Fei Gao wrote: >> @fg1417 Are you still working on this? > > Hi @eme64, many thanks for your review. It?s really comprehensive and insightful. I?ve given a thumbs-up to all the comments that have been resolved in this commit. > >> I have one concern: We now have changed the branches. There is now a long sequence of branches if we have very few iterations, so that we only go through pre and post loop. It would be interesting to see what the performance difference is between master and patch. > > Regarding this concern, I re-ran the microbenchmarks (now merged with the existing `VectorThroughputForIterationCount.java` ), named as `bench03*_drain_memoryBound`, and collected data across different platforms, including `128-bit` and `256-bit` `AArch64` machines as well as a `512-bit` `x86` machine. > > To summarize, I observe a minor performance regression for small-iteration loops on the `128-bit` and `256-bit` `AArch64` platforms. For larger-iteration loops, there is either a performance improvement or no noticeable change. The performance data on the `512-bit x86` machine shows a similar trend, though the regression is more significant. > > **The test range of `ITERATION_COUNT` is `0?300`. For larger `ITERATION_COUNT` values, there is either a performance improvement or no noticeable change, so those results are omitted. The following data only shows cases with regressions.** > > > (FIXED_OFFSET) (RANDOMIZE_OFFSETS) (REPETITIONS) (seed) Mode Cnt > 0 TRUE 1024 42 avgt 3 > > `Diff = (patch - master) / master` > > On `128-bit aarch64` platform: > > Benchmark (ITERATION_COUNT) Units Diff > bench031B_drain_memoryBound 1 ns/op 15.15% > bench031B_drain_memoryBound 2 ns/op 10.89% > bench031B_drain_memoryBound 3 ns/op 9.27% > bench031B_drain_memoryBound 4 ns/op 7.39% > bench031B_drain_memoryBound 5 ns/op 5.86% > bench031B_drain_memoryBound 6 ns/op 5.31% > bench031B_drain_memoryBound 7 ns/op 4.39% > bench031B_drain_memoryBound 8 ns/op 4.27% > bench031B_drain_memoryBound 9 ns/op 3.60% > bench031B_drain_memoryBound 10 ns/op 3.11% > bench031B_drain_memoryBound 11 ns/op 2.97% > bench031B_drain_memoryBound 12 ns/op 3.19% > bench031B_drain_memoryBound 13 ns/op 2.90% > bench031B_drain_memoryBound 14 ns/op 2.68% > bench031B_drain_memoryBound 15 ns/op 2.37% > bench031B_drain_memoryBound 16 ns/op 2.44% > bench031B_drain_memoryBound 17 ns/op 2.11% > bench031B_drain_memoryBound 18 ns... > @fg1417 I hope you had a good start into the new year. Hi @eme64, Happy New Year! > I'd love to make this PR a bit of a priority in the next weeks. Would you mind synching with master and fixing merge conflicts? Yes, absolutely. I?ve rebased it internally, and the new commit is currently under testing. Once the testing is complete, I?ll push it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3729090839 From duke at openjdk.org Fri Jan 9 14:50:14 2026 From: duke at openjdk.org (Ferenc Rakoczi) Date: Fri, 9 Jan 2026 14:50:14 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions Message-ID: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. ------------- Commit messages: - 8374755: ML-KEM's 12-bit decompression uses incorrect assertions Changes: https://git.openjdk.org/jdk/pull/29141/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29141&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374755 Stats: 90 lines in 2 files changed: 4 ins; 73 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/29141.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29141/head:pull/29141 PR: https://git.openjdk.org/jdk/pull/29141 From mchevalier at openjdk.org Fri Jan 9 15:40:35 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 9 Jan 2026 15:40:35 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <47G92GiYmF8jY_vJ2hUNaQQoGBLqpkg_pYJT2r4jn9g=.94cb4ca9-8dfd-4a31-b77b-dbcc5b8f29d0@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> <47G92GiYmF8jY_vJ2hUNaQQoGBLqpkg_pYJT2r4jn9g=.94cb4ca9-8dfd-4a31-b77b-dbcc5b8f29d0@github.com> Message-ID: On Fri, 9 Jan 2026 08:39:43 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/compile.cpp line 2196: >> >>> 2194: >>> 2195: if (StressIncrementalInlining) { >>> 2196: shuffle_late_inlines(); >> >> It shuffles initial list, but doesn't have any effects on elements added during incremental inlining. Do we want to shuffle them as well? > > Good point. I don't see why we wouldn't want that. Actually, I'm not sure. I see that https://github.com/openjdk/jdk/blob/8737a8ca73952d60129e7fc2f7e17eea3b800af7/src/hotspot/share/opto/compile.hpp#L480 and https://github.com/openjdk/jdk/blob/8737a8ca73952d60129e7fc2f7e17eea3b800af7/src/hotspot/share/opto/compile.hpp#L1059-L1062 But reading the code, I don't understand why it's important to insert just at this position, and not simply at this position or after (and then, why not at the end, to avoid shifting?). In `inline_incrementally_one` https://github.com/openjdk/jdk/blob/8737a8ca73952d60129e7fc2f7e17eea3b800af7/src/hotspot/share/opto/compile.cpp#L2123-L2126 We seem to stop as soon as something happens, so we wouldn't really use the fact that the coming elements in `_late_inlines` are related. Overall, I don't see where this assumption of depth-first is used. A little bit of testing doesn't catch fire when inserting potentially after `_late_inlines_pos`. I'll read and test more, but maybe someone already has more context? Maybe @iwanowww or @rwestrel from looking at `git blame`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29110#discussion_r2676647403 From galder at openjdk.org Fri Jan 9 15:59:39 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 9 Jan 2026 15:59:39 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v7] In-Reply-To: References: Message-ID: > Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. > > Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. > > I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. > > To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). > > During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. > > I've run tier1-3 tests on linux/x64 successfully. Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/addnode.hpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/phaseX.cpp Co-authored-by: Beno?t Maillard ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28895/files - new: https://git.openjdk.org/jdk/pull/28895/files/7229e345..fe71e03f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=05-06 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28895/head:pull/28895 PR: https://git.openjdk.org/jdk/pull/28895 From galder at openjdk.org Fri Jan 9 15:59:44 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 9 Jan 2026 15:59:44 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v6] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 15:49:55 GMT, Beno?t Maillard wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Merge branch 'master' into topic.uses-min-max >> - Module also needed in the wrapper test class >> - It's the templated test that needs the module >> - Add missing module to test >> - Merge branch 'master' into topic.uses-min-max >> - Test Float16 >> - Only apply to uses that match original IR node >> - Merge branch 'master' into topic.uses-min-max >> - Use is_MinMax() instead of spelling out individual Min/Max opcodes >> - Refactor MaxNode to MinMaxNode and add is_MinMax() query >> - ... and 3 more: https://git.openjdk.org/jdk/compare/067fd3cb...7229e345 > > src/hotspot/share/opto/phaseX.cpp line 2605: > >> 2603: } >> 2604: } >> 2605: // Check for max(a, max(b, c)) patterns > > Suggestion: > > // Check for Max/Min(A, Max/Min(B, C)) where A == B or A == C > > Nit: I find it nice when we have the exact same string as in the comment where the optimization actually takes place, so we can just find it with `ctrl+f` easily. Yeah sure, makes sense. I've integrated the change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2676713054 From galder at openjdk.org Fri Jan 9 16:14:17 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 9 Jan 2026 16:14:17 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v8] In-Reply-To: References: Message-ID: > Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. > > Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. > > I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. > > To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). > > During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. > > I've run tier1-3 tests on linux/x64 successfully. Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java Co-authored-by: Emanuel Peter - Fix style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28895/files - new: https://git.openjdk.org/jdk/pull/28895/files/fe71e03f..2c0b0e43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=06-07 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28895/head:pull/28895 PR: https://git.openjdk.org/jdk/pull/28895 From galder at openjdk.org Fri Jan 9 16:14:18 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 9 Jan 2026 16:14:18 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v6] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 13:40:27 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/addnode.hpp line 334: >> >>> 332: >>> 333: public: >>> 334: MinMaxNode( Node *in1, Node *in2 ) : AddNode(in1,in2) { >> >> Suggestion: >> >> MinMaxNode(Node* in1, Node* in2) : AddNode(in1, in2) { >> >> Might as well fix code style while touching it. > > Do the same below ;) Sure, I've fixed other places too ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2676762121 From galder at openjdk.org Fri Jan 9 16:14:23 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 9 Jan 2026 16:14:23 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v6] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 13:42:50 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Merge branch 'master' into topic.uses-min-max >> - Module also needed in the wrapper test class >> - It's the templated test that needs the module >> - Add missing module to test >> - Merge branch 'master' into topic.uses-min-max >> - Test Float16 >> - Only apply to uses that match original IR node >> - Merge branch 'master' into topic.uses-min-max >> - Use is_MinMax() instead of spelling out individual Min/Max opcodes >> - Refactor MaxNode to MinMaxNode and add is_MinMax() query >> - ... and 3 more: https://git.openjdk.org/jdk/compare/067fd3cb...7229e345 > > test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java line 26: > >> 24: /* >> 25: * @test >> 26: * @bug 8354244 > > Bug ID is different to issue number. Intentional? > Suggestion: > > * @bug 8373134 Not intentional, I think I copied from a previous change ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2676764771 From dlunden at openjdk.org Fri Jan 9 20:19:39 2026 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 9 Jan 2026 20:19:39 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v6] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 02:15:22 GMT, Jatin Bhateja wrote: >> Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. >> >> Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> PS: Validation performed using Intel SDE 9.58. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions I agree with these changes. Nice that the assert caught this case! I'm running some tests for the changeset over the weekend, for both JDK 26 and JDK 27. I'll report back on Monday, and then you can go ahead with the backport @missa-prime (if no issues show up in testing). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28999#issuecomment-3730438445 From vlivanov at openjdk.org Fri Jan 9 20:50:13 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 9 Jan 2026 20:50:13 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> <47G92GiYmF8jY_vJ2hUNaQQoGBLqpkg_pYJT2r4jn9g=.94cb4ca9-8dfd-4a31-b77b-dbcc5b8f29d0@github.com> Message-ID: On Fri, 9 Jan 2026 15:36:28 GMT, Marc Chevalier wrote: >> Good point. I don't see why we wouldn't want that. > > Actually, I'm not sure. I see that > https://github.com/openjdk/jdk/blob/8737a8ca73952d60129e7fc2f7e17eea3b800af7/src/hotspot/share/opto/compile.hpp#L480 > and > https://github.com/openjdk/jdk/blob/8737a8ca73952d60129e7fc2f7e17eea3b800af7/src/hotspot/share/opto/compile.hpp#L1059-L1062 > But reading the code, I don't understand why it's important to insert just at this position, and not simply at this position or after (and then, why not at the end, to avoid shifting?). In `inline_incrementally_one` > https://github.com/openjdk/jdk/blob/8737a8ca73952d60129e7fc2f7e17eea3b800af7/src/hotspot/share/opto/compile.cpp#L2123-L2126 > We seem to stop as soon as something happens, so we wouldn't really use the fact that the coming elements in `_late_inlines` are related. Overall, I don't see where this assumption of depth-first is used. > > A little bit of testing doesn't catch fire when inserting potentially after `_late_inlines_pos`. I'll read and test more, but maybe someone already has more context? Maybe @iwanowww or @rwestrel from looking at `git blame`? During late inlining we want to preserve relative order or candidate sites for late inlining. It matters when compiler runs out of inlining budget. If it appends candidates instead, it turns DFS call site traversal into BFS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29110#discussion_r2677580953 From chagedorn at openjdk.org Fri Jan 9 22:47:33 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 9 Jan 2026 22:47:33 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v28] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 17:08:57 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: > > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - Update license header years > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - remove trailing whitespaces > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - additional suggestions from code review > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - fix trip counter loop-variant detection > - fix bad merge with ctrl_is_member() > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > > # Conflicts: > # src/hotspot/share/opto/loopnode.cpp > - ... and 40 more: https://git.openjdk.org/jdk/compare/640343f7...7783d609 There are quite some failures with the same assert (probably all related). Can be triggered, for example, by running `compiler/predicates/assertion/TestAssertionPredicates.java#NoLoopPredicationXbatch` with `-XX:+UseSerialGC`: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/mach5/mesos/work_dir/slaves/da1065b5-7b94-4f0d-85e9-a3a252b9a32e-S11864/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/c6afc1de-b432-44d4-bd71-2c035e46dc9e/runs/88cff2b5-6582-4c32-8cb2-92c8c5d2feeb/workspace/open/src/hotspot/share/opto/loopnode.hpp:1450), pid=182310, tid=182326 # Error: assert(!has_ctrl(n)) failed .......... Current CompileTask: C2:300 95 b 4 compiler.predicates.assertion.TestAssertionPredicates::testTrySplitUpNonOpaqueExpressionNode (163 bytes) Stack: [0x00007f27d75cc000,0x00007f27d76cc000], sp=0x00007f27d76c6b00, free space=1002k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x156bff8] PhaseIdealLoop::get_loop(Node const*) const+0x68 (loopnode.hpp:1450) V [libjvm.so+0x15a07f7] IdealLoopTree::remove_safepoints(PhaseIdealLoop*, bool)+0x167 (loopnode.cpp:4672) V [libjvm.so+0x15b7dee] IdealLoopTree::counted_loop(PhaseIdealLoop*)+0x11e (loopnode.cpp:4700) V [libjvm.so+0x15b7d7a] IdealLoopTree::counted_loop(PhaseIdealLoop*)+0xaa (loopnode.cpp:4719) V [libjvm.so+0x15b7d7a] IdealLoopTree::counted_loop(PhaseIdealLoop*)+0xaa (loopnode.cpp:4719) V [libjvm.so+0x15bcc07] PhaseIdealLoop::build_and_optimize()+0xaf7 (loopnode.cpp:5285) V [libjvm.so+0xbb8130] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x4c0 (loopnode.hpp:1226) V [libjvm.so+0xbb1995] Compile::Optimize()+0x685 (compile.cpp:2466) V [libjvm.so+0xbb5173] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x2023 (compile.cpp:862) V [libjvm.so+0x9cc3e8] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x498 (c2compiler.cpp:147) V [libjvm.so+0xbc4660] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x780 (compileBroker.cpp:2345) V [libjvm.so+0xbc5ec0] CompileBroker::compiler_thread_loop()+0x530 (compileBroker.cpp:1989) V [libjvm.so+0x112635b] JavaThread::thread_main_inner()+0x13b (javaThread.cpp:776) V [libjvm.so+0x1bb30b6] Thread::call_run()+0xb6 (thread.cpp:242) V [libjvm.so+0x1808c98] thread_native_entry(Thread*)+0x118 (os_linux.cpp:860) The branch with the old vs. new code also hit the diff assert for a closed test. I will check next week if I can extract a reproducer to share. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3730828674 From duke at openjdk.org Fri Jan 9 22:52:10 2026 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 9 Jan 2026 22:52:10 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v2] In-Reply-To: <7gLxGLuYKPYchraxk4Z4hh_ThfgGsGGdYAL2LVaDBvg=.63281063-7480-431f-b24e-1304ef92326e@github.com> References: <7gLxGLuYKPYchraxk4Z4hh_ThfgGsGGdYAL2LVaDBvg=.63281063-7480-431f-b24e-1304ef92326e@github.com> Message-ID: On Wed, 7 Jan 2026 16:36:52 GMT, Volodymyr Paprotski wrote: >>> "Insert 0b0000 nibble after every third nibble". I only have two questions, looks good otherwise. >> >> Yes, that is the idea. >> >>> >>> PS: things I've considered: >>> >>> * Loop controls? >>> >>> * ML_KEM.java guarantees (per callee comment and assert) lengths are multiple of 64 >>> * also same as original code >>> * Why not simply a vpermb? Have zeroes already from the masked load with k1.. >> >> It *is* using vpermb (evpermb() generates the EVEX encoded VPERMB) >> >>> >>> * shuffle granularity is actually 4-bits, not 8-bits >> >> Really? In what instruction? I hadn't found it in the manual. >> >>> * logical shift already zeroes top bits, so `vpand` not required? >> >> Only every 2nd byte is shifted, the rest needs to be masked. >>> >>> * odd columns not shifted, so still have extra bits that need clearing >> >> Yes, that is what the vpand does. (actually, it also (unnecessarily) masks the shifted bytes. >> >>> * Why VBMI? >>> >>> * needed for `evpermb` >> >> Yes. > >> > "Insert 0b0000 nibble after every third nibble". I only have two questions, looks good otherwise. > > @ferakocz apologies for the misunderstanding; everything after the PS was not a request for change.. those were the questions that occurred to me and I found the answer.. The only reason I put them in was for the next reviewer. Or if I am wrong, e.g. no, I did not find a better instruction than vpermb either. (My first reaction to seeing the java code, was 'oh, this is easy, just a `vpermb`, then had to reason out why not..) @vpaprotsk I've reran related regression tests and benchmarks after implementing your code review comments and remerging with the master branch. These have all came back with the expected results. Could you reapprove after the merge commit? Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28815#issuecomment-3730835813 From chagedorn at openjdk.org Fri Jan 9 22:51:14 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 9 Jan 2026 22:51:14 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v3] In-Reply-To: References: Message-ID: <7AGclBaK6DfxQoJmjWgcjUamtX8ueUTJC8PdC9hC2dU=.e1881c83-2f73-4556-b00b-fb10be886bec@github.com> On Thu, 8 Jan 2026 11:53:25 GMT, Roland Westrelin wrote: >> Sounds good, let's do it separately. > > I filed: https://bugs.openjdk.org/browse/JDK-8374789 Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2677831123 From vpaprotski at openjdk.org Fri Jan 9 23:21:24 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 9 Jan 2026 23:21:24 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> References: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> Message-ID: On Thu, 8 Jan 2026 17:59:35 GMT, Shawn M Emery wrote: >> This change allows use of the AVX512_VBMI instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.3 to 0.6%, encapsulation is 0.4 to 1.7%, and decapsulation is 0.3 to 1.9%. >> >> Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. > > Shawn M Emery has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge with mainline > - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI > Change Swap to Dup named function/variable > Check for only VBMI support (not VBMI2) > - Update copyright year > - Merge with mainline > - Swap parameter operation with source > - Remove wrong mask from evpsrlvw > - Reverse ordering for vpermb and vpsrlvw instructions > - Switch from vpshldvw to vpsrlvw > - Fix whitespaces > - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 > @vpaprotsk I've reran related regression tests and benchmarks after implementing your code review comments and remerging with the master branch. These have all came back with the expected results. Could you reapprove after the merge commit? Thank you. Sure! Though my approval wont let you integrate, I am a committer, not a reviewer. Maybe we can ask @ascarpino ------------- Marked as reviewed by vpaprotski (Committer). PR Review: https://git.openjdk.org/jdk/pull/28815#pullrequestreview-3645900397 From dlong at openjdk.org Fri Jan 9 23:38:36 2026 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Jan 2026 23:38:36 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v8] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 16:14:17 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java > > Co-authored-by: Emanuel Peter > - Fix style Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28895#pullrequestreview-3645940347 From dlong at openjdk.org Sat Jan 10 00:20:11 2026 From: dlong at openjdk.org (Dean Long) Date: Sat, 10 Jan 2026 00:20:11 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v4] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 08:16:55 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - more > - more > - review > - Merge branch 'master' into JDK-8373343 > - review > - review > - review > - merge > - more > - more > - ... and 3 more: https://git.openjdk.org/jdk/compare/6b450b5c...b20f41db src/hotspot/share/opto/macroArrayCopy.cpp line 936: > 934: end = transform_later(new AndXNode(end, MakeConX(~end_round)) ); > 935: mem = ClearArrayNode::clear_memory(ctrl, mem, dest, > 936: start_con, end, false,&_igvn); Suggestion: start_con, end, false, &_igvn); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2677995370 From jbhateja at openjdk.org Sat Jan 10 03:26:20 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 10 Jan 2026 03:26:20 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> References: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> Message-ID: On Thu, 8 Jan 2026 17:59:35 GMT, Shawn M Emery wrote: >> This change allows use of the AVX512_VBMI instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.3 to 0.6%, encapsulation is 0.4 to 1.7%, and decapsulation is 0.3 to 1.9%. >> >> Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. > > Shawn M Emery has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge with mainline > - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI > Change Swap to Dup named function/variable > Check for only VBMI support (not VBMI2) > - Update copyright year > - Merge with mainline > - Swap parameter operation with source > - Remove wrong mask from evpsrlvw > - Reverse ordering for vpermb and vpsrlvw instructions > - Switch from vpshldvw to vpsrlvw > - Fix whitespaces > - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 876: > 874: __ evmovdquq(xmm22, Address(perms), Assembler::AVX_512bit); > 875: > 876: __ BIND(VBMILoop); Better to align loop sarting address to OptoLoopAlignment ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2678272848 From jbhateja at openjdk.org Sat Jan 10 03:26:21 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 10 Jan 2026 03:26:21 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: References: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> Message-ID: On Sat, 10 Jan 2026 03:18:56 GMT, Jatin Bhateja wrote: >> Shawn M Emery has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge with mainline >> - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI >> Change Swap to Dup named function/variable >> Check for only VBMI support (not VBMI2) >> - Update copyright year >> - Merge with mainline >> - Swap parameter operation with source >> - Remove wrong mask from evpsrlvw >> - Reverse ordering for vpermb and vpsrlvw instructions >> - Switch from vpshldvw to vpsrlvw >> - Fix whitespaces >> - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 > > src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 876: > >> 874: __ evmovdquq(xmm22, Address(perms), Assembler::AVX_512bit); >> 875: >> 876: __ BIND(VBMILoop); > > Better to align loop sarting address to OptoLoopAlignment I will run the micro benchmark on AMD Turin and report by back early next week. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2678273572 From duke at openjdk.org Sat Jan 10 07:15:50 2026 From: duke at openjdk.org (Shawn M Emery) Date: Sat, 10 Jan 2026 07:15:50 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: References: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> Message-ID: <28KSbLDo353fDhRsW-5aaLpYvQ9XGPnyOqO1YN_LnPs=.eaa3ddbf-9228-4fce-a973-dda88d5abf75@github.com> On Sat, 10 Jan 2026 03:20:18 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 876: >> >>> 874: __ evmovdquq(xmm22, Address(perms), Assembler::AVX_512bit); >>> 875: >>> 876: __ BIND(VBMILoop); >> >> Better to align loop sarting address to OptoLoopAlignment > > I will run the micro benchmark on AMD Turin and report back by early next week. > Better to align loop sarting address to OptoLoopAlignment For parity, should I do this for the other labels in the file as well? > I will run the micro benchmark on AMD Turin and report by back early next week. That would be great, thank you for doing this! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2678401684 From duke at openjdk.org Sun Jan 11 06:59:20 2026 From: duke at openjdk.org (Shawn M Emery) Date: Sun, 11 Jan 2026 06:59:20 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v5] In-Reply-To: References: Message-ID: > This change allows use of the AVX512_VBMI instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.4 to 0.5%, encapsulation is 0.2 to 1.7%, and decapsulation is 0.3 to 2.0%. > > Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: Update to use OptoLoopAlignment for VBMILoop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28815/files - new: https://git.openjdk.org/jdk/pull/28815/files/373b1339..f278a63f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28815/head:pull/28815 PR: https://git.openjdk.org/jdk/pull/28815 From jbhateja at openjdk.org Sun Jan 11 09:55:14 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 11 Jan 2026 09:55:14 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: <28KSbLDo353fDhRsW-5aaLpYvQ9XGPnyOqO1YN_LnPs=.eaa3ddbf-9228-4fce-a973-dda88d5abf75@github.com> References: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> <28KSbLDo353fDhRsW-5aaLpYvQ9XGPnyOqO1YN_LnPs=.eaa3ddbf-9228-4fce-a973-dda88d5abf75@github.com> Message-ID: On Sat, 10 Jan 2026 07:11:48 GMT, Shawn M Emery wrote: >> I will run the micro benchmark on AMD Turin and report back by early next week. > >> Better to align loop sarting address to OptoLoopAlignment > > For parity, should I do this for the other labels in the file as well? > >> I will run the micro benchmark on AMD Turin and report by back early next week. > > That would be great, thank you for doing this! Just a note on LoopAlignment, there are multiple moving parts here, first aligning starting addresses of loop to 64 ([recommendation from Zen5 optimization guide](https://docs.amd.com/v/u/en-US/58455_1.00) section 2.8.3) ensure small loop bodies are not split-across the cache line, if that happens then there is a code entry penalty since for first iteration of loop front-end will have to read multiple L1I cachelines, once its decoded and uops are part of Op-cache (AMD) or DSB (Intel) then uops stream for successive loop iterations are emitted from op-cache. Since op-cache is shared b/w 2 HW threads in SMT configuration hence in case of noisy neighbor scenarios or context-switches we may hit code-entry penalty during lifetime of loop. So its advisable to add alignment in this case for other labels before loops we already have OptoLoopAlignment in place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2679380724 From jbhateja at openjdk.org Sun Jan 11 09:55:15 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 11 Jan 2026 09:55:15 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: References: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> <28KSbLDo353fDhRsW-5aaLpYvQ9XGPnyOqO1YN_LnPs=.eaa3ddbf-9228-4fce-a973-dda88d5abf75@github.com> Message-ID: On Sun, 11 Jan 2026 09:31:03 GMT, Jatin Bhateja wrote: >>> Better to align loop sarting address to OptoLoopAlignment >> >> For parity, should I do this for the other labels in the file as well? >> >>> I will run the micro benchmark on AMD Turin and report by back early next week. >> >> That would be great, thank you for doing this! > > Just a note on LoopAlignment, there are multiple moving parts here, first aligning starting addresses of loop to 64 ([recommendation from Zen5 optimization guide](https://docs.amd.com/v/u/en-US/58455_1.00) section 2.8.3) ensure small loop bodies are not split-across the cache line, if that happens then there is a code entry penalty since for first iteration of loop front-end will have to read multiple L1I cachelines, once its decoded and uops are part of Op-cache (AMD) or DSB (Intel) then uops stream for successive loop iterations are emitted from op-cache. Since op-cache is shared b/w 2 HW threads in SMT configuration hence in case of noisy neighbor scenarios or context-switches we may hit code-entry penalty during lifetime of loop. > > So its advisable to add alignment in this case for other labels before loops we already have OptoLoopAlignment in place. > > Better to align loop sarting address to OptoLoopAlignment > > For parity, should I do this for the other labels in the file as well? > > > I will run the micro benchmark on AMD Turin and report back by early next week. > > That would be great, thank you for doing this! Here are the score on Turin. Baseline: Benchmark (algorithm) (keyLength) (provider) Mode Cnt Score Error Units KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-512 0 thrpt 2 62235.790 ops/s KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-768 0 thrpt 2 38238.390 ops/s KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-1024 0 thrpt 2 24725.512 ops/s Withopt: Benchmark (algorithm) (keyLength) (provider) Mode Cnt Score Error Units KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-512 0 thrpt 2 62483.697 ops/s KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-768 0 thrpt 2 38464.272 ops/s KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-1024 0 thrpt 2 24702.044 ops/s Baseline: Benchmark (algorithm) (provider) Mode Cnt Score Error Units KEMBench.decapsulate ML-KEM-512 thrpt 2 46416.479 ops/s KEMBench.decapsulate ML-KEM-768 thrpt 2 28516.289 ops/s KEMBench.decapsulate ML-KEM-1024 thrpt 2 19250.020 ops/s KEMBench.encapsulate ML-KEM-512 thrpt 2 60374.724 ops/s KEMBench.encapsulate ML-KEM-768 thrpt 2 36226.100 ops/s KEMBench.encapsulate ML-KEM-1024 thrpt 2 23656.223 ops/s Withopt: Benchmark (algorithm) (provider) Mode Cnt Score Error Units KEMBench.decapsulate ML-KEM-512 thrpt 2 46730.153 ops/s KEMBench.decapsulate ML-KEM-768 thrpt 2 28650.349 ops/s KEMBench.decapsulate ML-KEM-1024 thrpt 2 19390.927 ops/s KEMBench.encapsulate ML-KEM-512 thrpt 2 60238.211 ops/s KEMBench.encapsulate ML-KEM-768 thrpt 2 36454.138 ops/s KEMBench.encapsulate ML-KEM-1024 thrpt 2 23649.839 ops/s System was set at fixed frequency of 2.7 Ghz during benchmarking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2679382599 From qamai at openjdk.org Sun Jan 11 12:29:25 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 11 Jan 2026 12:29:25 GMT Subject: RFR: 8374969: Incorrect results of LoadStoreNode::adr_type and SCMemProj::adr_type Message-ID: Hi, This is extracted from #28570 This PR fixes the return value of `LoadStoreNode::adr_type` and `SCMemProj::adr_type`. For the former, it is trivial that we can do what `MemNode` does. And for the latter, the implementation of `ProjNode::adr_type` is adequate. Please kindly review, thanks a lot. ------------- Commit messages: - Fix LoadStoreNode::adr_type and SCMemProj::adr_type Changes: https://git.openjdk.org/jdk/pull/29154/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29154&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374969 Stats: 17 lines in 2 files changed: 9 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29154.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29154/head:pull/29154 PR: https://git.openjdk.org/jdk/pull/29154 From qamai at openjdk.org Mon Jan 12 03:39:01 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 03:39:01 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test Message-ID: Hi, This PR fixes the handling of `depends_only_on_test` when the control graph is transformed. It has to do with the theoretical idea of `depends_only_on_test`, copy from the JBS issue description: To start with, what is `depends_only_on_test`? Given a node `n` with the control input `c`, if `c` can be deduced from `c'` and `n->depends_only_on_test() == true`, then we can rewire the control input of `n` to `c'`. This means that `depends_only_on_test` does not mean that the node depends on a test, it means that the node depends on the test that is its control input. For example: if (y != 0) { if (x > 0) { if (y != 0) { x / y; } } } Then `x/y` `depends_only_on_test` because its control input is the test `y != 0`. Then, we can rewire the control input of the division to the outer `y != 0`, resulting in: if (y != 0) { x / y; if (x > 0) { } } On the other hand, consider this case: if (x > 0) { if (y != 0) { if (x > 0) { x / y; } } } Then `x/y` does not `depends_only_on_test` because its control input is the test `x > 0` which is unrelated, we can see that if we rewire the division to the outer `x > 0` test, the division floats above the actual test `y != 0`. This means that `depends_only_on_test` is a dynamic property of a node, and not a static property of the division operation. It can change when we transform the graph and it can be different for different nodes of the same kind. More details can be found in the description of `Node::depends_only_on_test` and `Node::pin_node_under_control` in this change. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - Fix depends_only_on_test Changes: https://git.openjdk.org/jdk/pull/29158/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29158&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347365 Stats: 418 lines in 22 files changed: 201 ins; 103 del; 114 mod Patch: https://git.openjdk.org/jdk/pull/29158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29158/head:pull/29158 PR: https://git.openjdk.org/jdk/pull/29158 From qamai at openjdk.org Mon Jan 12 03:49:34 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 03:49:34 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test In-Reply-To: References: Message-ID: <0LnGTky0Ll7EHXLGY2nLuwaPV1pQKkMwD9NGUCCP8u4=.8397aef9-d2dd-410d-9b19-0e4d6b23004d@github.com> On Mon, 12 Jan 2026 03:30:35 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the handling of `depends_only_on_test` when the control graph is transformed. It has to do with the theoretical idea of `depends_only_on_test`, copy from the JBS issue description: > > To start with, what is `depends_only_on_test`? Given a node `n` with the control input `c`, if `c` can be deduced from `c'` and `n->depends_only_on_test() == true`, then we can rewire the control input of `n` to `c'`. This means that `depends_only_on_test` does not mean that the node depends on a test, it means that the node depends on the test that is its control input. > > For example: > > if (y != 0) { > if (x > 0) { > if (y != 0) { > x / y; > } > } > } > > Then `x/y` `depends_only_on_test` because its control input is the test `y != 0`. Then, we can rewire the control input of the division to the outer `y != 0`, resulting in: > > if (y != 0) { > x / y; > if (x > 0) { > } > } > > On the other hand, consider this case: > > if (x > 0) { > if (y != 0) { > if (x > 0) { > x / y; > } > } > } > > Then `x/y` does not `depends_only_on_test` because its control input is the test `x > 0` which is unrelated, we can see that if we rewire the division to the outer `x > 0` test, the division floats above the actual test `y != 0`. This means that `depends_only_on_test` is a dynamic property of a node, and not a static property of the division operation. It can change when we transform the graph and it can be different for different nodes of the same kind. > > More details can be found in the description of `Node::depends_only_on_test` and `Node::pin_node_under_control` in this change. > > Please take a look and leave your reviews, thanks a lot. test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java line 89: > 87: @Test > 88: @IR(phase = CompilePhase.ITER_GVN1, counts = { IRNode.CAST_II, "4" }) > 89: @IR(phase = CompilePhase.OPTIMIZE_FINISHED, counts = { IRNode.CAST_II, "3" }) This test needs changing because of range check smearing, just after loop opts, the graph looks like this: checkIndex(i - 3, length); j = cast(i, [3, length + 2]) - 3; checkIndex(i, length); j += cast(i, [0, length - 1]); checkIndex(i - 2, length); j += cast(i, [2, length + 1]) - 2; checkIndex(i - 1, length); j += cast(i, [1, length]) - 1; Range check smearing removes the last 2 range checks, this results in their dependent casts moved to the range check `checkIndex(i, length)`. However, since they now depends on both the first and the second range checks, they are pinned and become non-floating. Furthermore, after loop opts, `CastII`s are widen and become `[0, max_int]`. As a result, the optimized_finish graph has 3 `CastII`, 1 floating non-narrowing under `checkIndex(i - 3, length)`, 1 floating non-narrowing under `checkIndex(i, length)` and 1 non-floating non-narrowing under `checkIndex(i, length)`. Maybe it's possible to optimize even further by canonicalize the `_dependency` of all `CastII` to the strongest one later during compilation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2680775488 From qamai at openjdk.org Mon Jan 12 03:49:34 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 03:49:34 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test In-Reply-To: <0LnGTky0Ll7EHXLGY2nLuwaPV1pQKkMwD9NGUCCP8u4=.8397aef9-d2dd-410d-9b19-0e4d6b23004d@github.com> References: <0LnGTky0Ll7EHXLGY2nLuwaPV1pQKkMwD9NGUCCP8u4=.8397aef9-d2dd-410d-9b19-0e4d6b23004d@github.com> Message-ID: On Mon, 12 Jan 2026 03:44:19 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the handling of `depends_only_on_test` when the control graph is transformed. It has to do with the theoretical idea of `depends_only_on_test`, copy from the JBS issue description: >> >> To start with, what is `depends_only_on_test`? Given a node `n` with the control input `c`, if `c` can be deduced from `c'` and `n->depends_only_on_test() == true`, then we can rewire the control input of `n` to `c'`. This means that `depends_only_on_test` does not mean that the node depends on a test, it means that the node depends on the test that is its control input. >> >> For example: >> >> if (y != 0) { >> if (x > 0) { >> if (y != 0) { >> x / y; >> } >> } >> } >> >> Then `x/y` `depends_only_on_test` because its control input is the test `y != 0`. Then, we can rewire the control input of the division to the outer `y != 0`, resulting in: >> >> if (y != 0) { >> x / y; >> if (x > 0) { >> } >> } >> >> On the other hand, consider this case: >> >> if (x > 0) { >> if (y != 0) { >> if (x > 0) { >> x / y; >> } >> } >> } >> >> Then `x/y` does not `depends_only_on_test` because its control input is the test `x > 0` which is unrelated, we can see that if we rewire the division to the outer `x > 0` test, the division floats above the actual test `y != 0`. This means that `depends_only_on_test` is a dynamic property of a node, and not a static property of the division operation. It can change when we transform the graph and it can be different for different nodes of the same kind. >> >> More details can be found in the description of `Node::depends_only_on_test` and `Node::pin_node_under_control` in this change. >> >> Please take a look and leave your reviews, thanks a lot. > > test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java line 89: > >> 87: @Test >> 88: @IR(phase = CompilePhase.ITER_GVN1, counts = { IRNode.CAST_II, "4" }) >> 89: @IR(phase = CompilePhase.OPTIMIZE_FINISHED, counts = { IRNode.CAST_II, "3" }) > > This test needs changing because of range check smearing, just after loop opts, the graph looks like this: > > checkIndex(i - 3, length); > j = cast(i, [3, length + 2]) - 3; > checkIndex(i, length); > j += cast(i, [0, length - 1]); > checkIndex(i - 2, length); > j += cast(i, [2, length + 1]) - 2; > checkIndex(i - 1, length); > j += cast(i, [1, length]) - 1; > > Range check smearing removes the last 2 range checks, this results in their dependent casts moved to the range check `checkIndex(i, length)`. However, since they now depends on both the first and the second range checks, they are pinned and become non-floating. Furthermore, after loop opts, `CastII`s are widen and become `[0, max_int]`. As a result, the optimized_finish graph has 3 `CastII`, 1 floating non-narrowing under `checkIndex(i - 3, length)`, 1 floating non-narrowing under `checkIndex(i, length)` and 1 non-floating non-narrowing under `checkIndex(i, length)`. > > Maybe it's possible to optimize even further by canonicalize the `_dependency` of all `CastII` to the strongest one later during compilation. Before this PR, the casts under the elided range checks are not pinned during range check smearing, which is incorrect, This means that both the `CastII`s under `checkIndex(i, length)` are the same and can be GVN-ed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2680777278 From xgong at openjdk.org Mon Jan 12 05:40:34 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 12 Jan 2026 05:40:34 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector mask relative code in c2 In-Reply-To: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Fri, 9 Jan 2026 11:00:42 GMT, Emanuel Peter wrote: >> The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific >> features, making the related code in HotSpot difficult to understand and review. >> >> This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and >> maintainability. >> >> Note: This patch only adds comments; no functional changes are made. > > src/hotspot/share/opto/vectornode.hpp line 81: > >> 79: // the masked nodes. >> 80: // >> 81: // For example, "AddVBNode" might have two versions: > > Might? > Suggestion: > > // For example: Yes, it has two versions on architectures with predicate feature such as SVE, while the masked-version does not exist on others like NEON. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2680925193 From xgong at openjdk.org Mon Jan 12 05:50:35 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 12 Jan 2026 05:50:35 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector mask relative code in c2 In-Reply-To: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Fri, 9 Jan 2026 11:03:47 GMT, Emanuel Peter wrote: >> The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific >> features, making the related code in HotSpot difficult to understand and review. >> >> This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and >> maintainability. >> >> Note: This patch only adds comments; no functional changes are made. > > src/hotspot/share/opto/type.cpp line 2452: > >> 2450: // stored in a predicate/mask register. >> 2451: // - Returns a normal vector type (i.e. TypeVectA ~ TypeVectZ) otherwise, where >> 2452: // the vector mask is stored in a vector register. > > The first case is `PVectMask`, and the second `NVectMask`, right? Yes, correct. > src/hotspot/share/opto/vectornode.hpp line 1478: > >> 1476: }; >> 1477: >> 1478: //-------------------------- Vector mask broadcast ------------------------------ > > A nit that has been bothering me for a while: > > I would just remove all the "title lines" with the `------`. > They don't really add anything. And they are currently inconsistent in this file anyway. > Some are a more of a description like here. Some repeat the node name. And in some cases they are missing anyway. OK, I will remove these lines. Thanks for your suggestion! > src/hotspot/share/opto/vectornode.hpp line 1739: > >> 1737: VectorRearrangeNode(Node* vec1, Node* shuffle) >> 1738: : VectorNode(vec1, shuffle, vec1->bottom_type()->is_vect()) { >> 1739: // assert(mask->is_VectorMask(), "VectorBlendNode requires that third argument be a mask"); > > Can you add a comment for Rearrange as well? Sure. I will add for it in next commit. > src/hotspot/share/opto/vectornode.hpp line 1874: > >> 1872: // Convert a "BVectMask" into a platform-specific vector mask (either "NVectMask" >> 1873: // or "PVectMask"). >> 1874: class VectorLoadMaskNode : public VectorNode { > > I'd love to rename this. Because it is (as you say in the comments) a conversion, and not a "load" (memory op). > What about `VectorConvertBooleans2MaskNode`. > > And below, rename `VectorStoreMaskNode` to `VectorConvertMask2BooleansNode`. > > You may have an even better idea. Yeah, I remember that @PaulSandoz gave a suggestion for the name before. I will take a consideration. Renaming of these two IRs is not an easy task that we need to go through all the code in mid-end, and backend of platforms that have supported Vector API. I'd like to leave it as a separate task with this PR. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2680926486 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2680927789 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2680928497 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2680943534 From xgong at openjdk.org Mon Jan 12 05:50:36 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 12 Jan 2026 05:50:36 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector mask relative code in c2 In-Reply-To: References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Fri, 9 Jan 2026 13:27:35 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.hpp line 1855: >> >>> 1853: //------------------------------VectorLoadShuffleNode------------------------------ >>> 1854: // The target may not directly support the rearrange operation for an element type. >>> 1855: // In those cases, we can transform the rearrange into a different element type. >> >> Can you specify a bit more about the inputs and outputs, and what exactly the transformation does? > > Is this a memory instruction? Because it is called `Load`. If not: can we rename it? It's not a memory instruction. Like `VectorLoadMask`, it is used to convert a shuffle vector to a platform supported vector. Previously, the shuffle is a byte vector. And this node is used to convert the byte vector to a vector with real data element type. But now, it is not needed for some architectures, since the vector shuffle's payload has been changed to the same integer type of normal vector, i.e. byte, short, int, and long. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2680934137 From aboldtch at openjdk.org Mon Jan 12 06:34:29 2026 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 12 Jan 2026 06:34:29 GMT Subject: RFR: 8374450: GTest opto.canonicalize_constraints cannot run without VM Message-ID: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> The `opto.canonicalize_constraints` test explicitly uses symbols which are setup in `Type::Initialize_shared` (which seems to happen as a side effect of generating stubs at VM start. Also see comment in `Type::Initialize_shared`). So the test is required to be a VM test. * Testing * GHA * Verified that it `opto.canonicalize_constraints` now does not segmentation fault when run in isolation. * Tier 1 on Oracle supported platforms ------------- Commit messages: - 8374450: GTest opto.canonicalize_constraints cannot run without VM Changes: https://git.openjdk.org/jdk/pull/29159/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29159&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374450 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29159.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29159/head:pull/29159 PR: https://git.openjdk.org/jdk/pull/29159 From qamai at openjdk.org Mon Jan 12 06:53:21 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 06:53:21 GMT Subject: RFR: 8374450: GTest opto.canonicalize_constraints cannot run without VM In-Reply-To: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> References: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> Message-ID: On Mon, 12 Jan 2026 06:28:19 GMT, Axel Boldt-Christmas wrote: > The `opto.canonicalize_constraints` test explicitly uses symbols which are setup in `Type::Initialize_shared` (which seems to happen as a side effect of generating stubs at VM start. Also see comment in `Type::Initialize_shared`). So the test is required to be a VM test. > > * Testing > * GHA > * Verified that it `opto.canonicalize_constraints` now does not segmentation fault when run in isolation. > * Tier 1 on Oracle supported platforms Thanks a lot for fixing this, it looks good to me ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/29159#pullrequestreview-3649359405 From xgong at openjdk.org Mon Jan 12 06:56:37 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 12 Jan 2026 06:56:37 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector mask relative code in c2 In-Reply-To: References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Mon, 12 Jan 2026 05:42:29 GMT, Xiaohong Gong wrote: >> Is this a memory instruction? Because it is called `Load`. If not: can we rename it? > > It's not a memory instruction. Like `VectorLoadMask`, it is used to convert a shuffle vector to a platform supported vector. Previously, the shuffle is always a byte vector for all vector types. And this node is used to convert the byte vector to a vector that is needed for the rearrange instructions. It is not needed for some architectures now, since the vector shuffle's payload has been changed to the same integer type of normal vector, i.e. byte, short, int, and long. I searched the code, and noticed it is only used by X86 on specified architectures for specified types now. > Can you specify a bit more about the inputs and outputs, and what exactly the transformation does? Seems the existing comments have commented why this node is needed on some architectures. The input is an original shuffle vector from API with specified element types, while the output is a shuffle vector which is transformed to be used for the vector rearrange directly, depending on the backend codegen requirement? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2681054525 From epeter at openjdk.org Mon Jan 12 06:58:33 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jan 2026 06:58:33 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v8] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 16:14:17 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java > > Co-authored-by: Emanuel Peter > - Fix style Looks good to me now, thanks for the updates :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28895#pullrequestreview-3649370155 From epeter at openjdk.org Mon Jan 12 07:02:12 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jan 2026 07:02:12 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v6] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 15:42:42 GMT, Beno?t Maillard wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Merge branch 'master' into topic.uses-min-max >> - Module also needed in the wrapper test class >> - It's the templated test that needs the module >> - Add missing module to test >> - Merge branch 'master' into topic.uses-min-max >> - Test Float16 >> - Only apply to uses that match original IR node >> - Merge branch 'master' into topic.uses-min-max >> - Use is_MinMax() instead of spelling out individual Min/Max opcodes >> - Refactor MaxNode to MinMaxNode and add is_MinMax() query >> - ... and 3 more: https://git.openjdk.org/jdk/compare/067fd3cb...7229e345 > > Looks good to me, thanks for making this change. I like the move from `MaxNode` to `MinMaxNode`, I always thought the naming was a bit confusing. > > I have also kicked off internal testing and will come back with the results once done. The testing that @benoitmaillard ran does not seem to show any related failures. I think the recent changes were not substantial enough to warrant another round of internal testing, GitHub Actions is probably enough. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28895#issuecomment-3737139291 From epeter at openjdk.org Mon Jan 12 07:21:09 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jan 2026 07:21:09 GMT Subject: [jdk26] RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: References: <-lUfH47pEisQOr4Qi038k3ErKZp2MQFD_YgGjp6_Oj4=.21b442c3-51b1-4d94-a361-98c40c74cf99@github.com> Message-ID: On Fri, 9 Jan 2026 08:39:31 GMT, Tobias Hartmann wrote: >> Hi all, >> >> This pull request contains a backport of commit [da14813a](https://github.com/openjdk/jdk/commit/da14813a5bdadaf0a1f81fa57ff6e1b103eaf113) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Emanuel Peter on 7 Jan 2026 and was reviewed by Vladimir Kozlov, Tobias Hartmann and Quan Anh Mai. >> >> Thanks! > > Looks good. @TobiHartmann Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29123#issuecomment-3737174273 From epeter at openjdk.org Mon Jan 12 07:21:11 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jan 2026 07:21:11 GMT Subject: [jdk26] Integrated: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs In-Reply-To: <-lUfH47pEisQOr4Qi038k3ErKZp2MQFD_YgGjp6_Oj4=.21b442c3-51b1-4d94-a361-98c40c74cf99@github.com> References: <-lUfH47pEisQOr4Qi038k3ErKZp2MQFD_YgGjp6_Oj4=.21b442c3-51b1-4d94-a361-98c40c74cf99@github.com> Message-ID: On Thu, 8 Jan 2026 17:51:09 GMT, Emanuel Peter wrote: > Hi all, > > This pull request contains a backport of commit [da14813a](https://github.com/openjdk/jdk/commit/da14813a5bdadaf0a1f81fa57ff6e1b103eaf113) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Emanuel Peter on 7 Jan 2026 and was reviewed by Vladimir Kozlov, Tobias Hartmann and Quan Anh Mai. > > Thanks! This pull request has now been integrated. Changeset: 2d267303 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/2d267303de943982b6b8a5b1b55df7bb623d8a81 Stats: 116 lines in 3 files changed: 109 ins; 0 del; 7 mod 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs Reviewed-by: thartmann Backport-of: da14813a5bdadaf0a1f81fa57ff6e1b103eaf113 ------------- PR: https://git.openjdk.org/jdk/pull/29123 From duke at openjdk.org Mon Jan 12 07:26:11 2026 From: duke at openjdk.org (Shawn M Emery) Date: Mon, 12 Jan 2026 07:26:11 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: References: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> <28KSbLDo353fDhRsW-5aaLpYvQ9XGPnyOqO1YN_LnPs=.eaa3ddbf-9228-4fce-a973-dda88d5abf75@github.com> Message-ID: On Sun, 11 Jan 2026 09:33:43 GMT, Jatin Bhateja wrote: >> Just a note on LoopAlignment, there are multiple moving parts here, first aligning starting addresses of loop to 64 ([recommendation from Zen5 optimization guide](https://docs.amd.com/v/u/en-US/58455_1.00) section 2.8.3) ensure small loop bodies are not split-across the cache line, if that happens then there is a cold entry penalty in the first iteration of loop, where front-end will have to read multiple L1I cache lines, once its decoded and uops are part of Op-cache (AMD) or DSB (Intel). There onwards uops stream for successive loop iterations are issued from op-cache. Since op-cache is shared b/w 2 HW threads in SMT configuration hence in case of noisy neighbor scenarios or context-switches we may hit cold-entry penalty during lifetime of loop. >> >> So its advisable to add alignment in this case for other labels before loops we already have OptoLoopAlignment in place. > >> > Better to align loop sarting address to OptoLoopAlignment >> >> For parity, should I do this for the other labels in the file as well? >> >> > I will run the micro benchmark on AMD Turin and report back by early next week. >> >> That would be great, thank you for doing this! > > Here are the score on Turin. > > > Baseline: > Benchmark (algorithm) (keyLength) (provider) Mode Cnt Score Error Units > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-512 0 thrpt 2 62235.790 ops/s > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-768 0 thrpt 2 38238.390 ops/s > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-1024 0 thrpt 2 24725.512 ops/s > > Withopt: > Benchmark (algorithm) (keyLength) (provider) Mode Cnt Score Error Units > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-512 0 thrpt 2 62483.697 ops/s > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-768 0 thrpt 2 38464.272 ops/s > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-1024 0 thrpt 2 24702.044 ops/s > > > > Baseline: > Benchmark (algorithm) (provider) Mode Cnt Score Error Units > KEMBench.decapsulate ML-KEM-512 thrpt 2 46416.479 ops/s > KEMBench.decapsulate ML-KEM-768 thrpt 2 28516.289 ops/s > KEMBench.decapsulate ML-KEM-1024 thrpt 2 19250.020 ops/s > KEMBench.encapsulate ML-KEM-512 thrpt 2 60374.724 ops/s > KEMBench.encapsulate ML-KEM-768 thrpt 2 36226.100 ops/s > KEMBench.encapsulate ML-KEM-1024 thrpt 2 23656.223 ops/s > > Withopt: > Benchmark (algorithm) (provider) Mode Cnt Score Error Units > KEMBench.decapsulate ML-KEM-512 thrpt 2 46730.153 ops/s > KEMBench.decapsulate ML-KEM-768 thrpt 2 28650.349 ops/s > KEMBench.decapsulate ML-KEM-1024 thrpt 2 19390.927 ops/s > KEMBench.encapsulate ML-KEM-512 thrpt 2 60238.211 ops/s > KEMBench.encapsulate ML-KEM-768 thrpt 2 36454.138 ops/s > KEMBench.encapsulate ML-KEM-1024 thrpt 2 23649.839 ops/s > > > System was... Thank you for sharing these results. It is disconcerting to see the drop in performance for i) key gen-1024, ii) encapsulation-512, and iii) enacapsulation-1024, though I don't know the SE for these runs. During my testing on a AMD EPYC 9J14 96-Core Processor I consistently get noticeable performance increases for all ML-KEM operations: [Publish ML_KEM Benchmarks - Sheet1.pdf](https://github.com/user-attachments/files/24559070/Publish.ML_KEM.Benchmarks.-.Sheet1.pdf) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2681114748 From mhaessig at openjdk.org Mon Jan 12 07:27:08 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 Jan 2026 07:27:08 GMT Subject: RFR: 8374969: Incorrect results of LoadStoreNode::adr_type and SCMemProj::adr_type In-Reply-To: References: Message-ID: On Sun, 11 Jan 2026 12:22:16 GMT, Quan Anh Mai wrote: > Hi, > > This is extracted from #28570 > > This PR fixes the return value of `LoadStoreNode::adr_type` and `SCMemProj::adr_type`. For the former, it is trivial that we can do what `MemNode` does. And for the latter, the implementation of `ProjNode::adr_type` is adequate. > > Please kindly review, thanks a lot. src/hotspot/share/opto/memnode.cpp line 3923: > 3921: init_req(MemNode::ValueIn, val); > 3922: init_class_id(Class_LoadStore); > 3923: DEBUG_ONLY(_adr_type = at; adr_type();) Why make this debug only? AFAICT lots of non-debug code uses `adr_type()`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29154#discussion_r2681114370 From thartmann at openjdk.org Mon Jan 12 07:32:46 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 12 Jan 2026 07:32:46 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v8] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 16:14:17 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java > > Co-authored-by: Emanuel Peter > - Fix style Do we risk hitting [JDK-8374896](https://bugs.openjdk.org/browse/JDK-8374896) now that the verification will be enabled by this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28895#issuecomment-3737207951 From mchevalier at openjdk.org Mon Jan 12 07:50:55 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 12 Jan 2026 07:50:55 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> <47G92GiYmF8jY_vJ2hUNaQQoGBLqpkg_pYJT2r4jn9g=.94cb4ca9-8dfd-4a31-b77b-dbcc5b8f29d0@github.com> Message-ID: On Fri, 9 Jan 2026 20:47:24 GMT, Vladimir Ivanov wrote: >> Actually, I'm not sure. I see that >> https://github.com/openjdk/jdk/blob/8737a8ca73952d60129e7fc2f7e17eea3b800af7/src/hotspot/share/opto/compile.hpp#L480 >> and >> https://github.com/openjdk/jdk/blob/8737a8ca73952d60129e7fc2f7e17eea3b800af7/src/hotspot/share/opto/compile.hpp#L1059-L1062 >> But reading the code, I don't understand why it's important to insert just at this position, and not simply at this position or after (and then, why not at the end, to avoid shifting?). In `inline_incrementally_one` >> https://github.com/openjdk/jdk/blob/8737a8ca73952d60129e7fc2f7e17eea3b800af7/src/hotspot/share/opto/compile.cpp#L2123-L2126 >> We seem to stop as soon as something happens, so we wouldn't really use the fact that the coming elements in `_late_inlines` are related. Overall, I don't see where this assumption of depth-first is used. >> >> A little bit of testing doesn't catch fire when inserting potentially after `_late_inlines_pos`. I'll read and test more, but maybe someone already has more context? Maybe @iwanowww or @rwestrel from looking at `git blame`? > > During late inlining we want to preserve relative order or candidate sites for late inlining. It matters when compiler runs out of inlining budget. If it appends candidates instead, it turns DFS call site traversal into BFS. With a different order (assuming budget exhaustion), we would end-up with different inlining decisions, but it would still be correct, right? If budget runs out, randomizing the list before processing can also lead to different inlining decisions, so I guess it would be fine to indeed insert new elements anywhere after `_late_inlines_pos`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29110#discussion_r2681167547 From dfenacci at openjdk.org Mon Jan 12 07:56:21 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 12 Jan 2026 07:56:21 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: <_CuG78MuZJi0wYKBSomNbE8P1F1HvJIoEtfoVNblC8Y=.e7391d98-d84d-46fe-925e-6ddbf2196bdb@github.com> On Thu, 8 Jan 2026 09:25:35 GMT, Marc Chevalier wrote: > As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. > > I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. > There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). > > Thanks, > Marc Good stuff ? Thank you for adding this @marc-chevalier! Looks good (just an additional minor question) src/hotspot/share/opto/compile.cpp line 2177: > 2175: if (array.length() < 2) { > 2176: return; > 2177: } Do we need this check? ------------- PR Review: https://git.openjdk.org/jdk/pull/29110#pullrequestreview-3649489600 PR Review Comment: https://git.openjdk.org/jdk/pull/29110#discussion_r2681166345 From mchevalier at openjdk.org Mon Jan 12 07:57:32 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 12 Jan 2026 07:57:32 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 29 Dec 2025 17:39:42 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Address review comments for the JTREG test and microbenchmark > - Merge branch 'master' > - Address review comments > - Fix build failures on Mac > - Address review comments > - Merge 'master' > - 8366444: Add support for add/mul reduction operations for Float16 > > This patch adds mid-end support for vectorized add/mul reduction > operations for half floats. It also includes backend aarch64 support for > these operations. Only vectorization support through autovectorization > is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate > the implementation to be strictly ordered. The following is how each of > these reductions is implemented for different aarch64 targets - > > For AddReduction : > On Neon only targets (UseSVE = 0): Generates scalarized additions > using the scalar "fadd" instruction for both 8B and 16B vector lengths. > This is because Neon does not provide a direct instruction for computing > strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the "fadda" instruction which > computes add reduction for floating point in strict order. > > For MulReduction : > Both Neon and SVE do not provide a direct instruction for computing > strictly ordered floating point multiply reduction. For vector lengths > of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is > generated and multiply reduction for vector lengths > 16B is not > supported. > > Below is the performance of the two newly added microbenchmarks in > Float16OperationsBenchmark.java tested on three different aarch64 > machines and with varying MaxVectorSize - > > Note: On all machines, the score (ops/ms) is compared with the master > branch without this patch which generates a sequence of loads ("ldrsh") > to load the FP16 value into an FPR and a scalar "fadd/fmul" to > add/multiply the loaded value to the running sum/product. The ratios > given below are the ratios between the throughput with this patch and > the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the > master branch. > > N1 (UseSVE = 0... Testing passed fine, but keeping in mind it might not have a very exhaustive range of hardware (but nothing I can do about that, I fear). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3737278627 From bmaillard at openjdk.org Mon Jan 12 08:00:36 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 12 Jan 2026 08:00:36 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v8] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 16:14:17 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java > > Co-authored-by: Emanuel Peter > - Fix style Marked as reviewed by bmaillard (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28895#pullrequestreview-3649520191 From bmaillard at openjdk.org Mon Jan 12 08:04:08 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 12 Jan 2026 08:04:08 GMT Subject: RFR: 8372302: C2: IGVN verification fails because ModXNode::Ideal creates unused intermediate nodes [v3] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 10:24:50 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update package >> - Move to compiler/c2/igvn > > Marked as reviewed by epeter (Reviewer). Thank you @eme64 @merykitty! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28488#issuecomment-3737290364 From bmaillard at openjdk.org Mon Jan 12 08:04:10 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 12 Jan 2026 08:04:10 GMT Subject: Integrated: 8372302: C2: IGVN verification fails because ModXNode::Ideal creates unused intermediate nodes In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 10:35:36 GMT, Beno?t Maillard wrote: > This PR addresses a failure in IGVN verification with `ModI` and `ModL` nodes. > > In `ModeXNode::Ideal`, we have code to optimize a modulo expression by expressing it in terms of other operations. There are actually two distinct cases, one where the divisor is a constant and is equal to `modulo 2^k-1` for some integer `k`, and a more general case where other transformations do not succeed. Because these transformations involve creating several new nodes (sometimes in a loop) and calling `phase->transform(...)` on them, we want to avoid accidentally triggering optimizations on the "unfinished" state of the subgraph. For this, we create a temporary dummy node and add edges to the nodes being constructed. > > There are some execution paths where the node is not destroyed before `Ideal` returns, and this creates issues during IGVN verification, as the verification code checks if the number of nodes has changed after having called `Ideal` on a given node and not expecting changes. > > The path in question is when we exit because the divisor is a constant and is the minimum value: > https://github.com/openjdk/jdk/blob/c19b12927d2ac901ec8ccaa2de5897ee4c47af56/src/hotspot/share/opto/divnode.cpp#L1146-L1147 > > The zero case does not cause problems (this seems to be because it would hide behind a `div0_check` anyway). > > The fix is simply to only create the temporary node when it is needed, and thus avoiding returning without destroying it. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! This pull request has now been integrated. Changeset: 49040462 Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/49040462f3d2761435cded1bd8898d0c6b16fc02 Stats: 70 lines in 2 files changed: 63 ins; 4 del; 3 mod 8372302: C2: IGVN verification fails because ModXNode::Ideal creates unused intermediate nodes Reviewed-by: epeter, qamai ------------- PR: https://git.openjdk.org/jdk/pull/28488 From mchevalier at openjdk.org Mon Jan 12 08:06:39 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 12 Jan 2026 08:06:39 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <_CuG78MuZJi0wYKBSomNbE8P1F1HvJIoEtfoVNblC8Y=.e7391d98-d84d-46fe-925e-6ddbf2196bdb@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> <_CuG78MuZJi0wYKBSomNbE8P1F1HvJIoEtfoVNblC8Y=.e7391d98-d84d-46fe-925e-6ddbf2196bdb@github.com> Message-ID: On Mon, 12 Jan 2026 07:47:04 GMT, Damon Fenacci wrote: >> As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. >> >> I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. >> There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). >> >> Thanks, >> Marc > > src/hotspot/share/opto/compile.cpp line 2177: > >> 2175: if (array.length() < 2) { >> 2176: return; >> 2177: } > > Do we need this check? Arguable. As you can see, it was there, and I didn't question it. But given the implementation under: for (uint i = array.length() - 1; i >= 1; i--) if the array is empty, something bad will happen, so we need at least this check. And then, why not also take `< 2`, at this point? The other option is to use an increasing loop index, or a signed one. I think the decreasing loop index is natural here. A signed `i` would probably work. I think some people have opinion against that in iterations... I don't mind either way. Overall, as it is now, we need this check. We could do without if we do other changes. I'm ok with how it's written now, especially because it keeps it similar to `PhaseIterGVN::shuffle_worklist()`, but I don't mind doing another way if someone has a strong opinion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29110#discussion_r2681202904 From bkilambi at openjdk.org Mon Jan 12 08:16:33 2026 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 12 Jan 2026 08:16:33 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Thu, 8 Jan 2026 15:27:01 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Address review comments for the JTREG test and microbenchmark >> - Merge branch 'master' >> - Address review comments >> - Fix build failures on Mac >> - Address review comments >> - Merge 'master' >> - 8366444: Add support for add/mul reduction operations for Float16 >> >> This patch adds mid-end support for vectorized add/mul reduction >> operations for half floats. It also includes backend aarch64 support for >> these operations. Only vectorization support through autovectorization >> is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate >> the implementation to be strictly ordered. The following is how each of >> these reductions is implemented for different aarch64 targets - >> >> For AddReduction : >> On Neon only targets (UseSVE = 0): Generates scalarized additions >> using the scalar "fadd" instruction for both 8B and 16B vector lengths. >> This is because Neon does not provide a direct instruction for computing >> strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the "fadda" instruction which >> computes add reduction for floating point in strict order. >> >> For MulReduction : >> Both Neon and SVE do not provide a direct instruction for computing >> strictly ordered floating point multiply reduction. For vector lengths >> of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is >> generated and multiply reduction for vector lengths > 16B is not >> supported. >> >> Below is the performance of the two newly added microbenchmarks in >> Float16OperationsBenchmark.java tested on three different aarch64 >> machines and with varying MaxVectorSize - >> >> Note: On all machines, the score (ops/ms) is compared with the master >> branch without this patch which generates a sequence of loads ("ldrsh") >> to load the FP16 value into an FPR and a scalar "fadd/fmul" to >> add/multiply the loaded value to the running sum/product. The ratios >> given below are the ratios between the throughput with this patch and >> the throughput without this patch. >> Ratio > 1 indicate... > > test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 459: > >> 457: short result = (short) 0; >> 458: for (int i = 0; i < LEN; i++) { >> 459: result = float16ToRawShortBits(add(shortBitsToFloat16(result), shortBitsToFloat16(input1[i]))); > > Why all the conversions from and to `short` / `Float16`? > Is there any benefit to use `short` for the intermediate results? Why not make `result` a `Float16`? If I remember correctly, I tried doing that initially but the loop did not get vectorized. The Ideal graph showed there were a lot of nodes related to object creation (probably for the intermediate `Float16` result) which bloated the size of the loop resulting in the loop not getting unrolled (and eventually not vectorized). I also tried a standalone loop where I do not return the intermediate result hoping that escape analysis could help in avoiding the object creation but did not help either. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2681225725 From qamai at openjdk.org Mon Jan 12 08:24:38 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 08:24:38 GMT Subject: RFR: 8374969: Incorrect results of LoadStoreNode::adr_type and SCMemProj::adr_type In-Reply-To: References: Message-ID: <982wRPnfMzLaaPDmavsPsTuEIR41n3cncJnl2CnkZJo=.f78bb862-f90c-4f91-95e3-62d100074431@github.com> On Mon, 12 Jan 2026 07:23:27 GMT, Manuel H?ssig wrote: >> Hi, >> >> This is extracted from #28570 >> >> This PR fixes the return value of `LoadStoreNode::adr_type` and `SCMemProj::adr_type`. For the former, it is trivial that we can do what `MemNode` does. And for the latter, the implementation of `ProjNode::adr_type` is adequate. >> >> Please kindly review, thanks a lot. > > src/hotspot/share/opto/memnode.cpp line 3923: > >> 3921: init_req(MemNode::ValueIn, val); >> 3922: init_class_id(Class_LoadStore); >> 3923: DEBUG_ONLY(_adr_type = at; adr_type();) > > Why make this debug only? AFAICT lots of non-debug code uses `adr_type()`? Similar to `MemNode`, `LoadStoreNode::_adr_type` is for verification only, the `adr_type` is computed from the bottom type of the address input. Please see `LoadStoreNode::adr_type` and `MemNode::adr_type`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29154#discussion_r2681245393 From galder at openjdk.org Mon Jan 12 08:38:27 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 12 Jan 2026 08:38:27 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v8] In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 07:29:09 GMT, Tobias Hartmann wrote: >> Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java >> >> Co-authored-by: Emanuel Peter >> - Fix style > > Do we risk hitting [JDK-8374896](https://bugs.openjdk.org/browse/JDK-8374896) now that the verification will be enabled by this PR? @TobiHartmann and I just had a chat and he is right. If the changes in `PhaseIterGVN::verify_Identity_for` are applied, we will hit [JDK-8374896](https://bugs.openjdk.org/browse/JDK-8374896). So, I will revert that part of the change. Note that the IR test does not rely on `VerifyIterativeGVN` being enabled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28895#issuecomment-3737402164 From qamai at openjdk.org Mon Jan 12 08:40:57 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 08:40:57 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v10] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:41:07 GMT, Zihao Lin wrote: >> If nodes both are constant, support constant folding. > > Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge branch 'openjdk:master' into JDK-8370196 > - fix test failed > - fix make unsigned > - Merge branch 'master' into JDK-8370196 > - Fix > - Fix > - Apply suggestion from @eme64 > > Co-authored-by: Emanuel Peter > - Add Math to Operations.java > - Add tests > - Merge branch 'master' into JDK-8370196 > - ... and 3 more: https://git.openjdk.org/jdk/compare/a62296d8...30fa1f03 Please refactor this patch so that these methods are handled similar to `And/Or/XorNode::Value`. You may also need to wait for #28952 . ------------- PR Comment: https://git.openjdk.org/jdk/pull/28097#issuecomment-3737407274 From thartmann at openjdk.org Mon Jan 12 08:49:36 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 12 Jan 2026 08:49:36 GMT Subject: RFR: 8374450: GTest opto.canonicalize_constraints cannot run without VM In-Reply-To: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> References: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> Message-ID: On Mon, 12 Jan 2026 06:28:19 GMT, Axel Boldt-Christmas wrote: > The `opto.canonicalize_constraints` test explicitly uses symbols which are setup in `Type::Initialize_shared` (which seems to happen as a side effect of generating stubs at VM start. Also see comment in `Type::Initialize_shared`). So the test is required to be a VM test. > > * Testing > * GHA > * Verified that it `opto.canonicalize_constraints` now does not segmentation fault when run in isolation. > * Tier 1 on Oracle supported platforms Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29159#pullrequestreview-3649668598 From epeter at openjdk.org Mon Jan 12 08:50:46 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jan 2026 08:50:46 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 12 Jan 2026 08:13:04 GMT, Bhavana Kilambi wrote: >> test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 459: >> >>> 457: short result = (short) 0; >>> 458: for (int i = 0; i < LEN; i++) { >>> 459: result = float16ToRawShortBits(add(shortBitsToFloat16(result), shortBitsToFloat16(input1[i]))); >> >> Why all the conversions from and to `short` / `Float16`? >> Is there any benefit to use `short` for the intermediate results? Why not make `result` a `Float16`? > > If I remember correctly, I tried doing that initially but the loop did not get vectorized. The Ideal graph showed there were a lot of nodes related to object creation (probably for the intermediate `Float16` result) which bloated the size of the loop resulting in the loop not getting unrolled (and eventually not vectorized). I also tried a standalone loop where I do not return the intermediate result hoping that escape analysis could help in avoiding the object creation but did not help either. Hmm, I see. That sounds like a deficiency in the auto unboxing of Float16. Suggestion: You should create both variants of the IR tests. And then file an RFE for the one that does not yet vectorize because of the boxing issues. Because the way things are now, it's not a huge win, to be honest. Which user is supposed to write their code in such a convoluted way, having to cast back and forth? Would they not expect they could just use Float16 all the way through? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2681318247 From dfenacci at openjdk.org Mon Jan 12 08:50:57 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 12 Jan 2026 08:50:57 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: <5QnDGQJ-UMZgpHsF4s4tVbbeuQFUaU0fJFIQ7SRaqMQ=.e4d696a9-6521-46aa-a56a-5b09c5900544@github.com> On Thu, 8 Jan 2026 09:25:35 GMT, Marc Chevalier wrote: > As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. > > I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. > There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). > > Thanks, > Marc Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29110#pullrequestreview-3649673026 From epeter at openjdk.org Mon Jan 12 08:50:47 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jan 2026 08:50:47 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 12 Jan 2026 08:47:07 GMT, Emanuel Peter wrote: >> If I remember correctly, I tried doing that initially but the loop did not get vectorized. The Ideal graph showed there were a lot of nodes related to object creation (probably for the intermediate `Float16` result) which bloated the size of the loop resulting in the loop not getting unrolled (and eventually not vectorized). I also tried a standalone loop where I do not return the intermediate result hoping that escape analysis could help in avoiding the object creation but did not help either. > > Hmm, I see. That sounds like a deficiency in the auto unboxing of Float16. > > Suggestion: You should create both variants of the IR tests. And then file an RFE for the one that does not yet vectorize because of the boxing issues. > > Because the way things are now, it's not a huge win, to be honest. Which user is supposed to write their code in such a convoluted way, having to cast back and forth? Would they not expect they could just use Float16 all the way through? @jatin-bhateja What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2681319220 From dfenacci at openjdk.org Mon Jan 12 08:50:58 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 12 Jan 2026 08:50:58 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> <_CuG78MuZJi0wYKBSomNbE8P1F1HvJIoEtfoVNblC8Y=.e7391d98-d84d-46fe-925e-6ddbf2196bdb@github.com> Message-ID: On Mon, 12 Jan 2026 08:03:07 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/compile.cpp line 2177: >> >>> 2175: if (array.length() < 2) { >>> 2176: return; >>> 2177: } >> >> Do we need this check? > > Arguable. As you can see, it was there, and I didn't question it. But given the implementation under: > > for (uint i = array.length() - 1; i >= 1; i--) > > if the array is empty, something bad will happen, so we need at least this check. And then, why not also take `< 2`, at this point? > > The other option is to use an increasing loop index, or a signed one. I think the decreasing loop index is natural here. A signed `i` would probably work. I think some people have opinion against that in iterations... I don't mind either way. > > Overall, as it is now, we need this check. We could do without if we do other changes. I'm ok with how it's written now, especially because it keeps it similar to `PhaseIterGVN::shuffle_worklist()`, but I don't mind doing another way if someone has a strong opinion. Fair enough (I was just wondering about the signed option since `array.length()` seems to be returning it anyway). Thanks @marc-chevalier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29110#discussion_r2681318197 From shade at openjdk.org Mon Jan 12 08:56:29 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Jan 2026 08:56:29 GMT Subject: RFR: 8374450: GTest opto.canonicalize_constraints cannot run without VM In-Reply-To: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> References: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> Message-ID: On Mon, 12 Jan 2026 06:28:19 GMT, Axel Boldt-Christmas wrote: > The `opto.canonicalize_constraints` test explicitly uses symbols which are setup in `Type::Initialize_shared` (which seems to happen as a side effect of generating stubs at VM start. Also see comment in `Type::Initialize_shared`). So the test is required to be a VM test. > > * Testing > * GHA > * Verified that it `opto.canonicalize_constraints` now does not segmentation fault when run in isolation. > * Tier 1 on Oracle supported platforms Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29159#pullrequestreview-3649697038 From galder at openjdk.org Mon Jan 12 09:01:44 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 12 Jan 2026 09:01:44 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v9] In-Reply-To: References: Message-ID: > Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. > > Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. > > I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. > > To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). > > During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. > > I've run tier1-3 tests on linux/x64 successfully. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Revert "Remove exclude or Min/Max in verify identity" This reverts commit cf24abad55db9a320930379c4f0f3154791d26e2. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28895/files - new: https://git.openjdk.org/jdk/pull/28895/files/2c0b0e43..438aeff3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=07-08 Stats: 21 lines in 1 file changed: 21 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28895/head:pull/28895 PR: https://git.openjdk.org/jdk/pull/28895 From galder at openjdk.org Mon Jan 12 09:01:45 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 12 Jan 2026 09:01:45 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v8] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 16:14:17 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java > > Co-authored-by: Emanuel Peter > - Fix style I've pushed a revert commit for the `PhaseIterGVN::verify_Identity_for` change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28895#issuecomment-3737478153 From qamai at openjdk.org Mon Jan 12 09:42:55 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 09:42:55 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector mask relative code in c2 In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 01:36:50 GMT, Xiaohong Gong wrote: > The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific > features, making the related code in HotSpot difficult to understand and review. > > This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and > maintainability. > > Note: This patch only adds comments; no functional changes are made. Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29130#pullrequestreview-3649886055 From qamai at openjdk.org Mon Jan 12 09:44:32 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 09:44:32 GMT Subject: RFR: 8374785: Template Library: need to tag Float16.copySign as having non-deterministic result because of multiple NaNs with different sign bits In-Reply-To: References: Message-ID: <_fmn_POaQ-XvbVSkK3jUNvq32Hr1cPfs68I0EDsZb8s=.56241967-8443-46da-8053-d4aec8077fec@github.com> On Thu, 8 Jan 2026 14:50:46 GMT, Emanuel Peter wrote: > This is a very similar issue as https://github.com/openjdk/jdk/pull/29033 / [JDK-8374489](https://bugs.openjdk.org/browse/JDK-8374489). > > There are `NaN` encodings that have the sign bit set, and others that have it not set. > If we now copy the sign from such a `NaN` to a numeric value (e.g. `1`), we can get `1` or `-1`. > > > jshell> var a = Float.NaN; > a ==> NaN > jshell> var b = Float.intBitsToFloat(0xFFC00000); > b ==> NaN > jshell> Math.copySign(1f, a) > ==> 1.0 > jshell> Math.copySign(1f, b) > ==> -1.0 > jshell> Float16.copySign(Float16.valueOf(1f), Float16.valueOf(a)) > ==> 1.0 > jshell> Float16.copySign(Float16.valueOf(1f), Float16.valueOf(b)) > ==> -1.0 > > > Since `NaN` values of different encodings are interchangable, and we cannot know what `NaN` we get, and hence the sign bit is arbitrary, we can also not know the sign of the result of `Float16.copySign`. We have to mark it as non-deterministic and hence disable result verification. > > Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29118#pullrequestreview-3649895604 From galder at openjdk.org Mon Jan 12 09:55:18 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 12 Jan 2026 09:55:18 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v2] In-Reply-To: References: Message-ID: <-vRCcNHAqJCJwuCpAeev537_6oTiUvJ6HCSywIOuJ_g=.41c8903c-d23b-4bc5-a147-82b238fbb0db@github.com> On Thu, 18 Dec 2025 23:17:06 GMT, Dean Long wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - Merge branch 'master' into topic.uses-min-max >> - Test Float16 >> - Only apply to uses that match original IR node >> - Merge branch 'master' into topic.uses-min-max >> - Use is_MinMax() instead of spelling out individual Min/Max opcodes >> - Refactor MaxNode to MinMaxNode and add is_MinMax() query >> - Add max(a, max(b, c)) patterns to add users of use >> - Add templated test >> - Remove exclude or Min/Max in verify identity > > src/hotspot/share/opto/phaseX.cpp line 2609: > >> 2607: for (DUIterator_Fast i2max, i2 = use->fast_outs(i2max); i2 < i2max; i2++) { >> 2608: Node* u = use->fast_out(i2); >> 2609: if (u->Opcode() == use->Opcode()) { > > So there are no Min(Max()) or Max(Min()) patterns we need to worry about? I was expecting this line to be > > if (u->is_MinMax()) { I've been looking at this PR again, having seen the issue in [JDK-8374896](https://bugs.openjdk.org/browse/JDK-8374896), and that scenario is exactly the one @dean-long mentions. I think fixing [JDK-8374896](https://bugs.openjdk.org/browse/JDK-8374896) can be done in the PR for that issue, and the fix is likely the one @dean-long suggests above. Agree @TobiHartmann @dean-long? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2681543378 From mdoerr at openjdk.org Mon Jan 12 10:07:42 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 Jan 2026 10:07:42 GMT Subject: [jdk26] RFR: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug Message-ID: Clean backport of [JDK-8374195](https://bugs.openjdk.org/browse/JDK-8374195). ------------- Commit messages: - Backport e4e923a1ffc8ff059c983c7e9201d0ee3273482d Changes: https://git.openjdk.org/jdk/pull/29162/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29162&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374195 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29162.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29162/head:pull/29162 PR: https://git.openjdk.org/jdk/pull/29162 From shade at openjdk.org Mon Jan 12 10:07:42 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Jan 2026 10:07:42 GMT Subject: [jdk26] RFR: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 09:57:46 GMT, Martin Doerr wrote: > Clean backport of [JDK-8374195](https://bugs.openjdk.org/browse/JDK-8374195). Marked as reviewed by shade (Reviewer). Yeah, you just need a regular PR review for JDK 26 stabilization branch. Here it is. ------------- PR Review: https://git.openjdk.org/jdk/pull/29162#pullrequestreview-3649986492 PR Comment: https://git.openjdk.org/jdk/pull/29162#issuecomment-3737755843 From mdoerr at openjdk.org Mon Jan 12 10:07:43 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 Jan 2026 10:07:43 GMT Subject: [jdk26] RFR: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 09:57:46 GMT, Martin Doerr wrote: > Clean backport of [JDK-8374195](https://bugs.openjdk.org/browse/JDK-8374195). Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29162#issuecomment-3737755653 From adinn at openjdk.org Mon Jan 12 10:36:58 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 12 Jan 2026 10:36:58 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: <6Aj0qmsVSD0iVIeK3Nesrbdzhn3r6OKfOnnGAeV66SQ=.2370d05a-0213-4d2a-8659-284c11a449dc@github.com> On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. Aleksey is right that the current code assumes that it can AOT-save and restore i2c/c2i adapters while employing different GCs at each end of the operation because "i2c/c21 adapters are GC-neutral". The truth of that statement relies on the observation that *in both mainline and premain* calls to `resolve_weak_handle` with flags `IN_NATIVE | ON_PHANTOM_OOP_REF` do not generate GC-specific code. The above call occurs when planting c2i code to test whether a target method holder is null/non-null i.e. whether the method is still alive. So, the fact that you get a missing function in Valhalla confirms that Valhalla is planting a GC-specific barrier for this case. Adding the missing function to the extrs list will indeed allow a reference to the function to be detected/marked at AOT-save and relinked at AOT-restore, making the current AOT save/restore work in Valhalla. However, it will break the assumption that the code is GC-neutral whcih means switching GCs between assembly time and run time will fail. This addition should indeed raise no error in mainline since the mainline barrier should not plant a reference to the function. I'd recommend that the fix be added only in Valhalla rather than mainline since the current error serves as a canary, indicating that the assumption above has been invalidated. In the longer term we need to find some way of making code in the AOT cache GC-neutral or give up on being able to switch GCs. I have a patch pending to save/restore all the enumerated stubs (Shared/C1/C2/StubGen) at which point we will definitely lose GC-neutrality unless we implement some barrier patching mechanism; and we will soon be adding nmethods to the cache, exacerbating the problem yet further. We are still discussing options for this in the Leyden dev meetings. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3737864179 From adinn at openjdk.org Mon Jan 12 10:39:35 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 12 Jan 2026 10:39:35 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions In-Reply-To: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Fri, 9 Jan 2026 14:41:07 GMT, Ferenc Rakoczi wrote: > The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. Changes look good. What testing have you run? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29141#issuecomment-3737885979 From roland at openjdk.org Mon Jan 12 11:08:38 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 12 Jan 2026 11:08:38 GMT Subject: [jdk26] RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph Message-ID: Hi all, This pull request contains a backport of commit [6ae3e064](https://github.com/openjdk/jdk/commit/6ae3e064352a56c5be140fba1ad6d040219432b0) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Roland Westrelin on 5 Jan 2026 and was reviewed by Christian Hagedorn and Dean Long. Thanks! ------------- Commit messages: - Backport 6ae3e064352a56c5be140fba1ad6d040219432b0 Changes: https://git.openjdk.org/jdk/pull/29166/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29166&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373508 Stats: 159 lines in 3 files changed: 159 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29166/head:pull/29166 PR: https://git.openjdk.org/jdk/pull/29166 From mdoerr at openjdk.org Mon Jan 12 11:14:38 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 Jan 2026 11:14:38 GMT Subject: [jdk26] RFR: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 09:57:46 GMT, Martin Doerr wrote: > Clean backport of [JDK-8374195](https://bugs.openjdk.org/browse/JDK-8374195). GHA Error is "No space left on device". ------------- PR Comment: https://git.openjdk.org/jdk/pull/29162#issuecomment-3738020338 From shade at openjdk.org Mon Jan 12 11:14:38 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Jan 2026 11:14:38 GMT Subject: [jdk26] RFR: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 11:09:49 GMT, Martin Doerr wrote: > GHA Error is "No space left on device". Yup, will be fixed in JDK 26 branch with https://github.com/openjdk/jdk/pull/29161. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29162#issuecomment-3738025993 From epeter at openjdk.org Mon Jan 12 11:20:47 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jan 2026 11:20:47 GMT Subject: RFR: 8374785: Template Library: need to tag Float16.copySign as having non-deterministic result because of multiple NaNs with different sign bits In-Reply-To: <_fmn_POaQ-XvbVSkK3jUNvq32Hr1cPfs68I0EDsZb8s=.56241967-8443-46da-8053-d4aec8077fec@github.com> References: <_fmn_POaQ-XvbVSkK3jUNvq32Hr1cPfs68I0EDsZb8s=.56241967-8443-46da-8053-d4aec8077fec@github.com> Message-ID: On Mon, 12 Jan 2026 09:41:32 GMT, Quan Anh Mai wrote: >> This is a very similar issue as https://github.com/openjdk/jdk/pull/29033 / [JDK-8374489](https://bugs.openjdk.org/browse/JDK-8374489). >> >> There are `NaN` encodings that have the sign bit set, and others that have it not set. >> If we now copy the sign from such a `NaN` to a numeric value (e.g. `1`), we can get `1` or `-1`. >> >> >> jshell> var a = Float.NaN; >> a ==> NaN >> jshell> var b = Float.intBitsToFloat(0xFFC00000); >> b ==> NaN >> jshell> Math.copySign(1f, a) >> ==> 1.0 >> jshell> Math.copySign(1f, b) >> ==> -1.0 >> jshell> Float16.copySign(Float16.valueOf(1f), Float16.valueOf(a)) >> ==> 1.0 >> jshell> Float16.copySign(Float16.valueOf(1f), Float16.valueOf(b)) >> ==> -1.0 >> >> >> Since `NaN` values of different encodings are interchangable, and we cannot know what `NaN` we get, and hence the sign bit is arbitrary, we can also not know the sign of the result of `Float16.copySign`. We have to mark it as non-deterministic and hence disable result verification. >> >> Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. > > Marked as reviewed by qamai (Committer). @merykitty @TobiHartmann Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29118#issuecomment-3738052994 From epeter at openjdk.org Mon Jan 12 11:20:48 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jan 2026 11:20:48 GMT Subject: Integrated: 8374785: Template Library: need to tag Float16.copySign as having non-deterministic result because of multiple NaNs with different sign bits In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 14:50:46 GMT, Emanuel Peter wrote: > This is a very similar issue as https://github.com/openjdk/jdk/pull/29033 / [JDK-8374489](https://bugs.openjdk.org/browse/JDK-8374489). > > There are `NaN` encodings that have the sign bit set, and others that have it not set. > If we now copy the sign from such a `NaN` to a numeric value (e.g. `1`), we can get `1` or `-1`. > > > jshell> var a = Float.NaN; > a ==> NaN > jshell> var b = Float.intBitsToFloat(0xFFC00000); > b ==> NaN > jshell> Math.copySign(1f, a) > ==> 1.0 > jshell> Math.copySign(1f, b) > ==> -1.0 > jshell> Float16.copySign(Float16.valueOf(1f), Float16.valueOf(a)) > ==> 1.0 > jshell> Float16.copySign(Float16.valueOf(1f), Float16.valueOf(b)) > ==> -1.0 > > > Since `NaN` values of different encodings are interchangable, and we cannot know what `NaN` we get, and hence the sign bit is arbitrary, we can also not know the sign of the result of `Float16.copySign`. We have to mark it as non-deterministic and hence disable result verification. > > Since this is a test-bug, I have no regression test. But I verified it manually, that with the same seed (for the ExpressionFuzzer) that fails before this change, we now succeed after the change. This pull request has now been integrated. Changeset: 2fbe4755 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/2fbe47559e9ba45306bd08c3636647f865a75abd Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8374785: Template Library: need to tag Float16.copySign as having non-deterministic result because of multiple NaNs with different sign bits Reviewed-by: thartmann, qamai ------------- PR: https://git.openjdk.org/jdk/pull/29118 From thartmann at openjdk.org Mon Jan 12 11:21:06 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 12 Jan 2026 11:21:06 GMT Subject: [jdk26] RFR: 8373579: Problem list compiler/runtime/Test7196199.java In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 17:50:45 GMT, Beno?t Maillard wrote: > Hi all, > > This pull request contains a backport of commit [a05d5d25](https://github.com/openjdk/jdk/commit/a05d5d2514c835f2bfeaf7a8c7df0ac241f0177f) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Beno?t Maillard on 12 Dec 2025 and was reviewed by Christian Hagedorn and Emanuel Peter. > > Thanks! Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28798#pullrequestreview-3650270797 From bmaillard at openjdk.org Mon Jan 12 11:24:29 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 12 Jan 2026 11:24:29 GMT Subject: [jdk26] Integrated: 8373579: Problem list compiler/runtime/Test7196199.java In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 17:50:45 GMT, Beno?t Maillard wrote: > Hi all, > > This pull request contains a backport of commit [a05d5d25](https://github.com/openjdk/jdk/commit/a05d5d2514c835f2bfeaf7a8c7df0ac241f0177f) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Beno?t Maillard on 12 Dec 2025 and was reviewed by Christian Hagedorn and Emanuel Peter. > > Thanks! This pull request has now been integrated. Changeset: e8f5d2f4 Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/e8f5d2f4f726d62f05faf7b6985279ef37521f21 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8373579: Problem list compiler/runtime/Test7196199.java Reviewed-by: thartmann Backport-of: a05d5d2514c835f2bfeaf7a8c7df0ac241f0177f ------------- PR: https://git.openjdk.org/jdk/pull/28798 From thartmann at openjdk.org Mon Jan 12 12:02:04 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 12 Jan 2026 12:02:04 GMT Subject: [jdk26] RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 10:58:40 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [6ae3e064](https://github.com/openjdk/jdk/commit/6ae3e064352a56c5be140fba1ad6d040219432b0) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 5 Jan 2026 and was reviewed by Christian Hagedorn and Dean Long. > > Thanks! Looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29166#pullrequestreview-3650438958 From roland at openjdk.org Mon Jan 12 12:11:51 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 12 Jan 2026 12:11:51 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v5] In-Reply-To: References: Message-ID: <_qZm_vXhwEf_OcRMb72w4t7vk1XKxjxwc_8eO1SmJsk=.d5ed1803-78b5-403a-baea-bbc5567facc7@github.com> > The base input of `AddP` is expected to only be set for heap accesses > but I noticed some inconsistencies so I added an assert in the `AddP` > constructor and fixed issues that it caught. AFAFICT, the > inconsistencies shouldn't create issues. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/macroArrayCopy.cpp Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28769/files - new: https://git.openjdk.org/jdk/pull/28769/files/b20f41db..507b8f45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28769/head:pull/28769 PR: https://git.openjdk.org/jdk/pull/28769 From thartmann at openjdk.org Mon Jan 12 12:22:27 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 12 Jan 2026 12:22:27 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v2] In-Reply-To: <-vRCcNHAqJCJwuCpAeev537_6oTiUvJ6HCSywIOuJ_g=.41c8903c-d23b-4bc5-a147-82b238fbb0db@github.com> References: <-vRCcNHAqJCJwuCpAeev537_6oTiUvJ6HCSywIOuJ_g=.41c8903c-d23b-4bc5-a147-82b238fbb0db@github.com> Message-ID: On Mon, 12 Jan 2026 09:52:47 GMT, Galder Zamarre?o wrote: >> src/hotspot/share/opto/phaseX.cpp line 2609: >> >>> 2607: for (DUIterator_Fast i2max, i2 = use->fast_outs(i2max); i2 < i2max; i2++) { >>> 2608: Node* u = use->fast_out(i2); >>> 2609: if (u->Opcode() == use->Opcode()) { >> >> So there are no Min(Max()) or Max(Min()) patterns we need to worry about? I was expecting this line to be >> >> if (u->is_MinMax()) { > > I've been looking at this PR again, having seen the issue in [JDK-8374896](https://bugs.openjdk.org/browse/JDK-8374896), and that scenario is exactly the one @dean-long mentions. I think fixing [JDK-8374896](https://bugs.openjdk.org/browse/JDK-8374896) can be done in the PR for that issue, and the fix is likely the one @dean-long suggests above. Agree @TobiHartmann @dean-long? Agreed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2682042049 From stefank at openjdk.org Mon Jan 12 12:34:35 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Jan 2026 12:34:35 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: <6Aj0qmsVSD0iVIeK3Nesrbdzhn3r6OKfOnnGAeV66SQ=.2370d05a-0213-4d2a-8659-284c11a449dc@github.com> References: <6Aj0qmsVSD0iVIeK3Nesrbdzhn3r6OKfOnnGAeV66SQ=.2370d05a-0213-4d2a-8659-284c11a449dc@github.com> Message-ID: On Mon, 12 Jan 2026 10:31:52 GMT, Andrew Dinn wrote: > Aleksey is right that the current code assumes that it can AOT-save and restore i2c/c2i adapters while employing different GCs at each end of the operation because "i2c/c21 adapters are GC-neutral". The truth of that statement relies on the observation that in both mainline and premain calls to resolve_weak_handle with flags IN_NATIVE | ON_PHANTOM_OOP_REF do not generate GC-specific code. The above call occurs when planting c2i code to test whether a target method holder is null/non-null i.e. whether the method is still alive. Could you clarify what I'm missing here. This is the code we are talking about: void MacroAssembler::resolve_weak_handle(Register result, Register tmp1, Register tmp2) { assert_different_registers(result, tmp1, tmp2); Label resolved; // A null weak handle resolves to null. cbz(result, resolved); // Only 64 bit platforms support GCs that require a tmp register // WeakHandle::resolve is an indirection like jweak. access_load_at(T_OBJECT, IN_NATIVE | ON_PHANTOM_OOP_REF, result, Address(result), tmp1, tmp2); bind(resolved); } which calls: void MacroAssembler::access_load_at(BasicType type, DecoratorSet decorators, Register dst, Address src, Register tmp1, Register tmp2) { BarrierSetAssembler *bs = BarrierSet::barrier_set()->barrier_set_assembler(); decorators = AccessInternal::decorator_fixup(decorators, type); bool as_raw = (decorators & AS_RAW) != 0; if (as_raw) { bs->BarrierSetAssembler::load_at(this, decorators, type, dst, src, tmp1, tmp2); } else { bs->load_at(this, decorators, type, dst, src, tmp1, tmp2); } } and the `bs->load_at` call is a virtual call that could call into `ZBarrierSetAssembler::load_at`. It is still unclear to my in what way this is not GC-specific. Sorry if I'm missing something obvious here and thereby derailing this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3738330153 From shade at openjdk.org Mon Jan 12 12:46:59 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Jan 2026 12:46:59 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v9] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: Fix ------------- Changes: https://git.openjdk.org/jdk/pull/26068/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=08 Stats: 18 lines in 3 files changed: 18 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From adinn at openjdk.org Mon Jan 12 14:07:41 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 12 Jan 2026 14:07:41 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: <6Aj0qmsVSD0iVIeK3Nesrbdzhn3r6OKfOnnGAeV66SQ=.2370d05a-0213-4d2a-8659-284c11a449dc@github.com> Message-ID: On Mon, 12 Jan 2026 12:30:43 GMT, Stefan Karlsson wrote: > and the bs->load_at call is a virtual call that could call into ZBarrierSetAssembler::load_at. > > It is still unclear to my in what way this is not GC-specific. Sorry if I'm missing something obvious here and thereby derailing this PR. Yes, I understand that there is a virtual call that enters a GC-specific barrier implementation. However, the issue is not whether GC-specific code is executed at generate time but whether the generated code that results differs from one GC to the next. When we added save and restore of i2c/c2i stubs to the AOT cache all the GCs for which AOT caching was then an option generated the same code if flags was passed as `IN_NATIVE | ON_PHANTOM_OOP_REF` i.e. a plain load (use of ZGC when generating or consuming an AOT cache was not an option at that point). That meant that an i2c/c2i adapter code generated in the assembly run and saved to the cache was suitable for use in a production run whatevre the GC setting. Are you saying that for this specific flag combination the code generated by `ZBarrierSetAssembler::load_at` as currently implemented in *mainline* is more than just a simple load? Clearly, that is true in the Valhalla tree where the error reported by @coleenp indicates the barrier is generating code that references `ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr` etc. However, she suggested this was not an issue in mainline. Does the mainline code differ in some other way? If so then this would mean that the assumption made above has been weakened and that an AOT cache generated with ZGC enabled can only be used in a production run using ZGC and vice versa. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3738724324 From adinn at openjdk.org Mon Jan 12 14:14:19 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 12 Jan 2026 14:14:19 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. Also: The ability to use a different GC in assembly and production runs is not something we necessarily need to support. Switching in and out alternative JVM configurations is arguably a bad idea in that it is likely to increase the divergence between what gets run in training and what gets executed in production. However, we do need to clearly document what will not work and, in cases where it might cause a potential error, reject use of an AOT cache. So, it would be good to confirm what the precise limitations are in the case of ZGC. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3738746466 From stefank at openjdk.org Mon Jan 12 14:27:53 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Jan 2026 14:27:53 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: <6Aj0qmsVSD0iVIeK3Nesrbdzhn3r6OKfOnnGAeV66SQ=.2370d05a-0213-4d2a-8659-284c11a449dc@github.com> Message-ID: <421ijmW9qZwmopebtIuH9ZNG9vp6nRvRnvI7Q4E9H_4=.570b3b20-2c0a-4a7a-b7ff-db6b5eed60e5@github.com> On Mon, 12 Jan 2026 14:05:38 GMT, Andrew Dinn wrote: > Are you saying that for this specific flag combination the code generated by ZBarrierSetAssembler::load_at as currently implemented in mainline is more than just a simple load? Yes, that's what I'm trying to say. If we follow the `ZBarrierSetAssembler::load_at` we see the call: ` __ call_VM_leaf(ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr(decorators), 2);` ```void ZBarrierSetAssembler::load_at(MacroAssembler* masm, DecoratorSet decorators, BasicType type, Register dst, Address src, Register tmp1, Register tmp2) { if (!ZBarrierSet::barrier_needed(decorators, type)) { // Barrier not needed BarrierSetAssembler::load_at(masm, decorators, type, dst, src, tmp1, tmp2); return; } assert_different_registers(tmp1, tmp2, src.base(), noreg); assert_different_registers(tmp1, tmp2, src.index()); assert_different_registers(tmp1, tmp2, dst, noreg); assert_different_registers(tmp2, rscratch1); Label done; Label uncolor; // Load bad mask into scratch register. const bool on_non_strong = (decorators & ON_WEAK_OOP_REF) != 0 || (decorators & ON_PHANTOM_OOP_REF) != 0; if (on_non_strong) { __ ldr(tmp1, mark_bad_mask_from_thread(rthread)); } else { __ ldr(tmp1, load_bad_mask_from_thread(rthread)); } __ lea(tmp2, src); __ ldr(dst, tmp2); // Test reference against bad mask. If mask bad, then we need to fix it up. __ tst(dst, tmp1); __ br(Assembler::EQ, uncolor); { // Call VM ZRuntimeCallSpill rcs(masm, dst); if (c_rarg0 != dst) { __ mov(c_rarg0, dst); } __ mov(c_rarg1, tmp2); __ call_VM_leaf(ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr(decorators), 2); } > If so then this would mean that the assumption made above has been weakened and that an AOT cache generated with ZGC enabled can only be used in a production run using ZGC and vice versa. OK. Maybe @fisk has thought about this in the context of Leyden? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3738806713 From stefank at openjdk.org Mon Jan 12 14:35:48 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Jan 2026 14:35:48 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 14:10:39 GMT, Andrew Dinn wrote: > Also: The ability to use a different GC in assembly and production runs is not something we necessarily need to support. Switching in and out alternative JVM configurations is arguably a bad idea in that it is likely to increase the divergence between what gets run in training and what gets executed in production. However, we do need to clearly document what will not work and, in cases where it might cause a potential error, reject use of an AOT cache. So, it would be good to confirm what the precise limitations are in the case of ZGC. There are still something that lingers unresolved here. For weak handles we have two operations: 1) "resolve" ZGC, Shenandoah, G1 has GC-specific code that handles this 2) "peek" Maybe only ZGC has GC-specific code for this resolve_weak_handle is a "resolve" operation and the three mentioned GCs have GC-specific code for that. At the same time Aleksey mentioned that we were mostly "peeking" in OopHandles. So, I'm just wondering if we have started "resolving" OopHandles, where we previously only "peeked"? (I have not tried to look at the code or follow the history for this) ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3738834130 From shade at openjdk.org Mon Jan 12 14:38:26 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Jan 2026 14:38:26 GMT Subject: [jdk26] RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 10:58:40 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [6ae3e064](https://github.com/openjdk/jdk/commit/6ae3e064352a56c5be140fba1ad6d040219432b0) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 5 Jan 2026 and was reviewed by Christian Hagedorn and Dean Long. > > Thanks! Marked as reviewed by shade (Reviewer). GHA failure is due to (the absence of) https://github.com/openjdk/jdk/pull/29161, which is now in JDK 26. ------------- PR Review: https://git.openjdk.org/jdk/pull/29166#pullrequestreview-3651080408 PR Comment: https://git.openjdk.org/jdk/pull/29166#issuecomment-3738842582 From rcastanedalo at openjdk.org Mon Jan 12 14:44:01 2026 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Jan 2026 14:44:01 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v13] In-Reply-To: References: Message-ID: On Tue, 23 Dec 2025 18:08:10 GMT, Quan Anh Mai wrote: > What do you think? Is leaving those default constructions fine, or which is the more preferable solution? Thanks! Leaving the default constructions should be fine I think, we should not compromise code readability beyond what is necessary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3738865498 From rcastanedalo at openjdk.org Mon Jan 12 14:44:05 2026 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Jan 2026 14:44:05 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v16] In-Reply-To: <2EiCLUhS6yKEXp8RQlAOmaZrwvxPghYYw6u_PW5m2iM=.3ad470ab-c926-4fe4-bf12-8872ce22c7ef@github.com> References: <2EiCLUhS6yKEXp8RQlAOmaZrwvxPghYYw6u_PW5m2iM=.3ad470ab-c926-4fe4-bf12-8872ce22c7ef@github.com> Message-ID: On Mon, 29 Dec 2025 14:53:46 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 2. Fold a pointer `Phi`. >> >> Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. >> >> Another interesting case: >> >> Point p = Phi(p1, p2); >> p.x = v; >> p1.x = v1; >> int a = p.x; >> >> Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. >> >> 3. Nested objects >> >> It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: >> >> Point p = new Point; >> PointHolder h = new PointHolder; >> h.p = p; >> int x = p.x; >> escape(h); >> >> Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. >> >> Please... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Fix escape at store src/hotspot/share/opto/memnode.cpp line 708: > 706: > 707: Node* mem = in(MemNode::Memory); // start searching here... > 708: Would it make sense to check and bail out early for some trivial non-candidates here? It feels a bit wasteful e.g. to run the LocalEA machinery for loads from `ThreadLocal`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2682571384 From aseoane at openjdk.org Mon Jan 12 14:50:38 2026 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Mon, 12 Jan 2026 14:50:38 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review [v2] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 10:08:33 GMT, Anton Seoane Ampudia wrote: >> This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). >> >> `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. >> >> This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. >> >> **Testing:** passes tiers 1-4 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Delete more unused code Any reviews, please? (@theRealAph maybe?) Any reviews, please? (@theRealAph maybe?) -- [my last comment got ignored by the bridgebot] ------------- PR Comment: https://git.openjdk.org/jdk/pull/28473#issuecomment-3660873945 PR Comment: https://git.openjdk.org/jdk/pull/28473#issuecomment-3738906326 From qamai at openjdk.org Mon Jan 12 15:05:29 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 15:05:29 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: References: Message-ID: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > ## The current PR: > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > ## Future work: > > 1. Fold a memory `Phi`. > > This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. > > 2. Fold a pointer `Phi`. > > Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: > > Point p1 = new Point; > Point p2 = new Point; > p1.x = v1; > p2.x = v2; > Point p = Phi(p1, p2); > int a = p.x; > > Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. > > Another interesting case: > > Point p = Phi(p1, p2); > p.x = v; > p1.x = v1; > int a = p.x; > > Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. > > 3. Nested objects > > It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: > > Point p = new Point; > PointHolder h = new PointHolder; > h.p = p; > int x = p.x; > escape(h); > > Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - Merge branch 'master' into loadfoldingigvn - Early return when not a heap access - Fix escape at store - Fix outdated and unclear comments - copyright year, return, comments, whitespace - Merge branch 'master' into loadfoldingigvn - ea of phis and nested objects - Add test scenarios - Add a flag to turn off the feature - Much more comments, refactor the data into a separate class - ... and 9 more: https://git.openjdk.org/jdk/compare/a371025e...c275e6e6 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/06fb10fe..c275e6e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=15-16 Stats: 18971 lines in 1438 files changed: 3853 ins; 2027 del; 13091 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Mon Jan 12 15:05:31 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 15:05:31 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v16] In-Reply-To: References: <2EiCLUhS6yKEXp8RQlAOmaZrwvxPghYYw6u_PW5m2iM=.3ad470ab-c926-4fe4-bf12-8872ce22c7ef@github.com> Message-ID: On Mon, 12 Jan 2026 14:41:30 GMT, Roberto Casta?eda Lozano wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix escape at store > > src/hotspot/share/opto/memnode.cpp line 708: > >> 706: >> 707: Node* mem = in(MemNode::Memory); // start searching here... >> 708: > > Would it make sense to check and bail out early for some trivial non-candidates here? It feels a bit wasteful e.g. to run the LocalEA machinery for loads from `ThreadLocal`. That seems reasonable. I added an early return case when the base that is accessed is not an oop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2682643157 From adinn at openjdk.org Mon Jan 12 15:06:47 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 12 Jan 2026 15:06:47 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: <421ijmW9qZwmopebtIuH9ZNG9vp6nRvRnvI7Q4E9H_4=.570b3b20-2c0a-4a7a-b7ff-db6b5eed60e5@github.com> References: <6Aj0qmsVSD0iVIeK3Nesrbdzhn3r6OKfOnnGAeV66SQ=.2370d05a-0213-4d2a-8659-284c11a449dc@github.com> <421ijmW9qZwmopebtIuH9ZNG9vp6nRvRnvI7Q4E9H_4=.570b3b20-2c0a-4a7a-b7ff-db6b5eed60e5@github.com> Message-ID: On Mon, 12 Jan 2026 14:24:06 GMT, Stefan Karlsson wrote: > If we follow the ZBarrierSetAssembler::load_at we see the call: > ` __ call_VM_leaf(ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr(decorators), 2);` Hmm, well if that call is being generated then any attempt to save an AOT cache when using ZGC ought fail as it did in the Valhalla case because the generated code will reference an unknown external address. I just checked head and `load_barrier_on_oop_field_preloaded_addr` is not added as an external address. Oddly though `load_barrier_on_phantom_oop_field_preloaded_addr` is included -- it ought not to be needed as no saved code should be referring to it. That may be a hangover from when the code was copied from Leyden premain but it also suggests that maybe something has changed in the Z barrier code and the AOT support is not up to date with it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3739011831 From stefank at openjdk.org Mon Jan 12 15:24:03 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Jan 2026 15:24:03 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: <6Aj0qmsVSD0iVIeK3Nesrbdzhn3r6OKfOnnGAeV66SQ=.2370d05a-0213-4d2a-8659-284c11a449dc@github.com> <421ijmW9qZwmopebtIuH9ZNG9vp6nRvRnvI7Q4E9H_4=.570b3b20-2c0a-4a7a-b7ff-db6b5eed60e5@github.com> Message-ID: On Mon, 12 Jan 2026 15:04:42 GMT, Andrew Dinn wrote: > > If we follow the ZBarrierSetAssembler::load_at we see the call: > > ` __ call_VM_leaf(ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr(decorators), 2);` > > Hmm, well if that call is being generated then any attempt to save an AOT cache when using ZGC ought fail as it did in the Valhalla case because the generated code will reference an unknown external address. I just checked head and `load_barrier_on_oop_field_preloaded_addr` is not added as an external address. Oddly though `load_barrier_on_phantom_oop_field_preloaded_addr` is included -- it ought not to be needed as no saved code should be referring to it. Ahh. I see now that there are two overloads named `load_barrier_on_oop_field_preloaded_addr` and the one I referred to above returns `load_barrier_on_phantom_oop_field_preloaded_addr`. This function could probably have a clearer name: address ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr(DecoratorSet decorators) { if (decorators & AS_NO_KEEPALIVE) { if (decorators & ON_PHANTOM_OOP_REF) { return no_keepalive_load_barrier_on_phantom_oop_field_preloaded_addr(); } else if (decorators & ON_WEAK_OOP_REF) { return no_keepalive_load_barrier_on_weak_oop_field_preloaded_addr(); } else { assert((decorators & ON_STRONG_OOP_REF), "Expected type"); // Normal loads on strong oop never keep objects alive return load_barrier_on_oop_field_preloaded_addr(); } } else { if (decorators & ON_PHANTOM_OOP_REF) { return load_barrier_on_phantom_oop_field_preloaded_addr(); } else if (decorators & ON_WEAK_OOP_REF) { return load_barrier_on_weak_oop_field_preloaded_addr(); } else { assert((decorators & ON_STRONG_OOP_REF), "Expected type"); return load_barrier_on_oop_field_preloaded_addr(); } } } > That may be a hangover from when the code was copied from Leyden premain but it also suggests that maybe something has changed in the Z barrier code and the AOT support is not up to date with it? I don't think we have changed barriers in a long time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3739101221 From stefank at openjdk.org Mon Jan 12 15:33:40 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Jan 2026 15:33:40 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. And another ahh (See the `()` at the end of the line): SET_ADDRESS(_extrs, ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr()); SET_ADDRESS(_extrs, ZBarrierSetRuntime::load_barrier_on_phantom_oop_field_preloaded_addr()); The `SET_ADDRESS` is done with the value returned from calling the above functions. The Shenandoah barrier functions are referred to directly: SET_ADDRESS(_extrs, ShenandoahRuntime::write_barrier_pre); SET_ADDRESS(_extrs, ShenandoahRuntime::load_reference_barrier_phantom); SET_ADDRESS(_extrs, ShenandoahRuntime::load_reference_barrier_phantom_narrow); So, I think you need to make sure to look for reference to load_barrier_on_oop_field_preloaded and `load_barrier_on_phantom_oop_field_preloaded`. I don't know if that helps or not. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3739144000 From coleenp at openjdk.org Mon Jan 12 16:01:20 2026 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 12 Jan 2026 16:01:20 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: <0t_XRa5DYhFUJFlQdO3JuYBWqQgLP5I7f2cE7PJqRgM=.348276f4-9bb4-4dc1-bbb0-c33dbba7d2de@github.com> On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. I're read these comments and @TobiHartmann in the JBS issue: https://bugs.openjdk.org/browse/JDK-8374828?focusedId=14845838&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14845838 I had assumed that if the AOT cache was created with ZGC, it must be used with ZGC. I didn't think the adapters could be GC agnostic in the current code, or see no evidence of it. I could make this change simply in the Valhalla repo because it fixes crashes, but wanted to do it in mainline because it'll get lost and forgotten in valhalla with other things. I could file a new issue to reexamine this (but is including the other ZGC barrier method -load_barrier_on_phantom_oop_field_preloaded_addr - wrong but not this one?). Should it be a mainline issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3739289225 From adinn at openjdk.org Mon Jan 12 17:39:35 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 12 Jan 2026 17:39:35 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. Marked as reviewed by adinn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29129#pullrequestreview-3651923453 From adinn at openjdk.org Mon Jan 12 17:39:37 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 12 Jan 2026 17:39:37 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: <6Aj0qmsVSD0iVIeK3Nesrbdzhn3r6OKfOnnGAeV66SQ=.2370d05a-0213-4d2a-8659-284c11a449dc@github.com> <421ijmW9qZwmopebtIuH9ZNG9vp6nRvRnvI7Q4E9H_4=.570b3b20-2c0a-4a7a-b7ff-db6b5eed60e5@github.com> Message-ID: On Mon, 12 Jan 2026 15:20:45 GMT, Stefan Karlsson wrote: > Ahh. I see now that there are two overloads named load_barrier_on_oop_field_preloaded_addr and the one I referred to above returns load_barrier_on_phantom_oop_field_preloaded_addr. This function could probably have a clearer name Ok, so that explains why mainline is adding whatever address `load_barrier_on_phantom_oop_field_preloaded_addr()` returns to the external addresses list and hence why we don't see a crash in mainline when using an AOT cache with ZGC. I failed to spot that this was being done when Vladimir committed the associated patch to mainline. However, the fact that ZGC requires a barrier for an off-heap phantom oop load does mean that we must stick with ZGC in production when we use it in assembly and vice versa (while we can safely mix and match Serial, Parallel, G1 and Shenandoah GCs). @coleenp Looking at the JIRA I understand now how these other two target addresses get embedded in adapters generated with the Valhalla but are never referenced from adapters generated in mainline. There is no real danger of these two address registrations being forgotten about when the Valhalla i2c/c2i changes are upstreamed. All AOT save/restore tests will find that they are missing and throw an exception. It won't cause any harm to push this in mainline before Valhalla lands (although there is always the possibility that someone performing a zealous cleanup will take them out again ;-). So, go ahead and commit if you want. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3739700481 From missa at openjdk.org Mon Jan 12 19:23:06 2026 From: missa at openjdk.org (Mohamed Issa) Date: Mon, 12 Jan 2026 19:23:06 GMT Subject: [jdk26] RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX Message-ID: Hi all, This pull request contains a backport of commit [640343f7](https://github.com/openjdk/jdk/commit/640343f7d94894b0378ea5b1768eeac203a9aaf8) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Jatin Bhateja on 7 Jan 2026 and was reviewed by Sandhya Viswanathan. Thanks! ------------- Commit messages: - Backport 640343f7d94894b0378ea5b1768eeac203a9aaf8 Changes: https://git.openjdk.org/jdk/pull/29176/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29176&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373724 Stats: 78 lines in 1 file changed: 2 ins; 1 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/29176.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29176/head:pull/29176 PR: https://git.openjdk.org/jdk/pull/29176 From dlunden at openjdk.org Mon Jan 12 19:38:30 2026 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 12 Jan 2026 19:38:30 GMT Subject: [jdk26] RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 19:12:34 GMT, Mohamed Issa wrote: > Hi all, > > This pull request contains a backport of commit [640343f7](https://github.com/openjdk/jdk/commit/640343f7d94894b0378ea5b1768eeac203a9aaf8) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jatin Bhateja on 7 Jan 2026 and was reviewed by Sandhya Viswanathan. > > Thanks! Looks good, thanks! ------------- Marked as reviewed by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/29176#pullrequestreview-3652385884 From sviswanathan at openjdk.org Mon Jan 12 19:38:31 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 12 Jan 2026 19:38:31 GMT Subject: [jdk26] RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 19:12:34 GMT, Mohamed Issa wrote: > Hi all, > > This pull request contains a backport of commit [640343f7](https://github.com/openjdk/jdk/commit/640343f7d94894b0378ea5b1768eeac203a9aaf8) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jatin Bhateja on 7 Jan 2026 and was reviewed by Sandhya Viswanathan. > > Thanks! Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29176#pullrequestreview-3652408161 From qamai at openjdk.org Mon Jan 12 19:42:19 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 12 Jan 2026 19:42:19 GMT Subject: RFR: 8374435: "assert(addp->is_AddP()) failed: must be AddP" failed intermittently when running tools/jpackage/share/AsyncTest.java with ZGC Message-ID: Hi, This PR fixes the assert in `ConnectionGraph::get_addp_base` during escape analysis. The issue here is that, after splitting a `Load` through a `Phi`, we try to adjust the `ConnectionGraph` as the split creates new nodes. As we visit each input of `data_phi`, we fail to take into consideration the possibility that the input can be folded to a `Load` but not from an `AddP`. This is the case for `Object::getClass`, as we load the `OopHandle` from the `Klass` object, then load the class mirror from that `OopHandle`. Since we are loading from raw memory, the base is not a scalar replaceable Java object. Similar to the case below, we are done processing this input. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - Fix comment - Fix assert addp->is_AddP() during EA Changes: https://git.openjdk.org/jdk/pull/29177/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29177&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374435 Stats: 92 lines in 2 files changed: 91 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29177.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29177/head:pull/29177 PR: https://git.openjdk.org/jdk/pull/29177 From eosterlund at openjdk.org Mon Jan 12 22:50:33 2026 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 12 Jan 2026 22:50:33 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. I think AOT code caching will require assembly and prod run to not change GC. This does not concern me too much. It seems like a weird use case to train in a different way than you are running. Just don't do that. So the use case is not obvious in a Leyden context, and fixing this is fiddly although doable and we have tried some T-shirts. But that would likely be out of scope of the first version of the AOT compiled code caching. What I care more about is the AOT cache shipped with the JDK, which does not embed any compiled code. It's important that these archives are offered for all GCs, despite not knowing which GC will be selected. Is that still the case? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3740859802 From duke at openjdk.org Mon Jan 12 22:57:43 2026 From: duke at openjdk.org (duke) Date: Mon, 12 Jan 2026 22:57:43 GMT Subject: [jdk26] RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 19:12:34 GMT, Mohamed Issa wrote: > Hi all, > > This pull request contains a backport of commit [640343f7](https://github.com/openjdk/jdk/commit/640343f7d94894b0378ea5b1768eeac203a9aaf8) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jatin Bhateja on 7 Jan 2026 and was reviewed by Sandhya Viswanathan. > > Thanks! @missa-prime Your change (at version 3bf6ac253c2ca630844b1d9910c5a1c3763916c5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29176#issuecomment-3740877807 From missa at openjdk.org Mon Jan 12 23:02:43 2026 From: missa at openjdk.org (Mohamed Issa) Date: Mon, 12 Jan 2026 23:02:43 GMT Subject: [jdk26] Integrated: 8373724: Assertion failure in TestSignumVector.java with UseAPX In-Reply-To: References: Message-ID: <1ygfCXcjicqEZp3pP23VhLphu_Fa6OsNRyJqp6K1y_E=.3bce68d5-eade-40bb-9d21-be31a1810bba@github.com> On Mon, 12 Jan 2026 19:12:34 GMT, Mohamed Issa wrote: > Hi all, > > This pull request contains a backport of commit [640343f7](https://github.com/openjdk/jdk/commit/640343f7d94894b0378ea5b1768eeac203a9aaf8) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jatin Bhateja on 7 Jan 2026 and was reviewed by Sandhya Viswanathan. > > Thanks! This pull request has now been integrated. Changeset: 2da14e26 Author: Mohamed Issa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/2da14e26e9a4c00bdee5603b5bfd975eab024d7e Stats: 78 lines in 1 file changed: 2 ins; 1 del; 75 mod 8373724: Assertion failure in TestSignumVector.java with UseAPX Reviewed-by: dlunden, sviswanathan Backport-of: 640343f7d94894b0378ea5b1768eeac203a9aaf8 ------------- PR: https://git.openjdk.org/jdk/pull/29176 From qamai at openjdk.org Tue Jan 13 03:31:36 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 03:31:36 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 19:49:52 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Move test, fix merge garbage > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Typo > - assert > - refactorings > - Typo > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Cleanup > - identity hash support in C2 > - ... and 2 more: https://git.openjdk.org/jdk/compare/91227712...67a3954f src/hotspot/share/ci/ciArray.cpp line 93: > 91: // Returns T_ILLEGAL if there is no element at the given index. > 92: ciConstant ciArray::element_value(int index) { > 93: assert(index >= 0, "out-of-bounds index: %d", index); IIUC, this is because you use `-1` as the offset for hashcode, so you need to make sure we are accessing a real element here, or the cache access will return something dubious. I think it is then more uniform to save the value at the cache using the offset instead of the element index. src/hotspot/share/ci/ciObject.cpp line 233: > 231: // Observed value is cached, so it doesn't change during compilation. > 232: ciConstant ciObject::identity_hash() { > 233: if (!is_null_object()) { Just a very small nitpick: Usually it is recommended to do an early return pattern instead. src/hotspot/share/ci/ciObject.hpp line 76: > 74: }; > 75: > 76: const int IDENTITY_HASH_OFFSET = -1; `const` is fine, but `constexpr` is often preferred. Also, is `static` needed here? Another nitpick is that constants are usually not in uppercase in C++, as macros are often in uppercase. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2684670938 PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2684645067 PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2684655292 From qamai at openjdk.org Tue Jan 13 03:31:36 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 03:31:36 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 03:16:44 GMT, Quan Anh Mai wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Move test, fix merge garbage >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const >> - Typo >> - assert >> - refactorings >> - Typo >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const >> - Cleanup >> - identity hash support in C2 >> - ... and 2 more: https://git.openjdk.org/jdk/compare/91227712...67a3954f > > src/hotspot/share/ci/ciObject.hpp line 76: > >> 74: }; >> 75: >> 76: const int IDENTITY_HASH_OFFSET = -1; > > `const` is fine, but `constexpr` is often preferred. Also, is `static` needed here? Another nitpick is that constants are usually not in uppercase in C++, as macros are often in uppercase. It is also useful to note what this value is. It is not clear at first glance why offset is -1 here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2684658970 From galder at openjdk.org Tue Jan 13 05:07:53 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 13 Jan 2026 05:07:53 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v9] In-Reply-To: References: Message-ID: <06u---XvC4cu2hGxf11yerkUbqN7mEQsQZFHpxrZEXQ=.b24603bb-192a-4dfd-9ea1-be7802f10382@github.com> On Mon, 12 Jan 2026 09:01:44 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Remove exclude or Min/Max in verify identity" > > This reverts commit cf24abad55db9a320930379c4f0f3154791d26e2. The CI failed with an infrastructure issue when building mac debug build. Can someone move this PR forward? The hosted runner lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28895#issuecomment-3741959506 From galder at openjdk.org Tue Jan 13 05:34:46 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 13 Jan 2026 05:34:46 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v8] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 18:34:58 GMT, Jatin Bhateja wrote: >> test/hotspot/jtreg/compiler/lib/generators/Generators.java line 375: >> >>> 373: * @return Random float16 generator. >>> 374: */ >>> 375: public Generator float16s() { >> >> Why do you not generate a `Float16` here? This here would probably conflict with a future `Short` generator which we might add in the future.... > > To avoid dependency on an incubating module. I was trying to use this API today and I was wondering the exact same thing. I would have expected this to be `Generator float16s()`. I can see @jatin-bhateja's point but this should have been noted in the code for future readers? Is this being tracked somewhere? Anyway, again for future readers, this is what I've done to actually get a `Float16[]`, to avoid the need to first generate a `short[]` and fill that before transforming it to to `Float16[]`: private static Float16[] input_47 = new Float16[10000]; private static final Generator GEN_input_47 = Generators.G.float16s(); static void fill_input_47(Float16[] a) { for (int i = 0; i < a.length; i++) { a[i] = Float16.shortBitsToFloat16(GEN_input_47.next()); } } static { fill_input_47(input_47); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2684887233 From aboldtch at openjdk.org Tue Jan 13 06:51:59 2026 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Jan 2026 06:51:59 GMT Subject: RFR: 8374450: GTest opto.canonicalize_constraints cannot run without VM In-Reply-To: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> References: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> Message-ID: On Mon, 12 Jan 2026 06:28:19 GMT, Axel Boldt-Christmas wrote: > The `opto.canonicalize_constraints` test explicitly uses symbols which are setup in `Type::Initialize_shared` (which seems to happen as a side effect of generating stubs at VM start. Also see comment in `Type::Initialize_shared`). So the test is required to be a VM test. > > * Testing > * GHA > * Verified that it `opto.canonicalize_constraints` now does not segmentation fault when run in isolation. > * Tier 1 on Oracle supported platforms Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29159#issuecomment-3742288012 From aboldtch at openjdk.org Tue Jan 13 06:54:23 2026 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Jan 2026 06:54:23 GMT Subject: Integrated: 8374450: GTest opto.canonicalize_constraints cannot run without VM In-Reply-To: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> References: <1ApOTqaq-f4GnIBX4NkCQAw9x8SOozI0Y0EouKZ23xM=.57c1b9c0-70c6-42d0-b568-d6e45bdb845a@github.com> Message-ID: On Mon, 12 Jan 2026 06:28:19 GMT, Axel Boldt-Christmas wrote: > The `opto.canonicalize_constraints` test explicitly uses symbols which are setup in `Type::Initialize_shared` (which seems to happen as a side effect of generating stubs at VM start. Also see comment in `Type::Initialize_shared`). So the test is required to be a VM test. > > * Testing > * GHA > * Verified that it `opto.canonicalize_constraints` now does not segmentation fault when run in isolation. > * Tier 1 on Oracle supported platforms This pull request has now been integrated. Changeset: 586846b8 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/586846b84a38d285c5905437e903cfc57f609410 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8374450: GTest opto.canonicalize_constraints cannot run without VM Reviewed-by: qamai, thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/29159 From hgreule at openjdk.org Tue Jan 13 07:35:28 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 13 Jan 2026 07:35:28 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v3] In-Reply-To: References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: On Thu, 8 Jan 2026 08:52:32 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: >> >> t1 = int:0 >> t2 = int:-2..3, widen = 3 >> >> Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. >> >> The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into widen > - copyright year > - Merge branch 'master' into widen > - RangeInference::infer should ensure correct value of _widen The change looks good. I wonder if it's worth to add a comment about it somewhere? ------------- Marked as reviewed by hgreule (Committer). PR Review: https://git.openjdk.org/jdk/pull/28952#pullrequestreview-3654272073 From epeter at openjdk.org Tue Jan 13 07:50:15 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 Jan 2026 07:50:15 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float Message-ID: I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. But it seems that nothing prevents the VM from compiling such an (unreachable) path. Here is how I think it happens: - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. ------------- Commit messages: - Merge branch 'master' into JDK-8374889-VectorAPI-convert0-ucast - fix - JDK-8374889 Changes: https://git.openjdk.org/jdk/pull/29169/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29169&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374889 Stats: 102 lines in 2 files changed: 101 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29169/head:pull/29169 PR: https://git.openjdk.org/jdk/pull/29169 From qamai at openjdk.org Tue Jan 13 07:50:16 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 07:50:16 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 13:34:28 GMT, Emanuel Peter wrote: > I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 > > In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 > > When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. > > At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. > But it seems that nothing prevents the VM from compiling such an (unreachable) path. > > Here is how I think it happens: > - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. > - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. > - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. > > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 > > That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. src/hotspot/share/opto/vectorIntrinsics.cpp line 2338: > 2336: // expected to violate this at runtime, but we may compile unreachable code > 2337: // where such impossible combinations arise. > 2338: if (is_ucast && (!is_integral_type(elem_bt_from) || elem_bt_from == T_LONG)) { I would suggest eagerly killing this path to be extra sure that it is indeed a dead path and we are not encountering some other issue here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2683626829 From epeter at openjdk.org Tue Jan 13 07:50:16 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 Jan 2026 07:50:16 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float In-Reply-To: References: Message-ID: <2_LedgpPkdp6NCH-9U3_xw3dJfHqiWklnsc3XosnUlk=.e0438e35-c139-4498-8a48-d5a5fec2c024@github.com> On Mon, 12 Jan 2026 19:36:04 GMT, Quan Anh Mai wrote: >> I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 >> >> In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 >> >> When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. >> >> At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. >> But it seems that nothing prevents the VM from compiling such an (unreachable) path. >> >> Here is how I think it happens: >> - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. >> - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. >> - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. >> >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 >> >> That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2338: > >> 2336: // expected to violate this at runtime, but we may compile unreachable code >> 2337: // where such impossible combinations arise. >> 2338: if (is_ucast && (!is_integral_type(elem_bt_from) || elem_bt_from == T_LONG)) { > > I would suggest eagerly killing this path to be extra sure that it is indeed a dead path and we are not encountering some other issue here. Hmm, indeed, we could try to put a `Halt` node here, right? @merykitty How exactly would you do that? Are there places we already do that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2685230666 From epeter at openjdk.org Tue Jan 13 07:50:18 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 Jan 2026 07:50:18 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 13:34:28 GMT, Emanuel Peter wrote: > I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 > > In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 > > When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. > > At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. > But it seems that nothing prevents the VM from compiling such an (unreachable) path. > > Here is how I think it happens: > - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. > - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. > - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. > > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 > > That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. src/hotspot/share/opto/vectorIntrinsics.cpp line 2343: > 2341: } > 2342: > 2343: int cast_vopc = VectorCastNode::opcode(-1, elem_bt_from, !is_ucast); Note: the relevant precondition is: https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2682460727 From chagedorn at openjdk.org Tue Jan 13 08:04:26 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 13 Jan 2026 08:04:26 GMT Subject: RFR: 8374435: "assert(addp->is_AddP()) failed: must be AddP" failed intermittently when running tools/jpackage/share/AsyncTest.java with ZGC In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 19:32:53 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the assert in `ConnectionGraph::get_addp_base` during escape analysis. The issue here is that, after splitting a `Load` through a `Phi`, we try to adjust the `ConnectionGraph` as the split creates new nodes. As we visit each input of `data_phi`, we fail to take into consideration the possibility that the input can be folded to a `Load` but not from an `AddP`. This is the case for `Object::getClass`, as we load the `OopHandle` from the `Klass` object, then load the class mirror from that `OopHandle`. > > Since we are loading from raw memory, the base is not a scalar replaceable Java object. Similar to the case below, we are done processing this input. > > Please take a look and leave your reviews, thanks a lot. Two small comments, otherwise, the fix looks good to me, thanks! src/hotspot/share/opto/escape.cpp line 1090: > 1088: // > 1089: // In this case, o1 is folded to o.getClass() which is a Load but not from an AddP, but from > 1090: // an OopHandle that is loaded from the Klass of o. Nice summary! test/hotspot/jtreg/compiler/escapeAnalysis/Test8374435.java line 32: > 30: * @summary assert during escape analysis when splitting a Load through a Phi does not result in a > 31: * Phi of Loads > 32: * @run main/othervm -XX:-UseOnStackReplacement -XX:-UseCompressedOops ${test.main.class} The JBS title mentions ZGC but it seems it's not required. Can you update the title? test/hotspot/jtreg/compiler/escapeAnalysis/Test8374435.java line 34: > 32: * @run main/othervm -XX:-UseOnStackReplacement -XX:-UseCompressedOops ${test.main.class} > 33: */ > 34: public class Test8374435 { Can you rename the test into something more descriptive? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29177#pullrequestreview-3654349551 PR Review Comment: https://git.openjdk.org/jdk/pull/29177#discussion_r2685285856 PR Review Comment: https://git.openjdk.org/jdk/pull/29177#discussion_r2685260071 PR Review Comment: https://git.openjdk.org/jdk/pull/29177#discussion_r2685257634 From roland at openjdk.org Tue Jan 13 08:14:19 2026 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 13 Jan 2026 08:14:19 GMT Subject: [jdk26] RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 11:59:37 GMT, Tobias Hartmann wrote: >> Hi all, >> >> This pull request contains a backport of commit [6ae3e064](https://github.com/openjdk/jdk/commit/6ae3e064352a56c5be140fba1ad6d040219432b0) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Roland Westrelin on 5 Jan 2026 and was reviewed by Christian Hagedorn and Dean Long. >> >> Thanks! > > Looks good and trivial to me. @TobiHartmann @shipilev thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29166#issuecomment-3742684750 From roland at openjdk.org Tue Jan 13 08:14:21 2026 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 13 Jan 2026 08:14:21 GMT Subject: [jdk26] Integrated: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: <0Ue3hWwJaw2E0capJaQw-93PIQhlaDdrx-MIltrcU7A=.1d4c4e59-402a-4e6c-8010-260b892aafe3@github.com> On Mon, 12 Jan 2026 10:58:40 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [6ae3e064](https://github.com/openjdk/jdk/commit/6ae3e064352a56c5be140fba1ad6d040219432b0) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 5 Jan 2026 and was reviewed by Christian Hagedorn and Dean Long. > > Thanks! This pull request has now been integrated. Changeset: d87c05ca Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/d87c05ca8d219c1917dd4c5becb5803172b6eeaa Stats: 159 lines in 3 files changed: 159 ins; 0 del; 0 mod 8373508: C2: sinking CreateEx out of loop breaks the graph Reviewed-by: thartmann, shade Backport-of: 6ae3e064352a56c5be140fba1ad6d040219432b0 ------------- PR: https://git.openjdk.org/jdk/pull/29166 From qamai at openjdk.org Tue Jan 13 08:28:00 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 08:28:00 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float In-Reply-To: <2_LedgpPkdp6NCH-9U3_xw3dJfHqiWklnsc3XosnUlk=.e0438e35-c139-4498-8a48-d5a5fec2c024@github.com> References: <2_LedgpPkdp6NCH-9U3_xw3dJfHqiWklnsc3XosnUlk=.e0438e35-c139-4498-8a48-d5a5fec2c024@github.com> Message-ID: On Tue, 13 Jan 2026 07:43:23 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 2338: >> >>> 2336: // expected to violate this at runtime, but we may compile unreachable code >>> 2337: // where such impossible combinations arise. >>> 2338: if (is_ucast && (!is_integral_type(elem_bt_from) || elem_bt_from == T_LONG)) { >> >> I would suggest eagerly killing this path to be extra sure that it is indeed a dead path and we are not encountering some other issue here. > > Hmm, indeed, we could try to put a `Halt` node here, right? > @merykitty How exactly would you do that? Are there places we already do that? If you look at the end of `GraphKit::uncommon_trap`, the procedure would look like this: HaltNode* halt = new HaltNode(control(), frameptr(), "uncommon trap returned which should never happen" PRODUCT_ONLY(COMMA /*reachable*/false)); _gvn.set_type_bottom(halt); root()->add_req(halt); stop_and_kill_map(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2685366719 From qamai at openjdk.org Tue Jan 13 08:38:46 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 08:38:46 GMT Subject: RFR: 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops [v2] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 07:52:23 GMT, Christian Hagedorn wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> rename test file > > test/hotspot/jtreg/compiler/escapeAnalysis/Test8374435.java line 34: > >> 32: * @run main/othervm -XX:-UseOnStackReplacement -XX:-UseCompressedOops ${test.main.class} >> 33: */ >> 34: public class Test8374435 { > > Can you rename the test into something more descriptive? Done! > test/hotspot/jtreg/compiler/escapeAnalysis/TestSplitLoadThroughPhiDuringEA.java line 32: > >> (failed to retrieve contents of file, check the PR for context) > The JBS title mentions ZGC but it seems it's not required. Can you update the title? I have updated it, the important bit here is `-UseCompressedOops`, which is implied by ZGC ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29177#discussion_r2685392551 PR Review Comment: https://git.openjdk.org/jdk/pull/29177#discussion_r2685394603 From qamai at openjdk.org Tue Jan 13 08:38:43 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 08:38:43 GMT Subject: RFR: 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops [v2] In-Reply-To: References: Message-ID: > Hi, > > This PR fixes the assert in `ConnectionGraph::get_addp_base` during escape analysis. The issue here is that, after splitting a `Load` through a `Phi`, we try to adjust the `ConnectionGraph` as the split creates new nodes. As we visit each input of `data_phi`, we fail to take into consideration the possibility that the input can be folded to a `Load` but not from an `AddP`. This is the case for `Object::getClass`, as we load the `OopHandle` from the `Klass` object, then load the class mirror from that `OopHandle`. > > Since we are loading from raw memory, the base is not a scalar replaceable Java object. Similar to the case below, we are done processing this input. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: rename test file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29177/files - new: https://git.openjdk.org/jdk/pull/29177/files/06145038..ab7a2454 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29177&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29177&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29177.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29177/head:pull/29177 PR: https://git.openjdk.org/jdk/pull/29177 From shade at openjdk.org Tue Jan 13 08:44:33 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Jan 2026 08:44:33 GMT Subject: RFR: 8375055: C2: Better dead loop detection printout Message-ID: Chasing the maddeningly intermittent CTW failure. When C2 fails dead loop verification checks, it prints: # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/opto/phaseX.cpp:784), pid=64465, tid=917 # assert(no_dead_loop) failed: dead loop detected It also dumps the bad node graph to tty. This is not really convenient in automated testing and/or driver tests like CTW. With this fix, we now print: # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:781), pid=645409, tid=645480 # fatal error: Dead loop detected, node references itself # # Node: 1606 CastPP === 1 1606 [[ 3163 1460 1006 1006 1606 1111 ]] #instptr:java/lang/Object:NotNull+0,iid=bot floating narrowing dependency Oop:instptr:java/lang/Object:NotNull+0,iid=bot !orig=[1613],[1629],[1015],1442 !jvms: MethodInfo::get @ bci:236 (line 119) ...which allows to have more clues where things may go wrong, and allows to classify the failures better as well. Additional testing: - [x] Ad-hoc crashes with selected seeds - [x] Linux x86_64 server fastdebug, `hotspot_compiler` - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/29185/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29185&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375055 Stats: 35 lines in 1 file changed: 17 ins; 2 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/29185.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29185/head:pull/29185 PR: https://git.openjdk.org/jdk/pull/29185 From qamai at openjdk.org Tue Jan 13 08:53:39 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 08:53:39 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v4] In-Reply-To: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: > Hi, > > The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: > > t1 = int:0 > t2 = int:-2..3, widen = 3 > > Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. > > The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28952/files - new: https://git.openjdk.org/jdk/pull/28952/files/ecee9cff..ff7fd535 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28952&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28952&range=02-03 Stats: 15 lines in 1 file changed: 15 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28952.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28952/head:pull/28952 PR: https://git.openjdk.org/jdk/pull/28952 From qamai at openjdk.org Tue Jan 13 08:53:42 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 08:53:42 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v3] In-Reply-To: References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: <2a7qbQFcC4KXMQhqjUpsVTEmxxfJ-l7RC_UpUUENiNs=.33ec7e82-99c1-4592-9c0b-8a18e240781e@github.com> On Tue, 13 Jan 2026 07:31:53 GMT, Hannes Greule wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into widen >> - copyright year >> - Merge branch 'master' into widen >> - RangeInference::infer should ensure correct value of _widen > > The change looks good. I wonder if it's worth to add a comment about it somewhere? @SirYwell Thanks a lot for the reviews, I have added a comment about it in `RangeInference::infer_binary`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28952#issuecomment-3742923349 From xgong at openjdk.org Tue Jan 13 08:54:00 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 13 Jan 2026 08:54:00 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v2] In-Reply-To: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Fri, 9 Jan 2026 13:32:54 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more comments for vector nodes > > Nice work, thanks for taking the time for this, much appreciated! > > On the whole I'm super happy with this, but left a few extra comments :) Hi @eme64 , I updated the vector nodes part with adding comments for more vector nodes. Would you mind taking another look? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29130#issuecomment-3742907657 From xgong at openjdk.org Tue Jan 13 08:54:01 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 13 Jan 2026 08:54:01 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v2] In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 09:38:54 GMT, Quan Anh Mai wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more comments for vector nodes > > Marked as reviewed by qamai (Committer). Thanks so much for your review @merykitty ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29130#issuecomment-3742908865 From xgong at openjdk.org Tue Jan 13 08:54:04 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 13 Jan 2026 08:54:04 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v2] In-Reply-To: References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: <_HUfHvmarK-86th7vlKm6oj3W6s7JC-2DZf4jVBSuvc=.a7fbc6ff-60b9-4d62-aa36-a6c3a55d9653@github.com> On Mon, 12 Jan 2026 05:37:55 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/type.cpp line 2452: >> >>> 2450: // stored in a predicate/mask register. >>> 2451: // - Returns a normal vector type (i.e. TypeVectA ~ TypeVectZ) otherwise, where >>> 2452: // the vector mask is stored in a vector register. >> >> The first case is `PVectMask`, and the second `NVectMask`, right? > > Yes, correct. Do you mean I'd better comment these names as well? If so, I will refine the comment with next commit. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2685444345 From thartmann at openjdk.org Tue Jan 13 08:52:39 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 Jan 2026 08:52:39 GMT Subject: RFR: 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops [v2] In-Reply-To: References: Message-ID: <_Gw9f-fw29kIWVcWvEqLjk--D7Zr_n4G0EI-6yUthk8=.85cdfff4-5639-4454-9ca5-97d855b08a99@github.com> On Tue, 13 Jan 2026 08:38:43 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the assert in `ConnectionGraph::get_addp_base` during escape analysis. The issue here is that, after splitting a `Load` through a `Phi`, we try to adjust the `ConnectionGraph` as the split creates new nodes. As we visit each input of `data_phi`, we fail to take into consideration the possibility that the input can be folded to a `Load` but not from an `AddP`. This is the case for `Object::getClass`, as we load the `OopHandle` from the `Klass` object, then load the class mirror from that `OopHandle`. >> >> Since we are loading from raw memory, the base is not a scalar replaceable Java object. Similar to the case below, we are done processing this input. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > rename test file Looks good to me too. Thanks for fixing this! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29177#pullrequestreview-3654612771 From xgong at openjdk.org Tue Jan 13 08:53:58 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 13 Jan 2026 08:53:58 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v2] In-Reply-To: References: Message-ID: > The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific > features, making the related code in HotSpot difficult to understand and review. > > This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and > maintainability. > > Note: This patch only adds comments; no functional changes are made. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Add more comments for vector nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29130/files - new: https://git.openjdk.org/jdk/pull/29130/files/ed5e79f7..6782b7f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29130&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29130&range=00-01 Stats: 227 lines in 1 file changed: 54 ins; 130 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/29130.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29130/head:pull/29130 PR: https://git.openjdk.org/jdk/pull/29130 From qamai at openjdk.org Tue Jan 13 09:14:57 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 09:14:57 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 09:18:32 GMT, Xiaohong Gong wrote: > ### Problem: > > Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: > > > // A fatal error has been detected by the Java Runtime Environment: > // > // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 > // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector > // ... > > > The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: > > https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 > > ### Root Cause: > > The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. > > Here is the simplified ideal graph showing the crash scenario: > > > Con #top > | ConI > \ / > \ / > VectorStoreMask > | > VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong > > > ### Detailed Scenario: > > Following is the method in the test case that hits the assertion: > > https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 > > This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. > > When compiling a specific test case such as: > https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 > > the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: > > > VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() > / \ > AddP \ > | \ > LoadNClass \ > ConP #IntMaxMask | | > \ | | > \ DecodeNClass | > \ / | > \ / | > CmpP ... src/hotspot/share/opto/vectornode.cpp line 1923: > 1921: Node* mask = in1->in(1); > 1922: const TypeVect* mask_vt = mask->bottom_type()->isa_vect(); > 1923: if (mask_vt == nullptr) { It is better to filter the exact `Type::TOP` instance and assert that otherwise, this must be a `TypeVect`. Additionally, if the type of the input is `Type::TOP`, we can eagerly return `C->top()` to kill it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29057#discussion_r2685541505 From chagedorn at openjdk.org Tue Jan 13 09:24:44 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 13 Jan 2026 09:24:44 GMT Subject: RFR: 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops [v2] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 08:38:43 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the assert in `ConnectionGraph::get_addp_base` during escape analysis. The issue here is that, after splitting a `Load` through a `Phi`, we try to adjust the `ConnectionGraph` as the split creates new nodes. As we visit each input of `data_phi`, we fail to take into consideration the possibility that the input can be folded to a `Load` but not from an `AddP`. This is the case for `Object::getClass`, as we load the `OopHandle` from the `Klass` object, then load the class mirror from that `OopHandle`. >> >> Since we are loading from raw memory, the base is not a scalar replaceable Java object. Similar to the case below, we are done processing this input. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > rename test file Thanks for the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29177#pullrequestreview-3654766375 From xgong at openjdk.org Tue Jan 13 09:33:12 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 13 Jan 2026 09:33:12 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: <25g0wlqf-9SAO0wVIh8Allc1RzTOyuwflTOj2On19fU=.a8e2fb88-31ef-4337-ac4a-d9c7a43f6e74@github.com> On Tue, 13 Jan 2026 09:11:25 GMT, Quan Anh Mai wrote: >> ### Problem: >> >> Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: >> >> >> // A fatal error has been detected by the Java Runtime Environment: >> // >> // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 >> // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector >> // ... >> >> >> The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 >> >> ### Root Cause: >> >> The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. >> >> Here is the simplified ideal graph showing the crash scenario: >> >> >> Con #top >> | ConI >> \ / >> \ / >> VectorStoreMask >> | >> VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong >> >> >> ### Detailed Scenario: >> >> Following is the method in the test case that hits the assertion: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 >> >> This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. >> >> When compiling a specific test case such as: >> https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 >> >> the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: >> >> >> VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() >> / \ >> AddP \ >> | \ >> LoadNClass \ >> ConP #IntMaxMask | | >> \ | | >> \ DecodeN... > > src/hotspot/share/opto/vectornode.cpp line 1923: > >> 1921: Node* mask = in1->in(1); >> 1922: const TypeVect* mask_vt = mask->bottom_type()->isa_vect(); >> 1923: if (mask_vt == nullptr) { > > It is better to filter the exact `Type::TOP` instance and assert that otherwise, this must be a `TypeVect`. Additionally, if the type of the input is `Type::TOP`, we can eagerly return `C->top()` to kill it. Thanks for looking at this change. > It is better to filter the exact Type::TOP instance and assert that otherwise, this must be a TypeVect Agree, I cannot find a case that the the type is not a `TypeVect` except `TOP` . But I'd like to check the `nullptr` here to make the code more robust, in case the same crash happens for corner cases that I didn't find out. > Additionally, if the type of the input is Type::TOP, we can eagerly return C->top() to kill it. Correct. This would be the right direction as I commented as well in `Solution` part of the commit message. However, it's better to check `TOP` input for all vector nodes in the chain. We need more test cases for that. Consider this is missing for all vector nodes now, I'd like leave it as a separate topic. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29057#discussion_r2685604768 From mdoerr at openjdk.org Tue Jan 13 09:36:26 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 13 Jan 2026 09:36:26 GMT Subject: [jdk26] Integrated: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 09:57:46 GMT, Martin Doerr wrote: > Clean backport of [JDK-8374195](https://bugs.openjdk.org/browse/JDK-8374195). This pull request has now been integrated. Changeset: 2a2b704d Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/2a2b704d9cc3cc8e092fb5131ef501fb38effbed Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug Reviewed-by: shade Backport-of: e4e923a1ffc8ff059c983c7e9201d0ee3273482d ------------- PR: https://git.openjdk.org/jdk/pull/29162 From chagedorn at openjdk.org Tue Jan 13 09:36:43 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 13 Jan 2026 09:36:43 GMT Subject: RFR: 8375055: C2: Better dead loop detection printout In-Reply-To: References: Message-ID: <3kixVZqfeuEJ8PbcAhHAcdIDDcFNv--lrCbiwzxBcdA=.ce2d1c8d-cbf2-4ed8-b51d-b2f25435920d@github.com> On Tue, 13 Jan 2026 08:37:16 GMT, Aleksey Shipilev wrote: > Chasing the maddeningly intermittent CTW failure. When C2 fails dead loop verification checks, it prints: > > > # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/opto/phaseX.cpp:784), pid=64465, tid=917 > # assert(no_dead_loop) failed: dead loop detected > > > It also dumps the bad node graph to tty. This is not really convenient in automated testing and/or driver tests like CTW. With this fix, we now print: > > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:781), pid=645409, tid=645480 > # fatal error: Dead loop detected, node references itself > # > # Node: 1606 CastPP === 1 1606 [[ 3163 1460 1006 1006 1606 1111 ]] #instptr:java/lang/Object:NotNull+0,iid=bot floating narrowing dependency Oop:instptr:java/lang/Object:NotNull+0,iid=bot !orig=[1613],[1629],[1015],1442 !jvms: MethodInfo::get @ bci:236 (line 119) > > > ...which allows to have more clues where things may go wrong, and allows to classify the failures better as well. > > Additional testing: > - [x] Ad-hoc crashes with selected seeds > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` Otherwise, I like this improvement, thanks for cleaning the code up! src/hotspot/share/opto/phaseX.cpp line 780: > 778: ss.print_cr("Dead loop detected, node references itself"); > 779: ss.print("#\n# Node: "); > 780: n->dump("", false, &ss); A full node dump for the failure message seems a little verbose. We also already dump the node as part of `dump_bfs()` above. How about just printing `n->_idx` and `n->Name()` for the failure message itself to get a good first hint? Same below. ------------- PR Review: https://git.openjdk.org/jdk/pull/29185#pullrequestreview-3654810525 PR Review Comment: https://git.openjdk.org/jdk/pull/29185#discussion_r2685610244 From jbhateja at openjdk.org Tue Jan 13 09:48:46 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 13 Jan 2026 09:48:46 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v10] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Fix incorrect argument passed to smokeTest - Fix from Bhavana Kilambi for failing JTREG regressions on AARCH64 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Including test changes from Bhavana Kilambi (ARM) - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Optimizing tail handling - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Cleanups - ... and 16 more: https://git.openjdk.org/jdk/compare/7e18de13...14861d5e ------------- Changes: https://git.openjdk.org/jdk/pull/28002/files Webrev: Webrev is not available because diff is too large Stats: 515469 lines in 232 files changed: 284458 ins; 229216 del; 1795 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From jbhateja at openjdk.org Tue Jan 13 09:48:50 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 13 Jan 2026 09:48:50 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v9] In-Reply-To: <3PzPbEnPapV-B3OenjmG6paXsyLFayh33S-f0IBI-LY=.773757f6-c9e0-48ac-b89d-aa81fd6b47f8@github.com> References: <3PzPbEnPapV-B3OenjmG6paXsyLFayh33S-f0IBI-LY=.773757f6-c9e0-48ac-b89d-aa81fd6b47f8@github.com> Message-ID: <212JnoGQ5FDIEHJMmZ2zFLxmM4BCjqrnFyuZ6CqeZ-c=.cfa4503a-e6da-401b-95b4-f3c384d12e3f@github.com> On Wed, 7 Jan 2026 19:29:03 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - Fix from Bhavana Kilambi for failing JTREG regressions on AARCH64 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Including test changes from Bhavana Kilambi (ARM) >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Optimizing tail handling >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Cleanups >> - Fix failing jtreg test in CI >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Cleanups >> - ... and 13 more: https://git.openjdk.org/jdk/compare/5e7ae281...703f313d > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractSpecies.java line 436: > >> 434: } else { >> 435: assert(Float16.valueOf(i).intValue() == i); >> 436: } > > It would be clearer if the same pattern is copied as for the other types. Assign and assert, no need to check bounds. We don't need to be performant here. (This code may become even clearer when we can leverage patterns on the primitive types and custom numeric types.) Done > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorShape.java line 277: > >> 275: if (etype == Float16.class) { >> 276: etype = short.class; >> 277: } > > My suggestion may not worth it, but i was wondering if we could get the lane type and then use the carrier type, rather then encoding this more specifically. Addressed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2685649510 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2685649160 From rcastanedalo at openjdk.org Tue Jan 13 09:53:34 2026 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Jan 2026 09:53:34 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v16] In-Reply-To: References: <2EiCLUhS6yKEXp8RQlAOmaZrwvxPghYYw6u_PW5m2iM=.3ad470ab-c926-4fe4-bf12-8872ce22c7ef@github.com> Message-ID: On Mon, 12 Jan 2026 14:58:43 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/memnode.cpp line 708: >> >>> 706: >>> 707: Node* mem = in(MemNode::Memory); // start searching here... >>> 708: >> >> Would it make sense to check and bail out early for some trivial non-candidates here? It feels a bit wasteful e.g. to run the LocalEA machinery for loads from `ThreadLocal`. > > That seems reasonable. I added an early return case when the base that is accessed is not an oop. Thanks! Could the early return case be hoisted to `MemNode::find_previous_store` so that we avoid constructing `local_ea`? Or is there any case where `base` is not an OOP and `find_previous_store` would still find something useful? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2685661054 From jbhateja at openjdk.org Tue Jan 13 09:58:29 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 13 Jan 2026 09:58:29 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v11] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding testpoint for JDK-8373574 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28002/files - new: https://git.openjdk.org/jdk/pull/28002/files/14861d5e..d1043144 Webrevs: - full: Webrev is not available because diff is too large - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=09-10 Stats: 113 lines in 1 file changed: 113 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From qamai at openjdk.org Tue Jan 13 10:06:47 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 10:06:47 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v16] In-Reply-To: References: <2EiCLUhS6yKEXp8RQlAOmaZrwvxPghYYw6u_PW5m2iM=.3ad470ab-c926-4fe4-bf12-8872ce22c7ef@github.com> Message-ID: <2RFAYXSlsxmta9WQoZ0GEGWM0To_FarSOPvjAevBCmE=.9d3da257-e88e-4952-889f-0c508358cc82@github.com> On Tue, 13 Jan 2026 09:50:23 GMT, Roberto Casta?eda Lozano wrote: >> That seems reasonable. I added an early return case when the base that is accessed is not an oop. > > Thanks! Could the early return case be hoisted to `MemNode::find_previous_store` so that we avoid constructing `local_ea`? Or is there any case where `base` is not an OOP and `find_previous_store` would still find something useful? Constructing a local variable is cheap. I think it is better to modify `Unique_Node_List` to be more C++ idiomatic (i.e. not allocating on default construction). However, it should be a separate issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2685701147 From fyang at openjdk.org Tue Jan 13 10:14:33 2026 From: fyang at openjdk.org (Fei Yang) Date: Tue, 13 Jan 2026 10:14:33 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v9] In-Reply-To: References: Message-ID: <8tmcHaaICcnQ5AjwdCe334bxoA3cIJ3IhqWexblRgkM=.6ffed558-f343-4072-a38b-ee7f5d292373@github.com> On Wed, 7 Jan 2026 09:03:03 GMT, Jatin Bhateja wrote: >> test/jdk/jdk/incubator/vector/Float16Vector64Tests.java line 1893: >> >>> 1891: VectorMask m = three.compare(VectorOperators.LE, higher); >>> 1892: assert(m.allTrue()); >>> 1893: m = higher.min((short)-1).test(VectorOperators.IS_NEGATIVE); >> >> I find that `higher.min((short)-1)` produces a float16 vector of 4 NaNs. So are we testing for negative NaNs with `VectorOperators.IS_NEGATIVE`? Is it more reasonable to test `VectorOperators.IS_NAN` instead? > > Thanks for catching this, all the Float16Vector lanes and short argument passed to shorthand APIs are assumed to be encoded in IEEE 754 binary 16 format, we should be passing Float16 bit representation of -1 here. Thanks for confirming this. And I see similar occurrences in Float / Double varients of the tests. Maybe we should fix them as well? test/jdk/jdk/incubator/vector/FloatVector256Tests.java test/jdk/jdk/incubator/vector/FloatVector128Tests.java test/jdk/jdk/incubator/vector/FloatVector64Tests.java test/jdk/jdk/incubator/vector/FloatVector512Tests.java test/jdk/jdk/incubator/vector/FloatVectorMaxTests.java test/jdk/jdk/incubator/vector/DoubleVector128Tests.java test/jdk/jdk/incubator/vector/DoubleVector64Tests.java test/jdk/jdk/incubator/vector/DoubleVector256Tests.java test/jdk/jdk/incubator/vector/DoubleVector512Tests.java test/jdk/jdk/incubator/vector/DoubleVectorMaxTests.java ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2685722818 From fyang at openjdk.org Tue Jan 13 10:18:46 2026 From: fyang at openjdk.org (Fei Yang) Date: Tue, 13 Jan 2026 10:18:46 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v9] In-Reply-To: <8tmcHaaICcnQ5AjwdCe334bxoA3cIJ3IhqWexblRgkM=.6ffed558-f343-4072-a38b-ee7f5d292373@github.com> References: <8tmcHaaICcnQ5AjwdCe334bxoA3cIJ3IhqWexblRgkM=.6ffed558-f343-4072-a38b-ee7f5d292373@github.com> Message-ID: On Tue, 13 Jan 2026 10:10:40 GMT, Fei Yang wrote: >> Thanks for catching this, all the Float16Vector lanes and short argument passed to shorthand APIs are assumed to be encoded in IEEE 754 binary 16 format, we should be passing Float16 bit representation of -1 here. > > Thanks for confirming this. And I see similar occurrences in Float / Double varients of the tests. > Maybe we should fix them as well? > > > test/jdk/jdk/incubator/vector/FloatVector256Tests.java > test/jdk/jdk/incubator/vector/FloatVector128Tests.java > test/jdk/jdk/incubator/vector/FloatVector64Tests.java > test/jdk/jdk/incubator/vector/FloatVector512Tests.java > test/jdk/jdk/incubator/vector/FloatVectorMaxTests.java > > test/jdk/jdk/incubator/vector/DoubleVector128Tests.java > test/jdk/jdk/incubator/vector/DoubleVector64Tests.java > test/jdk/jdk/incubator/vector/DoubleVector256Tests.java > test/jdk/jdk/incubator/vector/DoubleVector512Tests.java > test/jdk/jdk/incubator/vector/DoubleVectorMaxTests.java Ah, that doesn't seem necessary after another look. Float16 is special here. So please ignore my comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2685745615 From epeter at openjdk.org Tue Jan 13 10:23:42 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 Jan 2026 10:23:42 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v8] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 05:31:00 GMT, Galder Zamarre?o wrote: >> To avoid dependency on an incubating module. > > I was trying to use this API today and I was wondering the exact same thing. I would have expected this to be `Generator float16s()`. I can see @jatin-bhateja's point but this should have been noted in the code for future readers? Is this being tracked somewhere? > > Anyway, again for future readers, this is what I've done to actually get a `Float16[]`, to avoid the need to first generate a `short[]` and fill that before transforming it to to `Float16[]`: > > > private static Float16[] input_47 = new Float16[10000]; > private static final Generator GEN_input_47 = Generators.G.float16s(); > > static void fill_input_47(Float16[] a) { > for (int i = 0; i < a.length; i++) { > a[i] = Float16.shortBitsToFloat16(GEN_input_47.next()); > } > } > > static { > fill_input_47(input_47); > } @galderz No, it is not tracked. Feel free to file an RFE and add documentation. One reason to use `short[]` is that it allows us to auto vectorize. A `Float[]` is an Object array and that would prevent vectorization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2685757916 From fyang at openjdk.org Tue Jan 13 10:27:39 2026 From: fyang at openjdk.org (Fei Yang) Date: Tue, 13 Jan 2026 10:27:39 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v11] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 09:58:29 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding testpoint for JDK-8373574 src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16Vector.java line 1653: > 1651: * > 1652: * @param e the input scalar > 1653: * @return the result of multiplying this vector by the given scalar The code comment mentions "multiplying", which doesn't seem correct to me. Are we doing any multiplication for min/max? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16Vector.java line 1694: > 1692: * > 1693: * @param e the input scalar > 1694: * @return the result of multiplying this vector by the given scalar Similar question here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2685766276 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2685767490 From shade at openjdk.org Tue Jan 13 10:47:16 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Jan 2026 10:47:16 GMT Subject: RFR: 8375055: C2: Better dead loop detection printout [v2] In-Reply-To: References: Message-ID: > Chasing the maddeningly intermittent CTW failure. When C2 fails dead loop verification checks, it prints: > > > # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/opto/phaseX.cpp:784), pid=64465, tid=917 > # assert(no_dead_loop) failed: dead loop detected > > > It also dumps the bad node graph to tty. This is not really convenient in automated testing and/or driver tests like CTW. With this fix, we now print: > > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:776), pid=973305, tid=973372 > # fatal error: Dead loop detected, node references itself: CastPP > > > ...which allows to have more clues where things may go wrong, and allows to classify the failures better as well. > > Additional testing: > - [x] Ad-hoc crashes with selected seeds > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Indenting - Only do node name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29185/files - new: https://git.openjdk.org/jdk/pull/29185/files/b0eadca5..037e92dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29185&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29185&range=00-01 Stats: 15 lines in 1 file changed: 0 ins; 7 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/29185.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29185/head:pull/29185 PR: https://git.openjdk.org/jdk/pull/29185 From shade at openjdk.org Tue Jan 13 10:49:00 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Jan 2026 10:49:00 GMT Subject: RFR: 8375055: C2: Better dead loop detection printout [v3] In-Reply-To: References: Message-ID: > Chasing the maddeningly intermittent CTW failure. When C2 fails dead loop verification checks, it prints: > > > # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/opto/phaseX.cpp:784), pid=64465, tid=917 > # assert(no_dead_loop) failed: dead loop detected > > > It also dumps the bad node graph to tty. This is not really convenient in automated testing and/or driver tests like CTW. With this fix, we now print: > > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:776), pid=973305, tid=973372 > # fatal error: Dead loop detected, node references itself: CastPP > > > ...which allows to have more clues where things may go wrong, and allows to classify the failures better as well. > > Additional testing: > - [x] Ad-hoc crashes with selected seeds > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` Aleksey Shipilev has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains two commits: - Only do node name - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29185/files - new: https://git.openjdk.org/jdk/pull/29185/files/037e92dd..6b0e6e8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29185&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29185&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29185.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29185/head:pull/29185 PR: https://git.openjdk.org/jdk/pull/29185 From shade at openjdk.org Tue Jan 13 10:56:50 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Jan 2026 10:56:50 GMT Subject: RFR: 8375055: C2: Better dead loop detection printout [v4] In-Reply-To: References: Message-ID: <7csVu7A77KVBA0VDpVOtTPKpZDbAiKgedLkdKXm4a70=.e85bea88-30d6-47b9-b104-533ebe28a10e@github.com> > Chasing the maddeningly intermittent CTW failure. When C2 fails dead loop verification checks, it prints: > > > # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/opto/phaseX.cpp:784), pid=64465, tid=917 > # assert(no_dead_loop) failed: dead loop detected > > > It also dumps the bad node graph to tty. This is not really convenient in automated testing and/or driver tests like CTW. With this fix, we now print: > > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:776), pid=973305, tid=973372 > # fatal error: Dead loop detected, node references itself: CastPP > > > ...which allows to have more clues where things may go wrong, and allows to classify the failures better as well. > > Additional testing: > - [x] Ad-hoc crashes with selected seeds > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Also print node idx - Indenting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29185/files - new: https://git.openjdk.org/jdk/pull/29185/files/6b0e6e8d..24fc095b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29185&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29185&range=02-03 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29185.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29185/head:pull/29185 PR: https://git.openjdk.org/jdk/pull/29185 From qamai at openjdk.org Tue Jan 13 11:20:55 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 11:20:55 GMT Subject: RFR: 8375055: C2: Better dead loop detection printout [v4] In-Reply-To: <7csVu7A77KVBA0VDpVOtTPKpZDbAiKgedLkdKXm4a70=.e85bea88-30d6-47b9-b104-533ebe28a10e@github.com> References: <7csVu7A77KVBA0VDpVOtTPKpZDbAiKgedLkdKXm4a70=.e85bea88-30d6-47b9-b104-533ebe28a10e@github.com> Message-ID: On Tue, 13 Jan 2026 10:56:50 GMT, Aleksey Shipilev wrote: >> Chasing the maddeningly intermittent CTW failure. When C2 fails dead loop verification checks, it prints: >> >> >> # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/opto/phaseX.cpp:784), pid=64465, tid=917 >> # assert(no_dead_loop) failed: dead loop detected >> >> >> It also dumps the bad node graph to tty. This is not really convenient in automated testing and/or driver tests like CTW. With this fix, we now print: >> >> >> # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:776), pid=973305, tid=973372 >> # fatal error: Dead loop detected, node references itself: CastPP >> >> >> ...which allows to have more clues where things may go wrong, and allows to classify the failures better as well. >> >> Additional testing: >> - [x] Ad-hoc crashes with selected seeds >> - [x] Linux x86_64 server fastdebug, `hotspot_compiler` >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - Also print node idx > - Indenting Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29185#pullrequestreview-3655250434 From shade at openjdk.org Tue Jan 13 11:20:58 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Jan 2026 11:20:58 GMT Subject: RFR: 8375055: C2: Better dead loop detection printout [v4] In-Reply-To: <3kixVZqfeuEJ8PbcAhHAcdIDDcFNv--lrCbiwzxBcdA=.ce2d1c8d-cbf2-4ed8-b51d-b2f25435920d@github.com> References: <3kixVZqfeuEJ8PbcAhHAcdIDDcFNv--lrCbiwzxBcdA=.ce2d1c8d-cbf2-4ed8-b51d-b2f25435920d@github.com> Message-ID: On Tue, 13 Jan 2026 09:30:37 GMT, Christian Hagedorn wrote: >> Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: >> >> - Also print node idx >> - Indenting > > src/hotspot/share/opto/phaseX.cpp line 780: > >> 778: ss.print_cr("Dead loop detected, node references itself"); >> 779: ss.print("#\n# Node: "); >> 780: n->dump("", false, &ss); > > A full node dump for the failure message seems a little verbose. We also already dump the node as part of `dump_bfs()` above. How about just printing `n->_idx` and `n->Name()` for the failure message itself to get a good first hint? Same below. Yes, OK; name and idx is also already good to classify which dead node we are likely failing on. Did in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29185#discussion_r2685959501 From fgao at openjdk.org Tue Jan 13 11:27:53 2026 From: fgao at openjdk.org (Fei Gao) Date: Tue, 13 Jan 2026 11:27:53 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: > In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the > `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. > > Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. > > To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. > > The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. > > This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. > > The whole process is done by the function `insert_post_loop()`. > > We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: > > 1. The fall-in control flow to the vectorized drain loop comes from a `RegionNode` merging exits ... Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Fix build failure after rebasing and address review comments - Merge branch 'master' into optimize-atomic-post - Fixed new test failures after rebasing and refined parts of the code to address review comments - Merge branch 'master' into optimize-atomic-post - Merge branch 'master' into optimize-atomic-post - Clean up comments for consistency and add spacing for readability - Fix some corner case failures and refined part of code - Merge branch 'master' into optimize-atomic-post - Refine ascii art, rename some variables and resolve conflicts - Merge branch 'master' into optimize-atomic-post - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 ------------- Changes: https://git.openjdk.org/jdk/pull/22629/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22629&range=03 Stats: 1624 lines in 8 files changed: 1417 ins; 63 del; 144 mod Patch: https://git.openjdk.org/jdk/pull/22629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22629/head:pull/22629 PR: https://git.openjdk.org/jdk/pull/22629 From adinn at openjdk.org Tue Jan 13 11:36:04 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 13 Jan 2026 11:36:04 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 22:47:04 GMT, Erik ?sterlund wrote: > What I care more about is the AOT cache shipped with the JDK, which does not embed any compiled code. It's important that these archives are offered for all GCs, despite not knowing which GC will be selected. Is that still the case? I believe (and @iklam is the one who can confirm) that the JDK, both in the jdk26 release and in the mainline dev tree, still ships with a default static CDS archive rather than an AOT archive (well, rather two i.e. a coops and non-coops CDS archive). If and when we do switch to shipping with a default AOT archive we will need to ensure that said archive is GC-neutral, the simple option as far as GC barrier support is concerned being to ensure that the archive does not contain any (generated method or stub) code n.b. that option is available by disabling generation of an embedded code cache when creating the default archive. We also still need to finesse the choice of coops/non-coops as the layout of the archive heap section depends on the setting at assembly time and the same setting may be unavailable at runtime (if, say, we build with coops and then a very large heap max is specified). We want users to be able to benefit from using a default archive whatever heap size they specify on the command line without also having to force them to use non-coops. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3743843749 From chagedorn at openjdk.org Tue Jan 13 11:47:24 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 13 Jan 2026 11:47:24 GMT Subject: RFR: 8375055: C2: Better dead loop detection printout [v4] In-Reply-To: <7csVu7A77KVBA0VDpVOtTPKpZDbAiKgedLkdKXm4a70=.e85bea88-30d6-47b9-b104-533ebe28a10e@github.com> References: <7csVu7A77KVBA0VDpVOtTPKpZDbAiKgedLkdKXm4a70=.e85bea88-30d6-47b9-b104-533ebe28a10e@github.com> Message-ID: On Tue, 13 Jan 2026 10:56:50 GMT, Aleksey Shipilev wrote: >> Chasing the maddeningly intermittent CTW failure. When C2 fails dead loop verification checks, it prints: >> >> >> # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/opto/phaseX.cpp:784), pid=64465, tid=917 >> # assert(no_dead_loop) failed: dead loop detected >> >> >> It also dumps the bad node graph to tty. This is not really convenient in automated testing and/or driver tests like CTW. With this fix, we now print: >> >> >> # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:776), pid=973305, tid=973372 >> # fatal error: Dead loop detected, node references itself: CastPP >> >> >> ...which allows to have more clues where things may go wrong, and allows to classify the failures better as well. >> >> Additional testing: >> - [x] Ad-hoc crashes with selected seeds >> - [x] Linux x86_64 server fastdebug, `hotspot_compiler` >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - Also print node idx > - Indenting That looks good, thanks for the update! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29185#pullrequestreview-3655351607 From rcastanedalo at openjdk.org Tue Jan 13 12:20:44 2026 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Jan 2026 12:20:44 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v16] In-Reply-To: <2RFAYXSlsxmta9WQoZ0GEGWM0To_FarSOPvjAevBCmE=.9d3da257-e88e-4952-889f-0c508358cc82@github.com> References: <2EiCLUhS6yKEXp8RQlAOmaZrwvxPghYYw6u_PW5m2iM=.3ad470ab-c926-4fe4-bf12-8872ce22c7ef@github.com> <2RFAYXSlsxmta9WQoZ0GEGWM0To_FarSOPvjAevBCmE=.9d3da257-e88e-4952-889f-0c508358cc82@github.com> Message-ID: On Tue, 13 Jan 2026 10:04:19 GMT, Quan Anh Mai wrote: >> Thanks! Could the early return case be hoisted to `MemNode::find_previous_store` so that we avoid constructing `local_ea`? Or is there any case where `base` is not an OOP and `find_previous_store` would still find something useful? > > Constructing a local variable is cheap. I think it is better to modify `Unique_Node_List` to be more C++ idiomatic (i.e. not allocating on default construction). However, it should be a separate issue. Constructing local variables is cheap, but arena allocation is one of the main sources of overhead for C2. If we can avoid the allocations of `LocalEA::_aliases` and `LocalEA::_not_escaped_controls` in some cases by simply bailing out earlier, why not? Is there any drawback I am missing? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2686166465 From roland at openjdk.org Tue Jan 13 12:21:39 2026 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 13 Jan 2026 12:21:39 GMT Subject: RFR: 8374969: Incorrect results of LoadStoreNode::adr_type and SCMemProj::adr_type In-Reply-To: References: Message-ID: <8mc5R1y-ieib0Y_BZvQgs2j6UC3_OANbp4ZGg3a05xI=.1fdbebaa-7ad4-4215-9ade-4f7ae7aad0e6@github.com> On Sun, 11 Jan 2026 12:22:16 GMT, Quan Anh Mai wrote: > Hi, > > This is extracted from #28570 > > This PR fixes the return value of `LoadStoreNode::adr_type` and `SCMemProj::adr_type`. For the former, it is trivial that we can do what `MemNode` does. And for the latter, the implementation of `ProjNode::adr_type` is adequate. > > Please kindly review, thanks a lot. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29154#pullrequestreview-3655491863 From mhaessig at openjdk.org Tue Jan 13 12:27:01 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 13 Jan 2026 12:27:01 GMT Subject: RFR: 8374969: Incorrect results of LoadStoreNode::adr_type and SCMemProj::adr_type In-Reply-To: References: Message-ID: <9uGXjm9xq0lGzSbOlJdRr9xvKSH8ih_pn-HVLVZiD2Y=.0a34eebc-e105-47f1-85bd-e2c4c3f5ee78@github.com> On Sun, 11 Jan 2026 12:22:16 GMT, Quan Anh Mai wrote: > Hi, > > This is extracted from #28570 > > This PR fixes the return value of `LoadStoreNode::adr_type` and `SCMemProj::adr_type`. For the former, it is trivial that we can do what `MemNode` does. And for the latter, the implementation of `ProjNode::adr_type` is adequate. > > Please kindly review, thanks a lot. Thanks for answering my question. Looks good. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/29154#pullrequestreview-3655512396 From qamai at openjdk.org Tue Jan 13 12:28:10 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 12:28:10 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v16] In-Reply-To: References: <2EiCLUhS6yKEXp8RQlAOmaZrwvxPghYYw6u_PW5m2iM=.3ad470ab-c926-4fe4-bf12-8872ce22c7ef@github.com> <2RFAYXSlsxmta9WQoZ0GEGWM0To_FarSOPvjAevBCmE=.9d3da257-e88e-4952-889f-0c508358cc82@github.com> Message-ID: On Tue, 13 Jan 2026 12:17:23 GMT, Roberto Casta?eda Lozano wrote: >> Constructing a local variable is cheap. I think it is better to modify `Unique_Node_List` to be more C++ idiomatic (i.e. not allocating on default construction). However, it should be a separate issue. > > Constructing local variables is cheap, but arena allocation is one of the main sources of overhead for C2. If we can avoid the allocations of `LocalEA::_aliases` and `LocalEA::_not_escaped_controls` in some cases by simply bailing out earlier, why not? Is there any drawback I am missing? Yes, that's why I'm suggesting improving `Unique_Node_List` default constructor not to allocate instead. The drawback I see here is that it is not trivial whether the current behaviour has any effect on non-oop memory accesses. And even if it is truly so, refactoring the function like that expands the scope of this PR a little bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2686188483 From qamai at openjdk.org Tue Jan 13 12:46:12 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 12:46:12 GMT Subject: Integrated: 8374969: Incorrect results of LoadStoreNode::adr_type and SCMemProj::adr_type In-Reply-To: References: Message-ID: <-Lu9KEaEZAmJa09cgzL8MIpACIx6bfzIZJV_y9fMR8E=.f83b833f-1346-4ac4-be6f-e04ccc9abcfa@github.com> On Sun, 11 Jan 2026 12:22:16 GMT, Quan Anh Mai wrote: > Hi, > > This is extracted from #28570 > > This PR fixes the return value of `LoadStoreNode::adr_type` and `SCMemProj::adr_type`. For the former, it is trivial that we can do what `MemNode` does. And for the latter, the implementation of `ProjNode::adr_type` is adequate. > > Please kindly review, thanks a lot. This pull request has now been integrated. Changeset: a90c7eee Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/a90c7eee6f7e950edea4d94cf2b109fdb5e49909 Stats: 17 lines in 2 files changed: 9 ins; 7 del; 1 mod 8374969: Incorrect results of LoadStoreNode::adr_type and SCMemProj::adr_type Reviewed-by: roland, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/29154 From qamai at openjdk.org Tue Jan 13 12:43:19 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 12:43:19 GMT Subject: RFR: 8374969: Incorrect results of LoadStoreNode::adr_type and SCMemProj::adr_type In-Reply-To: References: Message-ID: On Sun, 11 Jan 2026 12:22:16 GMT, Quan Anh Mai wrote: > Hi, > > This is extracted from #28570 > > This PR fixes the return value of `LoadStoreNode::adr_type` and `SCMemProj::adr_type`. For the former, it is trivial that we can do what `MemNode` does. And for the latter, the implementation of `ProjNode::adr_type` is adequate. > > Please kindly review, thanks a lot. Thanks a lot for your reviews, testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29154#issuecomment-3744123670 From rcastanedalo at openjdk.org Tue Jan 13 12:57:35 2026 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Jan 2026 12:57:35 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v16] In-Reply-To: References: <2EiCLUhS6yKEXp8RQlAOmaZrwvxPghYYw6u_PW5m2iM=.3ad470ab-c926-4fe4-bf12-8872ce22c7ef@github.com> <2RFAYXSlsxmta9WQoZ0GEGWM0To_FarSOPvjAevBCmE=.9d3da257-e88e-4952-889f-0c508358cc82@github.com> Message-ID: <6qRIhrIR-wNADGS1RZUFPQ93E2UyQcbIZwDJ-VT5pFs=.abec7af1-96f1-4053-b7b6-66b40508fc4e@github.com> On Tue, 13 Jan 2026 12:24:06 GMT, Quan Anh Mai wrote: >> Constructing local variables is cheap, but arena allocation is one of the main sources of overhead for C2. If we can avoid the allocations of `LocalEA::_aliases` and `LocalEA::_not_escaped_controls` in some cases by simply bailing out earlier, why not? Is there any drawback I am missing? > > Yes, that's why I'm suggesting improving `Unique_Node_List` default constructor not to allocate instead. The drawback I see here is that it is not trivial whether the current behaviour has any effect on non-oop memory accesses. And even if it is truly so, refactoring the function like that expands the scope of this PR a little bit. Fair enough, I agree with not extending the scope of the PR if we are not sure about possible side-effects. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2686295679 From krk at openjdk.org Tue Jan 13 13:02:57 2026 From: krk at openjdk.org (Kerem Kat) Date: Tue, 13 Jan 2026 13:02:57 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 00:04:53 GMT, Dean Long wrote: >> Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into fix-c2-segfault-unlocknode >> - address comments >> - fix rename >> - rename test file >> - Merge branch 'master' into fix-c2-segfault-unlocknode >> - fix test spacing >> - Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Manuel H?ssig >> - Update src/hotspot/share/opto/macro.cpp >> >> Co-authored-by: Manuel H?ssig >> - copyright format fix? >> - 8370502: C2: segfault while adding node to IGVN worklist > > Yes, it would be good to know if expand_lock_node() also needs a null check. I was assuming the lock and unlock node shapes were basically the same, but now I see that the shapes are different for some reason. The LockNode gets a FastLockNode edge early, while the UnlockNode creates its FastUnlockNode late. I failed to get expand_lock_node() to crash with -XX:+StressMacroExpansion but that doesn't mean there isn't the same problem there. Now that the holiday season is over, could I get a review @dean-long, @mhaessig, @eme64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3744217399 From stefank at openjdk.org Tue Jan 13 13:46:22 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 13 Jan 2026 13:46:22 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: <6Aj0qmsVSD0iVIeK3Nesrbdzhn3r6OKfOnnGAeV66SQ=.2370d05a-0213-4d2a-8659-284c11a449dc@github.com> <421ijmW9qZwmopebtIuH9ZNG9vp6nRvRnvI7Q4E9H_4=.570b3b20-2c0a-4a7a-b7ff-db6b5eed60e5@github.com> Message-ID: On Mon, 12 Jan 2026 17:35:36 GMT, Andrew Dinn wrote: > However, the fact that ZGC requires a barrier for an off-heap phantom oop load does mean that we must stick with ZGC in production when we use it in assembly and vice versa (while we can safely mix and match Serial, Parallel, G1 and Shenandoah GCs). I don't think this is true. Both G1 and Shenandoah GC need their respective SATB load barrier when they are loading an off-heap phantom oop. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3744404716 From coleenp at openjdk.org Tue Jan 13 13:46:23 2026 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 13 Jan 2026 13:46:23 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 11:33:58 GMT, Andrew Dinn wrote: > What I care more about is the AOT cache shipped with the JDK, which does not embed any compiled code. It's important that these archives are offered for all GCs, despite not knowing which GC will be selected. Is that still the case? I don't know the answer to this. @iklam or @vnkozlov could you look at this PR and comment? Thanks @adinn for your comments and review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3744405035 PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3744408061 From fgao at openjdk.org Tue Jan 13 15:12:39 2026 From: fgao at openjdk.org (Fei Gao) Date: Tue, 13 Jan 2026 15:12:39 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <_gdObRdkYkS7d3fQQ6bcms709TpeM1IQtuPJtI0fcyE=.073d0496-8dfc-48a9-aee8-b64b408a6e62@github.com> Message-ID: On Fri, 9 Jan 2026 13:54:49 GMT, Emanuel Peter wrote: >> Hi @eme64, many thanks for your review. It?s really comprehensive and insightful. I?ve given a thumbs-up to all the comments that have been resolved in this commit. >> >>> I have one concern: We now have changed the branches. There is now a long sequence of branches if we have very few iterations, so that we only go through pre and post loop. It would be interesting to see what the performance difference is between master and patch. >> >> Regarding this concern, I re-ran the microbenchmarks (now merged with the existing `VectorThroughputForIterationCount.java` ), named as `bench03*_drain_memoryBound`, and collected data across different platforms, including `128-bit` and `256-bit` `AArch64` machines as well as a `512-bit` `x86` machine. >> >> To summarize, I observe a minor performance regression for small-iteration loops on the `128-bit` and `256-bit` `AArch64` platforms. For larger-iteration loops, there is either a performance improvement or no noticeable change. The performance data on the `512-bit x86` machine shows a similar trend, though the regression is more significant. >> >> **The test range of `ITERATION_COUNT` is `0?300`. For larger `ITERATION_COUNT` values, there is either a performance improvement or no noticeable change, so those results are omitted. The following data only shows cases with regressions.** >> >> >> (FIXED_OFFSET) (RANDOMIZE_OFFSETS) (REPETITIONS) (seed) Mode Cnt >> 0 TRUE 1024 42 avgt 3 >> >> `Diff = (patch - master) / master` >> >> On `128-bit aarch64` platform: >> >> Benchmark (ITERATION_COUNT) Units Diff >> bench031B_drain_memoryBound 1 ns/op 15.15% >> bench031B_drain_memoryBound 2 ns/op 10.89% >> bench031B_drain_memoryBound 3 ns/op 9.27% >> bench031B_drain_memoryBound 4 ns/op 7.39% >> bench031B_drain_memoryBound 5 ns/op 5.86% >> bench031B_drain_memoryBound 6 ns/op 5.31% >> bench031B_drain_memoryBound 7 ns/op 4.39% >> bench031B_drain_memoryBound 8 ns/op 4.27% >> bench031B_drain_memoryBound 9 ns/op 3.60% >> bench031B_drain_memoryBound 10 ns/op 3.11% >> bench031B_drain_memoryBound 11 ns/op 2.97% >> bench031B_drain_memoryBound 12 ns/op 3.19% >> bench031B_drain_memoryBound 13 ns/op 2.90% >> bench031B_drain_memoryBound 14 ns/op 2.68% >> bench031B_drain_memoryBound 15 ns/op 2.37% >> bench031B_drain_memoryBound 16 ns/op 2.44% >> bench031B_drain_memo... > > @fg1417 I hope you had a good start into the new year. I'd love to make this PR a bit of a priority in the next weeks. Would you mind synching with master and fixing merge conflicts? > > I'd review, run testing and look into running some benchmarks myself. Hi @eme64 the PR is ready for review and testing. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3744877402 From epeter at openjdk.org Tue Jan 13 16:25:55 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 Jan 2026 16:25:55 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: Message-ID: > I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 > > In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 > > When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. > > At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. > But it seems that nothing prevents the VM from compiling such an (unreachable) path. > > Here is how I think it happens: > - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. > - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. > - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. > > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 > > That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: intrinsify with Halt instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29169/files - new: https://git.openjdk.org/jdk/pull/29169/files/33fec21c..4c717b5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29169&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29169&range=00-01 Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29169/head:pull/29169 PR: https://git.openjdk.org/jdk/pull/29169 From epeter at openjdk.org Tue Jan 13 16:25:55 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 Jan 2026 16:25:55 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: <2_LedgpPkdp6NCH-9U3_xw3dJfHqiWklnsc3XosnUlk=.e0438e35-c139-4498-8a48-d5a5fec2c024@github.com> Message-ID: On Tue, 13 Jan 2026 08:24:52 GMT, Quan Anh Mai wrote: >> Hmm, indeed, we could try to put a `Halt` node here, right? >> @merykitty How exactly would you do that? Are there places we already do that? > > If you look at the end of `GraphKit::uncommon_trap`, the procedure would look like this: > > HaltNode* halt = new HaltNode(control(), frameptr(), "uncommon trap returned which should never happen" > PRODUCT_ONLY(COMMA /*reachable*/false)); > _gvn.set_type_bottom(halt); > root()->add_req(halt); > > stop_and_kill_map(); Nice idea! I applied it and tested it on the reproducer. Turns out the HaltNode is not in the final graph, so the path probably got folded away later. So we are able to prove that it is dead later on most likely. Running testing again now... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2687142861 From qamai at openjdk.org Tue Jan 13 16:37:28 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 Jan 2026 16:37:28 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 16:25:55 GMT, Emanuel Peter wrote: >> I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 >> >> In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 >> >> When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. >> >> At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. >> But it seems that nothing prevents the VM from compiling such an (unreachable) path. >> >> Here is how I think it happens: >> - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. >> - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. >> - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. >> >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 >> >> That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. >> >> **Update: ** instead of not intrinsifying, we chose a stronger path: we intrinsify it with a `HaltNode` that should never be encountered at runtime. The reproducer I have eventually even is able to fold away the `HaltNode`, so it turns out it is indeed a dead path. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > intrinsify with Halt instead Thanks, I think it looks good. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/29169#pullrequestreview-3656756740 From eastigeevich at openjdk.org Tue Jan 13 16:43:59 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 13 Jan 2026 16:43:59 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v3] In-Reply-To: References: Message-ID: <9cUx5KUaP2YdgxXDmJabY3oRQMXV4X4W1uf5DZfXZKA=.ac785876-da4c-4ac1-a2bb-d8d8ebc981e0@github.com> On Fri, 12 Dec 2025 23:04:29 GMT, Chad Rakoczy wrote: >> ### Summary >> This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. >> >> ### Description >> The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. >> >> Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. >> >> The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. >> >> The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). >> >> Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. >> >> ### Performance >> Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. >> >> ### Testing >> * CodeCache tests have been updated to cover the new `HotCodeHeap`. >> * Added ded... > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add HotCodeGrouperMoveFunction test src/hotspot/share/runtime/hotCodeGrouper.cpp line 99: > 97: > 98: void HotCodeGrouper::do_grouping(ThreadSampler& sampler) { > 99: while (sampler.has_candidates()) { We need to check if HotCodeHeap has enough space. If it does not we should not relocate candidates. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27858#discussion_r2687217064 From krk at openjdk.org Tue Jan 13 17:42:15 2026 From: krk at openjdk.org (Kerem Kat) Date: Tue, 13 Jan 2026 17:42:15 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded Message-ID: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. ------------- Commit messages: - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded Changes: https://git.openjdk.org/jdk/pull/29200/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29200&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375010 Stats: 55 lines in 2 files changed: 54 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29200/head:pull/29200 PR: https://git.openjdk.org/jdk/pull/29200 From iklam at openjdk.org Tue Jan 13 18:05:36 2026 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 13 Jan 2026 18:05:36 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. LGTM. Adding this in the mainline will be harmless (see my comment [here](https://github.com/openjdk/jdk/pull/29129#issuecomment-3745672633)) and will help with Valhalla integration. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29129#pullrequestreview-3657142050 From iklam at openjdk.org Tue Jan 13 18:05:37 2026 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 13 Jan 2026 18:05:37 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 13:41:45 GMT, Coleen Phillimore wrote: > > What I care more about is the AOT cache shipped with the JDK, which does not embed any compiled code. It's important that these archives are offered for all GCs, despite not knowing which GC will be selected. Is that still the case? > > I don't know the answer to this. @iklam or @vnkozlov could you look at this PR and comment? Compiled code is not included in the default CDS archive (aka the AOT cache shipped with the JDK), or any "old" CDS archives generated with the -Xshare:dump option (`ac` is the "AOT Code" region): $ java -Xshare:dump -Xlog:cds | grep region..ac [0.416s][info][cds] Shared file region (ac) 4: 0 bytes $ java -Xshare:dump --enable-preview -Xlog:cds | grep region..ac [0.436s][info][cds] Shared file region (ac) 4: 0 bytes $ java -Xshare:dump --enable-preview -XX:+UseZGC -Xlog:cds | grep region..ac [0.429s][info][cds] Shared file region (ac) 4: 0 bytes $ java -Xshare:dump --enable-preview -XX:+AOTClassLinking -Xlog:cds | grep region..ac [0.415s][info][cds] Shared file region (ac) 4: 0 bytes Compiled code *is* included in AOT archives generated with `-XX:AOTMode=create -XX:AOTCache=xxx` or `-XX:AOTCacheOutput=xxx`: $ java -XX:+UseSerialGC -Xlog:aot -cp HelloWorld.jar -XX:AOTCacheOutput=hw.aot HelloWorld | grep region..ac [1.722s][info][aot] Shared file region (ac) 4: 0 bytes Picked up JAVA_TOOL_OPTIONS: -Djava.class.path=HelloWorld.jar -XX:+UseSerialGC -Xlog:aot -XX:AOTCacheOutput=hw.aot -XX:AOTConfiguration=hw.aot.config -XX:AOTMode=create [1.387s][info][aot] Shared file region (ac) 4: 303480 bytes, addr 0x0000000800a37000 file offset 0x00a37000 crc 0x02b369fd $ java -XX:+UseSerialGC -cp HelloWorld.jar -XX:AOTCache=hw.aot -Xlog:aot*=debug HelloWorld | grep codecache [.....] [0.003s][debug][aot,codecache,init] Mapped 249856 bytes at address 0x000071bc10ef1000 at AOT Code Cache [0.003s][info ][aot,codecache,init] Loaded 322 AOT code entries from AOT Code Cache [0.003s][debug][aot,codecache,init] Adapters: total=322 [0.003s][debug][aot,codecache,init] Shared Blobs: total=0 [0.003s][debug][aot,codecache,init] C1 Blobs: total=0 [0.003s][debug][aot,codecache,init] C2 Blobs: total=0 [0.003s][debug][aot,codecache,init] AOT code cache size: 243680 bytes [0.003s][debug][aot,codecache,init] External addresses recorded [0.003s][info ][aot,codecache,init] Early stubs recorded [0.003s][debug][aot,codecache,init] Early shared blobs recorded [0.003s][debug][aot,codecache,stubs] Reading blob '' (id=1, kind=Adapter) from AOT Code Cache [0.003s][debug][aot,codecache,init ] Read 322 entries table at offset 225648 from AOT Code Cache [0.003s][debug][aot,codecache,stubs] Read blob '' (id=1, kind=Adapter) from AOT Code Cache [0.003s][debug][aot,codecache,stubs] Reading blob 'DI' (id=98, kind=Adapter) from AOT Code Cache ... However, if this AOT cache is not using the same GC that was used to create the cache, the code cache will be disabled. The rest of the AOT cache will still be used. $ java -XX:-UseSerialGC -cp HelloWorld.jar -XX:AOTCache=hw.aot -Xlog:aot*=debug HelloWorld | grep codecache [0.004s][debug][aot,codecache,init] Mapped 249856 bytes at address 0x0000748bcee9c000 at AOT Code Cache [0.004s][info ][aot,codecache,init] Loaded 322 AOT code entries from AOT Code Cache [0.004s][debug][aot,codecache,init] Adapters: total=322 [0.004s][debug][aot,codecache,init] Shared Blobs: total=0 [0.004s][debug][aot,codecache,init] C1 Blobs: total=0 [0.004s][debug][aot,codecache,init] C2 Blobs: total=0 [0.004s][debug][aot,codecache,init] AOT code cache size: 243680 bytes [0.004s][debug][aot,codecache,init] AOT Code Cache disabled: it was created with different GC: serial gc vs current g1 gc [0.004s][info ][aot,codecache,init] Unable to use AOT Code Cache. So with Valhalla, after this PR, ZGC-specific operations will be baked into the `ac` region, but this region will not be used unless ZGC is selected in the production run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3745672633 From duke at openjdk.org Tue Jan 13 22:44:52 2026 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 13 Jan 2026 22:44:52 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v4] In-Reply-To: References: Message-ID: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Added dedicated tests for the `HotCodeGrouper` > ... Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Add check for full HotCodeHeap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27858/files - new: https://git.openjdk.org/jdk/pull/27858/files/3697718f..002bffab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=02-03 Stats: 40 lines in 2 files changed: 37 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27858/head:pull/27858 PR: https://git.openjdk.org/jdk/pull/27858 From duke at openjdk.org Tue Jan 13 23:21:42 2026 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 13 Jan 2026 23:21:42 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v5] In-Reply-To: References: Message-ID: <3ei-Gj5iRPb71kXk4a9-DP4fO8mmwJWV7jGaaAFtabI=.2cfcfdbb-90cc-4da8-8133-a33c15a4b783@github.com> > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Added dedicated tests for the `HotCodeGrouper` > ... Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 27 additional commits since the last revision: - Fix merge - Merge remote-tracking branch 'origin/master' into JDK-8326205 - Add check for full HotCodeHeap - Add HotCodeGrouperMoveFunction test - Add StessHotCodeGrouper test - Update blob checks - Merge fix - Merge remote-tracking branch 'origin/master' into JDK-8326205 - Clean up - New implementation - ... and 17 more: https://git.openjdk.org/jdk/compare/a5480b0b...de746482 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27858/files - new: https://git.openjdk.org/jdk/pull/27858/files/002bffab..de746482 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=03-04 Stats: 71388 lines in 3424 files changed: 37998 ins; 12707 del; 20683 mod Patch: https://git.openjdk.org/jdk/pull/27858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27858/head:pull/27858 PR: https://git.openjdk.org/jdk/pull/27858 From duke at openjdk.org Wed Jan 14 01:33:37 2026 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 14 Jan 2026 01:33:37 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v6] In-Reply-To: References: Message-ID: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Added dedicated tests for the `HotCodeGrouper` > ... Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Fix builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27858/files - new: https://git.openjdk.org/jdk/pull/27858/files/de746482..9999bf7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=04-05 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27858/head:pull/27858 PR: https://git.openjdk.org/jdk/pull/27858 From dholmes at openjdk.org Wed Jan 14 02:08:59 2026 From: dholmes at openjdk.org (David Holmes) Date: Wed, 14 Jan 2026 02:08:59 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 08:42:07 GMT, Guanqiang Han wrote: > Please review this change. Thanks! > > **Description:** > > When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 > In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 > Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. > > **Fix:** > > Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. > To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. > > **Test:** > > GHA I think this looks like a good solution. I flagged the compiler folk as I'm unsure about the test location - but I see it is where the only other test that uses `PrintDeoptimizationDetails` exists. One small change requested (while you await the second review). Thanks src/hotspot/share/runtime/vframeArray.cpp line 494: > 492: #ifndef PRODUCT > 493: if (PrintDeoptimizationDetails) { > 494: const bool dump_codes = WizardMode && Verbose; Suggestion: const bool print_codes = WizardMode && Verbose; src/hotspot/share/runtime/vframeArray.cpp line 497: > 495: ResourceMark rm(thread); > 496: stringStream codes_ss; > 497: if (dump_codes) { Suggestion: if (print_codes) { src/hotspot/share/runtime/vframeArray.cpp line 511: > 509: vframe* f = vframe::new_vframe(iframe(), &map, thread); > 510: f->print(); > 511: if (dump_codes) { Suggestion: if (print_codes) { ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29186#pullrequestreview-3658551282 PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2688674709 PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2688680012 PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2688680468 From ghan at openjdk.org Wed Jan 14 02:32:59 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Wed, 14 Jan 2026 02:32:59 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v2] In-Reply-To: References: Message-ID: > Please review this change. Thanks! > > **Description:** > > When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 > In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 > Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. > > **Fix:** > > Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. > To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. > > **Test:** > > GHA Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - change variable name - Merge remote-tracking branch 'upstream/master' into 8374862 - fix a compile error - fix 8374862 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29186/files - new: https://git.openjdk.org/jdk/pull/29186/files/56f513d3..d584c0e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29186&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29186&range=00-01 Stats: 2610 lines in 100 files changed: 1488 ins; 667 del; 455 mod Patch: https://git.openjdk.org/jdk/pull/29186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29186/head:pull/29186 PR: https://git.openjdk.org/jdk/pull/29186 From dlong at openjdk.org Wed Jan 14 02:41:12 2026 From: dlong at openjdk.org (Dean Long) Date: Wed, 14 Jan 2026 02:41:12 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 16:25:55 GMT, Emanuel Peter wrote: >> I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 >> >> In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 >> >> When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. >> >> At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. >> But it seems that nothing prevents the VM from compiling such an (unreachable) path. >> >> Here is how I think it happens: >> - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. >> - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. >> - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. >> >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 >> >> That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. >> >> **Update: ** instead of not intrinsifying, we chose a stronger path: we intrinsify it with a `HaltNode` that should never be encountered at runtime. The reproducer I have eventually even is able to fold away the `HaltNode`, so it turns out it is indeed a dead path. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > intrinsify with Halt instead src/hotspot/share/opto/vectorIntrinsics.cpp line 2345: > 2343: HaltNode* halt = new HaltNode(control(), frameptr(), ss.as_string(C->comp_arena()) > 2344: PRODUCT_ONLY(COMMA /*reachable*/false)); > 2345: _gvn.set_type_bottom(halt); We create HaltNodes in several places, and all seem to be slightly different. But it seems that more call sites use transform() instead of set_type_bottom(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2688729647 From dlong at openjdk.org Wed Jan 14 02:57:33 2026 From: dlong at openjdk.org (Dean Long) Date: Wed, 14 Jan 2026 02:57:33 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v2] In-Reply-To: References: Message-ID: <3dWsNCY4TW22aUx9We8p35GiVX7JCgGs0j_Z7PUM1vQ=.043adab6-1667-4599-988a-c6b3ea1a79dc@github.com> On Wed, 14 Jan 2026 02:32:59 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> **Description:** >> >> When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 >> In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 >> Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. >> >> **Fix:** >> >> Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. >> To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. >> >> **Test:** >> >> GHA > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - change variable name > - Merge remote-tracking branch 'upstream/master' into 8374862 > - fix a compile error > - fix 8374862 I think adding `virtual bool is_buffered() `to outputStream might be a cleaner solution. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29186#issuecomment-3747452892 From qamai at openjdk.org Wed Jan 14 03:37:51 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 03:37:51 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: Message-ID: <34PvepRyfjNSmAEoE5AJfOmXe7j7VwlcQg7DqmYh3FE=.ac059f5c-b12b-4f6e-a126-08dc17e90908@github.com> On Wed, 14 Jan 2026 02:38:02 GMT, Dean Long wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> intrinsify with Halt instead > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2345: > >> 2343: HaltNode* halt = new HaltNode(control(), frameptr(), ss.as_string(C->comp_arena()) >> 2344: PRODUCT_ONLY(COMMA /*reachable*/false)); >> 2345: _gvn.set_type_bottom(halt); > > We create HaltNodes in several places, and all seem to be slightly different. But it seems that more call sites use transform() instead of set_type_bottom(). It seems there is an opportunity to create a method `GraphKit::halt(const char* reason)` that can be called by these places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2688811481 From jrose at openjdk.org Wed Jan 14 03:41:40 2026 From: jrose at openjdk.org (John R Rose) Date: Wed, 14 Jan 2026 03:41:40 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: <34PvepRyfjNSmAEoE5AJfOmXe7j7VwlcQg7DqmYh3FE=.ac059f5c-b12b-4f6e-a126-08dc17e90908@github.com> References: <34PvepRyfjNSmAEoE5AJfOmXe7j7VwlcQg7DqmYh3FE=.ac059f5c-b12b-4f6e-a126-08dc17e90908@github.com> Message-ID: On Wed, 14 Jan 2026 03:35:01 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 2345: >> >>> 2343: HaltNode* halt = new HaltNode(control(), frameptr(), ss.as_string(C->comp_arena()) >>> 2344: PRODUCT_ONLY(COMMA /*reachable*/false)); >>> 2345: _gvn.set_type_bottom(halt); >> >> We create HaltNodes in several places, and all seem to be slightly different. But it seems that more call sites use transform() instead of set_type_bottom(). > > It seems there is an opportunity to create a method `GraphKit::halt(const char* reason)` that can be called by these places. Yes. I recommend doing this cleanup in this PR. Unless we think there is a risk that gvn.transform and set_type_bottom will have different effects? I think they amount to the same thing for a halt node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2688817256 From qamai at openjdk.org Wed Jan 14 03:47:48 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 03:47:48 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: <34PvepRyfjNSmAEoE5AJfOmXe7j7VwlcQg7DqmYh3FE=.ac059f5c-b12b-4f6e-a126-08dc17e90908@github.com> Message-ID: On Wed, 14 Jan 2026 03:38:07 GMT, John R Rose wrote: >> It seems there is an opportunity to create a method `GraphKit::halt(const char* reason)` that can be called by these places. > > Yes. I recommend doing this cleanup in this PR. > Unless we think there is a risk that gvn.transform and set_type_bottom will have different effects? > I think they amount to the same thing for a halt node. Also, the `reachable` parameter here seems misleading, of course a `Halt` is unreachable. The parameter seems to be about whether we will emit code for the `HaltNode`. instruct ShouldNotReachHere() %{ match(Halt); format %{ "stop\t# ShouldNotReachHere" %} ins_encode %{ if (is_reachable()) { const char* str = __ code_string(_halt_reason); __ stop(str); } %} ins_pipe(pipe_slow); %} ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2688827029 From duke at openjdk.org Wed Jan 14 05:34:23 2026 From: duke at openjdk.org (Harshit470250) Date: Wed, 14 Jan 2026 05:34:23 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v10] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge remote-tracking branch 'origin/master' - ... and 10 more: https://git.openjdk.org/jdk/compare/fca30c1d...3ca1be39 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/05c649cb..3ca1be39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=08-09 Stats: 93214 lines in 3655 files changed: 53159 ins; 18612 del; 21443 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From xgong at openjdk.org Wed Jan 14 05:41:26 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 14 Jan 2026 05:41:26 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: References: Message-ID: > The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific > features, making the related code in HotSpot difficult to understand and review. > > This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and > maintainability. > > Note: This patch only adds comments; no functional changes are made. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Update comments in type.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29130/files - new: https://git.openjdk.org/jdk/pull/29130/files/6782b7f6..083f5754 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29130&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29130&range=01-02 Stats: 7 lines in 1 file changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/29130.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29130/head:pull/29130 PR: https://git.openjdk.org/jdk/pull/29130 From xgong at openjdk.org Wed Jan 14 06:21:44 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 14 Jan 2026 06:21:44 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: <_HUfHvmarK-86th7vlKm6oj3W6s7JC-2DZf4jVBSuvc=.a7fbc6ff-60b9-4d62-aa36-a6c3a55d9653@github.com> References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> <_HUfHvmarK-86th7vlKm6oj3W6s7JC-2DZf4jVBSuvc=.a7fbc6ff-60b9-4d62-aa36-a6c3a55d9653@github.com> Message-ID: On Tue, 13 Jan 2026 08:46:25 GMT, Xiaohong Gong wrote: >> Yes, correct. > > Do you mean I'd better comment these names as well? If so, I will refine the comment with next commit. Thanks! Refined the comments. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2689108679 From thartmann at openjdk.org Wed Jan 14 06:23:21 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 Jan 2026 06:23:21 GMT Subject: RFR: 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops [v2] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 08:38:43 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the assert in `ConnectionGraph::get_addp_base` during escape analysis. The issue here is that, after splitting a `Load` through a `Phi`, we try to adjust the `ConnectionGraph` as the split creates new nodes. As we visit each input of `data_phi`, we fail to take into consideration the possibility that the input can be folded to a `Load` but not from an `AddP`. This is the case for `Object::getClass`, as we load the `OopHandle` from the `Klass` object, then load the class mirror from that `OopHandle`. >> >> Since we are loading from raw memory, the base is not a scalar replaceable Java object. Similar to the case below, we are done processing this input. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > rename test file @merykitty Let's integrate this today and backport to JDK 26 before we enter RDP 2 tomorrow. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29177#issuecomment-3747971363 From duke at openjdk.org Wed Jan 14 06:46:18 2026 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 14 Jan 2026 06:46:18 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Mon, 12 Jan 2026 10:37:31 GMT, Andrew Dinn wrote: > Changes look good. What testing have you run? Thank you for your review. I've ran the ACVP tests for ML-KEM, which have all passed (default and intrinsics (aarch64)). ------------- PR Comment: https://git.openjdk.org/jdk/pull/29141#issuecomment-3748038784 From qamai at openjdk.org Wed Jan 14 07:11:21 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 07:11:21 GMT Subject: RFR: 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 06:20:05 GMT, Tobias Hartmann wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> rename test file > > @merykitty Let's integrate this today and backport to JDK 26 before we enter RDP 2 tomorrow. Thanks! Got it, thanks @TobiHartmann and @chhagedorn for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29177#issuecomment-3748111025 From qamai at openjdk.org Wed Jan 14 07:14:06 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 07:14:06 GMT Subject: Integrated: 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 19:32:53 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the assert in `ConnectionGraph::get_addp_base` during escape analysis. The issue here is that, after splitting a `Load` through a `Phi`, we try to adjust the `ConnectionGraph` as the split creates new nodes. As we visit each input of `data_phi`, we fail to take into consideration the possibility that the input can be folded to a `Load` but not from an `AddP`. This is the case for `Object::getClass`, as we load the `OopHandle` from the `Klass` object, then load the class mirror from that `OopHandle`. > > Since we are loading from raw memory, the base is not a scalar replaceable Java object. Similar to the case below, we are done processing this input. > > Please take a look and leave your reviews, thanks a lot. This pull request has now been integrated. Changeset: 624d7144 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/624d7144f757c39215ae3dfed1b78cdd3b3e4f8e Stats: 92 lines in 2 files changed: 91 ins; 0 del; 1 mod 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/29177 From epeter at openjdk.org Wed Jan 14 07:22:35 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jan 2026 07:22:35 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: <34PvepRyfjNSmAEoE5AJfOmXe7j7VwlcQg7DqmYh3FE=.ac059f5c-b12b-4f6e-a126-08dc17e90908@github.com> Message-ID: On Wed, 14 Jan 2026 03:43:20 GMT, Quan Anh Mai wrote: >> Yes. I recommend doing this cleanup in this PR. >> Unless we think there is a risk that gvn.transform and set_type_bottom will have different effects? >> I think they amount to the same thing for a halt node. > > Also, the `reachable` parameter here seems misleading, of course a `Halt` is unreachable. The parameter seems to be about whether we will emit code for the `HaltNode`. > > instruct ShouldNotReachHere() %{ > match(Halt); > format %{ "stop\t# ShouldNotReachHere" %} > ins_encode %{ > if (is_reachable()) { > const char* str = __ code_string(_halt_reason); > __ stop(str); > } > %} > ins_pipe(pipe_slow); > %} It will probably have to be `GraphKit::halt(HaltNode(Node* ctrl, Node* frameptr, const char* reason, bool generate_code_in_product = true)`. Because there are lots of different uses, and the `ctrl`, `frameptr` and "reachability" are all also used in different ways. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2689255977 From shade at openjdk.org Wed Jan 14 07:25:28 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Jan 2026 07:25:28 GMT Subject: RFR: 8375055: C2: Better dead loop detection printout [v4] In-Reply-To: <7csVu7A77KVBA0VDpVOtTPKpZDbAiKgedLkdKXm4a70=.e85bea88-30d6-47b9-b104-533ebe28a10e@github.com> References: <7csVu7A77KVBA0VDpVOtTPKpZDbAiKgedLkdKXm4a70=.e85bea88-30d6-47b9-b104-533ebe28a10e@github.com> Message-ID: On Tue, 13 Jan 2026 10:56:50 GMT, Aleksey Shipilev wrote: >> Chasing the maddeningly intermittent CTW failure. When C2 fails dead loop verification checks, it prints: >> >> >> # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/opto/phaseX.cpp:784), pid=64465, tid=917 >> # assert(no_dead_loop) failed: dead loop detected >> >> >> It also dumps the bad node graph to tty. This is not really convenient in automated testing and/or driver tests like CTW. With this fix, we now print: >> >> >> # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:776), pid=973305, tid=973372 >> # fatal error: Dead loop detected, node references itself: CastPP >> >> >> ...which allows to have more clues where things may go wrong, and allows to classify the failures better as well. >> >> Additional testing: >> - [x] Ad-hoc crashes with selected seeds >> - [x] Linux x86_64 server fastdebug, `hotspot_compiler` >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - Also print node idx > - Indenting Thanks folks! I am integrating to help further debugging. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29185#issuecomment-3748151110 From shade at openjdk.org Wed Jan 14 07:25:30 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Jan 2026 07:25:30 GMT Subject: Integrated: 8375055: C2: Better dead loop detection printout In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 08:37:16 GMT, Aleksey Shipilev wrote: > Chasing the maddeningly intermittent CTW failure. When C2 fails dead loop verification checks, it prints: > > > # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/opto/phaseX.cpp:784), pid=64465, tid=917 > # assert(no_dead_loop) failed: dead loop detected > > > It also dumps the bad node graph to tty. This is not really convenient in automated testing and/or driver tests like CTW. With this fix, we now print: > > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:776), pid=973305, tid=973372 > # fatal error: Dead loop detected, node references itself: CastPP > > > ...which allows to have more clues where things may go wrong, and allows to classify the failures better as well. > > Additional testing: > - [x] Ad-hoc crashes with selected seeds > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` This pull request has now been integrated. Changeset: 1b6c2bdd Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/1b6c2bdd7b57891ed35e3c067871d2c0bf282824 Stats: 30 lines in 1 file changed: 12 ins; 2 del; 16 mod 8375055: C2: Better dead loop detection printout Reviewed-by: chagedorn, qamai ------------- PR: https://git.openjdk.org/jdk/pull/29185 From hgreule at openjdk.org Wed Jan 14 07:25:47 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 14 Jan 2026 07:25:47 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v4] In-Reply-To: References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: On Tue, 13 Jan 2026 08:53:39 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: >> >> t1 = int:0 >> t2 = int:-2..3, widen = 3 >> >> Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. >> >> The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add comments Thanks for adding the comment, just one minor typo. src/hotspot/share/opto/rangeinference.hpp line 357: > 355: // - During the first step of CCP, type(x) = {0}, type(y) = [-2, 2], w = 3. > 356: // Since x is a constant that is the identity element of the xor operation, type(r) = type(y) = [-2, 2], w = 3 > 357: // - During the second step, type(x) is widen to [0, 2], w = 0. Suggestion: // - During the second step, type(x) is widened to [0, 2], w = 0. ------------- PR Review: https://git.openjdk.org/jdk/pull/28952#pullrequestreview-3659230093 PR Review Comment: https://git.openjdk.org/jdk/pull/28952#discussion_r2689262433 From epeter at openjdk.org Wed Jan 14 07:38:34 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jan 2026 07:38:34 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v3] In-Reply-To: References: Message-ID: > I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 > > In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 > > When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. > > At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. > But it seems that nothing prevents the VM from compiling such an (unreachable) path. > > Here is how I think it happens: > - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. > - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. > - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. > > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 > > That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. > > **Update: ** instead of not intrinsifying, we chose a stronger path: we intrinsify it with a `HaltNode` that should never be encountered at runtime. The reproducer I have eventually even is able to fold away the `HaltNode`, so it turns out it is indeed a dead path. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: halt refactor by demand of reviewers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29169/files - new: https://git.openjdk.org/jdk/pull/29169/files/4c717b5a..486f8930 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29169&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29169&range=01-02 Stats: 21 lines in 4 files changed: 8 ins; 7 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/29169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29169/head:pull/29169 PR: https://git.openjdk.org/jdk/pull/29169 From epeter at openjdk.org Wed Jan 14 07:38:36 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jan 2026 07:38:36 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 16:25:55 GMT, Emanuel Peter wrote: >> I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 >> >> In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 >> >> When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. >> >> At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. >> But it seems that nothing prevents the VM from compiling such an (unreachable) path. >> >> Here is how I think it happens: >> - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. >> - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. >> - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. >> >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 >> >> That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. >> >> **Update: ** instead of not intrinsifying, we chose a stronger path: we intrinsify it with a `HaltNode` that should never be encountered at runtime. The reproducer I have eventually even is able to fold away the `HaltNode`, so it turns out it is indeed a dead path. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > intrinsify with Halt instead @dean-long @merykitty @rose00 I did the refactor. We could now consider doing a separate refactor for the non-parsing use-cases of `HaltNode`, but that's out of scope. Of course there is a small risk that I messed up something here, so please review carefully - we don't want to accidentally mess up `HaltNode`s and loose their asserting powers! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29169#issuecomment-3748212015 From duke at openjdk.org Wed Jan 14 07:50:17 2026 From: duke at openjdk.org (Shawn M Emery) Date: Wed, 14 Jan 2026 07:50:17 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions In-Reply-To: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Fri, 9 Jan 2026 14:41:07 GMT, Ferenc Rakoczi wrote: > The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 6217: > 6215: __ sub(parsedLength, parsedLength, 64); > 6216: __ cmp(parsedLength, (u1)0); > 6217: __ br(Assembler::GE, L_loop); Should this be GT now? src/java.base/share/classes/com/sun/crypto/provider/ML_KEM.java line 1364: > 1362: int n = (parsedLength + 127) / 128; > 1363: assert ((parsed.length >= n * 128) && > 1364: (condensed.length >= index + n * 192)); Given the comments, can this be simplified to just: - int n = (parsedLength + 127) / 128; - assert ((parsed.length >= n * 128) && - (condensed.length >= index + n * 192)); + assert((parsed.length % 128) == 0) && (condensed.length % 192 == 0)); If the length is smaller than the constant then the remainder will be non-zero. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29141#discussion_r2689338785 PR Review Comment: https://git.openjdk.org/jdk/pull/29141#discussion_r2689173853 From epeter at openjdk.org Wed Jan 14 07:57:47 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jan 2026 07:57:47 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v2] In-Reply-To: References: Message-ID: <6ZdTxspqu8jgS92JRpoy3cI-lJvtyhWfz1Ol49Vjg-8=.944ff828-eacb-4c8b-8d78-21cea803dc3f@github.com> > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8360557-ctw-inlining - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/438d9ecf..fd0f506e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=08-09 Stats: 18248 lines in 189 files changed: 10242 ins; 4562 del; 3444 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From jbhateja at openjdk.org Wed Jan 14 11:02:26 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 14 Jan 2026 11:02:26 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v11] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 10:22:25 GMT, Fei Yang wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding testpoint for JDK-8373574 > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16Vector.java line 1653: > >> 1651: * >> 1652: * @param e the input scalar >> 1653: * @return the result of multiplying this vector by the given scalar > > The code comment mentions "multiplying", which doesn't seem correct to me. Are we doing any multiplication for min/max? This is the problem in JDK-mainline code also, we should address it separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2689974123 From aph at openjdk.org Wed Jan 14 11:04:11 2026 From: aph at openjdk.org (Andrew Haley) Date: Wed, 14 Jan 2026 11:04:11 GMT Subject: RFR: 8354853: Clean up x86 registers after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 09:37:09 GMT, Manuel H?ssig wrote: > This PR cleans up some 32-bit remnants in the x86 register code. This also presented the opportunity to convert the unscoped enums into typed constants. > > Testing: > - [ ] Github Actions > - [ ] tier1,tier2 on linux-x64-debug, linux-x64, windows-x64-debug, windows-x64, macosx-x64-debug, macosx-x64 Looks good. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29224#pullrequestreview-3660145106 From shade at openjdk.org Wed Jan 14 11:09:53 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Jan 2026 11:09:53 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: <5QyCbXyc0I8USepmyY3aega_4CXemrB2t8-vKlTVZ4U=.810d8e35-ff7f-493a-92c7-dd3b2036e743@github.com> On Fri, 9 Jan 2026 08:50:10 GMT, Aleksey Shipilev wrote: >>> That peeking involves no GC action. >> >> Not directly related to this PR, but this caught my eyes. Do you have more information about this somewhere? On the surface this sounds incorrect for ZGC, so I'd like to make sure that there's no bug lurking in there. > >> > That peeking involves no GC action. >> >> Not directly related to this PR, but this caught my eyes. Do you have more information about this somewhere? On the surface this sounds incorrect for ZGC, so I'd like to make sure that there's no bug lurking in there. > > IIRC, it is somewhere here in `BarrierSetAssembler::c2i_entry_barrier`: > > > void BarrierSetAssembler::c2i_entry_barrier(MacroAssembler* masm) { > ... > __ movptr(tmp1, Address(tmp1, ClassLoaderData::holder_offset())); > __ resolve_weak_handle(tmp1, tmp2); // <--- does IN_NATIVE | ON_PHANTOM_OOP_REF inside > __ cmpptr(tmp1, 0); > __ jcc(Assembler::notEqual, method_live); > ... > } > > > Again, IIRC, the code that is emitted in this method is part of C2I adapter, so it is stored in AOT cache. > @shipilev any ideas? I honestly do not remember what we were thinking :/ I do vaguely recall it was about peeking for nullptrs into phantoms; that is how IIRC we convinced ourselves it was "fine", as there was no GC-specific difference. For current non-peeking resolution in adapters, it does not sound right indeed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3749015167 From fyang at openjdk.org Wed Jan 14 11:14:06 2026 From: fyang at openjdk.org (Fei Yang) Date: Wed, 14 Jan 2026 11:14:06 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v11] In-Reply-To: References: Message-ID: <53ngdNhoWJy4Cacq2Xs1bC0ulSPPVcUdz6WoSb3Pp6U=.699dfca7-3a7c-4f6b-addb-3c26b8f4c357@github.com> On Wed, 14 Jan 2026 10:58:44 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16Vector.java line 1653: >> >>> 1651: * >>> 1652: * @param e the input scalar >>> 1653: * @return the result of multiplying this vector by the given scalar >> >> The code comment mentions "multiplying", which doesn't seem correct to me. Are we doing any multiplication for min/max? > > This is the problem in JDK-mainline code also, we should address it separately. Sure. I just realized that it is there for Float / Double varients as well in mainline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2690008963 From chagedorn at openjdk.org Wed Jan 14 11:28:36 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 14 Jan 2026 11:28:36 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM Message-ID: This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. This patch is about naming updates: `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. Thanks, Christian ------------- Commit messages: - 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM Changes: https://git.openjdk.org/jdk/pull/29229/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29229&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375271 Stats: 496 lines in 28 files changed: 162 ins; 159 del; 175 mod Patch: https://git.openjdk.org/jdk/pull/29229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29229/head:pull/29229 PR: https://git.openjdk.org/jdk/pull/29229 From mhaessig at openjdk.org Wed Jan 14 12:05:46 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 14 Jan 2026 12:05:46 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 11:21:58 GMT, Christian Hagedorn wrote: > This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. > > This patch is about naming updates: > > `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. > > Thanks, > Christian Nice and straightforward, thank you @chhagedorn. I found two typos, otherwise this looks good. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 242: > 240: *
    > 241: *
  • The {@code flags} override any set VM or Javaoptions flags by JTreg by default.

    > 242: * Use {@code -DPreferCommandLineFlags=true} if you want to prefer the JTreg VM and Javaoptions flags over Suggestion: * Use {@code -DPreferCommandLineFlags=true} if you want to prefer the JTreg VM and Javaoptions flags over Since you are modifying this comment, you might also remove this superfluous space. test/hotspot/jtreg/compiler/lib/ir_framework/driver/TestVMProcess.java line 44: > 42: > 43: /** > 44: * This class prepares, creates, and runs the "test" VM with verification of proper termination. The class also stores Suggestion: * This class prepares, creates, and runs the "Test" VM with verification of proper termination. The class also stores You missed a capitalization ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/29229#pullrequestreview-3660373869 PR Review Comment: https://git.openjdk.org/jdk/pull/29229#discussion_r2690158732 PR Review Comment: https://git.openjdk.org/jdk/pull/29229#discussion_r2690146222 From mhaessig at openjdk.org Wed Jan 14 12:17:27 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 14 Jan 2026 12:17:27 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v8] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 11:42:10 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - ... and 3 more: https://git.openjdk.org/jdk/compare/8f725dbd...8713f16d Marked as reviewed by mhaessig (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28432#pullrequestreview-3660447395 From shade at openjdk.org Wed Jan 14 12:19:59 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Jan 2026 12:19:59 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: <5QyCbXyc0I8USepmyY3aega_4CXemrB2t8-vKlTVZ4U=.810d8e35-ff7f-493a-92c7-dd3b2036e743@github.com> References: <5QyCbXyc0I8USepmyY3aega_4CXemrB2t8-vKlTVZ4U=.810d8e35-ff7f-493a-92c7-dd3b2036e743@github.com> Message-ID: On Wed, 14 Jan 2026 11:06:56 GMT, Aleksey Shipilev wrote: > > @shipilev any ideas? > > I honestly do not remember what we were thinking :/ I do vaguely recall it was about peeking for nullptrs into phantoms; that is how IIRC we convinced ourselves it was "fine", as there was no GC-specific difference. For current non-peeking resolution in adapters, it does not sound right indeed. I have a prototype that checks this condition mechanically, and sure it fails. Filed: https://bugs.openjdk.org/browse/JDK-8375298. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3749294696 From chagedorn at openjdk.org Wed Jan 14 12:34:33 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 14 Jan 2026 12:34:33 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM [v2] In-Reply-To: References: Message-ID: > This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. > > This patch is about naming updates: > > `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: - Update Test VM - Update test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29229/files - new: https://git.openjdk.org/jdk/pull/29229/files/43d3be12..17afa9ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29229&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29229&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29229/head:pull/29229 PR: https://git.openjdk.org/jdk/pull/29229 From chagedorn at openjdk.org Wed Jan 14 12:34:35 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 14 Jan 2026 12:34:35 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 12:02:32 GMT, Manuel H?ssig wrote: >> Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update Test VM >> - Update test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java >> >> Co-authored-by: Manuel H?ssig > > Nice and straightforward, thank you @chhagedorn. I found two typos, otherwise this looks good. Thanks @mhaessig for your review! I pushed an update addressing your comments. > test/hotspot/jtreg/compiler/lib/ir_framework/driver/TestVMProcess.java line 44: > >> 42: >> 43: /** >> 44: * This class prepares, creates, and runs the "test" VM with verification of proper termination. The class also stores > > Suggestion: > > * This class prepares, creates, and runs the "Test" VM with verification of proper termination. The class also stores > > You missed a capitalization Good catch, changed it to Test VM instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29229#issuecomment-3749348074 PR Review Comment: https://git.openjdk.org/jdk/pull/29229#discussion_r2690245636 From mhaessig at openjdk.org Wed Jan 14 12:41:22 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 14 Jan 2026 12:41:22 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 12:34:33 GMT, Christian Hagedorn wrote: >> This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. >> >> This patch is about naming updates: >> >> `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Update Test VM > - Update test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java > > Co-authored-by: Manuel H?ssig Thank you for addressing my comments. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/29229#pullrequestreview-3660540669 From bmaillard at openjdk.org Wed Jan 14 12:54:58 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 14 Jan 2026 12:54:58 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v4] In-Reply-To: References: Message-ID: On Sun, 28 Dec 2025 07:33:45 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AddNode/SubNode::Value` by taking advantage of the additional information in `TypeInt`. The implementation has some pretty non-trivial logic. Fortunately, the test infrastructure is already there. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - copyright year > - Merge branch 'master' into addsub > - Merge branch 'master' into addsub > - include order > - Improve Add/SubNode::Value with unsigned bounds and known bits Great work! I went through all the calculations, and tried to reproduce them independently. It all looks sound to me. I only have a few comments, mostly about notation. src/hotspot/share/opto/rangeinference.hpp line 419: > 417: // Similarly, if st1._lo < 0 and st2._lo < 0, we have: > 418: // - 2^(n-1) <= st1._ulo <= v1 <= st1._uhi <= 2^n - 1 > 419: // - 2^(n-1) <= st2._ulo <= v2 <= st2._uhi <= 2^n - 1 At first I thought the `-` character was "minus", which confused me a little. Maybe we should avoid this, and only have indentation? src/hotspot/share/opto/rangeinference.hpp line 437: > 435: // non-negative, the signed addition does not overflow, we can compute it directly. > 436: lo = S(st1._ulo + st2._ulo); > 437: hi = S(st1._uhi + st2._uhi); Why not use the signed bounds directly since they are equal anyway? I find it a bit easier to read, and we can avoid the cast. Suggestion: lo = st1._lo + st2._lo; hi = st1._hi + st2._hi; src/hotspot/share/opto/rangeinference.hpp line 453: > 451: // sum[i] = bit & 1; > 452: // carry[i - 1] = (bit >= 2); > 453: // } Is there a specific reason why the notation here goes from `n-1` to `0` and not the reverse? I find it more intuitive to have index `0` for the least significant bit, but maybe there is some convention I am not aware of. It does not matter too much in any case, so feel free to do whatever. src/hotspot/share/opto/rangeinference.hpp line 480: > 478: // > 479: // If we gather the min_bits into a value tmp, it is clear that > 480: // tmp = st1._bits._ones + st2._bits._ones: It feels like we don't need to "initialize" `tmp`, but maybe I am missing something src/hotspot/share/opto/rangeinference.hpp line 491: > 489: // min_bit >= 2 if and only if either: > 490: // - st1._bits._ones[i] == st2._bits._ones[i] == 1 > 491: // - (st1._bits._ones[i] == 1 || st2._bits._ones[i] == 1) && ((min_bit & 1) == 0) If I am not mistaken we could also write it this way, and I personally find this a bit more intuitive (and also more consistent with the subtraction case). And for the subsequent computations we could replace `|` by `^`. Suggestion: // - (st1._bits._ones[i] != st2._bits._ones[i]) && ((min_bit & 1) == 0) src/hotspot/share/opto/rangeinference.hpp line 535: > 533: // compute the signed bounds. > 534: lo = S(st1._ulo - st2._uhi); > 535: hi = S(st1._uhi - st2._ulo); Same comment as for `infer_add` Suggestion: lo = st1._lo - st2._hi; hi = st1._hi - st2._lo; ------------- PR Review: https://git.openjdk.org/jdk/pull/28897#pullrequestreview-3659450715 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2689670444 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2690270626 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2689734122 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2690053424 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2689804049 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2690277995 From epeter at openjdk.org Wed Jan 14 13:02:55 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jan 2026 13:02:55 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v9] In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 09:01:44 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Remove exclude or Min/Max in verify identity" > > This reverts commit cf24abad55db9a320930379c4f0f3154791d26e2. Thanks for the updates. Looks good to me, and thanks for your continued working on Min/Max :) I'm not going to run internal testing again, as you only removed the verification code, so I think GitHub Actions is sufficient now. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28895#pullrequestreview-3660616313 From duke at openjdk.org Wed Jan 14 13:06:15 2026 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 14 Jan 2026 13:06:15 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions [v2] In-Reply-To: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: > The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Fix off-by-one error discovered by Shawn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29141/files - new: https://git.openjdk.org/jdk/pull/29141/files/f2437a69..2fca58bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29141&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29141&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29141.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29141/head:pull/29141 PR: https://git.openjdk.org/jdk/pull/29141 From duke at openjdk.org Wed Jan 14 13:12:34 2026 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 14 Jan 2026 13:12:34 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions [v2] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Wed, 14 Jan 2026 10:43:23 GMT, Andrew Dinn wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 6217: >> >>> 6215: __ sub(parsedLength, parsedLength, 64); >>> 6216: __ cmp(parsedLength, (u1)0); >>> 6217: __ br(Assembler::GE, L_loop); >> >> Should this be GT now? > > Yes, I believe it should. That makes me wonder why the test did not fail. I would have expected it to loop back to the top and try to consume an extra 96 bytes of non-existent input and write it to 64 bytes of of non-existent output buffer? Did this erroneous computation not happen? or was the error simply not manifest? It is a buffer overflow, so if the memory after the arrays is there, it would be read/written, if you are lucky, it doesn't overwrite anything that is used later, so it might be able to pass a test program (which definitely had happened here). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29141#discussion_r2690369308 From shade at openjdk.org Wed Jan 14 13:22:50 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Jan 2026 13:22:50 GMT Subject: RFR: 8354853: Clean up x86 registers after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 09:37:09 GMT, Manuel H?ssig wrote: > This PR cleans up some 32-bit remnants in the x86 register code. This also presented the opportunity to convert the unscoped enums into typed constants. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 on linux-x64-debug, linux-x64, windows-x64-debug, windows-x64, macosx-x64-debug, macosx-x64 Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29224#pullrequestreview-3660710615 From duke at openjdk.org Wed Jan 14 13:31:23 2026 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 14 Jan 2026 13:31:23 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions [v2] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Wed, 14 Jan 2026 13:09:29 GMT, Ferenc Rakoczi wrote: >> Yes, I believe it should. That makes me wonder why the test did not fail. I would have expected it to loop back to the top and try to consume an extra 96 bytes of non-existent input and write it to 64 bytes of of non-existent output buffer? Did this erroneous computation not happen? or was the error simply not manifest? > > It is a buffer overflow, so if the memory after the arrays is there, it would be read/written, if you are lucky, it doesn't overwrite anything that is used later, so it might be able to pass a test program (which definitely had happened here). Yes, it should. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29141#discussion_r2690431947 From duke at openjdk.org Wed Jan 14 13:31:28 2026 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 14 Jan 2026 13:31:28 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions [v2] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Wed, 14 Jan 2026 06:51:00 GMT, Shawn M Emery wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix off-by-one error discovered by Shawn > > src/java.base/share/classes/com/sun/crypto/provider/ML_KEM.java line 1364: > >> 1362: int n = (parsedLength + 127) / 128; >> 1363: assert ((parsed.length >= n * 128) && >> 1364: (condensed.length >= index + n * 192)); > > Given the comments, can this be simplified to just: > > > - int n = (parsedLength + 127) / 128; > - assert ((parsed.length >= n * 128) && > - (condensed.length >= index + n * 192)); > + assert((parsed.length % 128) == 0) && (condensed.length % 192 == 0)); > > > If the length is smaller than the constant then the remainder will be non-zero. These are the exact conditions that the most demanding intrinsic (the AVX-512 one) requires. If we would rely on that the callers satisfy these, we wouldn't need the assert :-) . The loop in the intrinsic will read n * 192 bytes and write n * 128 shorts, your suggestion would not ensure that the arrays have at least that much space. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29141#discussion_r2690431489 From qamai at openjdk.org Wed Jan 14 13:36:11 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 13:36:11 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v5] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AddNode/SubNode::Value` by taking advantage of the additional information in `TypeInt`. The implementation has some pretty non-trivial logic. Fortunately, the test infrastructure is already there. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Improve comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28897/files - new: https://git.openjdk.org/jdk/pull/28897/files/fe534505..ae17b24e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=03-04 Stats: 15 lines in 1 file changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/28897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28897/head:pull/28897 PR: https://git.openjdk.org/jdk/pull/28897 From qamai at openjdk.org Wed Jan 14 13:36:15 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 13:36:15 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v4] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 09:33:44 GMT, Beno?t Maillard wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - copyright year >> - Merge branch 'master' into addsub >> - Merge branch 'master' into addsub >> - include order >> - Improve Add/SubNode::Value with unsigned bounds and known bits > > src/hotspot/share/opto/rangeinference.hpp line 419: > >> 417: // Similarly, if st1._lo < 0 and st2._lo < 0, we have: >> 418: // - 2^(n-1) <= st1._ulo <= v1 <= st1._uhi <= 2^n - 1 >> 419: // - 2^(n-1) <= st2._ulo <= v2 <= st2._uhi <= 2^n - 1 > > At first I thought the `-` character was "minus", which confused me a little. Maybe we should avoid this, and only have indentation? Yes you are right, very questionable from my side here ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2690440286 From mchevalier at openjdk.org Wed Jan 14 13:37:26 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 14 Jan 2026 13:37:26 GMT Subject: RFR: 8354853: Clean up x86 registers after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 09:37:09 GMT, Manuel H?ssig wrote: > This PR cleans up some 32-bit remnants in the x86 register code. This also presented the opportunity to convert the unscoped enums into typed constants. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 on linux-x64-debug, linux-x64, windows-x64-debug, windows-x64, macosx-x64-debug, macosx-x64 Marked as reviewed by mchevalier (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29224#pullrequestreview-3660764093 From qamai at openjdk.org Wed Jan 14 13:39:40 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 13:39:40 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v4] In-Reply-To: References: Message-ID: <7CmCvn7fnzAMJgIaHHHw3qySJlnMNbyJpoOcAUOTfxU=.4e27e114-e539-49dc-8061-5a9f4be6ec09@github.com> On Wed, 14 Jan 2026 09:51:06 GMT, Beno?t Maillard wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - copyright year >> - Merge branch 'master' into addsub >> - Merge branch 'master' into addsub >> - include order >> - Improve Add/SubNode::Value with unsigned bounds and known bits > > src/hotspot/share/opto/rangeinference.hpp line 453: > >> 451: // sum[i] = bit & 1; >> 452: // carry[i - 1] = (bit >= 2); >> 453: // } > > Is there a specific reason why the notation here goes from `n-1` to `0` and not the reverse? I find it more intuitive to have index `0` for the least significant bit, but maybe there is some convention I am not aware of. It does not matter too much in any case, so feel free to do whatever. Since we are viewing the binary number as a bit string, I tend to think that it is more intuitive to imagine a value 0b1011 as a bit string "1011", which means the first index is the msb and the last index is the lsb. Normally, when I do Maths I use 0 as the lsb, though, since numbers are unbounded. But for this presentation, I think doing it this way is easier for the others to conceptualize. > src/hotspot/share/opto/rangeinference.hpp line 480: > >> 478: // >> 479: // If we gather the min_bits into a value tmp, it is clear that >> 480: // tmp = st1._bits._ones + st2._bits._ones: > > It feels like we don't need to "initialize" `tmp`, but maybe I am missing something This is not an initialization, though. This section just describes that a `tmp` constructed using the loop below is the same as the one constructed by adding these 2 values. > src/hotspot/share/opto/rangeinference.hpp line 491: > >> 489: // min_bit >= 2 if and only if either: >> 490: // - st1._bits._ones[i] == st2._bits._ones[i] == 1 >> 491: // - (st1._bits._ones[i] == 1 || st2._bits._ones[i] == 1) && ((min_bit & 1) == 0) > > If I am not mistaken we could also write it this way, and I personally find this a bit more intuitive (and also more consistent with the subtraction case). And for the subsequent computations we could replace `|` by `^`. > Suggestion: > > // - (st1._bits._ones[i] != st2._bits._ones[i]) && ((min_bit & 1) == 0) Yes you are right, that's a good idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2690456483 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2690461242 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2690457238 From qamai at openjdk.org Wed Jan 14 13:42:29 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 13:42:29 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v4] In-Reply-To: References: Message-ID: <0U9hUIU5fRcLc0lgHqBP9Or7pv6bepUsza7LWdKtRTI=.1ae45b96-caf0-4fee-ab10-53078bf5589f@github.com> On Wed, 14 Jan 2026 12:38:18 GMT, Beno?t Maillard wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - copyright year >> - Merge branch 'master' into addsub >> - Merge branch 'master' into addsub >> - include order >> - Improve Add/SubNode::Value with unsigned bounds and known bits > > src/hotspot/share/opto/rangeinference.hpp line 437: > >> 435: // non-negative, the signed addition does not overflow, we can compute it directly. >> 436: lo = S(st1._ulo + st2._ulo); >> 437: hi = S(st1._uhi + st2._uhi); > > Why not use the signed bounds directly since they are equal anyway? I find it a bit easier to read, and we can avoid the cast. > Suggestion: > > lo = st1._lo + st2._lo; > hi = st1._hi + st2._hi; It is because `st1._lo` can be a 3-bit signed `int`. And I don't want to implement arithmetic for these signed classes since normally signed arithmetic is UB in the presence of overflow. And it seems not a good idea to either introduce UB for `int3_t` addition, or have inconsistent behaviour between things we test with and things that are the real guys. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2690469664 From qamai at openjdk.org Wed Jan 14 13:45:16 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 13:45:16 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v4] In-Reply-To: References: Message-ID: <1-0yFVb-o9jqV2oatwVcNLGBSZS_oQXpbPmM_-VWkLY=.69d72a7e-94d5-4f30-b614-37584ed712b9@github.com> On Wed, 14 Jan 2026 12:51:50 GMT, Beno?t Maillard wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - copyright year >> - Merge branch 'master' into addsub >> - Merge branch 'master' into addsub >> - include order >> - Improve Add/SubNode::Value with unsigned bounds and known bits > > Great work! I went through all the calculations, and tried to reproduce them independently. It all looks sound to me. I only have a few comments, mostly about notation. @benoitmaillard Thanks a lot for your reviews! I have addressed your comments. I think this PR should wait for #28952, so it would be great if you or anyone could take a look there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28897#issuecomment-3749622511 From roland at openjdk.org Wed Jan 14 13:53:59 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 14 Jan 2026 13:53:59 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked Message-ID: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` caches intermediate results in `_dom_lca_tags` when the late control is computed by `PhaseIdealLoop::get_late_ctrl()` for a node `n`: the code iterates over all uses of `n` potentially calling `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` multiple times. `_dom_lca_tags` is used to cache data that is specific to the lca computation for `n`. `_dom_lca_tags` is set to a tag that depends on `n` to mark the cached data as only valid during the lca computation for `n`. `PhaseIdealLoop::try_sink_out_of_loop()` checks that all uses of a node are out of loop with `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` which also needs to consider anti-dependences for `Load`s and also calls `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` through `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`: this computes the late control for a node and one particular out of loop use. `_dom_lca_tags` values computed by an earlier `PhaseIdealLoop::get_late_ctrl()` should be ignored (because it computes the late control for a node and all its uses). To address that issue, the tag that's used by `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` is made different on each call from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` by incrementing `_dom_lca_tags_round`. The issue here is that one `Load` node is input to a `Phi` twice. So the `Phi` is considered twice as a use of the node along 2 different paths. `PhaseIdealLoop::get_late_ctrl_with_anti_dep()` is called twice from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` but `_dom_lca_tags_round` is not incremented between the 2 calls. `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` when called for the second `Phi` input uses incorrect cached data which, in turn, causes an incorrect computation. The fix I propose is to make sure `_dom_lca_tags_round` is incremented for every call to `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`. ------------- Commit messages: - test - fix Changes: https://git.openjdk.org/jdk/pull/29231/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29231&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374725 Stats: 74 lines in 2 files changed: 69 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29231.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29231/head:pull/29231 PR: https://git.openjdk.org/jdk/pull/29231 From ghan at openjdk.org Wed Jan 14 13:56:46 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Wed, 14 Jan 2026 13:56:46 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: References: Message-ID: > Please review this change. Thanks! > > **Description:** > > When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 > In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 > Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. > > **Fix:** > > Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. > To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. > > **Test:** > > GHA Guanqiang Han has updated the pull request incrementally with one additional commit since the last revision: fix a compile error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29186/files - new: https://git.openjdk.org/jdk/pull/29186/files/9dc71b7d..ea011598 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29186&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29186&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29186/head:pull/29186 PR: https://git.openjdk.org/jdk/pull/29186 From roland at openjdk.org Wed Jan 14 15:02:59 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 14 Jan 2026 15:02:59 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v4] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 11:48:13 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - more >> - more >> - review >> - Merge branch 'master' into JDK-8373343 >> - review >> - review >> - review >> - merge >> - more >> - more >> - ... and 3 more: https://git.openjdk.org/jdk/compare/90d0a72d...b20f41db > > Update looks good! Let me this another spin in our testing. @chhagedorn do you have an update regarding the testing you started last week? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28769#issuecomment-3749970175 From shade at openjdk.org Wed Jan 14 15:12:10 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Jan 2026 15:12:10 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. Let's do this for mainline. I'll follow up on AOT-related stuff separately. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29129#pullrequestreview-3661238039 From epeter at openjdk.org Wed Jan 14 15:15:45 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jan 2026 15:15:45 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 13 Jan 2026 11:27:53 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Fix build failure after rebasing and address review comments > - Merge branch 'master' into optimize-atomic-post > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 @fg1417 Thanks for merging! I took an hour to dig back into the PR now, and only just read through the first few dozens of lines. I have some questions and naming suggestions :) I'll continue with the review in the next days. src/hotspot/share/opto/loopTransform.cpp line 1337: > 1335: } else { > 1336: if (get_ctrl(n) != back_ctrl) { return n; } > 1337: } Is the `clone_up_backedge` name still correct for all cases? It seems before we only cloned up nodes that belonged to `back_ctrl`. But now we also clone from exit pre-loop **exit path**. I never liked the `goo` anyway... And: why is it called `clone_up`? What "up" does it refer to? What about `clone_up(FromBackedge, ...` and `clone_up(FromPreLoopExit, ...`, using two enums? Then we can be a bit more explicit which case we are in, and add corresponding asserts for `back_ctrl` (non-null vs null). src/hotspot/share/opto/loopTransform.cpp line 1402: > 1400: Node* main_backedge_ctrl = main_head->back_control(); > 1401: // For the post loop, we call clone_up_backedge_goo() to obtain the fall-out values > 1402: // from the main loop, which serve as the fall-in values for the post loop. The naming of `clone_up_backedge_goo` is confusing me a bit: We seem to clone down (main -> post) and we clone fall-out (exit) values, and not backedge values. src/hotspot/share/opto/loopTransform.cpp line 1408: > 1406: main_phi->in(LoopNode::LoopBackControl), > 1407: visited, clones); > 1408: } I did not dig super deep here, but I'm wondering if/how `ControlAroundStripMined` relates to the post loop case? Also, if the branch is not taken, you assume it can only be the drain loop. Could we assert `mode == InsertVectorizedDrain` down there? src/hotspot/share/opto/loopTransform.cpp line 1413: > 1411: // we now need to make the fall-in values to the vectorized drain > 1412: // loop come from phis merging exit values from the pre loop and > 1413: // the main loop. Suggestion: // the main loop, see "drain_input". src/hotspot/share/opto/loopTransform.cpp line 1460: > 1458: // TestVectorizedDrainLoop.java. > 1459: Node* drain_input = nullptr; > 1460: Node* iv_after_main = main_phi->in(LoopNode::LoopBackControl); Is the `phi` we are looking at here always only the `iv`/trip counter? - If always `iv`: how do we patch the other `phi`s? And is this loop not covering all phis? https://github.com/openjdk/jdk/pull/22629/files#diff-6a59f91cb710d682247df87c75faf602f0ff9f87e2855ead1b80719704fbedffL1770-L1781 - If we cover all phis: we should probably change the naming of the variables to indicate that it is any `phi` and not just the `iv`. What do you think? src/hotspot/share/opto/loopTransform.cpp line 1464: > 1462: // We try to look up target phi from all uses of node 'iv_after_main'. > 1463: drain_input = find_merge_phi_for_vectorized_drain(iv_after_main, main_merge_region); > 1464: } What is the if for here? Why do we need that condition? Ah, I suppose if `iv_after_main` is not on the backedge, it is in the main-loop body, right? Still, I don't see through yet ... can you clarify? src/hotspot/share/opto/loopTransform.cpp line 1473: > 1471: // otherwise return 'iv_after_main'. > 1472: iv_after_main = clone_up_backedge_goo(main_backedge_ctrl, main_merge_region->in(2), > 1473: iv_after_main, visited, clones); I think this part would make more sense if we actually started with a variable `main_backedge` instead of `after_main`, and then the clone gets us the `after_main` value, because now it has been cloned out of the loop. src/hotspot/share/opto/loopTransform.cpp line 1475: > 1473: iv_after_main, visited, clones); > 1474: drain_input = PhiNode::make(main_merge_region, iv_after_main); > 1475: Node* pre_incr = main_phi->in(LoopNode::EntryControl); What about renaming `pre_incr` -> `main_input` or `after_pre`. That would remove the `iv` connotation, and be more parallel to `drain_input` or `after_main`. But a similar question here: what if we had a split-through-phi here? Before split-through-phi, we would have had some input `x`, but afterwards we'd have `op(x)` here. Would that not mean that we should use `x` for the input to `drain_input`, but are getting `op(x)`? src/hotspot/share/opto/loopTransform.cpp line 1482: > 1480: pre_incr = clone_up_backedge_goo(nullptr, main_merge_region->in(1), pre_incr, visited, clones); > 1481: } > 1482: drain_input->set_req(1, pre_incr); Just a control question: above you did: `drain_input = PhiNode::make(main_merge_region, iv_after_main);` Does that not put `iv_after_main` at slot `1`, and now we overwrite it with `pre_incr`? src/hotspot/share/opto/loopTransform.cpp line 1491: > 1489: // Remove the new phi from the graph and use the hit > 1490: _igvn.remove_dead_node(drain_input); > 1491: drain_input = hit; Does this actually ever happen? Would we not have expected that `find_merge_phi_for_vectorized_drain` would have succeeded? ------------- PR Review: https://git.openjdk.org/jdk/pull/22629#pullrequestreview-3660935313 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690594036 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690614731 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690636076 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690649517 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690688876 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690755348 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690774526 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690820214 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690838851 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690849347 From epeter at openjdk.org Wed Jan 14 15:15:47 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jan 2026 15:15:47 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Wed, 14 Jan 2026 14:22:21 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Fix build failure after rebasing and address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Fixed new test failures after rebasing and refined parts of the code to address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Merge branch 'master' into optimize-atomic-post >> - Clean up comments for consistency and add spacing for readability >> - Fix some corner case failures and refined part of code >> - Merge branch 'master' into optimize-atomic-post >> - Refine ascii art, rename some variables and resolve conflicts >> - Merge branch 'master' into optimize-atomic-post >> - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 > > src/hotspot/share/opto/loopTransform.cpp line 1413: > >> 1411: // we now need to make the fall-in values to the vectorized drain >> 1412: // loop come from phis merging exit values from the pre loop and >> 1413: // the main loop. > > Suggestion: > > // the main loop, see "drain_input". Would this be correct? It would allow the reader to search for "drain_input" and immediately find the right point to focus in the ASCII art below :) > src/hotspot/share/opto/loopTransform.cpp line 1460: > >> 1458: // TestVectorizedDrainLoop.java. >> 1459: Node* drain_input = nullptr; >> 1460: Node* iv_after_main = main_phi->in(LoopNode::LoopBackControl); > > Is the `phi` we are looking at here always only the `iv`/trip counter? > - If always `iv`: how do we patch the other `phi`s? And is this loop not covering all phis? https://github.com/openjdk/jdk/pull/22629/files#diff-6a59f91cb710d682247df87c75faf602f0ff9f87e2855ead1b80719704fbedffL1770-L1781 > - If we cover all phis: we should probably change the naming of the variables to indicate that it is any `phi` and not just the `iv`. > > What do you think? Also: how sure are you that the backedge `main_phi->in(LoopNode::LoopBackControl)` is the same as the the value after main `iv_after_main`? What if we did some split-through-phi action at some point? Example: x = ... LOOP: x = op(x); // x now serves as exit value and backedge value exit check; goto LOOP; If we split `op` through the LOOP Phi, we get: x = ... x = op(x); LOOP: // the exit value is the phi exit check; x = op(x); // x after op is the backedge goto LOOP; I'm not sure this currently ever happens, but what if it did? > src/hotspot/share/opto/loopTransform.cpp line 1473: > >> 1471: // otherwise return 'iv_after_main'. >> 1472: iv_after_main = clone_up_backedge_goo(main_backedge_ctrl, main_merge_region->in(2), >> 1473: iv_after_main, visited, clones); > > I think this part would make more sense if we actually started with a variable `main_backedge` instead of `after_main`, and then the clone gets us the `after_main` value, because now it has been cloned out of the loop. So maybe it should not be: `Node* iv_after_main = main_phi->in(LoopNode::LoopBackControl);` But instead: `Node* main_backedge = main_phi->in(LoopNode::LoopBackControl);` Because it is at that point not yet clear that it is really the `after_main` value, of if we need to clone it first. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690652154 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690744427 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2690779855 From qamai at openjdk.org Wed Jan 14 15:30:30 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 15:30:30 GMT Subject: [jdk26] RFR: 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 08:01:24 GMT, Quan Anh Mai wrote: > Hi all, > > This pull request contains a backport of commit [624d7144](https://github.com/openjdk/jdk/commit/624d7144f757c39215ae3dfed1b78cdd3b3e4f8e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Quan Anh Mai on 14 Jan 2026 and was reviewed by Christian Hagedorn and Tobias Hartmann. > > Thanks! Thanks for your approval, do I need a second review for this backport? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29219#issuecomment-3750096822 From thartmann at openjdk.org Wed Jan 14 15:39:44 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 Jan 2026 15:39:44 GMT Subject: [jdk26] RFR: 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 08:01:24 GMT, Quan Anh Mai wrote: > Hi all, > > This pull request contains a backport of commit [624d7144](https://github.com/openjdk/jdk/commit/624d7144f757c39215ae3dfed1b78cdd3b3e4f8e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Quan Anh Mai on 14 Jan 2026 and was reviewed by Christian Hagedorn and Tobias Hartmann. > > Thanks! I think this is good to go. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29219#issuecomment-3750133033 From qamai at openjdk.org Wed Jan 14 15:56:56 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 15:56:56 GMT Subject: [jdk26] Integrated: 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 08:01:24 GMT, Quan Anh Mai wrote: > Hi all, > > This pull request contains a backport of commit [624d7144](https://github.com/openjdk/jdk/commit/624d7144f757c39215ae3dfed1b78cdd3b3e4f8e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Quan Anh Mai on 14 Jan 2026 and was reviewed by Christian Hagedorn and Tobias Hartmann. > > Thanks! This pull request has now been integrated. Changeset: ffc6d1b7 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/ffc6d1b74baeecc63de869b7fadef49b438ac131 Stats: 92 lines in 2 files changed: 91 ins; 0 del; 1 mod 8374435: assert(addp->is_AddP()) failed: must be AddP during EA with -XX:-UseCompressedOops Reviewed-by: thartmann Backport-of: 624d7144f757c39215ae3dfed1b78cdd3b3e4f8e ------------- PR: https://git.openjdk.org/jdk/pull/29219 From kxu at openjdk.org Wed Jan 14 16:38:25 2026 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 14 Jan 2026 16:38:25 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v29] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: fix safepoint detection ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/7783d609..e39a2a63 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=27-28 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From vlivanov at openjdk.org Wed Jan 14 16:54:40 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 14 Jan 2026 16:54:40 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: On Thu, 8 Jan 2026 09:25:35 GMT, Marc Chevalier wrote: > As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. > > I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. > There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). > > Thanks, > Marc Thinking more about it, there's already a stress mode for incremental inlining which is `AlwaysIncrementalInline`. It makes sense to extend it with shuffling logic. (And I don't mind renaming `AlwaysIncrementalInline` to `StressIncrementalInline`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/29110#issuecomment-3750522940 From vlivanov at openjdk.org Wed Jan 14 16:54:42 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 14 Jan 2026 16:54:42 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> <47G92GiYmF8jY_vJ2hUNaQQoGBLqpkg_pYJT2r4jn9g=.94cb4ca9-8dfd-4a31-b77b-dbcc5b8f29d0@github.com> Message-ID: On Mon, 12 Jan 2026 07:47:39 GMT, Marc Chevalier wrote: > we would end-up with different inlining decisions, but it would still be correct, right? Yes and it perfectly fits stressing mode scenario. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29110#discussion_r2691242099 From vlivanov at openjdk.org Wed Jan 14 17:06:37 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 14 Jan 2026 17:06:37 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded In-Reply-To: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: On Tue, 13 Jan 2026 17:35:35 GMT, Kerem Kat wrote: > The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. src/hotspot/share/opto/vector.cpp line 357: > 355: // any of the inputs to VectorBoxNode are value-numbered they can only > 356: // move up and are guaranteed to dominate. > 357: if (vbox->is_Phi() && vect->bottom_type()->isa_vect()) { Does `vect->bottom_type()->isa_vect()` check become redundant? In other words, is it possible to observe a non-vector value here? It seems like the important bit is whether `vect` is a `Phi` or not. Another observation: `vbox->is_Phi() && vect->is_Phi()` and `vbox->is_Phi() && !vect->is_Phi()` cases can be commoned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29200#discussion_r2691294922 From vlivanov at openjdk.org Wed Jan 14 17:15:30 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 14 Jan 2026 17:15:30 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> Message-ID: On Mon, 12 Jan 2026 15:05:29 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 2. Fold a pointer `Phi`. >> >> Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. >> >> Another interesting case: >> >> Point p = Phi(p1, p2); >> p.x = v; >> p1.x = v1; >> int a = p.x; >> >> Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. >> >> 3. Nested objects >> >> It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: >> >> Point p = new Point; >> PointHolder h = new PointHolder; >> h.p = p; >> int x = p.x; >> escape(h); >> >> Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. >> >> Please... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into loadfoldingigvn > - Early return when not a heap access > - Fix escape at store > - Fix outdated and unclear comments > - copyright year, return, comments, whitespace > - Merge branch 'master' into loadfoldingigvn > - ea of phis and nested objects > - Add test scenarios > - Add a flag to turn off the feature > - Much more comments, refactor the data into a separate class > - ... and 9 more: https://git.openjdk.org/jdk/compare/5d335f95...c275e6e6 Very nice! Considering local EA is part of IGVN, is any additional work needed to ensure that changes at use sites (which may affect escape state) trigger reprocessing of affected memory nodes? In other words, if IGVN improves object escape state, do all corresponding memory operations promptly notice that? ------------- PR Review: https://git.openjdk.org/jdk/pull/28812#pullrequestreview-3661821455 From mchevalier at openjdk.org Wed Jan 14 17:18:04 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 14 Jan 2026 17:18:04 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: On Thu, 8 Jan 2026 09:25:35 GMT, Marc Chevalier wrote: > As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. > > I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. > There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). > > Thanks, > Marc I could find only very little difference between `Always` and `Stress`. This (hence their names): https://github.com/openjdk/jdk/blob/56545328f849c3ebf062e3ff601224084fa3b46e/src/hotspot/share/opto/compile.hpp#L1108-L1109 and the fact that the `Always` is a develop flag while `Stress` is diagnostic And there is only one location where the `Stress` version appears alone: to initialize the random generator. It seems that the `Stress` version should do the shuffling (that's the randomized one). And indeed, I suppose we can do it for the `Always` version, but its current behavior is deterministic and that would change that. That would also make more questionable to have both the `Stress` and `Always` version. Wdyt? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29110#issuecomment-3750643909 From qamai at openjdk.org Wed Jan 14 17:33:49 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 14 Jan 2026 17:33:49 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 03:51:32 GMT, Vladimir Ivanov wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 2. Fold a pointer `Phi`. >> >> Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. >> >> Another interesting case: >> >> Point p = Phi(p1, p2); >> p.x = v; >> p1.x = v1; >> int a = p.x; >> >> Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. >> >> 3. Nested objects >> >> It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: >> >> Point p = new Point; >> PointHolder h = new PointHolder; >> h.p = p; >> int x = p.x; >> escape(h); >> >> Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. >> >> Please... > > Very nice! I definitely prefer the approach here to #28764. > > I see that the unit test stays the same and there's an adjustment in some other test, so I assume this version is functionally more powerful than #28764 version. > > Have you had a chance to measure how much it affects compilation speed compared to #28764? > > (The code is dense and hard to reason about, so some polishing/refactoring to make it more readable. Also, please, think about verification checks.) @iwanowww Unfortunately, I believe there is no such feature yet. That's why we skip `LoadNode`s in `PhaseIterGVN::verify_Ideal|Identity|Value_for`. I think it is profitable to investigate manually appending them to the work list after each round of IGVN, similar to how it is handled in `PhaseCCP`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3750712766 From coleenp at openjdk.org Wed Jan 14 18:54:26 2026 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 14 Jan 2026 18:54:26 GMT Subject: RFR: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. Thanks you Ioi for the information and Andrew. Stefan and Aleksey for the discussion and reviews. This does take one small thing off the valhalla list for us. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29129#issuecomment-3751091399 From coleenp at openjdk.org Wed Jan 14 18:57:12 2026 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 14 Jan 2026 18:57:12 GMT Subject: Integrated: 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 23:18:19 GMT, Coleen Phillimore wrote: > Save this address so that the code that uses it in the valhalla repo will have it. This doesn't fail in main jdk repository. > See CR for more information. > Tested with tier1-4. This pull request has now been integrated. Changeset: 60fbaf5b Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/60fbaf5b26d7d359b1258898d4c4dfd86010b8a5 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8374828: Save load_barrier_on_oop_field_preloaded in aot CodeCache Reviewed-by: adinn, iklam, shade ------------- PR: https://git.openjdk.org/jdk/pull/29129 From vlivanov at openjdk.org Wed Jan 14 19:48:07 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 14 Jan 2026 19:48:07 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> Message-ID: On Mon, 12 Jan 2026 15:05:29 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 2. Fold a pointer `Phi`. >> >> Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. >> >> Another interesting case: >> >> Point p = Phi(p1, p2); >> p.x = v; >> p1.x = v1; >> int a = p.x; >> >> Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. >> >> 3. Nested objects >> >> It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: >> >> Point p = new Point; >> PointHolder h = new PointHolder; >> h.p = p; >> int x = p.x; >> escape(h); >> >> Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. >> >> Please... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into loadfoldingigvn > - Early return when not a heap access > - Fix escape at store > - Fix outdated and unclear comments > - copyright year, return, comments, whitespace > - Merge branch 'master' into loadfoldingigvn > - ea of phis and nested objects > - Add test scenarios > - Add a flag to turn off the feature > - Much more comments, refactor the data into a separate class > - ... and 9 more: https://git.openjdk.org/jdk/compare/44c7f6cd...c275e6e6 Please, file an RFE to address such scenarios then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3751313371 From vlivanov at openjdk.org Wed Jan 14 21:06:50 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 14 Jan 2026 21:06:50 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order In-Reply-To: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: On Thu, 8 Jan 2026 09:25:35 GMT, Marc Chevalier wrote: > As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. > > I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. > There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). > > Thanks, > Marc There's clearly duplication between `StressIncrementalInlining` and `AlwaysIncrementalInline`. Any particular reason to keep `AlwaysIncrementalInline`? It was introduced as part of initial implementation to test the functionality, but `StressIncrementalInlining` looks more flexible whet it comes to testing various combinations of inlining decisions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29110#issuecomment-3751705845 From vlivanov at openjdk.org Wed Jan 14 21:13:21 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 14 Jan 2026 21:13:21 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 09:11:25 GMT, Quan Anh Mai wrote: >> ### Problem: >> >> Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: >> >> >> // A fatal error has been detected by the Java Runtime Environment: >> // >> // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 >> // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector >> // ... >> >> >> The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 >> >> ### Root Cause: >> >> The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. >> >> Here is the simplified ideal graph showing the crash scenario: >> >> >> Con #top >> | ConI >> \ / >> \ / >> VectorStoreMask >> | >> VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong >> >> >> ### Detailed Scenario: >> >> Following is the method in the test case that hits the assertion: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 >> >> This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. >> >> When compiling a specific test case such as: >> https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 >> >> the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: >> >> >> VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() >> / \ >> AddP \ >> | \ >> LoadNClass \ >> ConP #IntMaxMask | | >> \ | | >> \ DecodeN... > > src/hotspot/share/opto/vectornode.cpp line 1923: > >> 1921: Node* mask = in1->in(1); >> 1922: const TypeVect* mask_vt = mask->bottom_type()->isa_vect(); >> 1923: if (mask_vt == nullptr) { > > It is better to filter the exact `Type::TOP` instance and assert that otherwise, this must be a `TypeVect`. Additionally, if the type of the input is `Type::TOP`, we can eagerly return `C->top()` to kill it. I second @merykitty's suggestion. It's better to improve input validation and fail-fast IGVN/intrinsification attempt when any inputs are dead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29057#discussion_r2692058439 From kxu at openjdk.org Wed Jan 14 23:24:13 2026 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 14 Jan 2026 23:24:13 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v28] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 22:44:30 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: >> >> - Merge remote-tracking branch 'origin/master' into counted-loop-refactor >> - Update license header years >> - Merge remote-tracking branch 'origin/master' into counted-loop-refactor >> - remove trailing whitespaces >> - Merge remote-tracking branch 'origin/master' into counted-loop-refactor >> - additional suggestions from code review >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - fix trip counter loop-variant detection >> - fix bad merge with ctrl_is_member() >> - Merge remote-tracking branch 'origin/master' into counted-loop-refactor >> >> # Conflicts: >> # src/hotspot/share/opto/loopnode.cpp >> - ... and 40 more: https://git.openjdk.org/jdk/compare/640343f7...7783d609 > > There are quite some failures with the same assert (probably all related). Can be triggered, for example, by running `compiler/predicates/assertion/TestAssertionPredicates.java#NoLoopPredicationXbatch` with `-XX:+UseSerialGC`: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/opt/mach5/mesos/work_dir/slaves/da1065b5-7b94-4f0d-85e9-a3a252b9a32e-S11864/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/c6afc1de-b432-44d4-bd71-2c035e46dc9e/runs/88cff2b5-6582-4c32-8cb2-92c8c5d2feeb/workspace/open/src/hotspot/share/opto/loopnode.hpp:1450), pid=182310, tid=182326 > # Error: assert(!has_ctrl(n)) failed > .......... > Current CompileTask: > C2:300 95 b 4 compiler.predicates.assertion.TestAssertionPredicates::testTrySplitUpNonOpaqueExpressionNode (163 bytes) > > Stack: [0x00007f27d75cc000,0x00007f27d76cc000], sp=0x00007f27d76c6b00, free space=1002k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x156bff8] PhaseIdealLoop::get_loop(Node const*) const+0x68 (loopnode.hpp:1450) > V [libjvm.so+0x15a07f7] IdealLoopTree::remove_safepoints(PhaseIdealLoop*, bool)+0x167 (loopnode.cpp:4672) > V [libjvm.so+0x15b7dee] IdealLoopTree::counted_loop(PhaseIdealLoop*)+0x11e (loopnode.cpp:4700) > V [libjvm.so+0x15b7d7a] IdealLoopTree::counted_loop(PhaseIdealLoop*)+0xaa (loopnode.cpp:4719) > V [libjvm.so+0x15b7d7a] IdealLoopTree::counted_loop(PhaseIdealLoop*)+0xaa (loopnode.cpp:4719) > V [libjvm.so+0x15bcc07] PhaseIdealLoop::build_and_optimize()+0xaf7 (loopnode.cpp:5285) > V [libjvm.so+0xbb8130] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x4c0 (loopnode.hpp:1226) > V [libjvm.so+0xbb1995] Compile::Optimize()+0x685 (compile.cpp:2466) > V [libjvm.so+0xbb5173] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x2023 (compile.cpp:862) > V [libjvm.so+0x9cc3e8] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x498 (c2compiler.cpp:147) > V [libjvm.so+0xbc4660] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x780 (compileBroker.cpp:2345) > V [libjvm.so+0xbc5ec0] CompileBroker::compiler_thread_loop()+0x530 (compileBroker.cpp:1989) > V [libjvm.so+0x112635b] JavaThread::thread_main_inner()+0x13b (javaThread.cpp:776) > V [libjvm.so+0x1bb30b6] Thread::call_run()+0xb6 (thread.cpp:242) > V [libjvm.so+0x1808c98] thread_native_entry(Thread*)+0x118 (os_linux.cpp:860) > > > The branch with the old vs. new code also hit the diff asser... @chhagedorn Sorry I made a mistake with safepoint detection. Upon inspecting the original code, `_safepoint` should be set to `null` if `.opcode() != Op_SafePoint`. This logic is missing from my refactored code. How the test only fails with `-XX:+UseSerialGC` is beyond me. > I will check next week if I can extract a reproducer to share. Yes it is curious regarding the diff assert. I'll appreciate if you can share more information. Thank you very much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3752153513 From ghan at openjdk.org Thu Jan 15 00:42:21 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Thu, 15 Jan 2026 00:42:21 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 02:04:51 GMT, David Holmes wrote: >> Guanqiang Han has updated the pull request incrementally with one additional commit since the last revision: >> >> fix a compile error > > I think this looks like a good solution. I flagged the compiler folk as I'm unsure about the test location - but I see it is where the only other test that uses `PrintDeoptimizationDetails` exists. > > One small change requested (while you await the second review). > > Thanks Hi @dholmes-ora @dean-long, thanks for the suggestion. I?ve made the changes,could you please take another look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29186#issuecomment-3752341939 From duke at openjdk.org Thu Jan 15 01:55:59 2026 From: duke at openjdk.org (Shawn M Emery) Date: Thu, 15 Jan 2026 01:55:59 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions [v2] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Wed, 14 Jan 2026 13:27:50 GMT, Ferenc Rakoczi wrote: >> src/java.base/share/classes/com/sun/crypto/provider/ML_KEM.java line 1364: >> >>> 1362: int n = (parsedLength + 127) / 128; >>> 1363: assert ((parsed.length >= n * 128) && >>> 1364: (condensed.length >= index + n * 192)); >> >> Given the comments, can this be simplified to just: >> >> >> - int n = (parsedLength + 127) / 128; >> - assert ((parsed.length >= n * 128) && >> - (condensed.length >= index + n * 192)); >> + assert((parsed.length % 128) == 0) && (condensed.length % 192 == 0)); >> >> >> If the length is smaller than the constant then the remainder will be non-zero. > > These are the exact conditions that the most demanding intrinsic (the AVX-512 one) requires. If we would rely on that the callers satisfy these, we wouldn't need the assert :-) . The loop in the intrinsic will read n * 192 bytes and write n * 128 shorts, your suggestion would not ensure that the arrays have at least that much space. Ah, I see that now. Maybe an update to the comments would alleviate my confusion?: NEW: // An intrinsic implementation assumes that the input and output buffers // are such that condensed can be read in n-multiples of 192 bytes and // parsed can be written in n-multiples of 128 shorts, so callers should allocate // the condensed and parsed arrays to at least these amounts, see the assert() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29141#discussion_r2692668352 From vpaprotski at openjdk.org Thu Jan 15 04:33:45 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 15 Jan 2026 04:33:45 GMT Subject: RFR: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings Message-ID: Failure always for UU case, needle=2, len=17 - (Note: `len=len-offset` in `library_call.cpp`, ie. stub does not see the same len as the test case) Following down the code layout: if len==0 return 0 if len>needle return -1 if len<=16|32 && needle<=3|6 optimized_short_cases if len>16|32 // big switch switch(needle) { default >10 cases 2..10 // BUG IS HERE: len 17|34, needle 2|4, case=4 } else // small switch switch(needle) { cases 7..10 // others under optimized_short_cases } Furthermore.. big switch case itself has two cases.. if len-needle>31 // works // loop else // len-needle<=31 // BUG HERE The else case corrects mask misalignment; the 'correction shift' is off-by-1 for the UTF16 case. ----- Why not found before? - testcase issue, needle was UTF8 for UTF16 case Why only needle==2? - Possibly because the mask for words has two bits, so tolerated off-by-one ------------- Commit messages: - whitespace - off-by-1 in UU/UL case Changes: https://git.openjdk.org/jdk/pull/29242/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29242&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360271 Stats: 19 lines in 2 files changed: 11 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/29242.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29242/head:pull/29242 PR: https://git.openjdk.org/jdk/pull/29242 From duke at openjdk.org Thu Jan 15 05:31:20 2026 From: duke at openjdk.org (duke) Date: Thu, 15 Jan 2026 05:31:20 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v9] In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 09:01:44 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Remove exclude or Min/Max in verify identity" > > This reverts commit cf24abad55db9a320930379c4f0f3154791d26e2. @galderz Your change (at version 438aeff326b1f055893c8b0206cd4ee5fdf1057c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28895#issuecomment-3752963377 From xgong at openjdk.org Thu Jan 15 05:48:05 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Jan 2026 05:48:05 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 09:11:25 GMT, Quan Anh Mai wrote: >> ### Problem: >> >> Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: >> >> >> // A fatal error has been detected by the Java Runtime Environment: >> // >> // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 >> // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector >> // ... >> >> >> The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 >> >> ### Root Cause: >> >> The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. >> >> Here is the simplified ideal graph showing the crash scenario: >> >> >> Con #top >> | ConI >> \ / >> \ / >> VectorStoreMask >> | >> VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong >> >> >> ### Detailed Scenario: >> >> Following is the method in the test case that hits the assertion: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 >> >> This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. >> >> When compiling a specific test case such as: >> https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 >> >> the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: >> >> >> VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() >> / \ >> AddP \ >> | \ >> LoadNClass \ >> ConP #IntMaxMask | | >> \ | | >> \ DecodeN... > > src/hotspot/share/opto/vectornode.cpp line 1923: > >> 1921: Node* mask = in1->in(1); >> 1922: const TypeVect* mask_vt = mask->bottom_type()->isa_vect(); >> 1923: if (mask_vt == nullptr) { > > It is better to filter the exact `Type::TOP` instance and assert that otherwise, this must be a `TypeVect`. Additionally, if the type of the input is `Type::TOP`, we can eagerly return `C->top()` to kill it. OK, I will check TOP input instead and convert back the assertion changes. Thanks a lot for your input @merykitty and @iwanowww ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29057#discussion_r2693069575 From syan at openjdk.org Thu Jan 15 05:52:36 2026 From: syan at openjdk.org (SendaoYan) Date: Thu, 15 Jan 2026 05:52:36 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version In-Reply-To: References: Message-ID: <8c8QLPcy701RGF_uMOjert2_G9bYg6c6EDrWhe3rFcA=.a1add9ea-7189-4002-b913-a343f4752841@github.com> On Sat, 10 Jan 2026 00:08:12 GMT, Guanqiang Han wrote: > Please review this change. Thanks! > > Description: > > This change fixes a crash in Deoptimization::uncommon_trap_inner when running with -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp. > > With -XX:-ProfileTraps, create_if_missing is set to false. > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2121-L2122 > > When create_if_missing is false, new mdo can not be created when m()->method_data() return null, so get_method_data() may return null . > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L1911-L1912 > > and trap_mdo can be null as a result > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2134-L2136 > > The crash happens here because trap_mdo is null > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2157 > > Fix: > > The fix makes acquisition of extra_data_lock conditional on trap_mdo being non-null, preserving the required lock ordering (extra_data_lock before ttyLocker) when an MDO exists. > > Test: > > GHA test/hotspot/jtreg/compiler/uncommontrap/TestPrintDiagnosticsWithoutProfileTraps.java line 29: > 27: * @summary Regression test for -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp crash > 28: * @requires vm.debug > 29: * @run main/othervm -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp compiler.uncommontrap.TestPrintDiagnosticsWithoutProfileTraps Should we split this line to two or three lines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2693077929 From dholmes at openjdk.org Thu Jan 15 06:40:39 2026 From: dholmes at openjdk.org (David Holmes) Date: Thu, 15 Jan 2026 06:40:39 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 13:56:46 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> **Description:** >> >> When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 >> In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 >> Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. >> >> **Fix:** >> >> Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. >> To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. >> >> **Test:** >> >> GHA > > Guanqiang Han has updated the pull request incrementally with one additional commit since the last revision: > > fix a compile error src/hotspot/share/utilities/ostream.hpp line 289: > 287: char* as_string(bool c_heap = false) const; > 288: char* as_string(Arena* arena) const; > 289: bool is_buffered() const { return true; } Suggestion: bool is_buffered() const override { return true; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2693164755 From dholmes at openjdk.org Thu Jan 15 06:53:23 2026 From: dholmes at openjdk.org (David Holmes) Date: Thu, 15 Jan 2026 06:53:23 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 13:56:46 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> **Description:** >> >> When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 >> In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 >> Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. >> >> **Fix:** >> >> Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. >> To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. >> >> **Test:** >> >> GHA > > Guanqiang Han has updated the pull request incrementally with one additional commit since the last revision: > > fix a compile error src/hotspot/share/utilities/ostream.hpp line 162: > 160: virtual ~outputStream() {} // close properly on deletion > 161: // Return true if this stream buffers/accumulates output in memory (e.g., stringStream) > 162: virtual bool is_buffered() const { return false; } Really you are using this as a proxy for "do I have a stringStream?" and it doesn't quite work because we also have a `bufferedStream` class that you have not marked as buffered - though I'm not sure the "buffering" in the two cases is actually the same thing. @dean-long you made this suggestion so how do you see `is_buffered` fitting in to the whole stream hierarchy? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2693188249 From ghan at openjdk.org Thu Jan 15 06:57:53 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Thu, 15 Jan 2026 06:57:53 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: References: Message-ID: <4S_zPFzylTWkeSFvL5g3It1MQbcGfPYYzL-_lhWf87Y=.67f070b4-f3f1-4709-904d-a933a74f9372@github.com> On Thu, 15 Jan 2026 06:37:34 GMT, David Holmes wrote: >> Guanqiang Han has updated the pull request incrementally with one additional commit since the last revision: >> >> fix a compile error > > src/hotspot/share/utilities/ostream.hpp line 289: > >> 287: char* as_string(bool c_heap = false) const; >> 288: char* as_string(Arena* arena) const; >> 289: bool is_buffered() const { return true; } > > Suggestion: > > bool is_buffered() const override { return true; } I originally added override to is_buffered(), but on macOS the build fails with -Werror -Winconsistent-missing-override. In ostream.hpp, the existing implementations of virtual functions also do not use override, so adding override only to is_buffered() makes the class inconsistent and triggers the warning. I removed it to keep consistency with the current style and avoid the macOS build break. Do you still want me to add override back (and then update the other overridden methods in this class accordingly) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2693199895 From ghan at openjdk.org Thu Jan 15 07:23:44 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Thu, 15 Jan 2026 07:23:44 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 06:49:55 GMT, David Holmes wrote: >> Guanqiang Han has updated the pull request incrementally with one additional commit since the last revision: >> >> fix a compile error > > src/hotspot/share/utilities/ostream.hpp line 162: > >> 160: virtual ~outputStream() {} // close properly on deletion >> 161: // Return true if this stream buffers/accumulates output in memory (e.g., stringStream) >> 162: virtual bool is_buffered() const { return false; } > > Really you are using this as a proxy for "do I have a stringStream?" and it doesn't quite work because we also have a `bufferedStream` class that you have not marked as buffered - though I'm not sure the "buffering" in the two cases is actually the same thing. > > @dean-long you made this suggestion so how do you see `is_buffered` fitting in to the whole stream hierarchy? I agree with the concern here. The buffering we need is local to this call site to keep the output coherent (collect everything and print once). Whether we need to buffer/accumulate output for coherence is scenario-dependent, rather than a property that should permanently classify a stream type as ?buffered? vs. ?unbuffered?. @dean-long what?s your view on this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2693255744 From epeter at openjdk.org Thu Jan 15 07:27:24 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 Jan 2026 07:27:24 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 12 Jan 2026 08:47:29 GMT, Emanuel Peter wrote: >> Hmm, I see. That sounds like a deficiency in the auto unboxing of Float16. >> >> Suggestion: You should create both variants of the IR tests. And then file an RFE for the one that does not yet vectorize because of the boxing issues. >> >> Because the way things are now, it's not a huge win, to be honest. Which user is supposed to write their code in such a convoluted way, having to cast back and forth? Would they not expect they could just use Float16 all the way through? > > @jatin-bhateja What do you think? Someone filed the RFE: https://bugs.openjdk.org/browse/JDK-8375321 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2693259787 From galder at openjdk.org Thu Jan 15 07:27:35 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 15 Jan 2026 07:27:35 GMT Subject: Integrated: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 11:31:30 GMT, Galder Zamarre?o wrote: > Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. > > Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. > > I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. > > To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). > > During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. > > I've run tier1-3 tests on linux/x64 successfully. This pull request has now been integrated. Changeset: d16a9b2e Author: Galder Zamarre?o Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/d16a9b2ec507251a44f034f1ccf8039f02023d52 Stats: 251 lines in 9 files changed: 200 ins; 0 del; 51 mod 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN Reviewed-by: epeter, bmaillard, dlong ------------- PR: https://git.openjdk.org/jdk/pull/28895 From galder at openjdk.org Thu Jan 15 07:42:50 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 15 Jan 2026 07:42:50 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v8] In-Reply-To: References: Message-ID: <8GX5KSy_IE_8Z5hxM8tOsyKe561P1bzsH6zhz6iuQmI=.8a63eccb-efa4-49dd-a147-7aa21055366d@github.com> On Fri, 9 Jan 2026 23:35:05 GMT, Dean Long wrote: >> Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java >> >> Co-authored-by: Emanuel Peter >> - Fix style > > Marked as reviewed by dlong (Reviewer). Thanks @dean-long @eme64 @benoitmaillard @TobiHartmann @rwestrel for all your feedback! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28895#issuecomment-3753267033 From mhaessig at openjdk.org Thu Jan 15 07:52:10 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 15 Jan 2026 07:52:10 GMT Subject: RFR: 8354853: Clean up x86 registers after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 09:37:09 GMT, Manuel H?ssig wrote: > This PR cleans up some 32-bit remnants in the x86 register code. This also presented the opportunity to convert the unscoped enums into typed constants. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 on linux-x64-debug, linux-x64, windows-x64-debug, windows-x64, macosx-x64-debug, macosx-x64 Thank you all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29224#issuecomment-3753291287 From mhaessig at openjdk.org Thu Jan 15 07:54:21 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 15 Jan 2026 07:54:21 GMT Subject: Integrated: 8354853: Clean up x86 registers after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 09:37:09 GMT, Manuel H?ssig wrote: > This PR cleans up some 32-bit remnants in the x86 register code. This also presented the opportunity to convert the unscoped enums into typed constants. > > Testing: > - [x] Github Actions > - [x] tier1,tier2 on linux-x64-debug, linux-x64, windows-x64-debug, windows-x64, macosx-x64-debug, macosx-x64 This pull request has now been integrated. Changeset: f6d26c6b Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/f6d26c6b32a3ea394cc9b7f6046cd9d7d635c568 Stats: 62 lines in 5 files changed: 0 ins; 39 del; 23 mod 8354853: Clean up x86 registers after 32-bit x86 removal Reviewed-by: aph, shade, mchevalier ------------- PR: https://git.openjdk.org/jdk/pull/29224 From bmaillard at openjdk.org Thu Jan 15 08:32:53 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 15 Jan 2026 08:32:53 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v4] In-Reply-To: <0U9hUIU5fRcLc0lgHqBP9Or7pv6bepUsza7LWdKtRTI=.1ae45b96-caf0-4fee-ab10-53078bf5589f@github.com> References: <0U9hUIU5fRcLc0lgHqBP9Or7pv6bepUsza7LWdKtRTI=.1ae45b96-caf0-4fee-ab10-53078bf5589f@github.com> Message-ID: On Wed, 14 Jan 2026 13:40:22 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.hpp line 437: >> >>> 435: // non-negative, the signed addition does not overflow, we can compute it directly. >>> 436: lo = S(st1._ulo + st2._ulo); >>> 437: hi = S(st1._uhi + st2._uhi); >> >> Why not use the signed bounds directly since they are equal anyway? I find it a bit easier to read, and we can avoid the cast. >> Suggestion: >> >> lo = st1._lo + st2._lo; >> hi = st1._hi + st2._hi; > > It is because `st1._lo` can be a 3-bit signed `int`. And I don't want to implement arithmetic for these signed classes since normally signed arithmetic is UB in the presence of overflow. And it seems not a good idea to either introduce UB for `int3_t` addition, or have inconsistent behaviour between things we test with and things that are the real guys. Oh right, I didn't think of that, sorry. Thanks for explaining. >> src/hotspot/share/opto/rangeinference.hpp line 453: >> >>> 451: // sum[i] = bit & 1; >>> 452: // carry[i - 1] = (bit >= 2); >>> 453: // } >> >> Is there a specific reason why the notation here goes from `n-1` to `0` and not the reverse? I find it more intuitive to have index `0` for the least significant bit, but maybe there is some convention I am not aware of. It does not matter too much in any case, so feel free to do whatever. > > Since we are viewing the binary number as a bit string, I tend to think that it is more intuitive to imagine a value 0b1011 as a bit string "1011", which means the first index is the msb and the last index is the lsb. Normally, when I do Maths I use 0 as the lsb, though, since numbers are unbounded. But for this presentation, I think doing it this way is easier for the others to conceptualize. Right, why not. >> src/hotspot/share/opto/rangeinference.hpp line 480: >> >>> 478: // >>> 479: // If we gather the min_bits into a value tmp, it is clear that >>> 480: // tmp = st1._bits._ones + st2._bits._ones: >> >> It feels like we don't need to "initialize" `tmp`, but maybe I am missing something > > This is not an initialization, though. This section just describes that a `tmp` constructed using the loop below is the same as the one constructed by adding these 2 values. Oh I see, makes sense now. I think I was just confused about whether this was pseudo c++ or a mathematical expression. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2693426631 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2693424670 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2693422440 From bmaillard at openjdk.org Thu Jan 15 08:38:55 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 15 Jan 2026 08:38:55 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v4] In-Reply-To: References: Message-ID: <4m46JmkQxxNKZoYMVsKjSJZ2Gl8HWMJgg1G7f8Co5Bk=.48ecd3e9-ceae-488e-92db-cf19de82815d@github.com> On Wed, 14 Jan 2026 12:51:50 GMT, Beno?t Maillard wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - copyright year >> - Merge branch 'master' into addsub >> - Merge branch 'master' into addsub >> - include order >> - Improve Add/SubNode::Value with unsigned bounds and known bits > > Great work! I went through all the calculations, and tried to reproduce them independently. It all looks sound to me. I only have a few comments, mostly about notation. > @benoitmaillard Thanks a lot for your reviews! I have addressed your comments. I think this PR should wait for #28952, so it would be great if you or anyone could take a look there. I will try to take a look today if I have time, worst case tomorrow. Thank you for adressing my comments! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28897#issuecomment-3753552647 From bmaillard at openjdk.org Thu Jan 15 08:38:58 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 15 Jan 2026 08:38:58 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v4] In-Reply-To: <7CmCvn7fnzAMJgIaHHHw3qySJlnMNbyJpoOcAUOTfxU=.4e27e114-e539-49dc-8061-5a9f4be6ec09@github.com> References: <7CmCvn7fnzAMJgIaHHHw3qySJlnMNbyJpoOcAUOTfxU=.4e27e114-e539-49dc-8061-5a9f4be6ec09@github.com> Message-ID: On Wed, 14 Jan 2026 13:36:14 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.hpp line 491: >> >>> 489: // min_bit >= 2 if and only if either: >>> 490: // - st1._bits._ones[i] == st2._bits._ones[i] == 1 >>> 491: // - (st1._bits._ones[i] == 1 || st2._bits._ones[i] == 1) && ((min_bit & 1) == 0) >> >> If I am not mistaken we could also write it this way, and I personally find this a bit more intuitive (and also more consistent with the subtraction case). And for the subsequent computations we could replace `|` by `^`. >> Suggestion: >> >> // - (st1._bits._ones[i] != st2._bits._ones[i]) && ((min_bit & 1) == 0) > > Yes you are right, that's a good idea. Then we should probably also replace `|` by `^` in the subsequent calculations/expressions right? I have added suggestions for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2693441539 From bmaillard at openjdk.org Thu Jan 15 08:39:02 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 15 Jan 2026 08:39:02 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v5] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 13:36:11 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AddNode/SubNode::Value` by taking advantage of the additional information in `TypeInt`. The implementation has some pretty non-trivial logic. Fortunately, the test infrastructure is already there. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Improve comments src/hotspot/share/opto/rangeinference.hpp line 496: > 494: // min_carry[i - 1] == 1 iff either: > 495: // + (st1._bits._ones[i] & st2._bits._ones[i]) == 1 > 496: // + ((st1._bits._ones[i] | st2._bits._ones[i]) & (~tmp[i])) == 1 Suggestion: // + ((st1._bits._ones[i] ^ st2._bits._ones[i]) & (~tmp[i])) == 1 src/hotspot/share/opto/rangeinference.hpp line 499: > 497: // > 498: // As a result, we can calculate min_carry: > 499: // min_carry = ((st1._bits._ones & st2._bits._ones) | ((st1._bits._ones | st2._bits._ones) & (~(st1._bits._ones + st2._bits._ones)))) << 1 Suggestion: // min_carry = ((st1._bits._ones & st2._bits._ones) | ((st1._bits._ones ^ st2._bits._ones) & (~(st1._bits._ones + st2._bits._ones)))) << 1 src/hotspot/share/opto/rangeinference.hpp line 501: > 499: // min_carry = ((st1._bits._ones & st2._bits._ones) | ((st1._bits._ones | st2._bits._ones) & (~(st1._bits._ones + st2._bits._ones)))) << 1 > 500: U min_carry = ((st1._bits._ones & st2._bits._ones) | > 501: ((st1._bits._ones | st2._bits._ones) & (~(st1._bits._ones + st2._bits._ones)))); Suggestion: ((st1._bits._ones ^ st2._bits._ones) & (~(st1._bits._ones + st2._bits._ones)))); src/hotspot/share/opto/rangeinference.hpp line 505: > 503: // Similarly, we can calculate max_carry from ~st1._bits._zeros and ~st2._bits._zeros > 504: U max_carry = ((~st1._bits._zeros & ~st2._bits._zeros) | > 505: ((~st1._bits._zeros | ~st2._bits._zeros) & (~(~st1._bits._zeros + ~st2._bits._zeros)))); Suggestion: ((~st1._bits._zeros ^ ~st2._bits._zeros) & (~(~st1._bits._zeros + ~st2._bits._zeros)))); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2693431904 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2693433557 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2693434441 PR Review Comment: https://git.openjdk.org/jdk/pull/28897#discussion_r2693435496 From qamai at openjdk.org Thu Jan 15 08:45:39 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 15 Jan 2026 08:45:39 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v5] In-Reply-To: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: > Hi, > > The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: > > t1 = int:0 > t2 = int:-2..3, widen = 3 > > Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. > > The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: grammar ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28952/files - new: https://git.openjdk.org/jdk/pull/28952/files/ff7fd535..86ced264 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28952&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28952&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28952.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28952/head:pull/28952 PR: https://git.openjdk.org/jdk/pull/28952 From jbhateja at openjdk.org Thu Jan 15 08:56:21 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 Jan 2026 08:56:21 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v12] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Adding testpoint for JDK-8373574 - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Fix incorrect argument passed to smokeTest - Fix from Bhavana Kilambi for failing JTREG regressions on AARCH64 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Including test changes from Bhavana Kilambi (ARM) - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Optimizing tail handling - ... and 18 more: https://git.openjdk.org/jdk/compare/499b5882...273b219e ------------- Changes: https://git.openjdk.org/jdk/pull/28002/files Webrev: Webrev is not available because diff is too large Stats: 519849 lines in 228 files changed: 285040 ins; 233032 del; 1777 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From dholmes at openjdk.org Thu Jan 15 09:20:00 2026 From: dholmes at openjdk.org (David Holmes) Date: Thu, 15 Jan 2026 09:20:00 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 07:21:41 GMT, Guanqiang Han wrote: >> src/hotspot/share/utilities/ostream.hpp line 162: >> >>> 160: virtual ~outputStream() {} // close properly on deletion >>> 161: // Return true if this stream buffers/accumulates output in memory (e.g., stringStream) >>> 162: virtual bool is_buffered() const { return false; } >> >> Really you are using this as a proxy for "do I have a stringStream?" and it doesn't quite work because we also have a `bufferedStream` class that you have not marked as buffered - though I'm not sure the "buffering" in the two cases is actually the same thing. >> >> @dean-long you made this suggestion so how do you see `is_buffered` fitting in to the whole stream hierarchy? > > I agree with the concern here. The buffering we need is local to this call site to keep the output coherent (collect everything and print once). > Whether we need to buffer/accumulate output for coherence is scenario-dependent, rather than a property that should permanently classify a stream type as ?buffered? vs. ?unbuffered?. > @dean-long what?s your view on this? No - sorry I forgot that you have to add override to all methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2693585193 From ghan at openjdk.org Thu Jan 15 09:55:13 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Thu, 15 Jan 2026 09:55:13 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v2] In-Reply-To: References: Message-ID: > Please review this change. Thanks! > > Description: > > This change fixes a crash in Deoptimization::uncommon_trap_inner when running with -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp. > > With -XX:-ProfileTraps, create_if_missing is set to false. > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2121-L2122 > > When create_if_missing is false, new mdo can not be created when m()->method_data() return null, so get_method_data() may return null . > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L1911-L1912 > > and trap_mdo can be null as a result > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2134-L2136 > > The crash happens here because trap_mdo is null > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2157 > > Fix: > > The fix makes acquisition of extra_data_lock conditional on trap_mdo being non-null, preserving the required lock ordering (extra_data_lock before ttyLocker) when an MDO exists. > > Test: > > GHA Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - split long line - Merge remote-tracking branch 'upstream/master' into 8374807 - fix 8374807 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29147/files - new: https://git.openjdk.org/jdk/pull/29147/files/e22b4098..9445014e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29147&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29147&range=00-01 Stats: 32340 lines in 473 files changed: 20787 ins; 6752 del; 4801 mod Patch: https://git.openjdk.org/jdk/pull/29147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29147/head:pull/29147 PR: https://git.openjdk.org/jdk/pull/29147 From aph at openjdk.org Thu Jan 15 10:59:34 2026 From: aph at openjdk.org (Andrew Haley) Date: Thu, 15 Jan 2026 10:59:34 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions [v2] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Wed, 14 Jan 2026 13:06:15 GMT, Ferenc Rakoczi wrote: >> The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Fix off-by-one error discovered by Shawn src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 6084: > 6082: // byte[] condensed, int index, short[] parsed, int parsedLength) {} > 6083: // > 6084: // it is assumed that parsed and condensed are allocated such that for By whom? :-) Suggestion: // we assume that parsed and condensed are allocated such that for src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 6280: > 6278: vs_st2_post(vs_front(vs_front(vb)), __ T8H, parsed); > 6279: > 6280: __ BIND(L_end); This is a substantial change, not a mere matter of "incorrect assertions". Perhaps this PR needs a more appropriate title. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29141#discussion_r2693915262 PR Review Comment: https://git.openjdk.org/jdk/pull/29141#discussion_r2693920223 From ghan at openjdk.org Thu Jan 15 11:03:47 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Thu, 15 Jan 2026 11:03:47 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v2] In-Reply-To: <8c8QLPcy701RGF_uMOjert2_G9bYg6c6EDrWhe3rFcA=.a1add9ea-7189-4002-b913-a343f4752841@github.com> References: <8c8QLPcy701RGF_uMOjert2_G9bYg6c6EDrWhe3rFcA=.a1add9ea-7189-4002-b913-a343f4752841@github.com> Message-ID: <2WaIFRMuY26N0hgdMe4TdZho_WIyDt8_yooXIt1FWQk=.91f24456-35fd-4732-b39b-5540b1493227@github.com> On Thu, 15 Jan 2026 05:48:59 GMT, SendaoYan wrote: >> Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - split long line >> - Merge remote-tracking branch 'upstream/master' into 8374807 >> - fix 8374807 > > test/hotspot/jtreg/compiler/uncommontrap/TestPrintDiagnosticsWithoutProfileTraps.java line 29: > >> 27: * @summary Regression test for -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp crash >> 28: * @requires vm.debug >> 29: * @run main/othervm -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp compiler.uncommontrap.TestPrintDiagnosticsWithoutProfileTraps > > Should we split this line to two or three lines. Fixed?thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2693937392 From thartmann at openjdk.org Thu Jan 15 12:33:37 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 15 Jan 2026 12:33:37 GMT Subject: RFR: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings In-Reply-To: References: Message-ID: <3GHpdG1lUc6errJsYXUAfKMXw_6xeOi0iLT79IoGlE8=.976148c9-bf0e-4c02-b3ed-adc8b4eaeea3@github.com> On Thu, 15 Jan 2026 04:24:24 GMT, Volodymyr Paprotski wrote: > Failure always for UU case, needle=2, len=17 > - (Note: `len=len-offset` in `library_call.cpp`, ie. stub does not see the same len as the test case) > > Following down the code layout: > > if len==0 > return 0 > if len>needle > return -1 > if len<=16|32 && needle<=3|6 > optimized_short_cases > if len>16|32 > // big switch > switch(needle) { > default >10 > cases 2..10 // BUG IS HERE: len 17|34, needle 2|4, case=4 > } > else > // small switch > switch(needle) { > cases 7..10 > // others under optimized_short_cases > } > > Furthermore.. big switch case itself has two cases.. > > if len-needle>31 > // works > // loop > else // len-needle<=31 > // BUG HERE > > The else case corrects mask misalignment; the 'correction shift' is off-by-1 for the UTF16 case. > > ----- > Why not found before? > - testcase issue, needle was UTF8 for UTF16 case > > Why only needle==2? > - Possibly because the mask for words has two bits, so tolerated off-by-one Looks good to me but @jatin-bhateja should have a look as well. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29242#pullrequestreview-3665370816 From krk at openjdk.org Thu Jan 15 13:38:08 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 15 Jan 2026 13:38:08 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v2] In-Reply-To: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: > The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: Simplify expand_vbox_node_helper by merging VectorBox Phi handling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29200/files - new: https://git.openjdk.org/jdk/pull/29200/files/83df3ccf..45b02913 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29200&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29200&range=00-01 Stats: 30 lines in 1 file changed: 0 ins; 22 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/29200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29200/head:pull/29200 PR: https://git.openjdk.org/jdk/pull/29200 From krk at openjdk.org Thu Jan 15 13:38:11 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 15 Jan 2026 13:38:11 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v2] In-Reply-To: References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: <8C3YalFoAtVSAToJEIWf0IT6-_qJQ8RmARChrYaNa5U=.f04f0ee2-6bed-4ed7-8c79-c9b87a31a072@github.com> On Wed, 14 Jan 2026 17:03:01 GMT, Vladimir Ivanov wrote: >> Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplify expand_vbox_node_helper by merging VectorBox Phi handling > > src/hotspot/share/opto/vector.cpp line 357: > >> 355: // any of the inputs to VectorBoxNode are value-numbered they can only >> 356: // move up and are guaranteed to dominate. >> 357: if (vbox->is_Phi() && vect->bottom_type()->isa_vect()) { > > Does `vect->bottom_type()->isa_vect()` check become redundant? In other words, is it possible to observe a non-vector value here? It seems like the important bit is whether `vect` is a `Phi` or not. > > Another observation: `vbox->is_Phi() && vect->is_Phi()` and `vbox->is_Phi() && !vect->is_Phi()` cases can be commoned. It doesn't seem possible, I removed the redundant check and merged the if blocks to simplify. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29200#discussion_r2694408683 From rcastanedalo at openjdk.org Thu Jan 15 14:05:32 2026 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 Jan 2026 14:05:32 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> Message-ID: On Mon, 12 Jan 2026 15:05:29 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 2. Fold a pointer `Phi`. >> >> Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. >> >> Another interesting case: >> >> Point p = Phi(p1, p2); >> p.x = v; >> p1.x = v1; >> int a = p.x; >> >> Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. >> >> 3. Nested objects >> >> It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: >> >> Point p = new Point; >> PointHolder h = new PointHolder; >> h.p = p; >> int x = p.x; >> escape(h); >> >> Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. >> >> Please... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into loadfoldingigvn > - Early return when not a heap access > - Fix escape at store > - Fix outdated and unclear comments > - copyright year, return, comments, whitespace > - Merge branch 'master' into loadfoldingigvn > - ea of phis and nested objects > - Add test scenarios > - Add a flag to turn off the feature > - Much more comments, refactor the data into a separate class > - ... and 9 more: https://git.openjdk.org/jdk/compare/1d910232...c275e6e6 I evaluated the impact of this patch on C2 execution time across all [DaCapo 23.11-chopin](https://www.dacapobench.org/) benchmarks and multiple platforms, and it looks neutral. I measured C2's execution time using the HotSpot options `-Xbatch -XX:-TieredCompilation -XX:+CITime`. The impact of this patch on overall application throughput seems neutral across DaCapo (different versions), SPECjvm2008, and SPECjbb2015, and slightly positive on Renaissance (which is known to be more sensitive to EA improvements). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3755015601 PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3755044970 From qamai at openjdk.org Thu Jan 15 14:10:10 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 15 Jan 2026 14:10:10 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> Message-ID: <8R9aEEw4OsMWlSkTKDGh4xrcE_hqh5NkirndE9PjZUI=.20edd9cd-288d-4942-9f9c-ccfa025789b0@github.com> On Wed, 14 Jan 2026 19:44:17 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: >> >> - Merge branch 'master' into loadfoldingigvn >> - Early return when not a heap access >> - Fix escape at store >> - Fix outdated and unclear comments >> - copyright year, return, comments, whitespace >> - Merge branch 'master' into loadfoldingigvn >> - ea of phis and nested objects >> - Add test scenarios >> - Add a flag to turn off the feature >> - Much more comments, refactor the data into a separate class >> - ... and 9 more: https://git.openjdk.org/jdk/compare/1e50eecd...c275e6e6 > > Please, file an RFE to address such scenarios then. @iwanowww Filed [JDK-8375442](https://bugs.openjdk.org/browse/JDK-8375442). @robcasloz Thanks a lot for the extremely comprehensive measurements! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3755070578 From adinn at openjdk.org Thu Jan 15 14:34:54 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 15 Jan 2026 14:34:54 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression uses incorrect assertions [v2] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Thu, 15 Jan 2026 10:55:50 GMT, Andrew Haley wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix off-by-one error discovered by Shawn > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 6280: > >> 6278: vs_st2_post(vs_front(vs_front(vb)), __ T8H, parsed); >> 6279: >> 6280: __ BIND(L_end); > > This is a substantial change, not a mere matter of "incorrect assertions". Perhaps this PR needs a more appropriate title. Also, the description in the JIRA and the opening comment in this PR should mention that the intrinsic can be simplified in response to the stricter preconditions maintained by the Java client. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29141#discussion_r2694623478 From rsunderbabu at openjdk.org Thu Jan 15 14:41:01 2026 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Thu, 15 Jan 2026 14:41:01 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v5] In-Reply-To: References: Message-ID: > Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. > MD5 > SHA1 > SHA256 > SHA3 > > Testing: > All flag combinations from CI > hotspot tiers 1 to 5 > PS: only for tier testings, mac-aarch was skipped due to resource constraints Ramkumar Sunderbabu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - redo sha3 changes - reverting SHA3 changes - Fix TestUseSHA3IntrinsicsOptionOnSupportedCPU - remove requires condition - initial commit ------------- Changes: https://git.openjdk.org/jdk/pull/28634/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=04 Stats: 10 lines in 4 files changed: 0 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28634/head:pull/28634 PR: https://git.openjdk.org/jdk/pull/28634 From jbhateja at openjdk.org Thu Jan 15 16:07:00 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 Jan 2026 16:07:00 GMT Subject: RFR: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 04:24:24 GMT, Volodymyr Paprotski wrote: > Failure always for UU case, needle=2, len=17 > - (Note: `len=len-offset` in `library_call.cpp`, ie. stub does not see the same len as the test case) > > Following down the code layout: > > if len==0 > return 0 > if len>needle > return -1 > if len<=16|32 && needle<=3|6 > optimized_short_cases > if len>16|32 > // big switch > switch(needle) { > default >10 > cases 2..10 // BUG IS HERE: len 17|34, needle 2|4, case=4 > } > else > // small switch > switch(needle) { > cases 7..10 > // others under optimized_short_cases > } > > Furthermore.. big switch case itself has two cases.. > > if len-needle>31 > // works > // loop > else // len-needle<=31 > // BUG HERE > > The else case corrects mask misalignment; the 'correction shift' is off-by-1 for the UTF16 case. > > ----- > Why not found before? > - testcase issue, needle was UTF8 for UTF16 case > > Why only needle==2? > - Possibly because the mask for words has two bits, so tolerated off-by-one Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29242#pullrequestreview-3666397738 From epeter at openjdk.org Thu Jan 15 16:21:38 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 Jan 2026 16:21:38 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4] In-Reply-To: References: Message-ID: <-nGdkKB4aDkSwWlygwqkQ4tJ5s0DUwCfhyugVZtEDZk=.c4db9089-0dfa-4dca-a90e-d31639dd617a@github.com> > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge >> >> `windows_x64_oci` >> algo_windows_x64_oci_server >> >> `macosx_x64_sandybridge` >> > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > mask compress for Paul Since nobody has reviewed yet, I'm sneaking in an additional test/benchmark: Benchmark (NUM_X_OBJECTS) (SEED) (SIZE) Mode Cnt Score Error Units VectorAlgorithms.dotProductF_VectorAPI_naive 10000 0 640000 avgt 50 93668.922 ? 704.044 ns/op VectorAlgorithms.dotProductF_VectorAPI_reduction_after_loop 10000 0 640000 avgt 50 83854.691 ? 467.207 ns/op VectorAlgorithms.dotProductF_loop 10000 0 640000 avgt 50 550594.312 ? 740.470 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3755657476 From rsunderbabu at openjdk.org Thu Jan 15 16:23:13 2026 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Thu, 15 Jan 2026 16:23:13 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v5] In-Reply-To: References: Message-ID: <2_jOnrEh3DIVL4IF_dbOmvpDWkDgtVnHItTYk7RkX58=.7c7e83fd-0cc8-4701-bb5e-0e445be709a5@github.com> On Thu, 15 Jan 2026 14:41:01 GMT, Ramkumar Sunderbabu wrote: >> Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. >> MD5 >> SHA1 >> SHA256 >> SHA3 >> >> Testing: >> All flag combinations from CI >> hotspot tiers 1 to 5 >> PS: only for tier testings, mac-aarch was skipped due to resource constraints > > Ramkumar Sunderbabu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - redo sha3 changes > - reverting SHA3 changes > - Fix TestUseSHA3IntrinsicsOptionOnSupportedCPU > - remove requires condition > - initial commit During testing, TestUseSHA3IntrinsicsOptionOnSupportedCPU failed due to flag wrong behaviour. I have raised JDK-8375443. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3755669980 From vlivanov at openjdk.org Thu Jan 15 18:11:17 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 15 Jan 2026 18:11:17 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4] In-Reply-To: <-nGdkKB4aDkSwWlygwqkQ4tJ5s0DUwCfhyugVZtEDZk=.c4db9089-0dfa-4dca-a90e-d31639dd617a@github.com> References: <-nGdkKB4aDkSwWlygwqkQ4tJ5s0DUwCfhyugVZtEDZk=.c4db9089-0dfa-4dca-a90e-d31639dd617a@github.com> Message-ID: On Thu, 15 Jan 2026 16:21:38 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server >> >> `macosx_x64_sandybridge` >> > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add dotProductF Very nice set of microbenchmarks, Emanuel! test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 97: > 95: framework.addFlags("--add-modules=jdk.incubator.vector", "-XX:CompileCommand=inline,*VectorAlgorithmsImpl::*"); > 96: switch (args[0]) { > 97: case "vanilla" -> { /* no extra flags */ } It would be more flexible to let arbitrary VM flags to be appended. test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 218: > 216: > 217: // X4 oop setup. > 218: oopsX4 = new int[size]; Any particular reason to keep input data initialization duplicated between test and benchmark modes? test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithms.java line 95: > 93: } > 94: > 95: @Setup(Level.Iteration) Resetting after each iteration may introduce too much noise. Also, it makes it harder to reproduce input dependent variance. Maybe resetting inputs between forks is a good compromise. ------------- PR Review: https://git.openjdk.org/jdk/pull/28639#pullrequestreview-3666881098 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2695412861 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2695439621 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2695401704 From vlivanov at openjdk.org Thu Jan 15 18:19:00 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 15 Jan 2026 18:19:00 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v2] In-Reply-To: References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: On Thu, 15 Jan 2026 13:38:08 GMT, Kerem Kat wrote: >> The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. > > Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: > > Simplify expand_vbox_node_helper by merging VectorBox Phi handling src/hotspot/share/opto/vector.cpp line 335: > 333: // value-numbered to a single node if all inputs were identical. > 334: if (vbox->is_Phi()) { > 335: assert(!vect->is_Phi() || vbox->as_Phi()->region() == vect->as_Phi()->region(), ""); Isn't the assert too strong? I don't see why redundant phi elimination can't result in a dominating `Phi` node. So, the predicate to choose between `vect->in(i)` and `vect` is `vect->is_Phi() && vect->as_Phi()->region() == vbox->as_Phi()->region()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29200#discussion_r2695464066 From fgao at openjdk.org Thu Jan 15 18:30:39 2026 From: fgao at openjdk.org (Fei Gao) Date: Thu, 15 Jan 2026 18:30:39 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 13 Jan 2026 11:27:53 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Fix build failure after rebasing and address review comments > - Merge branch 'master' into optimize-atomic-post > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 @eme64 Thanks for your detailed review. Please feel free to correct me if I?ve misunderstood anything. ------------- PR Review: https://git.openjdk.org/jdk/pull/22629#pullrequestreview-3665930758 From fgao at openjdk.org Thu Jan 15 18:30:45 2026 From: fgao at openjdk.org (Fei Gao) Date: Thu, 15 Jan 2026 18:30:45 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Wed, 14 Jan 2026 14:08:37 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Fix build failure after rebasing and address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Fixed new test failures after rebasing and refined parts of the code to address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Merge branch 'master' into optimize-atomic-post >> - Clean up comments for consistency and add spacing for readability >> - Fix some corner case failures and refined part of code >> - Merge branch 'master' into optimize-atomic-post >> - Refine ascii art, rename some variables and resolve conflicts >> - Merge branch 'master' into optimize-atomic-post >> - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 > > src/hotspot/share/opto/loopTransform.cpp line 1337: > >> 1335: } else { >> 1336: if (get_ctrl(n) != back_ctrl) { return n; } >> 1337: } > > Is the `clone_up_backedge` name still correct for all cases? > It seems before we only cloned up nodes that belonged to `back_ctrl`. > But now we also clone from exit pre-loop **exit path**. > > I never liked the `goo` anyway... > > And: why is it called `clone_up`? What "up" does it refer to? > > What about `clone_up(FromBackedge, ...` and `clone_up(FromPreLoopExit, ...`, using two enums? Then we can be a bit more explicit which case we are in, and add corresponding asserts for `back_ctrl` (non-null vs null). Yeah, I agree. We should rename it. How about `resolve_value_for_preheader`? > What about clone_up(FromBackedge, ... and clone_up(FromPreLoopExit, ..., using two enums? Then we can be a bit more explicit which case we are in, and add corresponding asserts for back_ctrl (non-null vs null). That sounds good. Something like: resolve_value_for_preheader(FromBackedge, ...) resolve_value_for_preheader(FromPreLoopExit, ...) > src/hotspot/share/opto/loopTransform.cpp line 1402: > >> 1400: Node* main_backedge_ctrl = main_head->back_control(); >> 1401: // For the post loop, we call clone_up_backedge_goo() to obtain the fall-out values >> 1402: // from the main loop, which serve as the fall-in values for the post loop. > > The naming of `clone_up_backedge_goo` is confusing me a bit: > We seem to clone down (main -> post) and we clone fall-out (exit) values, and not backedge values. Agreed - that?s really confusing. I had considered renaming it before, but it would have required touching some unrelated files, so I decided against it. However, I now think the rename is necessary for clarity. > I'm wondering if/how ControlAroundStripMined relates to the post loop case? We use `ControlAroundStripMined` to clone the post loop. See the mainline code here: https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/share/opto/loopTransform.cpp#L1697 > if the branch is not taken, you assume it can only be the drain loop. Yes. > Could we assert mode == InsertVectorizedDrain down there? Sounds good. I can add it. > What about renaming pre_incr -> main_input or after_pre. That would remove the iv connotation, and be more parallel to drain_input or after_main. Sounds reasonable. > But a similar question here: what if we had a split-through-phi here? Before split-through-phi, we would have had some input x, but afterwards we'd have op(x) here. Would that not mean that we should use x for the input to drain_input, but are getting op(x)? A similar argument applies here as well. We also use `clone_up_backedge_goo()` along this path to resolve the correct pre_incr value after the pre-loop. Even if a split-through-phi has occurred, the helper will walk and clone the input chain as needed to recover the appropriate value whose target control is the pre-exit control (`main_merge_region->in(1)`). > src/hotspot/share/opto/loopTransform.cpp line 1482: > >> 1480: pre_incr = clone_up_backedge_goo(nullptr, main_merge_region->in(1), pre_incr, visited, clones); >> 1481: } >> 1482: drain_input->set_req(1, pre_incr); > > Just a control question: above you did: > `drain_input = PhiNode::make(main_merge_region, iv_after_main);` > Does that not put `iv_after_main` at slot `1`, and now we overwrite it with `pre_incr`? Yes, this is what we're generating: `drain_input = merge (pre_incr, iv_after_main)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2694628680 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2694641458 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2694662395 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2695142459 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2695189822 From fgao at openjdk.org Thu Jan 15 18:30:47 2026 From: fgao at openjdk.org (Fei Gao) Date: Thu, 15 Jan 2026 18:30:47 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Wed, 14 Jan 2026 14:22:58 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1413: >> >>> 1411: // we now need to make the fall-in values to the vectorized drain >>> 1412: // loop come from phis merging exit values from the pre loop and >>> 1413: // the main loop. >> >> Suggestion: >> >> // the main loop, see "drain_input". > > Would this be correct? It would allow the reader to search for "drain_input" and immediately find the right point to focus in the ASCII art below :) Yes, that's correct. That looks better. I'll update it. >> src/hotspot/share/opto/loopTransform.cpp line 1460: >> >>> 1458: // TestVectorizedDrainLoop.java. >>> 1459: Node* drain_input = nullptr; >>> 1460: Node* iv_after_main = main_phi->in(LoopNode::LoopBackControl); >> >> Is the `phi` we are looking at here always only the `iv`/trip counter? >> - If always `iv`: how do we patch the other `phi`s? And is this loop not covering all phis? https://github.com/openjdk/jdk/pull/22629/files#diff-6a59f91cb710d682247df87c75faf602f0ff9f87e2855ead1b80719704fbedffL1770-L1781 >> - If we cover all phis: we should probably change the naming of the variables to indicate that it is any `phi` and not just the `iv`. >> >> What do you think? > > Also: how sure are you that the backedge `main_phi->in(LoopNode::LoopBackControl)` is the same as the the value after main `iv_after_main`? > What if we did some split-through-phi action at some point? Example: > > x = ... > LOOP: > x = op(x); > // x now serves as exit value and backedge value > exit check; > goto LOOP; > > If we split `op` through the LOOP Phi, we get: > > x = ... > x = op(x); > LOOP: > // the exit value is the phi > exit check; > x = op(x); > // x after op is the backedge > goto LOOP; > > > I'm not sure this currently ever happens, but what if it did? > * If we cover all phis: we should probably change the naming of the variables to indicate that it is any `phi` and not just the `iv`. > > What do you think? Yes, we?re looking at all variables that change as the loop iterates, not just the trip counter. How about renaming it to `main_exit_value` or `value_after_main_loop`? > So maybe it should not be: `Node* iv_after_main = main_phi->in(LoopNode::LoopBackControl);` But instead: `Node* main_backedge = main_phi->in(LoopNode::LoopBackControl);` Because it is at that point not yet clear that it is really the `after_main` value, of if we need to clone it first. Yeah, that makes sense. We probably do need two separate variables here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2694674918 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2694752041 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2694968788 From fgao at openjdk.org Thu Jan 15 18:30:49 2026 From: fgao at openjdk.org (Fei Gao) Date: Thu, 15 Jan 2026 18:30:49 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Thu, 15 Jan 2026 15:04:30 GMT, Fei Gao wrote: >> Also: how sure are you that the backedge `main_phi->in(LoopNode::LoopBackControl)` is the same as the the value after main `iv_after_main`? >> What if we did some split-through-phi action at some point? Example: >> >> x = ... >> LOOP: >> x = op(x); >> // x now serves as exit value and backedge value >> exit check; >> goto LOOP; >> >> If we split `op` through the LOOP Phi, we get: >> >> x = ... >> x = op(x); >> LOOP: >> // the exit value is the phi >> exit check; >> x = op(x); >> // x after op is the backedge >> goto LOOP; >> >> >> I'm not sure this currently ever happens, but what if it did? > >> * If we cover all phis: we should probably change the naming of the variables to indicate that it is any `phi` and not just the `iv`. >> >> What do you think? > > Yes, we?re looking at all variables that change as the loop iterates, not just the trip counter. How about renaming it to `main_exit_value` or `value_after_main_loop`? > Also: how sure are you that the backedge `main_phi->in(LoopNode::LoopBackControl)` is the same as the the value after main `iv_after_main`? What if we did some split-through-phi action at some point? Example: > > ``` > x = ... > LOOP: > x = op(x); > // x now serves as exit value and backedge value > exit check; > goto LOOP; > ``` > > If we split `op` through the LOOP Phi, we get: > > ``` > x = ... > x = op(x); > LOOP: > // the exit value is the phi > exit check; > x = op(x); > // x after op is the backedge > goto LOOP; > ``` > > I'm not sure this currently ever happens, but what if it did? Great question?and yes, that?s exactly why this logic is a bit more involved. If I understand correctly, in the second scenario?after a split-through-phi?the value we need after the main loop is not the original x directly. Instead, we also need to apply an additional op(x) so that the resulting value can serve as the input to the drain loop. x = ... y = op(x); MAIN_LOOP: // the exit value is the phi exit check; y = op(x); // x after op is the backedge goto MAIN_LOOP; MAIN_EXIT: y = op(x); // the newly created op(x) outside the main-loop body DRAIN_LOOP: // the exit value is the phi exit check; y = op(x); // x after op is the backedge goto DRAIN_LOOP; Let's walk through the code step by step. Node* drain_input = nullptr; Node* iv_after_main = main_phi->in(LoopNode::LoopBackControl); At this point, we do **not** assume that `iv_after_main` is already the correct ?after-main? value. if (get_ctrl(iv_after_main) != main_backedge_ctrl) { drain_input = find_merge_phi_for_vectorized_drain(iv_after_main, main_merge_region); } We first check whether `main_phi->in(LoopNode::LoopBackControl)` is pinned in the backedge block. 1. If `get_ctrl(iv_after_main) != main_backedge_ctrl`, then `main_phi->in(LoopNode::LoopBackControl)` is **not** in the backedge block. It should be **either in the main loop body or in the main exit block.** - If it is still in the main loop body, we won?t be able to find a valid merge phi via its uses, so **`drain_input` remains nullptr**. - If it is already in the main exit block, we can find an existing valid merge phi via its uses?this corresponds to the first scenario you described, **`drain_input` won't be nullptr**. 2. If `get_ctrl(iv_after_main) == main_backedge_ctrl`, then the `main_phi->in(LoopNode::LoopBackControl)` is pinned in the backedge block. In this case, especially in the presence of transformations like split-through-phi (your second scenario), it clearly cannot be treated as the correct ?after-main? value, and **`drain_input` will also remain nullptr**. if (drain_input == nullptr) { iv_after_main = clone_up_backedge_goo(main_backedge_ctrl, main_merge_region->in(2), iv_after_main, visited, clones); drain_input = PhiNode::make(main_merge_region, iv_after_main); } When `drain_input` is still nullptr, we enter this path and call `clone_up_backedge_goo()`. This helper recursively walks and clones the input chain of `main_phi->in(LoopNode::LoopBackControl)` as needed, producing a version of the value that is legal at the main-exit control (`main_merge_region->in(2)`), which is the newly created op(x) outside the main-loop body. At this point, the returned `iv_after_main` represents the correct value to use after the main loop, even if split-through-phi or similar transformations have occurred. We can then safely create (or later reuse) a merge phi based on this value. Does this address your concern? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2695018811 From vlivanov at openjdk.org Thu Jan 15 18:35:53 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 15 Jan 2026 18:35:53 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v2] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 09:55:13 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> Description: >> >> This change fixes a crash in Deoptimization::uncommon_trap_inner when running with -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp. >> >> With -XX:-ProfileTraps, create_if_missing is set to false. >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2121-L2122 >> >> When create_if_missing is false, new mdo can not be created when m()->method_data() return null, so get_method_data() may return null . >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L1911-L1912 >> >> and trap_mdo can be null as a result >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2134-L2136 >> >> The crash happens here because trap_mdo is null >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2157 >> >> Fix: >> >> The fix makes acquisition of extra_data_lock conditional on trap_mdo being non-null, preserving the required lock ordering (extra_data_lock before ttyLocker) when an MDO exists. >> >> Test: >> >> GHA > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - split long line > - Merge remote-tracking branch 'upstream/master' into 8374807 > - fix 8374807 src/hotspot/share/runtime/deoptimization.cpp line 2161: > 2159: Mutex::_no_safepoint_check_flag); > 2160: > 2161: ttyLocker ttyl; Does the code still need `ttyLocker`? There's only one usage of `tty` and it prints all accumulated info all at once. `xtty` already annotates output with thread info. So, I'd assume that moving `trap_mdo->extra_data_lock()` locker to `trap_mdo` accesses should solve the problem as well. (I'm not sure whether a `ttyLocker` is needed or not to avoid interleaving during `tty->print_raw(st.freeze());`, but `ttyLocker` can be placed right before it.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2695517458 From vlivanov at openjdk.org Thu Jan 15 19:19:00 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 15 Jan 2026 19:19:00 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> Message-ID: On Mon, 12 Jan 2026 15:05:29 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 2. Fold a pointer `Phi`. >> >> Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. >> >> Another interesting case: >> >> Point p = Phi(p1, p2); >> p.x = v; >> p1.x = v1; >> int a = p.x; >> >> Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. >> >> 3. Nested objects >> >> It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: >> >> Point p = new Point; >> PointHolder h = new PointHolder; >> h.p = p; >> int x = p.x; >> escape(h); >> >> Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. >> >> Please... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into loadfoldingigvn > - Early return when not a heap access > - Fix escape at store > - Fix outdated and unclear comments > - copyright year, return, comments, whitespace > - Merge branch 'master' into loadfoldingigvn > - ea of phis and nested objects > - Add test scenarios > - Add a flag to turn off the feature > - Much more comments, refactor the data into a separate class > - ... and 9 more: https://git.openjdk.org/jdk/compare/7699c18e...c275e6e6 src/hotspot/share/opto/memnode.cpp line 714: > 712: bool is_known_instance = addr_t != nullptr && addr_t->is_known_instance_field(); > 713: LocalEA local_ea(phase->is_IterGVN(), base); > 714: TriBool has_not_escaped = is_known_instance ? TriBool(true) IMO `TriBool` doesn't hold its weight here. As an alternative, encapsulating caching logic inside `LocalEA` and unconditionally querying it for escape state would look cleaner and easier to reason about. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2695639546 From vlivanov at openjdk.org Thu Jan 15 19:34:06 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 15 Jan 2026 19:34:06 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> Message-ID: <2Fow0aJ6zpA2aTnACLAxW_d_O_OucOXKz08y-Lx7IBo=.73eeafd0-cf1d-4ce2-a438-89b51739f593@github.com> On Mon, 12 Jan 2026 15:05:29 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 2. Fold a pointer `Phi`. >> >> Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. >> >> Another interesting case: >> >> Point p = Phi(p1, p2); >> p.x = v; >> p1.x = v1; >> int a = p.x; >> >> Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. >> >> 3. Nested objects >> >> It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: >> >> Point p = new Point; >> PointHolder h = new PointHolder; >> h.p = p; >> int x = p.x; >> escape(h); >> >> Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. >> >> Please... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into loadfoldingigvn > - Early return when not a heap access > - Fix escape at store > - Fix outdated and unclear comments > - copyright year, return, comments, whitespace > - Merge branch 'master' into loadfoldingigvn > - ea of phis and nested objects > - Add test scenarios > - Add a flag to turn off the feature > - Much more comments, refactor the data into a separate class > - ... and 9 more: https://git.openjdk.org/jdk/compare/59b00669...c275e6e6 What amount of functional testing has been done? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3756522613 From duke at openjdk.org Thu Jan 15 19:48:37 2026 From: duke at openjdk.org (duke) Date: Thu, 15 Jan 2026 19:48:37 GMT Subject: Withdrawn: 8343689: AArch64: Optimize MulReduction implementation In-Reply-To: References: Message-ID: On Fri, 17 Jan 2025 19:35:44 GMT, Mikhail Ablakatov wrote: > Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. > > Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. > > The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms > IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms > LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms > > > Fujitsu A64FX (SVE 512-bit): > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms > IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms > LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23181 From sviswanathan at openjdk.org Thu Jan 15 23:14:54 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 15 Jan 2026 23:14:54 GMT Subject: RFR: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 04:24:24 GMT, Volodymyr Paprotski wrote: > Failure always for UU case, needle=2, len=17 > - (Note: `len=len-offset` in `library_call.cpp`, ie. stub does not see the same len as the test case) > > Following down the code layout: > > if len==0 > return 0 > if len>needle > return -1 > if len<=16|32 && needle<=3|6 > optimized_short_cases > if len>16|32 > // big switch > switch(needle) { > default >10 > cases 2..10 // BUG IS HERE: len 17|34, needle 2|4, case=4 > } > else > // small switch > switch(needle) { > cases 7..10 > // others under optimized_short_cases > } > > Furthermore.. big switch case itself has two cases.. > > if len-needle>31 > // works > // loop > else // len-needle<=31 > // BUG HERE > > The else case corrects mask misalignment; the 'correction shift' is off-by-1 for the UTF16 case. > > ----- > Why not found before? > - testcase issue, needle was UTF8 for UTF16 case > > Why only needle==2? > - Possibly because the mask for words has two bits, so tolerated off-by-one Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29242#pullrequestreview-3668028315 From vpaprotski at openjdk.org Thu Jan 15 23:14:55 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 15 Jan 2026 23:14:55 GMT Subject: RFR: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 04:24:24 GMT, Volodymyr Paprotski wrote: > Failure always for UU case, needle=2, len=17 > - (Note: `len=len-offset` in `library_call.cpp`, ie. stub does not see the same len as the test case) > > Following down the code layout: > > if len==0 > return 0 > if len>needle > return -1 > if len<=16|32 && needle<=3|6 > optimized_short_cases > if len>16|32 > // big switch > switch(needle) { > default >10 > cases 2..10 // BUG IS HERE: len 17|34, needle 2|4, case=4 > } > else > // small switch > switch(needle) { > cases 7..10 > // others under optimized_short_cases > } > > Furthermore.. big switch case itself has two cases.. > > if len-needle>31 > // works > // loop > else // len-needle<=31 > // BUG HERE > > The else case corrects mask misalignment; the 'correction shift' is off-by-1 for the UTF16 case. > > ----- > Why not found before? > - testcase issue, needle was UTF8 for UTF16 case > > Why only needle==2? > - Possibly because the mask for words has two bits, so tolerated off-by-one Thanks for the approvals! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29242#issuecomment-3757320799 From vpaprotski at openjdk.org Thu Jan 15 23:14:55 2026 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 15 Jan 2026 23:14:55 GMT Subject: Integrated: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 04:24:24 GMT, Volodymyr Paprotski wrote: > Failure always for UU case, needle=2, len=17 > - (Note: `len=len-offset` in `library_call.cpp`, ie. stub does not see the same len as the test case) > > Following down the code layout: > > if len==0 > return 0 > if len>needle > return -1 > if len<=16|32 && needle<=3|6 > optimized_short_cases > if len>16|32 > // big switch > switch(needle) { > default >10 > cases 2..10 // BUG IS HERE: len 17|34, needle 2|4, case=4 > } > else > // small switch > switch(needle) { > cases 7..10 > // others under optimized_short_cases > } > > Furthermore.. big switch case itself has two cases.. > > if len-needle>31 > // works > // loop > else // len-needle<=31 > // BUG HERE > > The else case corrects mask misalignment; the 'correction shift' is off-by-1 for the UTF16 case. > > ----- > Why not found before? > - testcase issue, needle was UTF8 for UTF16 case > > Why only needle==2? > - Possibly because the mask for words has two bits, so tolerated off-by-one This pull request has now been integrated. Changeset: 1d889b92 Author: Volodymyr Paprotski URL: https://git.openjdk.org/jdk/commit/1d889b92bde5dfcb1fbe6cddb389a77f92eb1ce7 Stats: 19 lines in 2 files changed: 11 ins; 0 del; 8 mod 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings Reviewed-by: thartmann, jbhateja, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/29242 From ghan at openjdk.org Fri Jan 16 02:32:06 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Fri, 16 Jan 2026 02:32:06 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v3] In-Reply-To: References: Message-ID: > Please review this change. Thanks! > > Description: > > This change fixes a crash in Deoptimization::uncommon_trap_inner when running with -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp. > > With -XX:-ProfileTraps, create_if_missing is set to false. > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2121-L2122 > > When create_if_missing is false, new mdo can not be created when m()->method_data() return null, so get_method_data() may return null . > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L1911-L1912 > > and trap_mdo can be null as a result > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2134-L2136 > > The crash happens here because trap_mdo is null > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2157 > > Fix: > > The fix makes acquisition of extra_data_lock conditional on trap_mdo being non-null, preserving the required lock ordering (extra_data_lock before ttyLocker) when an MDO exists. > > Test: > > GHA Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - narrow lock scope - Merge remote-tracking branch 'upstream/master' into 8374807 - split long line - Merge remote-tracking branch 'upstream/master' into 8374807 - fix 8374807 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29147/files - new: https://git.openjdk.org/jdk/pull/29147/files/9445014e..cdc88af1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29147&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29147&range=01-02 Stats: 931 lines in 36 files changed: 487 ins; 116 del; 328 mod Patch: https://git.openjdk.org/jdk/pull/29147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29147/head:pull/29147 PR: https://git.openjdk.org/jdk/pull/29147 From xgong at openjdk.org Fri Jan 16 03:50:44 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Jan 2026 03:50:44 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Fri, 9 Jan 2026 13:32:54 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comments in type.cpp > > Nice work, thanks for taking the time for this, much appreciated! > > On the whole I'm super happy with this, but left a few extra comments :) Hi @eme64 , regarding to the comments, could you please take another look and check whether it's clear enough? Thanks so much for any feedback. > src/hotspot/share/opto/vectornode.hpp line 1874: > >> 1872: // Convert a "BVectMask" into a platform-specific vector mask (either "NVectMask" >> 1873: // or "PVectMask"). >> 1874: class VectorLoadMaskNode : public VectorNode { > > I'd love to rename this. Because it is (as you say in the comments) a conversion, and not a "load" (memory op). > What about `VectorConvertBooleans2MaskNode`. > > And below, rename `VectorStoreMaskNode` to `VectorConvertMask2BooleansNode`. > > You may have an even better idea. Hi @eme64 , do you have any insights on this? I?m wondering whether separating the renaming change into a different PR would be acceptable to you. Additionally, do think it's better that we start a dedicated thread on the mailing list or github to discuss this further and gather more feedback from others? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29130#issuecomment-3757989173 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2696748738 From dlong at openjdk.org Fri Jan 16 04:01:20 2026 From: dlong at openjdk.org (Dean Long) Date: Fri, 16 Jan 2026 04:01:20 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: References: Message-ID: <_jevSayhH-Khj6mA5jxNQKSzYwEctDg0dDgFTHltHUg=.8e8356c9-7781-4845-bd1a-d9fc3b6f107d@github.com> On Thu, 15 Jan 2026 09:17:55 GMT, David Holmes wrote: >> I agree with the concern here. The buffering we need is local to this call site to keep the output coherent (collect everything and print once). >> Whether we need to buffer/accumulate output for coherence is scenario-dependent, rather than a property that should permanently classify a stream type as ?buffered? vs. ?unbuffered?. >> @dean-long what?s your view on this? > > No - sorry I forgot that you have to add override to all methods. My understanding is that we need at least one stringStream, and the code is trying to avoid having two stacked stringStreams, which would mean extra memory footprint and copying. So I agree the proposed is_buffered() is really a proxy for "do I have a stringStream?" because bufferedStream does not guarantee coherent output by itself. What if we just live with the inefficiency of having two stringStreams for now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2696784127 From duke at openjdk.org Fri Jan 16 05:44:53 2026 From: duke at openjdk.org (Harshit470250) Date: Fri, 16 Jan 2026 05:44:53 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v11] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 11 more: https://git.openjdk.org/jdk/compare/05a2a234...9676e39d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/3ca1be39..9676e39d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=09-10 Stats: 17770 lines in 255 files changed: 9586 ins; 4224 del; 3960 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From ghan at openjdk.org Fri Jan 16 05:45:41 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Fri, 16 Jan 2026 05:45:41 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: <_jevSayhH-Khj6mA5jxNQKSzYwEctDg0dDgFTHltHUg=.8e8356c9-7781-4845-bd1a-d9fc3b6f107d@github.com> References: <_jevSayhH-Khj6mA5jxNQKSzYwEctDg0dDgFTHltHUg=.8e8356c9-7781-4845-bd1a-d9fc3b6f107d@github.com> Message-ID: <4jqYbz9oL1Klo7KdXSdWPdNT9cD0LrpHPf6u4NfKlDg=.26a204e9-8f6f-4ff5-af92-671974b1a9c9@github.com> On Fri, 16 Jan 2026 03:57:52 GMT, Dean Long wrote: >> No - sorry I forgot that you have to add override to all methods. > > My understanding is that we need at least one stringStream, and the code is trying to avoid having two stacked stringStreams, which would mean extra memory footprint and copying. So I agree the proposed is_buffered() is really a proxy for "do I have a stringStream?" because bufferedStream does not guarantee coherent output by itself. What if we just live with the inefficiency of having two stringStreams for now? Thanks, ok, I?ll drop the is_buffered() change. Would it be acceptable to go back to my earlier proposal: pass an explicit ?buffering/coherent-output? parameter at the call site so we can use an existing stringStream and avoid double buffering? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2697031873 From thartmann at openjdk.org Fri Jan 16 07:14:07 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 Jan 2026 07:14:07 GMT Subject: [jdk26] RFR: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings Message-ID: Hi all, This pull request contains a backport of commit [1d889b92](https://github.com/openjdk/jdk/commit/1d889b92bde5dfcb1fbe6cddb389a77f92eb1ce7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Volodymyr Paprotski on 15 Jan 2026 and was reviewed by Tobias Hartmann, Jatin Bhateja and Sandhya Viswanathan. Thanks! ------------- Commit messages: - Backport 1d889b92bde5dfcb1fbe6cddb389a77f92eb1ce7 Changes: https://git.openjdk.org/jdk/pull/29263/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29263&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360271 Stats: 19 lines in 2 files changed: 11 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/29263.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29263/head:pull/29263 PR: https://git.openjdk.org/jdk/pull/29263 From epeter at openjdk.org Fri Jan 16 07:52:41 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 07:52:41 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4] In-Reply-To: References: <-nGdkKB4aDkSwWlygwqkQ4tJ5s0DUwCfhyugVZtEDZk=.c4db9089-0dfa-4dca-a90e-d31639dd617a@github.com> Message-ID: <5lhbA8iGLm8PeGmQgHaW06-AyfmAzzHUXfDjyV6RF5k=.531bbd83-61f6-4d66-99a4-e9ddbc59e142@github.com> On Thu, 15 Jan 2026 17:58:41 GMT, Vladimir Ivanov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add dotProductF > > test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 97: > >> 95: framework.addFlags("--add-modules=jdk.incubator.vector", "-XX:CompileCommand=inline,*VectorAlgorithmsImpl::*"); >> 96: switch (args[0]) { >> 97: case "vanilla" -> { /* no extra flags */ } > > It would be more flexible to let arbitrary VM flags to be appended. What exactly are you suggesting here? Are you suggesting that instead of doing: ` * @run driver ${test.main.class} noSuperWord` we could do ` * @run driver ${test.main.class} -XX:-OptimizeFill` And then just `framework.addFlags(args)`? > test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithms.java line 95: > >> 93: } >> 94: >> 95: @Setup(Level.Iteration) > > Resetting after each iteration may introduce too much noise. Also, it makes it harder to reproduce input dependent variance. Maybe resetting inputs between forks is a good compromise. I've considered the options here. Maybe I can add some comments in the benchmark later, once we've discussed the arguments. Let's consider the options: - `Level.Invocation`: this would definitively lead to too much noise, as we would do about equal if not more work in the `Setup` compared to the `Benchmark`. - `Level.Iteration`: In my case, I set the iteration time to `100ms`, so that is quite a bit of time, and dwarfs the time needed for `Setup`. So I think noise is not a big deal here. - `Level.Trial` would be once per fork, which would mean starting up a new VM, and re-compiling all the methods. It would also mean that we could get different profiling leading to different compilations (e.g. unstable-if). I think `Level.Iteration` strikes a good balance here. Note: I need to reset the data many times, because some benchmarks like `findI` may have drastically different runtime depending on the data. `findI` has an early exit, so if the exit is at the beginning of the array, runtime is low, and if it is at the end of the array the runtime is high. The runtime is basically uniformly distributed over the length of the array. That's why I run `50` iterations that are relatively short `100ms`, but not too short so that the `Setup` does not dominate too much. @iwanowww What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2697368748 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2697357507 From epeter at openjdk.org Fri Jan 16 07:52:42 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 07:52:42 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4] In-Reply-To: <5lhbA8iGLm8PeGmQgHaW06-AyfmAzzHUXfDjyV6RF5k=.531bbd83-61f6-4d66-99a4-e9ddbc59e142@github.com> References: <-nGdkKB4aDkSwWlygwqkQ4tJ5s0DUwCfhyugVZtEDZk=.c4db9089-0dfa-4dca-a90e-d31639dd617a@github.com> <5lhbA8iGLm8PeGmQgHaW06-AyfmAzzHUXfDjyV6RF5k=.531bbd83-61f6-4d66-99a4-e9ddbc59e142@github.com> Message-ID: On Fri, 16 Jan 2026 07:45:57 GMT, Emanuel Peter wrote: >> test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithms.java line 95: >> >>> 93: } >>> 94: >>> 95: @Setup(Level.Iteration) >> >> Resetting after each iteration may introduce too much noise. Also, it makes it harder to reproduce input dependent variance. Maybe resetting inputs between forks is a good compromise. > > I've considered the options here. Maybe I can add some comments in the benchmark later, once we've discussed the arguments. > > Let's consider the options: > - `Level.Invocation`: this would definitively lead to too much noise, as we would do about equal if not more work in the `Setup` compared to the `Benchmark`. > - `Level.Iteration`: In my case, I set the iteration time to `100ms`, so that is quite a bit of time, and dwarfs the time needed for `Setup`. So I think noise is not a big deal here. > - `Level.Trial` would be once per fork, which would mean starting up a new VM, and re-compiling all the methods. It would also mean that we could get different profiling leading to different compilations (e.g. unstable-if). > > I think `Level.Iteration` strikes a good balance here. > > Note: I need to reset the data many times, because some benchmarks like `findI` may have drastically different runtime depending on the data. `findI` has an early exit, so if the exit is at the beginning of the array, runtime is low, and if it is at the end of the array the runtime is high. The runtime is basically uniformly distributed over the length of the array. That's why I run `50` iterations that are relatively short `100ms`, but not too short so that the `Setup` does not dominate too much. > > @iwanowww What do you think? >Maybe resetting inputs between forks is a good compromise. Given the 3 options for `Level`, `Iteration` is in the middle, so that would be the compromise ;) If I did go with per-fork `Setup`, I would have to have `50` forks, which mean we would have to do warmup for each of the `50` forks. It would drive up the runtime quite a lot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2697363204 From epeter at openjdk.org Fri Jan 16 08:09:05 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 08:09:05 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4] In-Reply-To: References: <-nGdkKB4aDkSwWlygwqkQ4tJ5s0DUwCfhyugVZtEDZk=.c4db9089-0dfa-4dca-a90e-d31639dd617a@github.com> Message-ID: On Thu, 15 Jan 2026 18:08:27 GMT, Vladimir Ivanov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add dotProductF > > test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 218: > >> 216: >> 217: // X4 oop setup. >> 218: oopsX4 = new int[size]; > > Any particular reason to keep input data initialization duplicated between test and benchmark modes? Yes. Because they are not exactly the same. One is designed to test the implementation, the other to deliver reasonably stable and meaningful benchmarks. Example of some differences: - In the "test" environment, I have access to test libraries like `Generators.java`, which are better at generating edge-cases than regular `Random`. - In the benchmark, the size is a fixed parameter `SEED`. In the test, it is a randomly chosen value, so we can test better for alignment/drain/post loops. - `eI` can be chosen randomly in each iteration of the test. But for the benchmark it is better if we have an array of values to chose from, so that we can pick different values for each benchmark invocation. But there is still a lot of overlap. I could try to split it into a "shared" and "local" part, and stuff the "shared" part into `VectorAlgorithmsImpl.Data`. @iwanowww Do you think that is worth it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2697409014 From epeter at openjdk.org Fri Jan 16 08:20:27 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 08:20:27 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v5] In-Reply-To: References: Message-ID: <4oX30eH34qM7pVxT3HgnqRcprVQGiMBNcu_Yqq9VDas=.009cf016-10b0-4b3d-8d10-59ba9099c62e@github.com> > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge>> 1873: // or "PVectMask"). >>> 1874: class VectorLoadMaskNode : public VectorNode { >> >> I'd love to rename this. Because it is (as you say in the comments) a conversion, and not a "load" (memory op). >> What about `VectorConvertBooleans2MaskNode`. >> >> And below, rename `VectorStoreMaskNode` to `VectorConvertMask2BooleansNode`. >> >> You may have an even better idea. > > Hi @eme64 , do you have any insights on this? I?m wondering whether separating the renaming change into a different PR would be acceptable to you. Additionally, do think it's better that we start a dedicated thread on the mailing list or github to discuss this further and gather more feedback from others? I'm fine with a separate RFE. Why not file an RFE, and then we can discuss on JIRA, and the PR that we will create from it? That will keep the conversation accessible to all and once the change is made people can find the conversation that led up to the change more easily :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697451228 From epeter at openjdk.org Fri Jan 16 08:31:35 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 08:31:35 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 05:41:26 GMT, Xiaohong Gong wrote: >> The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific >> features, making the related code in HotSpot difficult to understand and review. >> >> This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and >> maintainability. >> >> Note: This patch only adds comments; no functional changes are made. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update comments in type.cpp src/hotspot/share/opto/vectornode.hpp line 1513: > 1511: // and incrementing by 1 up to "VLENGTH - 1". So far, the first input is an int > 1512: // constant 0. For example, a 128-bit vector with int (32-bit) elements produces > 1513: // a vector like "[0, 1, 2, 3]". Are you saying that `in1` has to be a constant with value zero? Actually, it seems the backend just ignores the input value... so why do we even have it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697471218 From epeter at openjdk.org Fri Jan 16 08:48:49 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 08:48:49 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: References: Message-ID: <24LD4MUZsphnFJ8aK4Vg3PIUiP-tbEAK5ZaRlxozAlE=.6eada449-6395-4dd0-a335-f6c4362e7af7@github.com> On Wed, 14 Jan 2026 05:41:26 GMT, Xiaohong Gong wrote: >> The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific >> features, making the related code in HotSpot difficult to understand and review. >> >> This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and >> maintainability. >> >> Note: This patch only adds comments; no functional changes are made. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update comments in type.cpp src/hotspot/share/opto/vectornode.hpp line 1912: > 1910: // Unsigned vector cast operations can only be used in Vector API unsigned > 1911: // extensions between integral types so far. E.g., extending byte to float > 1912: // is not supported now. Actually, we could totally implement it for the Auto Vectorizer. It would be methods like `Byte.toUnsignedInt`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697519994 From epeter at openjdk.org Fri Jan 16 08:56:16 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 08:56:16 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: <9w7aZ_5AzZkyKYo4Rg8E8tRLidYkFtoKVYVOrPj4TkA=.aaaf1cc4-2b04-4324-a2ef-2936b519b254@github.com> References: <24LD4MUZsphnFJ8aK4Vg3PIUiP-tbEAK5ZaRlxozAlE=.6eada449-6395-4dd0-a335-f6c4362e7af7@github.com> <9w7aZ_5AzZkyKYo4Rg8E8tRLidYkFtoKVYVOrPj4TkA=.aaaf1cc4-2b04-4324-a2ef-2936b519b254@github.com> Message-ID: On Fri, 16 Jan 2026 08:50:02 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.hpp line 1912: >> >>> 1910: // Unsigned vector cast operations can only be used in Vector API unsigned >>> 1911: // extensions between integral types so far. E.g., extending byte to float >>> 1912: // is not supported now. >> >> Actually, we could totally implement it for the Auto Vectorizer. It would be methods like `Byte.toUnsignedInt`. > > I filed it here with an example: > [JDK-8375502](https://bugs.openjdk.org/browse/JDK-8375502) C2 SuperWord: implement unsigned casts > > If anybody is interested in taking on this task, feel free to reassign it to yourself :) I'd suggest to remove the comment about "Vector API only" in the code. Instead, I would explain that unsigned cast means zero-extension if dst is larger than src. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697536340 From epeter at openjdk.org Fri Jan 16 08:56:16 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 08:56:16 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: <24LD4MUZsphnFJ8aK4Vg3PIUiP-tbEAK5ZaRlxozAlE=.6eada449-6395-4dd0-a335-f6c4362e7af7@github.com> References: <24LD4MUZsphnFJ8aK4Vg3PIUiP-tbEAK5ZaRlxozAlE=.6eada449-6395-4dd0-a335-f6c4362e7af7@github.com> Message-ID: <9w7aZ_5AzZkyKYo4Rg8E8tRLidYkFtoKVYVOrPj4TkA=.aaaf1cc4-2b04-4324-a2ef-2936b519b254@github.com> On Fri, 16 Jan 2026 08:45:56 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comments in type.cpp > > src/hotspot/share/opto/vectornode.hpp line 1912: > >> 1910: // Unsigned vector cast operations can only be used in Vector API unsigned >> 1911: // extensions between integral types so far. E.g., extending byte to float >> 1912: // is not supported now. > > Actually, we could totally implement it for the Auto Vectorizer. It would be methods like `Byte.toUnsignedInt`. I filed it here with an example: [JDK-8375502](https://bugs.openjdk.org/browse/JDK-8375502) C2 SuperWord: implement unsigned casts If anybody is interested in taking on this task, feel free to reassign it to yourself :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697533321 From mhaessig at openjdk.org Fri Jan 16 08:57:36 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 16 Jan 2026 08:57:36 GMT Subject: [jdk26] RFR: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 07:08:06 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [1d889b92](https://github.com/openjdk/jdk/commit/1d889b92bde5dfcb1fbe6cddb389a77f92eb1ce7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Volodymyr Paprotski on 15 Jan 2026 and was reviewed by Tobias Hartmann, Jatin Bhateja and Sandhya Viswanathan. > > Thanks! Thank you for taking care of the backport. Looks good. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/29263#pullrequestreview-3669616756 From jbhateja at openjdk.org Fri Jan 16 09:03:48 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 16 Jan 2026 09:03:48 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 05:41:26 GMT, Xiaohong Gong wrote: >> The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific >> features, making the related code in HotSpot difficult to understand and review. >> >> This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and >> maintainability. >> >> Note: This patch only adds comments; no functional changes are made. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update comments in type.cpp Thanks @XiaohongGong , looks good. I have also created https://bugs.openjdk.org/browse/JDK-8375498 so that we enable dumping of VectorIR created though vector inline expanders similar to auto-vectorization. src/hotspot/share/opto/vectornode.hpp line 57: > 55: // operations. While BVectMask primarily represents mask values loaded from or > 56: // stored to Java boolean memory, and is currently used in certain mask operations > 57: // (i.e. VectorMaskOpNode). You can also specify here that BVectMask is a specific representation tied to mask backing storage and VectorLoadMask and VectorStoreMask IR is needed to transform it to/from P/NVectMask ------------- PR Review: https://git.openjdk.org/jdk/pull/29130#pullrequestreview-3669612253 PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697542602 From duke at openjdk.org Fri Jan 16 09:04:53 2026 From: duke at openjdk.org (duke) Date: Fri, 16 Jan 2026 09:04:53 GMT Subject: Withdrawn: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Thu, 26 Jun 2025 12:13:19 GMT, Samuel Chee wrote: > AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: > > ;; cmpxchg { > 0x0000e708d144cf60: mov x8, x2 > 0x0000e708d144cf64: casal x8, x3, [x0] > 0x0000e708d144cf68: cmp x8, x2 > ;; 0x1F1F1F1F1F1F1F1F > 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f > ;; } cmpxchg > 0x0000e708d144cf70: cset x8, ne // ne = any > 0x0000e708d144cf74: dmb ish > > > According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] > >> Atomically sets the value of a variable to the >> newValue with the memory semantics of setVolatile if >> the variable's current value, referred to as the witness >> value, == the expectedValue, as accessed with the memory >> semantics of getVolatile. > > > > Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. > > Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) > > This is also reflected by C2 not having a dmb for the same respective method. > > [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) > [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26000 From epeter at openjdk.org Fri Jan 16 09:14:59 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 09:14:59 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Thu, 15 Jan 2026 16:06:21 GMT, Fei Gao wrote: >>> * If we cover all phis: we should probably change the naming of the variables to indicate that it is any `phi` and not just the `iv`. >>> >>> What do you think? >> >> Yes, we?re looking at all variables that change as the loop iterates, not just the trip counter. How about renaming it to `main_exit_value` or `value_after_main_loop`? > >> Also: how sure are you that the backedge `main_phi->in(LoopNode::LoopBackControl)` is the same as the the value after main `iv_after_main`? What if we did some split-through-phi action at some point? Example: >> >> ``` >> x = ... >> LOOP: >> x = op(x); >> // x now serves as exit value and backedge value >> exit check; >> goto LOOP; >> ``` >> >> If we split `op` through the LOOP Phi, we get: >> >> ``` >> x = ... >> x = op(x); >> LOOP: >> // the exit value is the phi >> exit check; >> x = op(x); >> // x after op is the backedge >> goto LOOP; >> ``` >> >> I'm not sure this currently ever happens, but what if it did? > > Great question?and yes, that?s exactly why this logic is a bit more involved. > > If I understand correctly, in the second scenario?after a split-through-phi?the value we need after the main loop is not the original x directly. Instead, we also need to apply an additional op(x) so that the resulting value can serve as the input to the drain loop. > > > x = ... > y = op(x); > MAIN_LOOP: > // the exit value is the phi > exit check; > y = op(x); > // x after op is the backedge > goto MAIN_LOOP; > > MAIN_EXIT: > y = op(x); // the newly created op(x) outside the main-loop body > > DRAIN_LOOP: > // the exit value is the phi > exit check; > y = op(x); > // x after op is the backedge > goto DRAIN_LOOP; > > > > Let's walk through the code step by step. > > > Node* drain_input = nullptr; > Node* iv_after_main = main_phi->in(LoopNode::LoopBackControl); > > At this point, we do **not** assume that `iv_after_main` is already the correct ?after-main? value. > > if (get_ctrl(iv_after_main) != main_backedge_ctrl) { > drain_input = find_merge_phi_for_vectorized_drain(iv_after_main, main_merge_region); > } > > We first check whether `main_phi->in(LoopNode::LoopBackControl)` is pinned in the backedge block. > > 1. If `get_ctrl(iv_after_main) != main_backedge_ctrl`, then `main_phi->in(LoopNode::LoopBackControl)` is **not** in the backedge block. It should be **either in the main loop body or in the main exit block.** > > - If it is still in the main loop body, we won?t be able to find a valid merge phi via its uses, so **`drain_input` remains nullptr**. > > - If it is already in the main exit block, we can find an existing valid merge phi via its uses?this corresponds to the first scenario you described, **`drain_input` won't be nullptr**. > > 2. If `get_ctrl(iv_after_main) == main_backedge_ctrl`, then the `main_phi->in(LoopNode::LoopBackControl)` is pinned in the b... Ah, I see, so the magic is supposed to happen in `clone_up_backedge_goo`. Thanks for all the detailed explanations! I think I had missed the recursive approach. I had just relied on the old comment above `clone_up_backedge_goo` that only talks about `n` and its direct clone. It fails to mention any of the recursive part, that walks the whole chain. I think it would be worth spelling it out a bit more for `clone_up_backedge_goo`, and writing down why what it does is correct. I think that would help a lot, especially because you are now using it in a new way, and so we have to make sure things are correct, and it is easy for the reader to see why :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697596988 From xgong at openjdk.org Fri Jan 16 09:18:01 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Jan 2026 09:18:01 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 08:53:13 GMT, Jatin Bhateja wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comments in type.cpp > > src/hotspot/share/opto/vectornode.hpp line 57: > >> 55: // operations. While BVectMask primarily represents mask values loaded from or >> 56: // stored to Java boolean memory, and is currently used in certain mask operations >> 57: // (i.e. VectorMaskOpNode). > > You can also specify here that BVectMask representation is specifically tied to mask backing storage and VectorLoadMask and VectorStoreMask IR is needed to transform it to/from P/NVectMask Thanks for looking at this PR. I added the mask type comment at the IR definition code in `vectornode.hpp` for these two IRs. Is that fine to you? Or we'd better comment here as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697609026 From xgong at openjdk.org Fri Jan 16 09:18:03 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Jan 2026 09:18:03 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Fri, 16 Jan 2026 08:20:40 GMT, Emanuel Peter wrote: >> Hi @eme64 , do you have any insights on this? I?m wondering whether separating the renaming change into a different PR would be acceptable to you. Additionally, do you think it's better that we start a dedicated thread on the mailing list or github to discuss this further and gather more feedback from others? > > I'm fine with a separate RFE. Why not file an RFE, and then we can discuss on JIRA, and the PR that we will create from it? That will keep the conversation accessible to all and once the change is made people can find the conversation that led up to the change more easily :) Sounds good. I'v filed a JBS (https://bugs.openjdk.org/browse/JDK-8375509) to record and follow. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697600461 From epeter at openjdk.org Fri Jan 16 09:22:16 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 09:22:16 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: <_7sefVQwDEqTNjRhKbwgUEAoMttSbsW1DIRnWoWJ2M4=.04e87f22-fd62-44ba-8f8c-b6faad29f83f@github.com> On Thu, 15 Jan 2026 16:50:51 GMT, Fei Gao wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1482: >> >>> 1480: pre_incr = clone_up_backedge_goo(nullptr, main_merge_region->in(1), pre_incr, visited, clones); >>> 1481: } >>> 1482: drain_input->set_req(1, pre_incr); >> >> Just a control question: above you did: >> `drain_input = PhiNode::make(main_merge_region, iv_after_main);` >> Does that not put `iv_after_main` at slot `1`, and now we overwrite it with `pre_incr`? > > Yes, this is what we're generating: > `drain_input = merge (pre_incr, iv_after_main)` Ah, I think you are right. `PhiNode::make` does something I did not expect: it creates a Phi that has as many slots for merging as the region, and puts the specified value in ALL slots. So at first we get: `Phi(self, iv_after_main, iv_after_main)` And after `set_req`, we then have: `Phi(self, pre_incr, iv_after_main)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697612719 From epeter at openjdk.org Fri Jan 16 09:22:12 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 09:22:12 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <_gdObRdkYkS7d3fQQ6bcms709TpeM1IQtuPJtI0fcyE=.073d0496-8dfc-48a9-aee8-b64b408a6e62@github.com> Message-ID: On Tue, 13 Jan 2026 15:10:29 GMT, Fei Gao wrote: >> @fg1417 I hope you had a good start into the new year. I'd love to make this PR a bit of a priority in the next weeks. Would you mind synching with master and fixing merge conflicts? >> >> I'd review, run testing and look into running some benchmarks myself. > > Hi @eme64 the PR is ready for review and testing. Thanks! @fg1417 Thanks for all the responses :) I'll review some different parts of the patch now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3758915137 From jbhateja at openjdk.org Fri Jan 16 09:25:10 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 16 Jan 2026 09:25:10 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v5] In-Reply-To: <4oX30eH34qM7pVxT3HgnqRcprVQGiMBNcu_Yqq9VDas=.009cf016-10b0-4b3d-8d10-59ba9099c62e@github.com> References: <4oX30eH34qM7pVxT3HgnqRcprVQGiMBNcu_Yqq9VDas=.009cf016-10b0-4b3d-8d10-59ba9099c62e@github.com> Message-ID: On Fri, 16 Jan 2026 08:20:27 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server >> >> `macosx_x64_sandybridge` >> > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix small issue in benchmark and add comment test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithmsImpl.java line 61: > 59: } > 60: return r; > 61: } You can add another flavor for vector API kernels where tail is implemented using masked operations. if (i < r.length) { VectorMask mask = SPECIES_I.indexInRange(i, r.length); v.intoArray(r, i, mask); } Simply replicated the loop body guarded by Mask. https://github.com/openjdk/jdk/pull/28002/changes#diff-b5c49811dff21107eb8c8ab0578be4cd235c6f69bafd879a8e4b4620b974c25bR153-R159 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2697649279 From fgao at openjdk.org Fri Jan 16 09:33:15 2026 From: fgao at openjdk.org (Fei Gao) Date: Fri, 16 Jan 2026 09:33:15 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Wed, 14 Jan 2026 14:47:06 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Fix build failure after rebasing and address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Fixed new test failures after rebasing and refined parts of the code to address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Merge branch 'master' into optimize-atomic-post >> - Clean up comments for consistency and add spacing for readability >> - Fix some corner case failures and refined part of code >> - Merge branch 'master' into optimize-atomic-post >> - Refine ascii art, rename some variables and resolve conflicts >> - Merge branch 'master' into optimize-atomic-post >> - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 > > src/hotspot/share/opto/loopTransform.cpp line 1464: > >> 1462: // We try to look up target phi from all uses of node 'iv_after_main'. >> 1463: drain_input = find_merge_phi_for_vectorized_drain(iv_after_main, main_merge_region); >> 1464: } > > What is the if for here? Why do we need that condition? > Ah, I suppose if `iv_after_main` is not on the backedge, it is in the main-loop body, right? > Still, I don't see through yet ... can you clarify? As explained above, only when `main_phi->in(LoopNode::LoopBackControl)` is already in the main-exit block can we find an existing valid merge phi via its uses. Otherwise, `drain_input` will remain `nullptr` at this point, so this `if` check doesn?t add much value. It should be safe to remove it. > src/hotspot/share/opto/loopTransform.cpp line 1491: > >> 1489: // Remove the new phi from the graph and use the hit >> 1490: _igvn.remove_dead_node(drain_input); >> 1491: drain_input = hit; > > Does this actually ever happen? Would we not have expected that `find_merge_phi_for_vectorized_drain` would have succeeded? Agreed. It?s safe to remove this part of the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697648087 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697613445 From jbhateja at openjdk.org Fri Jan 16 09:34:38 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 16 Jan 2026 09:34:38 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 09:14:05 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectornode.hpp line 57: >> >>> 55: // operations. While BVectMask primarily represents mask values loaded from or >>> 56: // stored to Java boolean memory, and is currently used in certain mask operations >>> 57: // (i.e. VectorMaskOpNode). >> >> You can also specify here that BVectMask representation is specifically tied to mask backing storage and VectorLoadMask and VectorStoreMask IR is needed to transform it to/from P/NVectMask > > Thanks for looking at this PR. I added the mask type comment at the IR definition code in `vectornode.hpp` for these two IRs. Is that fine to you? Or we'd better comment here as well. A mention here will complete the link b/w B and N/P representations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697674507 From xgong at openjdk.org Fri Jan 16 09:34:38 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Jan 2026 09:34:38 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 09:28:49 GMT, Jatin Bhateja wrote: >> Thanks for looking at this PR. I added the mask type comment at the IR definition code in `vectornode.hpp` for these two IRs. Is that fine to you? Or we'd better comment here as well. > > A mention here will complete the link b/w B and N/P representations. Ok. I will add it with next commit. Thanks for your suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697685857 From xgong at openjdk.org Fri Jan 16 09:34:43 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Jan 2026 09:34:43 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 08:28:08 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comments in type.cpp > > src/hotspot/share/opto/vectornode.hpp line 1513: > >> 1511: // and incrementing by 1 up to "VLENGTH - 1". So far, the first input is an int >> 1512: // constant 0. For example, a 128-bit vector with int (32-bit) elements produces >> 1513: // a vector like "[0, 1, 2, 3]". > > Are you saying that `in1` has to be a constant with value zero? Actually, it seems the backend just ignores the input value... so why do we even have it? Yes, `in1` is a constant with zero and it can be ignored in backend codegen. I'm unsure about the root cause why we still need it. It deserves an investigation. Maybe it's necessary for a floating node that has no control input for some phases? I'm not sure. When I tried to remove it , I met hotspot crashes when running jtreg tests: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (jdk/src/hotspot/share/opto/node.hpp:418), pid=2345980, tid=2346012 # assert(i < _max) failed: oob: i=1, _max=1 The call stack is: Stack: [0x0000e66182a7f000,0x0000e66182c7d000], sp=0x0000e66182c775d0, free space=2017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x138a0ac] Node::in(unsigned int) const [clone .part.0]+0x2c (node.hpp:418) V [libjvm.so+0x13a36b4] PhaseIdealLoop::get_early_ctrl(Node*)+0x400 (node.hpp:418) V [libjvm.so+0x13a48e8] PhaseIdealLoop::build_loop_early(VectorSet&, Node_List&, Node_Stack&)+0x658 (loopnode.cpp:250) V [libjvm.so+0x13b26e4] PhaseIdealLoop::build_and_optimize()+0x654 (loopnode.cpp:5134) V [libjvm.so+0xa35ba0] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x360 (loopnode.hpp:1233) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697682558 From xgong at openjdk.org Fri Jan 16 09:34:45 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Jan 2026 09:34:45 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v3] In-Reply-To: References: <24LD4MUZsphnFJ8aK4Vg3PIUiP-tbEAK5ZaRlxozAlE=.6eada449-6395-4dd0-a335-f6c4362e7af7@github.com> <9w7aZ_5AzZkyKYo4Rg8E8tRLidYkFtoKVYVOrPj4TkA=.aaaf1cc4-2b04-4324-a2ef-2936b519b254@github.com> Message-ID: On Fri, 16 Jan 2026 08:51:01 GMT, Emanuel Peter wrote: >> I filed it here with an example: >> [JDK-8375502](https://bugs.openjdk.org/browse/JDK-8375502) C2 SuperWord: implement unsigned casts >> >> If anybody is interested in taking on this task, feel free to reassign it to yourself :) > > I'd suggest to remove the comment about "Vector API only" in the code. Instead, I would explain that unsigned cast means zero-extension if dst is larger than src. Sounds good. I will change it with next commit. Thanks for the suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29130#discussion_r2697690358 From epeter at openjdk.org Fri Jan 16 09:40:20 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 09:40:20 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v5] In-Reply-To: References: <4oX30eH34qM7pVxT3HgnqRcprVQGiMBNcu_Yqq9VDas=.009cf016-10b0-4b3d-8d10-59ba9099c62e@github.com> Message-ID: On Fri, 16 Jan 2026 09:23:01 GMT, Jatin Bhateja wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix small issue in benchmark and add comment > > test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithmsImpl.java line 61: > >> 59: } >> 60: return r; >> 61: } > > You can add another flavor for vector API kernels where tail is implemented using masked operations. > > > if (i < r.length) { > VectorMask mask = SPECIES_I.indexInRange(i, r.length); > v.intoArray(r, i, mask); > } > > > Simply replicated the loop body guarded by Mask. > https://github.com/openjdk/jdk/pull/28002/changes#diff-b5c49811dff21107eb8c8ab0578be4cd235c6f69bafd879a8e4b4620b974c25bR153-R159 Good idea, I could! However, it would mean I would have to probably add this version for every benchmark. I'm wondering if that is worth it. I think I won't add it now, but maybe in a follow-up RFE :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2697707533 From qamai at openjdk.org Fri Jan 16 09:49:34 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 16 Jan 2026 09:49:34 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked In-Reply-To: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: On Wed, 14 Jan 2026 13:45:09 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` caches > intermediate results in `_dom_lca_tags` when the late control is > computed by `PhaseIdealLoop::get_late_ctrl()` for a node `n`: the code > iterates over all uses of `n` potentially calling > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` multiple > times. `_dom_lca_tags` is used to cache data that is specific to the > lca computation for `n`. `_dom_lca_tags` is set to a tag that depends > on `n` to mark the cached data as only valid during the lca > computation for `n`. > > `PhaseIdealLoop::try_sink_out_of_loop()` checks that all uses of a > node are out of loop with > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` which also needs to > consider anti-dependences for `Load`s and also calls > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` through > `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`: this computes the > late control for a node and one particular out of loop > use. `_dom_lca_tags` values computed by an earlier > `PhaseIdealLoop::get_late_ctrl()` should be ignored (because it > computes the late control for a node and all its uses). To address > that issue, the tag that's used by > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` is made > different on each call from > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` by incrementing > `_dom_lca_tags_round`. > > The issue here is that one `Load` node is input to a `Phi` twice. So > the `Phi` is considered twice as a use of the node along 2 different > paths. `PhaseIdealLoop::get_late_ctrl_with_anti_dep()` is called twice > from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` but > `_dom_lca_tags_round` is not incremented between the 2 > calls. `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` when > called for the second `Phi` input uses incorrect cached data which, in > turn, causes an incorrect computation. > > The fix I propose is to make sure `_dom_lca_tags_round` is incremented > for every call to `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`. LGTM, it also brings the comment about `get_late_ctrl_with_anti_dep` to where the call is actually invoked, which makes much more sense. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/29231#pullrequestreview-3669877710 From xgong at openjdk.org Fri Jan 16 10:00:26 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Jan 2026 10:00:26 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v2] In-Reply-To: References: Message-ID: > ### Problem: > > Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: > > > // A fatal error has been detected by the Java Runtime Environment: > // > // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 > // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector > // ... > > > The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: > > https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 > > ### Root Cause: > > The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. > > Here is the simplified ideal graph showing the crash scenario: > > > Con #top > | ConI > \ / > \ / > VectorStoreMask > | > VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong > > > ### Detailed Scenario: > > Following is the method in the test case that hits the assertion: > > https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 > > This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. > > When compiling a specific test case such as: > https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 > > the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: > > > VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() > / \ > AddP \ > | \ > LoadNClass \ > ConP #IntMaxMask | | > \ | | > \ DecodeNClass | > \ / | > \ / | > CmpP ... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Check "top" and revert the assertion changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29057/files - new: https://git.openjdk.org/jdk/pull/29057/files/294e74e3..cf73b3ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29057&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29057&range=00-01 Stats: 9 lines in 1 file changed: 5 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29057/head:pull/29057 PR: https://git.openjdk.org/jdk/pull/29057 From xgong at openjdk.org Fri Jan 16 10:15:18 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Jan 2026 10:15:18 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v2] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 09:11:25 GMT, Quan Anh Mai wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Check "top" and revert the assertion changes > > src/hotspot/share/opto/vectornode.cpp line 1923: > >> 1921: Node* mask = in1->in(1); >> 1922: const TypeVect* mask_vt = mask->bottom_type()->isa_vect(); >> 1923: if (mask_vt == nullptr) { > > It is better to filter the exact `Type::TOP` instance and assert that otherwise, this must be a `TypeVect`. Additionally, if the type of the input is `Type::TOP`, we can eagerly return `C->top()` to kill it. Hi @merykitty , @iwanowww , I'v updated the change to check `TOP` input and converted the changes for assertion. Please help take another look. Thanks! >Additionally, if the type of the input is Type::TOP, we can eagerly return C->top() to kill it. This makes sense to me. However, I would prefer to use `return nullptr` in this optimization. My concerns are: 1. In `Ideal()` implementations of other nodes that handle `top` inputs, the common pattern is to detect `top` and simply return `nullptr`. This is reasonable because `Ideal()` is meant for optimization/transformation, and avoiding changes when a node has a top input is usually safer. In many cases, updating a node to top is done in `Value()`, which seems like a more appropriate place for that kind of change. 2. Conceptually, a node should only need to check whether its own inputs are `top`. In this case, we are required to look through and check `in(1)->in(1)`, which is less natural and makes the logic more fragile. 3. The `Ideal()` contract is to return either a new node or the node itself for the GVN phase. `C->top()` is effectively an existing ?global? node, not a newly created one. Even though IGVN treats top as a special case, introducing this here feels risky and could increase the chance of HotSpot crashes. Using `return nullptr` avoids this extra risk. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29057#discussion_r2697855374 From dlong at openjdk.org Fri Jan 16 10:17:41 2026 From: dlong at openjdk.org (Dean Long) Date: Fri, 16 Jan 2026 10:17:41 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: <4jqYbz9oL1Klo7KdXSdWPdNT9cD0LrpHPf6u4NfKlDg=.26a204e9-8f6f-4ff5-af92-671974b1a9c9@github.com> References: <_jevSayhH-Khj6mA5jxNQKSzYwEctDg0dDgFTHltHUg=.8e8356c9-7781-4845-bd1a-d9fc3b6f107d@github.com> <4jqYbz9oL1Klo7KdXSdWPdNT9cD0LrpHPf6u4NfKlDg=.26a204e9-8f6f-4ff5-af92-671974b1a9c9@github.com> Message-ID: On Fri, 16 Jan 2026 05:43:40 GMT, Guanqiang Han wrote: >> My understanding is that we need at least one stringStream, and the code is trying to avoid having two stacked stringStreams, which would mean extra memory footprint and copying. So I agree the proposed is_buffered() is really a proxy for "do I have a stringStream?" because bufferedStream does not guarantee coherent output by itself. What if we just live with the inefficiency of having two stringStreams for now? > > Thanks, ok, I?ll drop the is_buffered() change. Would it be acceptable to go back to my earlier proposal: pass an explicit ?buffering/coherent-output? parameter at the call site so we can use an existing stringStream and avoid double buffering? I would be OK with that, since we failed to come up with a better solution, but I suspect that we will run into this same issue again, and having to pass an extra flag through multiple layers every time does not feel elegant. Trying to hide the extra flag using something like a THREAD_LOCAL seems hacky as well. I think using a lambda expression would work but may be overkill. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2697862305 From chagedorn at openjdk.org Fri Jan 16 10:33:12 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 Jan 2026 10:33:12 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked In-Reply-To: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: On Wed, 14 Jan 2026 13:45:09 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` caches > intermediate results in `_dom_lca_tags` when the late control is > computed by `PhaseIdealLoop::get_late_ctrl()` for a node `n`: the code > iterates over all uses of `n` potentially calling > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` multiple > times. `_dom_lca_tags` is used to cache data that is specific to the > lca computation for `n`. `_dom_lca_tags` is set to a tag that depends > on `n` to mark the cached data as only valid during the lca > computation for `n`. > > `PhaseIdealLoop::try_sink_out_of_loop()` checks that all uses of a > node are out of loop with > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` which also needs to > consider anti-dependences for `Load`s and also calls > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` through > `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`: this computes the > late control for a node and one particular out of loop > use. `_dom_lca_tags` values computed by an earlier > `PhaseIdealLoop::get_late_ctrl()` should be ignored (because it > computes the late control for a node and all its uses). To address > that issue, the tag that's used by > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` is made > different on each call from > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` by incrementing > `_dom_lca_tags_round`. > > The issue here is that one `Load` node is input to a `Phi` twice. So > the `Phi` is considered twice as a use of the node along 2 different > paths. `PhaseIdealLoop::get_late_ctrl_with_anti_dep()` is called twice > from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` but > `_dom_lca_tags_round` is not incremented between the 2 > calls. `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` when > called for the second `Phi` input uses incorrect cached data which, in > turn, causes an incorrect computation. > > The fix I propose is to make sure `_dom_lca_tags_round` is incremented > for every call to `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`. Looks good to me, too. Let's wait for the submitted testing by @merykitty to complete. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29231#pullrequestreview-3670118570 From epeter at openjdk.org Fri Jan 16 10:42:52 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 10:42:52 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 13 Jan 2026 11:27:53 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Fix build failure after rebasing and address review comments > - Merge branch 'master' into optimize-atomic-post > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 Alright, here another batch of comments. I'm now at the beginning of `fix_data_uses_for_vectorized_drain`, I'll continue from here next time :) src/hotspot/share/opto/loopTransform.cpp line 1841: > 1839: } > 1840: } > 1841: A few questions: - Why not cache the `skip_assertion_predicates_with_halt` values? Do the values change over time? If there are lots of predicates, you will do a traversal over and over again. - Why do we need this special logic for the `drain` loop cloning? What makes it different to other cloning cases? - The new ctrl you set is either at the post-head entry, or after skipping the predicates. Why did you chose those? src/hotspot/share/opto/loopTransform.cpp line 1944: > 1942: > 1943: // Step 2: Find some key nodes which control the execution paths of the zero trip guard. > 1944: // Step 2.1: Find 'zero_ctrl' which will be the control input of the zero trip guard. Nit: "find" suggests that it already exists. That contradicts `zero_ctrl = new IfFalseNode(outer_main_end);` below a little. src/hotspot/share/opto/loopTransform.cpp line 1948: > 1946: if (mode == InsertVectorizedDrain) { > 1947: // For vectorized drain loop, 'zero_ctrl' should be the node merges exits > 1948: // from the main loop and the pre loop. Suggestion: // For vectorized drain loop, 'zero_ctrl' should be the node that merges exits // from the main loop and the pre loop. src/hotspot/share/opto/loopTransform.cpp line 1951: > 1949: zero_ctrl = main_exit->unique_ctrl_out_or_null(); > 1950: assert(zero_ctrl != nullptr && zero_ctrl->is_Region(), > 1951: "In the pre-main-post model, zero_ctrl must exist."); Suggestion: zero_ctrl = main_exit->unique_ctrl_out()->as_Region(); It would do the same assertions, and be a little compacter. But I suppose the asserts would be less nice if they are ever hit. Up to you. src/hotspot/share/opto/loopnode.hpp line 1440: > 1438: // result control flow branches > 1439: // either to inner clone or outer > 1440: // strip mined loop. I have trouble understanding the comments here (not your fault, it was here already). I'm also wondering if this is only used for `post_loop`? If so, maybe we could rename it, and improve the comments here? src/hotspot/share/opto/loopnode.hpp line 1507: > 1505: // If 'back_ctrl' is null: (Specially for pre-loop exit in resolve_input_for_drain_or_post()) > 1506: // - Clone 'n' into 'preheader_ctrl' if its block does not strictly dominate 'preheader_ctrl'. > 1507: // - Otherwise, return 'n'. Personally, I think it would be better to avoid extensive documentation in both `hpp` and `cpp`. The chances that the documentation eventually goes out of sync is big. I'd suggest putting the documentation in the `cpp`, and either no documentation or only a summary in `hpp`. src/hotspot/share/opto/loopopts.cpp line 2377: > 2375: Node* hit = _igvn.hash_find_insert(use); > 2376: if (hit) > 2377: _igvn.replace_node(use, hit); Suggestion: if (hit != nullptr) { _igvn.replace_node(use, hit); } Styleguide does not want implicit zero/null checks. And with brackets is preferred :) src/hotspot/share/opto/loopopts.cpp line 2384: > 2382: // > 2383: // Let us look at the data path of the trip counter, as an example > 2384: // to understand the data uses: I love the ASCII art :) But I'm wondering if it might be better not to use `iv` names like `pre_incr`, just because the `iv` phi structure is simpler than others, and this hides the complexity of other phis. But I don't have the solution yet. src/hotspot/share/opto/loopopts.cpp line 2483: > 2481: for (DUIterator_Fast jmax, j = main_old->fast_outs(jmax); j < jmax; j++) { > 2482: worklist.push(main_old->fast_out(j)); > 2483: } What nodes might already be on the `worklist` before we get here? Why not rename `main_old` to `main_incr` or `main_backedge`, or something else that is a bit more specific? src/hotspot/share/opto/loopopts.cpp line 2485: > 2483: } > 2484: > 2485: Node_List visit_list; Suggestion: ResourceMark rm; Node_List visit_list; Can we do this, or do we run into issues? src/hotspot/share/opto/loopopts.cpp line 2489: > 2487: > 2488: while (worklist.size() != 0) { > 2489: Node* use = worklist.pop(); Can you add a comment above these lines and say what we are iterating over and what we do with each `use`? Only a single line, so the reader knows what to expect :) ------------- PR Review: https://git.openjdk.org/jdk/pull/22629#pullrequestreview-3669808361 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697692770 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697749773 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697751048 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697762503 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697721267 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697857135 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697881478 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697900940 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697934189 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697914118 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697940471 From epeter at openjdk.org Fri Jan 16 10:42:54 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 10:42:54 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: <-yc3O4xvk6Wy7-EouV2BoiK5iydinqSFez5WKCeCUdw=.b006cc2b-d490-4c66-a103-fa4459b35ecb@github.com> On Fri, 16 Jan 2026 09:46:34 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Fix build failure after rebasing and address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Fixed new test failures after rebasing and refined parts of the code to address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Merge branch 'master' into optimize-atomic-post >> - Clean up comments for consistency and add spacing for readability >> - Fix some corner case failures and refined part of code >> - Merge branch 'master' into optimize-atomic-post >> - Refine ascii art, rename some variables and resolve conflicts >> - Merge branch 'master' into optimize-atomic-post >> - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 > > src/hotspot/share/opto/loopTransform.cpp line 1944: > >> 1942: >> 1943: // Step 2: Find some key nodes which control the execution paths of the zero trip guard. >> 1944: // Step 2.1: Find 'zero_ctrl' which will be the control input of the zero trip guard. > > Nit: "find" suggests that it already exists. That contradicts `zero_ctrl = new IfFalseNode(outer_main_end);` below a little. Maybe "get" could be more neutral? > src/hotspot/share/opto/loopnode.hpp line 1440: > >> 1438: // result control flow branches >> 1439: // either to inner clone or outer >> 1440: // strip mined loop. > > I have trouble understanding the comments here (not your fault, it was here already). > I'm also wondering if this is only used for `post_loop`? If so, maybe we could rename it, and improve the comments here? At least in your code, it would read much better if it was called `InsertPost` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697768299 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2697734928 From dlunden at openjdk.org Fri Jan 16 10:57:47 2026 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 16 Jan 2026 10:57:47 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> Message-ID: On Mon, 12 Jan 2026 15:05:29 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 2. Fold a pointer `Phi`. >> >> Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. >> >> Another interesting case: >> >> Point p = Phi(p1, p2); >> p.x = v; >> p1.x = v1; >> int a = p.x; >> >> Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. >> >> 3. Nested objects >> >> It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: >> >> Point p = new Point; >> PointHolder h = new PointHolder; >> h.p = p; >> int x = p.x; >> escape(h); >> >> Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. >> >> Please... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into loadfoldingigvn > - Early return when not a heap access > - Fix escape at store > - Fix outdated and unclear comments > - copyright year, return, comments, whitespace > - Merge branch 'master' into loadfoldingigvn > - ea of phis and nested objects > - Add test scenarios > - Add a flag to turn off the feature > - Much more comments, refactor the data into a separate class > - ... and 9 more: https://git.openjdk.org/jdk/compare/2d116c1f...c275e6e6 I haven't started digging into the meat of this changeset yet, but here are some initial comments. Can you please create an RFE for the future work you mention in the PR description (if you have not done so already)? src/hotspot/share/opto/memnode.cpp line 2191: > 2189: const TypePtr *addr_t = phase->type(address)->isa_ptr(); > 2190: > 2191: if (can_reshape && (addr_t != nullptr)) { We can now remove this can_reshape check, right? If we want to keep it as documentation, better to use an assert. src/hotspot/share/opto/memnode.cpp line 2224: > 2222: // anything that is not a load of a field/array element (like > 2223: // barriers etc.) alone > 2224: if (in(0) != nullptr && !adr_type()->isa_rawptr() && can_reshape) { We can now remove this can_reshape check, right? If we want to keep it as documentation, better to use an assert. src/hotspot/share/opto/memnode.cpp line 2256: > 2254: // the alias index stuff. So instead, peek through Stores and IFF we can > 2255: // fold up, do so. > 2256: Node* prev_mem = find_previous_store(phase); Previously, we reached here even if `!can_reshape`. We no longer do so due to the additional check above. Is this correct? If so, can you add a brief comment explaining this? src/hotspot/share/opto/memnode.cpp line 2259: > 2257: if (prev_mem != nullptr && prev_mem->is_top()) { > 2258: return prev_mem; > 2259: } Please add a comment explaining this addition. src/hotspot/share/opto/memnode.cpp line 3861: > 3859: if (prev_mem != nullptr && prev_mem->is_top()) { > 3860: return prev_mem; > 3861: } Please add a comment explaining this addition. ------------- PR Review: https://git.openjdk.org/jdk/pull/28812#pullrequestreview-3670244656 PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2698014261 PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2698019143 PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2698019025 PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2698021042 PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2698022539 From dlong at openjdk.org Fri Jan 16 11:15:17 2026 From: dlong at openjdk.org (Dean Long) Date: Fri, 16 Jan 2026 11:15:17 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v3] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 07:38:34 GMT, Emanuel Peter wrote: >> I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 >> >> In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 >> >> When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. >> >> At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. >> But it seems that nothing prevents the VM from compiling such an (unreachable) path. >> >> Here is how I think it happens: >> - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. >> - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. >> - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. >> >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 >> >> That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. >> >> **Update: ** instead of not intrinsifying, we chose a stronger path: we intrinsify it with a `HaltNode` that should never be encountered at runtime. The reproducer I have eventually even is able to fold away the `HaltNode`, so it turns out it is indeed a dead path. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > halt refactor by demand of reviewers Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29169#pullrequestreview-3670352719 From dlong at openjdk.org Fri Jan 16 11:15:20 2026 From: dlong at openjdk.org (Dean Long) Date: Fri, 16 Jan 2026 11:15:20 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 07:34:50 GMT, Emanuel Peter wrote: > Of course there is a small risk that I messed up something here, so please review carefully - we don't want to accidentally mess up HaltNodes and loose their asserting powers! Looks good to me, but I suspect we don't have great test coverage for HaltNodes, since they are never supposed to get executed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29169#issuecomment-3759537899 From epeter at openjdk.org Fri Jan 16 11:33:15 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 11:33:15 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v6] In-Reply-To: References: Message-ID: > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge 2205: false /* don't emit code in product, it is just a waste of code space */); We could be more explicit here, other `HaltNode` often gets folded, while a `Halt` after an uncommon trap is not, and it is a frequent occurrence. ------------- PR Review: https://git.openjdk.org/jdk/pull/29169#pullrequestreview-3670491963 PR Review Comment: https://git.openjdk.org/jdk/pull/29169#discussion_r2698200941 From epeter at openjdk.org Fri Jan 16 11:57:48 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 11:57:48 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v4] In-Reply-To: References: Message-ID: <0ACkKqpv7QYOl_JgsGfTRxefXJiwKYjS5QZIXhYMEv8=.f4f9a28c-a8f8-4ee8-8694-9bacdf192c3e@github.com> > I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 > > In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 > > When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. > > At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. > But it seems that nothing prevents the VM from compiling such an (unreachable) path. > > Here is how I think it happens: > - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. > - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. > - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. > > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 > > That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. > > **Update: ** instead of not intrinsifying, we chose a stronger path: we intrinsify it with a `HaltNode` that should never be encountered at runtime. The reproducer I have eventually even is able to fold away the `HaltNode`, so it turns out it is indeed a dead path. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: improve comments for merykitty ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29169/files - new: https://git.openjdk.org/jdk/pull/29169/files/486f8930..c1b42f62 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29169&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29169&range=02-03 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29169/head:pull/29169 PR: https://git.openjdk.org/jdk/pull/29169 From epeter at openjdk.org Fri Jan 16 11:57:50 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 11:57:50 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: Message-ID: <6JUNvUPpzxj0wkTx3mIB5XIwupSEsdJcu9uutK8HERs=.80b5ac85-3e8f-47e7-813f-b7816c3e30bf@github.com> On Fri, 16 Jan 2026 11:11:52 GMT, Dean Long wrote: >> @dean-long @merykitty @rose00 I did the refactor. We could now consider doing a separate refactor for the non-parsing use-cases of `HaltNode`, but that's out of scope. >> >> Of course there is a small risk that I messed up something here, so please review carefully - we don't want to accidentally mess up `HaltNode`s and loose their asserting powers! > >> Of course there is a small risk that I messed up something here, so please review carefully - we don't want to accidentally mess up HaltNodes and loose their asserting powers! > > Looks good to me, but I suspect we don't have great test coverage for HaltNodes, since they are never supposed to get executed. @dean-long Thanks for the approval! @merykitty I added some more comments for you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29169#issuecomment-3759697466 From qamai at openjdk.org Fri Jan 16 12:11:34 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 16 Jan 2026 12:11:34 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v18] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > ## The current PR: > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > ## Future work: > > 1. Fold a memory `Phi`. > > This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. > > 2. Fold a pointer `Phi`. > > Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: > > Point p1 = new Point; > Point p2 = new Point; > p1.x = v1; > p2.x = v2; > Point p = Phi(p1, p2); > int a = p.x; > > Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. > > Another interesting case: > > Point p = Phi(p1, p2); > p.x = v; > p1.x = v1; > int a = p.x; > > Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. > > 3. Nested objects > > It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: > > Point p = new Point; > PointHolder h = new PointHolder; > h.p = p; > int x = p.x; > escape(h); > > Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Fix dead accesses, address reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/c275e6e6..97297f8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=16-17 Stats: 27 lines in 1 file changed: 23 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From dbriemann at openjdk.org Fri Jan 16 12:13:27 2026 From: dbriemann at openjdk.org (David Briemann) Date: Fri, 16 Jan 2026 12:13:27 GMT Subject: RFR: 8375530: PPC64: incorrect quick verify_method_data_pointer check causes poor performance in debug build Message-ID: <7CAMgaI9MMOT9EaKu8uIdNaSU9kfUgnWrpjbqrO2yXw=.19d1849e-03e5-47d7-ac3c-8f77fd134a3c@github.com> Workaround for this fix is setting -XX:-VerifyDataPointer ------------- Commit messages: - 8375530: PPC64: incorrect quick verify_method_data_pointer check causes poor performance in debug build Changes: https://git.openjdk.org/jdk/pull/29279/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29279&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375530 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29279/head:pull/29279 PR: https://git.openjdk.org/jdk/pull/29279 From qamai at openjdk.org Fri Jan 16 12:16:34 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 16 Jan 2026 12:16:34 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: <2Fow0aJ6zpA2aTnACLAxW_d_O_OucOXKz08y-Lx7IBo=.73eeafd0-cf1d-4ce2-a438-89b51739f593@github.com> References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> <2Fow0aJ6zpA2aTnACLAxW_d_O_OucOXKz08y-Lx7IBo=.73eeafd0-cf1d-4ce2-a438-89b51739f593@github.com> Message-ID: On Thu, 15 Jan 2026 19:30:12 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: >> >> - Merge branch 'master' into loadfoldingigvn >> - Early return when not a heap access >> - Fix escape at store >> - Fix outdated and unclear comments >> - copyright year, return, comments, whitespace >> - Merge branch 'master' into loadfoldingigvn >> - ea of phis and nested objects >> - Add test scenarios >> - Add a flag to turn off the feature >> - Much more comments, refactor the data into a separate class >> - ... and 9 more: https://git.openjdk.org/jdk/compare/19c31186...c275e6e6 > > What amount of functional testing has been done? @iwanowww I have run the latest version with tier1-tier4 and hs-comp-stress. @dlunde Thanks for your comments, I have addressed them. > src/hotspot/share/opto/memnode.cpp line 714: > >> 712: bool is_known_instance = addr_t != nullptr && addr_t->is_known_instance_field(); >> 713: LocalEA local_ea(phase->is_IterGVN(), base); >> 714: TriBool has_not_escaped = is_known_instance ? TriBool(true) > > IMO `TriBool` doesn't hold its weight here. As an alternative, encapsulating caching logic inside `LocalEA` and unconditionally querying it for escape state would look cleaner and easier to reason about. I took a try, but I find it not compelling, the caching is the consequence of `find_previous_store` keeping walking from a node to its user. So, if a node does not observe that `base` has escaped, its user should not do so, either. Moving this logic to `LocalEA` seems not logical from that POV. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3759763066 PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2698287188 From krk at openjdk.org Fri Jan 16 12:29:48 2026 From: krk at openjdk.org (Kerem Kat) Date: Fri, 16 Jan 2026 12:29:48 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v2] In-Reply-To: References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: On Thu, 15 Jan 2026 18:16:13 GMT, Vladimir Ivanov wrote: >> Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplify expand_vbox_node_helper by merging VectorBox Phi handling > > src/hotspot/share/opto/vector.cpp line 335: > >> 333: // value-numbered to a single node if all inputs were identical. >> 334: if (vbox->is_Phi()) { >> 335: assert(!vect->is_Phi() || vbox->as_Phi()->region() == vect->as_Phi()->region(), ""); > > Isn't the assert too strong? I don't see why redundant phi elimination can't result in a dominating `Phi` node. > > So, the predicate to choose between `vect->in(i)` and `vect` is `vect->is_Phi() && vect->as_Phi()->region() == vbox->as_Phi()->region()`. Yes, I was going to fix that separately in https://github.com/krk/jdk/commit/43d0649aaafde05b1e516129e3249d8cb8aad5b9, for [JDK-8374903](https://bugs.openjdk.org/browse/JDK-8374903) I will merge the fixes and add the second issue to this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29200#discussion_r2698331270 From jbhateja at openjdk.org Fri Jan 16 12:50:39 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 16 Jan 2026 12:50:39 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: <3uKUR2sP_jgcwbuDsrH8iIf2V9b-N8H6bv_tJPWNL20=.74a841cb-df9b-443a-8bbe-4ea4e7dd9996@github.com> On Thu, 15 Jan 2026 07:23:41 GMT, Emanuel Peter wrote: >> @jatin-bhateja What do you think? > > Someone filed the RFE: https://bugs.openjdk.org/browse/JDK-8375321 I experience problems with auto-vectorization of reduction kernels for other box types also. Added an example in the JDK-8375321 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2698390695 From krk at openjdk.org Fri Jan 16 12:51:21 2026 From: krk at openjdk.org (Kerem Kat) Date: Fri, 16 Jan 2026 12:51:21 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v3] In-Reply-To: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: > The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into fix-c2-checkCastPP - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed - Simplify expand_vbox_node_helper by merging VectorBox Phi handling - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29200/files - new: https://git.openjdk.org/jdk/pull/29200/files/45b02913..6b3695cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29200&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29200&range=01-02 Stats: 20122 lines in 332 files changed: 10948 ins; 4776 del; 4398 mod Patch: https://git.openjdk.org/jdk/pull/29200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29200/head:pull/29200 PR: https://git.openjdk.org/jdk/pull/29200 From mdoerr at openjdk.org Fri Jan 16 13:11:48 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 16 Jan 2026 13:11:48 GMT Subject: RFR: 8375530: PPC64: incorrect quick verify_method_data_pointer check causes poor performance in debug build In-Reply-To: <7CAMgaI9MMOT9EaKu8uIdNaSU9kfUgnWrpjbqrO2yXw=.19d1849e-03e5-47d7-ac3c-8f77fd134a3c@github.com> References: <7CAMgaI9MMOT9EaKu8uIdNaSU9kfUgnWrpjbqrO2yXw=.19d1849e-03e5-47d7-ac3c-8f77fd134a3c@github.com> Message-ID: On Fri, 16 Jan 2026 12:06:31 GMT, David Briemann wrote: > Workaround for this fix is setting -XX:-VerifyDataPointer Looks good and trivial. Thanks for fixing it! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29279#pullrequestreview-3670789996 From dfenacci at openjdk.org Fri Jan 16 14:03:06 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 16 Jan 2026 14:03:06 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked In-Reply-To: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: On Wed, 14 Jan 2026 13:45:09 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` caches > intermediate results in `_dom_lca_tags` when the late control is > computed by `PhaseIdealLoop::get_late_ctrl()` for a node `n`: the code > iterates over all uses of `n` potentially calling > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` multiple > times. `_dom_lca_tags` is used to cache data that is specific to the > lca computation for `n`. `_dom_lca_tags` is set to a tag that depends > on `n` to mark the cached data as only valid during the lca > computation for `n`. > > `PhaseIdealLoop::try_sink_out_of_loop()` checks that all uses of a > node are out of loop with > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` which also needs to > consider anti-dependences for `Load`s and also calls > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` through > `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`: this computes the > late control for a node and one particular out of loop > use. `_dom_lca_tags` values computed by an earlier > `PhaseIdealLoop::get_late_ctrl()` should be ignored (because it > computes the late control for a node and all its uses). To address > that issue, the tag that's used by > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` is made > different on each call from > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` by incrementing > `_dom_lca_tags_round`. > > The issue here is that one `Load` node is input to a `Phi` twice. So > the `Phi` is considered twice as a use of the node along 2 different > paths. `PhaseIdealLoop::get_late_ctrl_with_anti_dep()` is called twice > from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` but > `_dom_lca_tags_round` is not incremented between the 2 > calls. `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` when > called for the second `Phi` input uses incorrect cached data which, in > turn, causes an incorrect computation. > > The fix I propose is to make sure `_dom_lca_tags_round` is incremented > for every call to `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`. Looks good to me too. Thanks a lot @rwestrel! (I just added a couple of very marginal nits) src/hotspot/share/opto/loopopts.cpp line 1959: > 1957: if (n->is_Load()) { > 1958: // We can't reuse tags in PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal() so make sure calls to > 1959: // get_late_ctrl_with_anti_dep() use their own tag just to emphasise this change we could write "make sure each call... its own tag" test/hotspot/jtreg/compiler/loopopts/TestSinkingLoadInputOfPhi.java line 40: > 38: static int iFld2 = 10; > 39: static void test() { > 40: long l1; It is a reduced fuzzer test but I happened to notice that `l1` doesn't seem to be used... ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/29231#pullrequestreview-3670996932 PR Review Comment: https://git.openjdk.org/jdk/pull/29231#discussion_r2698610998 PR Review Comment: https://git.openjdk.org/jdk/pull/29231#discussion_r2698618802 From dbriemann at openjdk.org Fri Jan 16 14:04:49 2026 From: dbriemann at openjdk.org (David Briemann) Date: Fri, 16 Jan 2026 14:04:49 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove Message-ID: Adds the following mach nodes: match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); ------------- Commit messages: - 8375536: PPC64: Implement special MachNodes for floating point CMove Changes: https://git.openjdk.org/jdk/pull/29281/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375536 Stats: 107 lines in 6 files changed: 100 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29281/head:pull/29281 PR: https://git.openjdk.org/jdk/pull/29281 From mhaessig at openjdk.org Fri Jan 16 14:47:18 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 16 Jan 2026 14:47:18 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 03:30:35 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the handling of `depends_only_on_test` when the control graph is transformed. It has to do with the theoretical idea of `depends_only_on_test`, copy from the JBS issue description: > > To start with, what is `depends_only_on_test`? Given a node `n` with the control input `c`, if `c` can be deduced from `c'` and `n->depends_only_on_test() == true`, then we can rewire the control input of `n` to `c'`. This means that `depends_only_on_test` does not mean that the node depends on a test, it means that the node depends on the test that is its control input. > > For example: > > if (y != 0) { > if (x > 0) { > if (y != 0) { > x / y; > } > } > } > > Then `x/y` `depends_only_on_test` because its control input is the test `y != 0`. Then, we can rewire the control input of the division to the outer `y != 0`, resulting in: > > if (y != 0) { > x / y; > if (x > 0) { > } > } > > On the other hand, consider this case: > > if (x > 0) { > if (y != 0) { > if (x > 0) { > x / y; > } > } > } > > Then `x/y` does not `depends_only_on_test` because its control input is the test `x > 0` which is unrelated, we can see that if we rewire the division to the outer `x > 0` test, the division floats above the actual test `y != 0`. This means that `depends_only_on_test` is a dynamic property of a node, and not a static property of the division operation. It can change when we transform the graph and it can be different for different nodes of the same kind. > > More details can be found in the description of `Node::depends_only_on_test` and `Node::pin_node_under_control` in this change. > > Please take a look and leave your reviews, thanks a lot. Thank you for working on this, @merykitty! As far as I can understand it, the fix looks good. I do have a question and a few nitpicks below. Is this change already tested well enough by the regression tests from JDK-8331717 and JDK-8257822? If so, please add `noreg-sqe`. Or are there possibly additional cases that could be covered? src/hotspot/share/opto/callnode.hpp line 123: > 121: virtual bool is_CFG() const { return true; } > 122: virtual uint hash() const { return NO_HASH; } // CFG nodes do not hash > 123: virtual bool depends_only_on_test() const { return false; } Why are you not replacing this with an implementation for depends_only_on_test_impl()? Same question for `RethrowNode`, `GotoNode`, and `RegionNode`. src/hotspot/share/opto/ifnode.cpp line 1577: > 1575: igvn->replace_input_of(s, 0, data_target); // Move child to data-target > 1576: if (prev_dom_not_imply_this && data_target != top) { > 1577: // If prev_dom_not_equivalent, s now depends on multiple tests with prev_dom being the I think you mean `prev_dom_not_imply_this` here? Also, it would be good to mention that this is to prevent floating above the dependent test. src/hotspot/share/opto/loopnode.hpp line 1684: > 1682: // Mark an IfNode as being dominated by a prior test, > 1683: // without actually altering the CFG (and hence IDOM info). > 1684: void dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip = false, bool pin_array_access_nodes = false); Please also rename this `pin_array_access_nodes`, since it is just wired through to `rewire_safe_outputs_to_dominator()` src/hotspot/share/opto/loopopts.cpp line 1724: > 1722: if (!would_sink_below_pre_loop_exit(loop_ctrl, outside_ctrl)) { > 1723: if (n->depends_only_on_test()) { > 1724: // If this node depends_only_on_test, it will be rewire to a control input that is not the correct test Suggestion: // If this node depends_only_on_test, it will be rewired to a control input that is not the correct test The same applies to the other changes in this file. src/hotspot/share/opto/memnode.hpp line 316: > 314: > 315: private: > 316: // depends_only_on_test is almost always true, and needs to be almost always Suggestion: // depends_only_on_test_impl is almost always true, and needs to be almost always ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/29158#pullrequestreview-3670265735 PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2698496945 PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2698448653 PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2698029174 PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2698367280 PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2698305305 From mhaessig at openjdk.org Fri Jan 16 14:47:19 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 16 Jan 2026 14:47:19 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 12:38:56 GMT, Manuel H?ssig wrote: >> Hi, >> >> This PR fixes the handling of `depends_only_on_test` when the control graph is transformed. It has to do with the theoretical idea of `depends_only_on_test`, copy from the JBS issue description: >> >> To start with, what is `depends_only_on_test`? Given a node `n` with the control input `c`, if `c` can be deduced from `c'` and `n->depends_only_on_test() == true`, then we can rewire the control input of `n` to `c'`. This means that `depends_only_on_test` does not mean that the node depends on a test, it means that the node depends on the test that is its control input. >> >> For example: >> >> if (y != 0) { >> if (x > 0) { >> if (y != 0) { >> x / y; >> } >> } >> } >> >> Then `x/y` `depends_only_on_test` because its control input is the test `y != 0`. Then, we can rewire the control input of the division to the outer `y != 0`, resulting in: >> >> if (y != 0) { >> x / y; >> if (x > 0) { >> } >> } >> >> On the other hand, consider this case: >> >> if (x > 0) { >> if (y != 0) { >> if (x > 0) { >> x / y; >> } >> } >> } >> >> Then `x/y` does not `depends_only_on_test` because its control input is the test `x > 0` which is unrelated, we can see that if we rewire the division to the outer `x > 0` test, the division floats above the actual test `y != 0`. This means that `depends_only_on_test` is a dynamic property of a node, and not a static property of the division operation. It can change when we transform the graph and it can be different for different nodes of the same kind. >> >> More details can be found in the description of `Node::depends_only_on_test` and `Node::pin_node_under_control` in this change. >> >> Please take a look and leave your reviews, thanks a lot. > > src/hotspot/share/opto/loopopts.cpp line 1724: > >> 1722: if (!would_sink_below_pre_loop_exit(loop_ctrl, outside_ctrl)) { >> 1723: if (n->depends_only_on_test()) { >> 1724: // If this node depends_only_on_test, it will be rewire to a control input that is not the correct test > > Suggestion: > > // If this node depends_only_on_test, it will be rewired to a control input that is not the correct test > > The same applies to the other changes in this file. Also, "[...] it will be rewired to a control that is dominated by the test it depends on to prevent it later floating above that test." or something along that line might be a bit more informative. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2698411124 From sviswanathan at openjdk.org Fri Jan 16 15:59:43 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 16 Jan 2026 15:59:43 GMT Subject: [jdk26] RFR: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 07:08:06 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [1d889b92](https://github.com/openjdk/jdk/commit/1d889b92bde5dfcb1fbe6cddb389a77f92eb1ce7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Volodymyr Paprotski on 15 Jan 2026 and was reviewed by Tobias Hartmann, Jatin Bhateja and Sandhya Viswanathan. > > Thanks! Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29263#pullrequestreview-3671578823 From epeter at openjdk.org Fri Jan 16 16:03:37 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Jan 2026 16:03:37 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v7] In-Reply-To: References: Message-ID: > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge 75: > 76: void reduceI(int opcode, Register dst, Register iSrc, VectorRegister vSrc, VectorRegister vTmp1, VectorRegister vTmp2); > 77: void cmovF(int cmpFlag, VectorSRegister dst, VectorSRegister op1, VectorSRegister op2, VectorSRegister src1, VectorSRegister src2, VectorSRegister tmp); The cpp file uses `cc`. Having it consistent would be better. A line break would be nice, too. src/hotspot/cpu/ppc/matcher_ppc.hpp line 70: > 68: > 69: // Suppress CMOVF for Power8 because there are no fast nodes. > 70: static int float_cmove_cost() {return (PowerArchitecturePPC64 >= 9) ? 0 : ConditionalMoveLimit; } Whitespace before `return` would be better. ------------- PR Review: https://git.openjdk.org/jdk/pull/29281#pullrequestreview-3671544232 PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2699042546 PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2699051341 PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2699058439 PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2699064310 From chagedorn at openjdk.org Fri Jan 16 16:19:24 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 Jan 2026 16:19:24 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v28] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 23:20:13 GMT, Kangcheng Xu wrote: >> There are quite some failures with the same assert (probably all related). Can be triggered, for example, by running `compiler/predicates/assertion/TestAssertionPredicates.java#NoLoopPredicationXbatch` with `-XX:+UseSerialGC`: >> >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/opt/mach5/mesos/work_dir/slaves/da1065b5-7b94-4f0d-85e9-a3a252b9a32e-S11864/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/c6afc1de-b432-44d4-bd71-2c035e46dc9e/runs/88cff2b5-6582-4c32-8cb2-92c8c5d2feeb/workspace/open/src/hotspot/share/opto/loopnode.hpp:1450), pid=182310, tid=182326 >> # Error: assert(!has_ctrl(n)) failed >> .......... >> Current CompileTask: >> C2:300 95 b 4 compiler.predicates.assertion.TestAssertionPredicates::testTrySplitUpNonOpaqueExpressionNode (163 bytes) >> >> Stack: [0x00007f27d75cc000,0x00007f27d76cc000], sp=0x00007f27d76c6b00, free space=1002k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x156bff8] PhaseIdealLoop::get_loop(Node const*) const+0x68 (loopnode.hpp:1450) >> V [libjvm.so+0x15a07f7] IdealLoopTree::remove_safepoints(PhaseIdealLoop*, bool)+0x167 (loopnode.cpp:4672) >> V [libjvm.so+0x15b7dee] IdealLoopTree::counted_loop(PhaseIdealLoop*)+0x11e (loopnode.cpp:4700) >> V [libjvm.so+0x15b7d7a] IdealLoopTree::counted_loop(PhaseIdealLoop*)+0xaa (loopnode.cpp:4719) >> V [libjvm.so+0x15b7d7a] IdealLoopTree::counted_loop(PhaseIdealLoop*)+0xaa (loopnode.cpp:4719) >> V [libjvm.so+0x15bcc07] PhaseIdealLoop::build_and_optimize()+0xaf7 (loopnode.cpp:5285) >> V [libjvm.so+0xbb8130] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x4c0 (loopnode.hpp:1226) >> V [libjvm.so+0xbb1995] Compile::Optimize()+0x685 (compile.cpp:2466) >> V [libjvm.so+0xbb5173] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x2023 (compile.cpp:862) >> V [libjvm.so+0x9cc3e8] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x498 (c2compiler.cpp:147) >> V [libjvm.so+0xbc4660] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x780 (compileBroker.cpp:2345) >> V [libjvm.so+0xbc5ec0] CompileBroker::compiler_thread_loop()+0x530 (compileBroker.cpp:1989) >> V [libjvm.so+0x112635b] JavaThread::thread_main_inner()+0x13b (javaThread.cpp:776) >> V [libjvm.so+0x1bb30b6] Thread::call_run()+0xb6 (thread.cpp:242) >> V [libjvm.so+0x1808c98] thread_native_entry(Thread*)+0x118 (os_linux.cpp:860) >> >>... > > @chhagedorn Sorry I made a mistake with safepoint detection. Upon inspecting the original code, `_safepoint` should be set to `null` if `.opcode() != Op_SafePoint`. This logic is missing from my refactored code. How the test only fails with `-XX:+UseSerialGC` is beyond me. > >> I will check next week if I can extract a reproducer to share. > Yes it is curious regarding the diff assert. I'll appreciate if you can share more information. Thank you very much! Thanks for addressing the issue @tabjy! Unfortunately, the closed test triggering the DIFF assert uses a closed jar that cannot be shared and does not provide source information. I will try to have a look myself at the failure and report back what I found next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3760795417 From qamai at openjdk.org Fri Jan 16 16:26:49 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 16 Jan 2026 16:26:49 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test [v2] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 14:45:02 GMT, Manuel H?ssig wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - more clarification >> - Refine comments > > Thank you for working on this, @merykitty! As far as I can understand it, the fix looks good. I do have a question and a few nitpicks below. > > Is this change already tested well enough by the regression tests from JDK-8331717 and JDK-8257822? If so, please add `noreg-sqe`. Or are there possibly additional cases that could be covered? @mhaessig Thanks for your reviews, I have addressed them. For the regression tests from JDK-8331717 and JDK-8257822, they are run by themselves as well as with multiple stress flags. For the details, regarding JDK-8331717, previously, the zero divisor check of `2 / i4` is removed because it is dominated by an equivalent check of `1 / i4`, however, the division is incorrectly wired to the immediate dominator of the removed test, which is the range check `iArr[0]`. Now, the division will be correctly rewired from the zero divisor check of `2 / i4` to the zero divisor check of `1 / i4`. As a result, loop predication will not bring the division out of the loop without also pulling the zero divisor check out with it. For JDK-8257822, `split_if` will pin the division, so it cannot be hoisted incorrectly. I tried to construct a test for this but did not succeed. Some ideas involve taking advantage of range check predication, or range check smearing, then having the resulting check be dominated by an equivalent check, leading to the `CastNode` floating above . But for the former, range check predication often means the the operation is loop-variant even if its control input is above the node, while the latter looks the dominating chain in the same way as when the `IfNode` is elided, so range check smearing would just use the dominating check anyway. Furthermore, the regression tests for the SIGFPE caused by division can be served as tests since they exercise that mechanism. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29158#issuecomment-3760824885 From mhaessig at openjdk.org Fri Jan 16 17:01:36 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 16 Jan 2026 17:01:36 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test [v2] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 15:59:00 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/callnode.hpp line 123: >> >>> 121: virtual bool is_CFG() const { return true; } >>> 122: virtual uint hash() const { return NO_HASH; } // CFG nodes do not hash >>> 123: virtual bool depends_only_on_test() const { return false; } >> >> Why are you not replacing this with an implementation for depends_only_on_test_impl()? Same question for `RethrowNode`, `GotoNode`, and `RegionNode`. > > Because they are CFG nodes, `depends_only_on_test` returns `false` without the need to invoke `depends_only_on_test_impl`. Ahh, I failed to make that connection. Thanks for the clarification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2699270968 From mhaessig at openjdk.org Fri Jan 16 17:06:14 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 16 Jan 2026 17:06:14 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test [v2] In-Reply-To: References: Message-ID: <4ogH6orx-PHGcuFUD_0QwD-Cn4CamTSUFfB2yLUfLCo=.bb7bfb70-1514-4b8b-a940-e80fc36ca71c@github.com> On Fri, 16 Jan 2026 16:04:49 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the handling of `depends_only_on_test` when the control graph is transformed. It has to do with the theoretical idea of `depends_only_on_test`, copy from the JBS issue description: >> >> To start with, what is `depends_only_on_test`? Given a node `n` with the control input `c`, if `c` can be deduced from `c'` and `n->depends_only_on_test() == true`, then we can rewire the control input of `n` to `c'`. This means that `depends_only_on_test` does not mean that the node depends on a test, it means that the node depends on the test that is its control input. >> >> For example: >> >> if (y != 0) { >> if (x > 0) { >> if (y != 0) { >> x / y; >> } >> } >> } >> >> Then `x/y` `depends_only_on_test` because its control input is the test `y != 0`. Then, we can rewire the control input of the division to the outer `y != 0`, resulting in: >> >> if (y != 0) { >> x / y; >> if (x > 0) { >> } >> } >> >> On the other hand, consider this case: >> >> if (x > 0) { >> if (y != 0) { >> if (x > 0) { >> x / y; >> } >> } >> } >> >> Then `x/y` does not `depends_only_on_test` because its control input is the test `x > 0` which is unrelated, we can see that if we rewire the division to the outer `x > 0` test, the division floats above the actual test `y != 0`. This means that `depends_only_on_test` is a dynamic property of a node, and not a static property of the division operation. It can change when we transform the graph and it can be different for different nodes of the same kind. >> >> More details can be found in the description of `Node::depends_only_on_test` and `Node::pin_node_under_control` in this change. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - more clarification > - Refine comments Thank you for addressing my comments and your thorough explanation of the testing situation. This does indeed seem like a situation where the test suite already exercises the new code paths. Nice work. Looks good. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/29158#pullrequestreview-3671837518 From mhaessig at openjdk.org Fri Jan 16 17:41:31 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 16 Jan 2026 17:41:31 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v6] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Fri, 19 Dec 2025 08:54:08 GMT, Beno?t Maillard wrote: >> src/hotspot/share/opto/phaseX.cpp line 1202: >> >>> 1200: tty->print_cr("%s", ss.as_string()); >>> 1201: >>> 1202: assert(false, "Missed Value optimization opportunity in PhaseIterGVN for %s", n->Name()); >> >> What if it gets called during CCP? Then it is not just a missed opportunity, but possibly a correctness problem. >> >> I wonder if we should have different assert messages here. We could even just pass a string into the method, either `IGVN` or `CCP`. >> >> What do you think? > > Good point, I didn't think of that. Passing a string into the method would be one solution. Another one would be to keep the `bool` return type for `verify_Value_for` and assert at the call site (just as it was before). I think this feels a bit more natural that passing an assert message as parameter. What do you think? Perhaps you could change the message to `... PhaseCCP ...` if --- in Java speak --- `this instanceof PhaseCCP` in addition to the comment you added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2699400064 From vlivanov at openjdk.org Fri Jan 16 19:11:21 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 16 Jan 2026 19:11:21 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v3] In-Reply-To: References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: <_T_TkLOqVewDuublSElHLE5B43Ez9cy4yWgxl2zT1Z8=.9db4e14a-feb1-49d0-bafe-ad0a696b75c6@github.com> On Fri, 16 Jan 2026 12:51:21 GMT, Kerem Kat wrote: >> The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into fix-c2-checkCastPP > - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed > - Simplify expand_vbox_node_helper by merging VectorBox Phi handling > - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded Looks good. I'll submit it for testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29200#issuecomment-3761411089 From vlivanov at openjdk.org Fri Jan 16 19:14:32 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 16 Jan 2026 19:14:32 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v2] In-Reply-To: References: Message-ID: <0maDYNpdQQVJIEWMt1PcO-MV1UzBNoJTphzLo078N4w=.de4645cb-acef-4627-b7a9-8813d016d307@github.com> On Fri, 16 Jan 2026 10:00:26 GMT, Xiaohong Gong wrote: >> ### Problem: >> >> Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: >> >> >> // A fatal error has been detected by the Java Runtime Environment: >> // >> // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 >> // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector >> // ... >> >> >> The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 >> >> ### Root Cause: >> >> The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. >> >> Here is the simplified ideal graph showing the crash scenario: >> >> >> Con #top >> | ConI >> \ / >> \ / >> VectorStoreMask >> | >> VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong >> >> >> ### Detailed Scenario: >> >> Following is the method in the test case that hits the assertion: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 >> >> This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. >> >> When compiling a specific test case such as: >> https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 >> >> the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: >> >> >> VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() >> / \ >> AddP \ >> | \ >> LoadNClass \ >> ConP #IntMaxMask | | >> \ | | >> \ DecodeN... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Check "top" and revert the assertion changes src/hotspot/share/opto/vectorIntrinsics.cpp line 625: > 623: } > 624: > 625: const TypeVect* mask_vt = TypeVect::makemask(elem_bt, num_elem); Doesn't the same reasoning apply to vector intrinsics? If `mask_vec` and `opd` aren't TOP, they should produce vector values. So, additional input validation should rule out the problematic scenario. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29057#discussion_r2699667388 From vlivanov at openjdk.org Fri Jan 16 19:25:08 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 16 Jan 2026 19:25:08 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> <2Fow0aJ6zpA2aTnACLAxW_d_O_OucOXKz08y-Lx7IBo=.73eeafd0-cf1d-4ce2-a438-89b51739f593@github.com> Message-ID: On Fri, 16 Jan 2026 12:11:05 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/memnode.cpp line 714: >> >>> 712: bool is_known_instance = addr_t != nullptr && addr_t->is_known_instance_field(); >>> 713: LocalEA local_ea(phase->is_IterGVN(), base); >>> 714: TriBool has_not_escaped = is_known_instance ? TriBool(true) >> >> IMO `TriBool` doesn't hold its weight here. As an alternative, encapsulating caching logic inside `LocalEA` and unconditionally querying it for escape state would look cleaner and easier to reason about. > > I took a try, but I find it not compelling, the caching is the consequence of `find_previous_store` keeping walking from a node to its input. So, if a node does not observe that `base` has escaped, its input should not do so, either. Moving this logic to `LocalEA` seems not logical from that POV. `LocalEA` is already stateful and there are asserts in place to ensure that cached escape state agrees with current control (`local_ea.not_escaped_controls().member(ctrl)`). The assert can be turned into a dynamic check inside `MemNode::LocalEA::check_escape_status()` to report cached (non-escaping) state when queried control dominates cached one (and, hence, should be recorded in `LocalEA::_not_escaped_controls`). That should be equivalent to the current logic (modulo the dynamic check). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2699696483 From vlivanov at openjdk.org Fri Jan 16 19:29:00 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 16 Jan 2026 19:29:00 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> <2Fow0aJ6zpA2aTnACLAxW_d_O_OucOXKz08y-Lx7IBo=.73eeafd0-cf1d-4ce2-a438-89b51739f593@github.com> Message-ID: On Fri, 16 Jan 2026 12:12:31 GMT, Quan Anh Mai wrote: > I have run the latest version with tier1-tier4 and hs-comp-stress. IMO tier1-tier4 is a bare minimum. For anything non-trivial I'd recommend to test it up to tier6. But in this particular case, it makes sense to test it up to tier8 (or even tier10) at least once to avoid any surprises after integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3761474390 From vlivanov at openjdk.org Fri Jan 16 20:36:18 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 16 Jan 2026 20:36:18 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v2] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 18:33:29 GMT, Vladimir Ivanov wrote: >> Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - split long line >> - Merge remote-tracking branch 'upstream/master' into 8374807 >> - fix 8374807 > > src/hotspot/share/runtime/deoptimization.cpp line 2161: > >> 2159: Mutex::_no_safepoint_check_flag); >> 2160: >> 2161: ttyLocker ttyl; > > Does the code still need `ttyLocker`? > > There's only one usage of `tty` and it prints all accumulated info all at once. `xtty` already annotates output with thread info. So, I'd assume that moving `trap_mdo->extra_data_lock()` locker to `trap_mdo` accesses should solve the problem as well. > > (I'm not sure whether a `ttyLocker` is needed or not to avoid interleaving during `tty->print_raw(st.freeze());`, but `ttyLocker` can be placed right before it.) I take my suggestion back. Sorry for the confusion. The code in question populates complex XML structure, so locking is needed to ensure the resulting XML is well-formed. Your previous version looks fine to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2699884865 From kxu at openjdk.org Fri Jan 16 20:33:04 2026 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 16 Jan 2026 20:33:04 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v28] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 16:16:04 GMT, Christian Hagedorn wrote: >> @chhagedorn Sorry I made a mistake with safepoint detection. Upon inspecting the original code, `_safepoint` should be set to `null` if `.opcode() != Op_SafePoint`. This logic is missing from my refactored code. How the test only fails with `-XX:+UseSerialGC` is beyond me. >> >>> I will check next week if I can extract a reproducer to share. >> Yes it is curious regarding the diff assert. I'll appreciate if you can share more information. Thank you very much! > > Thanks for addressing the issue @tabjy! Unfortunately, the closed test triggering the DIFF assert uses a closed jar that cannot be shared and does not provide source information. I will try to have a look myself at the failure and report back what I found next week. Understood. Thank you for looking into it @chhagedorn! I really appreciate it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3761710121 From vlivanov at openjdk.org Fri Jan 16 21:36:48 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 16 Jan 2026 21:36:48 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4] In-Reply-To: <5lhbA8iGLm8PeGmQgHaW06-AyfmAzzHUXfDjyV6RF5k=.531bbd83-61f6-4d66-99a4-e9ddbc59e142@github.com> References: <-nGdkKB4aDkSwWlygwqkQ4tJ5s0DUwCfhyugVZtEDZk=.c4db9089-0dfa-4dca-a90e-d31639dd617a@github.com> <5lhbA8iGLm8PeGmQgHaW06-AyfmAzzHUXfDjyV6RF5k=.531bbd83-61f6-4d66-99a4-e9ddbc59e142@github.com> Message-ID: On Fri, 16 Jan 2026 07:50:26 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 97: >> >>> 95: framework.addFlags("--add-modules=jdk.incubator.vector", "-XX:CompileCommand=inline,*VectorAlgorithmsImpl::*"); >>> 96: switch (args[0]) { >>> 97: case "vanilla" -> { /* no extra flags */ } >> >> It would be more flexible to let arbitrary VM flags to be appended. > > What exactly are you suggesting here? Are you suggesting that instead of doing: > > ` * @run driver ${test.main.class} noSuperWord` > we could do > ` * @run driver ${test.main.class} -XX:-OptimizeFill` > > And then just `framework.addFlags(args)`? Yes. Or introduce a property. >> test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 218: >> >>> 216: >>> 217: // X4 oop setup. >>> 218: oopsX4 = new int[size]; >> >> Any particular reason to keep input data initialization duplicated between test and benchmark modes? > > Yes. Because they are not exactly the same. One is designed to test the implementation, the other to deliver reasonably stable and meaningful benchmarks. > > Example of some differences: > - In the "test" environment, I have access to test libraries like `Generators.java`, which are better at generating edge-cases than regular `Random`. > - In the benchmark, the size is a fixed parameter `SEED`. In the test, it is a randomly chosen value, so we can test better for alignment/drain/post loops. > - `eI` can be chosen randomly in each iteration of the test. But for the benchmark it is better if we have an array of values to chose from, so that we can pick different values for each benchmark invocation. > > But there is still a lot of overlap. I could try to split it into a "shared" and "local" part, and stuff the "shared" part into `VectorAlgorithmsImpl.Data`. @iwanowww Do you think that is worth it? Ok, up to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2700028889 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2700032354 From vlivanov at openjdk.org Fri Jan 16 21:36:49 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 16 Jan 2026 21:36:49 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4] In-Reply-To: <-8X3rKDGWGBNYCWyn5GwB1HobhPkwXCRXwgMA1LsvkI=.75dc60a2-f2f0-4b8c-82ec-b87a50e17b1d@github.com> References: <-nGdkKB4aDkSwWlygwqkQ4tJ5s0DUwCfhyugVZtEDZk=.c4db9089-0dfa-4dca-a90e-d31639dd617a@github.com> <5lhbA8iGLm8PeGmQgHaW06-AyfmAzzHUXfDjyV6RF5k=.531bbd83-61f6-4d66-99a4-e9ddbc59e142@github.com> <-8X3rKDGWGBNYCWyn5GwB1HobhPkwXCRXwgMA1LsvkI=.75dc60a2-f2f0-4b8c-82ec-b87a50e17b1d@github.com> Message-ID: On Fri, 16 Jan 2026 08:16:55 GMT, Emanuel Peter wrote: >>>Maybe resetting inputs between forks is a good compromise. >> >> Given the 3 options for `Level`, `Iteration` is in the middle, so that would be the compromise ;) >> >> If I did go with per-fork `Setup`, I would have to have `50` forks, which mean we would have to do warmup for each of the `50` forks. It would drive up the runtime quite a lot. > >>Also, it makes it harder to reproduce input dependent variance. > > I suppose my whole goal was to eliminate input dependent variance as far as possible. Do you think it would be better to make input dependent variance measurable at the `Iteration` level? I fear that this will make the variance of the benchmark very large, and the results of a fork would be quite noisy. My personal preference is `Level.Trial`. When I work with a microbenchmark, the fewer "moving parts" it has the better. It's easy to spot fork-to-fork variance. Not so much with iteration variance where a single outlier can be caused by many factors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2700027532 From duke at openjdk.org Fri Jan 16 22:50:12 2026 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 16 Jan 2026 22:50:12 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: References: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> <28KSbLDo353fDhRsW-5aaLpYvQ9XGPnyOqO1YN_LnPs=.eaa3ddbf-9228-4fce-a973-dda88d5abf75@github.com> Message-ID: On Sun, 11 Jan 2026 09:33:43 GMT, Jatin Bhateja wrote: >> Just a note on LoopAlignment, there are multiple moving parts here, first aligning starting addresses of loop to 64 ([recommendation from Zen5 optimization guide](https://docs.amd.com/v/u/en-US/58455_1.00) section 2.8.3) ensure small loop bodies are not split-across the cache line, if that happens then there is a cold entry penalty in the first iteration of loop, where front-end will have to read multiple L1I cache lines, once its decoded and uops are part of Op-cache (AMD) or DSB (Intel). There onwards uops stream for successive loop iterations are issued from op-cache. Since op-cache is shared b/w 2 HW threads in SMT configuration hence in case of noisy neighbor scenarios or context-switches we may hit cold-entry penalty during lifetime of loop. >> >> So its advisable to add alignment in this case for other labels before loops we already have OptoLoopAlignment in place. > >> > Better to align loop sarting address to OptoLoopAlignment >> >> For parity, should I do this for the other labels in the file as well? >> >> > I will run the micro benchmark on AMD Turin and report back by early next week. >> >> That would be great, thank you for doing this! > > Here are the score on Turin. > > > Baseline: > Benchmark (algorithm) (keyLength) (provider) Mode Cnt Score Error Units > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-512 0 thrpt 2 62235.790 ops/s > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-768 0 thrpt 2 38238.390 ops/s > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-1024 0 thrpt 2 24725.512 ops/s > > Withopt: > Benchmark (algorithm) (keyLength) (provider) Mode Cnt Score Error Units > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-512 0 thrpt 2 62483.697 ops/s > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-768 0 thrpt 2 38464.272 ops/s > KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-1024 0 thrpt 2 24702.044 ops/s > > > > Baseline: > Benchmark (algorithm) (provider) Mode Cnt Score Error Units > KEMBench.decapsulate ML-KEM-512 thrpt 2 46416.479 ops/s > KEMBench.decapsulate ML-KEM-768 thrpt 2 28516.289 ops/s > KEMBench.decapsulate ML-KEM-1024 thrpt 2 19250.020 ops/s > KEMBench.encapsulate ML-KEM-512 thrpt 2 60374.724 ops/s > KEMBench.encapsulate ML-KEM-768 thrpt 2 36226.100 ops/s > KEMBench.encapsulate ML-KEM-1024 thrpt 2 23656.223 ops/s > > Withopt: > Benchmark (algorithm) (provider) Mode Cnt Score Error Units > KEMBench.decapsulate ML-KEM-512 thrpt 2 46730.153 ops/s > KEMBench.decapsulate ML-KEM-768 thrpt 2 28650.349 ops/s > KEMBench.decapsulate ML-KEM-1024 thrpt 2 19390.927 ops/s > KEMBench.encapsulate ML-KEM-512 thrpt 2 60238.211 ops/s > KEMBench.encapsulate ML-KEM-768 thrpt 2 36454.138 ops/s > KEMBench.encapsulate ML-KEM-1024 thrpt 2 23649.839 ops/s > > > System was... @jatin-bhateja are there any outstanding issues with this PR? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2700237409 From qamai at openjdk.org Sat Jan 17 00:05:21 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 17 Jan 2026 00:05:21 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> <2Fow0aJ6zpA2aTnACLAxW_d_O_OucOXKz08y-Lx7IBo=.73eeafd0-cf1d-4ce2-a438-89b51739f593@github.com> Message-ID: On Fri, 16 Jan 2026 19:25:39 GMT, Vladimir Ivanov wrote: >> @iwanowww I have run the latest version with tier1-tier4 and hs-comp-stress. >> @dlunde Thanks for your comments, I have addressed them. > >> I have run the latest version with tier1-tier4 and hs-comp-stress. > > IMO tier1-tier4 is a bare minimum. For anything non-trivial I'd recommend to test it up to tier6. But in this particular case, it makes sense to test it up to tier8 (or even tier10) at least once to avoid any surprises after integration. @iwanowww Thanks for your advice, I will perform more thorough testing when the reviews are somewhat content with the implementation. >> I took a try, but I find it not compelling, the caching is the consequence of `find_previous_store` keeping walking from a node to its input. So, if a node does not observe that `base` has escaped, its input should not do so, either. Moving this logic to `LocalEA` seems not logical from that POV. > > `LocalEA` is already stateful and there are asserts in place to ensure that cached escape state agrees with current control (`local_ea.not_escaped_controls().member(ctrl)`). The assert can be turned into a dynamic check inside `MemNode::LocalEA::check_escape_status()` to report cached (non-escaping) state when queried control dominates cached one (and, hence, should be recorded in `LocalEA::_not_escaped_controls`). That should be equivalent to the current logic (modulo the dynamic check). Do you mean adding an early return in `check_escape_control` when the queried control is a transitive input of the cached one like this: if (_not_escaped_controls.member(ctl)) { return NOT_ESCAPED; } I think it is correct to do so, but an assert that `_not_escaped_controls` does contain the `ctl` is a little bit stronger in terms of strictness. Moving this assert into `check_escape_status` will make it harder to reuse a `LocalEA` across multiple calls of `find_previous_store`. This is useful, for example, when the load is from a memory `Phi`, and we try to follow the `Phi` inputs to find the stored value along different paths of the merge. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3762253612 PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2700352410 From duke at openjdk.org Sat Jan 17 04:32:10 2026 From: duke at openjdk.org (Ruben) Date: Sat, 17 Jan 2026 04:32:10 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet [v6] In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: <7Qddv1kL7Xb9bnDSAevGwSGnEfkpxBQtf_9OKEYfoXE=.5a378a7f-682c-4695-bdb0-26537e6b836c@github.com> On Wed, 19 Nov 2025 12:52:37 GMT, Samuel Chee wrote: >> AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: >> >> ;; cmpxchg { >> 0x0000e708d144cf60: mov x8, x2 >> 0x0000e708d144cf64: casal x8, x3, [x0] >> 0x0000e708d144cf68: cmp x8, x2 >> ;; 0x1F1F1F1F1F1F1F1F >> 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f >> ;; } cmpxchg >> 0x0000e708d144cf70: cset x8, ne // ne = any >> 0x0000e708d144cf74: dmb ish >> >> >> According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] >> >>> Atomically sets the value of a variable to the >>> newValue with the memory semantics of setVolatile if >>> the variable's current value, referred to as the witness >>> value, == the expectedValue, as accessed with the memory >>> semantics of getVolatile. >> >> >> >> Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. >> >> Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) >> >> This is also reflected by C2 not having a dmb for the same respective method. >> >> [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) >> [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) > > Samuel Chee has updated the pull request incrementally with one additional commit since the last revision: > > Add "/*with_barrier*/" comments I've opened another PR from the same branch: https://github.com/openjdk/jdk/pull/29287 ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3762638821 From jbhateja at openjdk.org Sat Jan 17 05:58:30 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 17 Jan 2026 05:58:30 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v5] In-Reply-To: References: Message-ID: On Sun, 11 Jan 2026 06:59:20 GMT, Shawn M Emery wrote: >> This change allows use of the AVX512_VBMI instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.4 to 0.5%, encapsulation is 0.2 to 1.7%, and decapsulation is 0.3 to 2.0%. >> >> Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Update to use OptoLoopAlignment for VBMILoop Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28815#pullrequestreview-3673509627 From jbhateja at openjdk.org Sat Jan 17 05:58:30 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 17 Jan 2026 05:58:30 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: References: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> <28KSbLDo353fDhRsW-5aaLpYvQ9XGPnyOqO1YN_LnPs=.eaa3ddbf-9228-4fce-a973-dda88d5abf75@github.com> Message-ID: On Fri, 16 Jan 2026 22:46:33 GMT, Shawn M Emery wrote: >>> > Better to align loop sarting address to OptoLoopAlignment >>> >>> For parity, should I do this for the other labels in the file as well? >>> >>> > I will run the micro benchmark on AMD Turin and report back by early next week. >>> >>> That would be great, thank you for doing this! >> >> Here are the score on Turin. >> >> >> Baseline: >> Benchmark (algorithm) (keyLength) (provider) Mode Cnt Score Error Units >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-512 0 thrpt 2 62235.790 ops/s >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-768 0 thrpt 2 38238.390 ops/s >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-1024 0 thrpt 2 24725.512 ops/s >> >> Withopt: >> Benchmark (algorithm) (keyLength) (provider) Mode Cnt Score Error Units >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-512 0 thrpt 2 62483.697 ops/s >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-768 0 thrpt 2 38464.272 ops/s >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-1024 0 thrpt 2 24702.044 ops/s >> >> >> >> Baseline: >> Benchmark (algorithm) (provider) Mode Cnt Score Error Units >> KEMBench.decapsulate ML-KEM-512 thrpt 2 46416.479 ops/s >> KEMBench.decapsulate ML-KEM-768 thrpt 2 28516.289 ops/s >> KEMBench.decapsulate ML-KEM-1024 thrpt 2 19250.020 ops/s >> KEMBench.encapsulate ML-KEM-512 thrpt 2 60374.724 ops/s >> KEMBench.encapsulate ML-KEM-768 thrpt 2 36226.100 ops/s >> KEMBench.encapsulate ML-KEM-1024 thrpt 2 23656.223 ops/s >> >> Withopt: >> Benchmark (algorithm) (provider) Mode Cnt Score Error Units >> KEMBench.decapsulate ML-KEM-512 thrpt 2 46730.153 ops/s >> KEMBench.decapsulate ML-KEM-768 thrpt 2 28650.349 ops/s >> KEMBench.decapsulate ML-KEM-1024 thrpt 2 19390.927 ops/s >> KEMBench.encapsulate ML-KEM-512 thrpt 2 60238.211 ops/s >> KEMBench.encapsulate ML-KEM-768 thrpt 2 36454.138 ops/s >> KEMBench.encapsulat... > > @jatin-bhateja are there any outstanding issues with this PR? Thanks for the sharp details and analysis. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2700672393 From duke at openjdk.org Sat Jan 17 06:08:33 2026 From: duke at openjdk.org (Shawn M Emery) Date: Sat, 17 Jan 2026 06:08:33 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4] In-Reply-To: References: <-zQuk2uNHFvht7KASCP7OUrLploN8tVFFAldVmkzcuo=.0249f1ba-e2c9-4b33-b438-4fb7f2edf4c6@github.com> <28KSbLDo353fDhRsW-5aaLpYvQ9XGPnyOqO1YN_LnPs=.eaa3ddbf-9228-4fce-a973-dda88d5abf75@github.com> Message-ID: On Sat, 17 Jan 2026 05:54:32 GMT, Jatin Bhateja wrote: >> @jatin-bhateja are there any outstanding issues with this PR? > > Thanks for the sharp details and analysis. Thank you for your comments and review @jatin-bhateja. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2700677869 From duke at openjdk.org Sat Jan 17 06:28:25 2026 From: duke at openjdk.org (duke) Date: Sat, 17 Jan 2026 06:28:25 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v5] In-Reply-To: References: Message-ID: On Sun, 11 Jan 2026 06:59:20 GMT, Shawn M Emery wrote: >> This change allows use of the AVX512_VBMI instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.4 to 0.5%, encapsulation is 0.2 to 1.7%, and decapsulation is 0.3 to 2.0%. >> >> Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. > > Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision: > > Update to use OptoLoopAlignment for VBMILoop @smemery Your change (at version f278a63fff4a9f268803a1e2e5fbad260d29d11c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28815#issuecomment-3762780881 From duke at openjdk.org Sat Jan 17 11:11:21 2026 From: duke at openjdk.org (Shawn M Emery) Date: Sat, 17 Jan 2026 11:11:21 GMT Subject: Integrated: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI In-Reply-To: References: Message-ID: On Sun, 14 Dec 2025 04:56:39 GMT, Shawn M Emery wrote: > This change allows use of the AVX512_VBMI instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.4 to 0.5%, encapsulation is 0.2 to 1.7%, and decapsulation is 0.3 to 2.0%. > > Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. This pull request has now been integrated. Changeset: a0e6f028 Author: Shawn M Emery Committer: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/a0e6f028a8952f61d9115f7bdf04b8a87f8ebba4 Stats: 90 lines in 1 file changed: 88 ins; 0 del; 2 mod 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI Co-authored-by: Sandhya Viswanathan Reviewed-by: jbhateja, vpaprotski ------------- PR: https://git.openjdk.org/jdk/pull/28815 From jbhateja at openjdk.org Sat Jan 17 11:42:32 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 17 Jan 2026 11:42:32 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v12] In-Reply-To: <8Z84JAkAC6yVFA_1j82FXuoqn1Gu5qQLBlgbcVDAuLQ=.ec5f98c0-42de-4395-a46e-bb2b0be3c12a@github.com> References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> <8Z84JAkAC6yVFA_1j82FXuoqn1Gu5qQLBlgbcVDAuLQ=.ec5f98c0-42de-4395-a46e-bb2b0be3c12a@github.com> Message-ID: On Fri, 19 Dec 2025 22:48:50 GMT, Paul Sandoz wrote: >> Hi @PaulSandoz , your comments have been addressed. Please let us know if there are other comments. >> Hi @eme64 , Kindly share your comments. > >> @jatin-bhateja Thanks for the ping! I'll put this on the list for review early in 2026 :) > > Same here! Hi @PaulSandoz , your comments have been addressed ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3763199964 From qamai at openjdk.org Sat Jan 17 13:17:29 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 17 Jan 2026 13:17:29 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked In-Reply-To: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: On Wed, 14 Jan 2026 13:45:09 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` caches > intermediate results in `_dom_lca_tags` when the late control is > computed by `PhaseIdealLoop::get_late_ctrl()` for a node `n`: the code > iterates over all uses of `n` potentially calling > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` multiple > times. `_dom_lca_tags` is used to cache data that is specific to the > lca computation for `n`. `_dom_lca_tags` is set to a tag that depends > on `n` to mark the cached data as only valid during the lca > computation for `n`. > > `PhaseIdealLoop::try_sink_out_of_loop()` checks that all uses of a > node are out of loop with > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` which also needs to > consider anti-dependences for `Load`s and also calls > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` through > `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`: this computes the > late control for a node and one particular out of loop > use. `_dom_lca_tags` values computed by an earlier > `PhaseIdealLoop::get_late_ctrl()` should be ignored (because it > computes the late control for a node and all its uses). To address > that issue, the tag that's used by > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` is made > different on each call from > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` by incrementing > `_dom_lca_tags_round`. > > The issue here is that one `Load` node is input to a `Phi` twice. So > the `Phi` is considered twice as a use of the node along 2 different > paths. `PhaseIdealLoop::get_late_ctrl_with_anti_dep()` is called twice > from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` but > `_dom_lca_tags_round` is not incremented between the 2 > calls. `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` when > called for the second `Phi` input uses incorrect cached data which, in > turn, causes an incorrect computation. > > The fix I propose is to make sure `_dom_lca_tags_round` is incremented > for every call to `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`. Test results look good ------------- PR Comment: https://git.openjdk.org/jdk/pull/29231#issuecomment-3763657794 From ghan at openjdk.org Sat Jan 17 13:31:43 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Sat, 17 Jan 2026 13:31:43 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: References: Message-ID: > Please review this change. Thanks! > > Description: > > This change fixes a crash in Deoptimization::uncommon_trap_inner when running with -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp. > > With -XX:-ProfileTraps, create_if_missing is set to false. > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2121-L2122 > > When create_if_missing is false, new mdo can not be created when m()->method_data() return null, so get_method_data() may return null . > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L1911-L1912 > > and trap_mdo can be null as a result > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2134-L2136 > > The crash happens here because trap_mdo is null > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2157 > > Fix: > > The fix makes acquisition of extra_data_lock conditional on trap_mdo being non-null, preserving the required lock ordering (extra_data_lock before ttyLocker) when an MDO exists. > > Test: > > GHA Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - revert - Merge remote-tracking branch 'upstream/master' into 8374807 - narrow lock scope - Merge remote-tracking branch 'upstream/master' into 8374807 - split long line - Merge remote-tracking branch 'upstream/master' into 8374807 - fix 8374807 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29147/files - new: https://git.openjdk.org/jdk/pull/29147/files/cdc88af1..e065ae56 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29147&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29147&range=02-03 Stats: 4227 lines in 70 files changed: 2638 ins; 859 del; 730 mod Patch: https://git.openjdk.org/jdk/pull/29147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29147/head:pull/29147 PR: https://git.openjdk.org/jdk/pull/29147 From jkratochvil at openjdk.org Sat Jan 17 20:39:11 2026 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Sat, 17 Jan 2026 20:39:11 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v6] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 01:33:37 GMT, Chad Rakoczy wrote: >> ### Summary >> This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. >> >> ### Description >> The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. >> >> Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. >> >> The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. >> >> The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). >> >> Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. >> >> ### Performance >> Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. >> >> ### Testing >> * CodeCache tests have been updated to cover the new `HotCodeHeap`. >> * Added ded... > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix builds I see a FAIL for `test/jdk/jdk/jfr/event/compiler/TestCodeCacheFull.java` on linux64 fastdebug: Error occurred during initialization of VM HotCodeHeapSize requires HotCodeHeap enabled But GHA does not report it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27858#issuecomment-3764312804 From ghan at openjdk.org Sun Jan 18 01:47:26 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Sun, 18 Jan 2026 01:47:26 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v2] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 20:32:34 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 2161: >> >>> 2159: Mutex::_no_safepoint_check_flag); >>> 2160: >>> 2161: ttyLocker ttyl; >> >> Does the code still need `ttyLocker`? >> >> There's only one usage of `tty` and it prints all accumulated info all at once. `xtty` already annotates output with thread info. So, I'd assume that moving `trap_mdo->extra_data_lock()` locker to `trap_mdo` accesses should solve the problem as well. >> >> (I'm not sure whether a `ttyLocker` is needed or not to avoid interleaving during `tty->print_raw(st.freeze());`, but `ttyLocker` can be placed right before it.) > > I take my suggestion back. Sorry for the confusion. > > The code in question populates complex XML structure, so locking is needed to ensure the resulting XML is well-formed. > > Your previous version looks fine to me. Hi @iwanowww, I have restored the previous version. Could you please review it again? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2701935133 From ghan at openjdk.org Sun Jan 18 02:41:58 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Sun, 18 Jan 2026 02:41:58 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v7] In-Reply-To: References: Message-ID: > Please review this change. Thanks! > > **Description:** > > When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 > In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 > Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. > > **Fix:** > > Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. > To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. > > **Test:** > > GHA Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - revert - Merge remote-tracking branch 'upstream/master' into 8374862 - fix a compile error - remove unnecessary blank line - correct copyright year - Add outputStream::is_buffered() - change variable name - Merge remote-tracking branch 'upstream/master' into 8374862 - fix a compile error - fix 8374862 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29186/files - new: https://git.openjdk.org/jdk/pull/29186/files/ea011598..35cb8b9f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29186&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29186&range=05-06 Stats: 22847 lines in 338 files changed: 12751 ins; 5268 del; 4828 mod Patch: https://git.openjdk.org/jdk/pull/29186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29186/head:pull/29186 PR: https://git.openjdk.org/jdk/pull/29186 From ghan at openjdk.org Sun Jan 18 02:47:08 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Sun, 18 Jan 2026 02:47:08 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v8] In-Reply-To: References: Message-ID: > Please review this change. Thanks! > > **Description:** > > When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 > In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 > Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. > > **Fix:** > > Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. > To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. > > **Test:** > > GHA Guanqiang Han has updated the pull request incrementally with one additional commit since the last revision: remove unused code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29186/files - new: https://git.openjdk.org/jdk/pull/29186/files/35cb8b9f..601ff0da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29186&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29186&range=06-07 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29186/head:pull/29186 PR: https://git.openjdk.org/jdk/pull/29186 From ghan at openjdk.org Mon Jan 19 00:46:34 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Mon, 19 Jan 2026 00:46:34 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v6] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 09:17:55 GMT, David Holmes wrote: >> I agree with the concern here. The buffering we need is local to this call site to keep the output coherent (collect everything and print once). >> Whether we need to buffer/accumulate output for coherence is scenario-dependent, rather than a property that should permanently classify a stream type as ?buffered? vs. ?unbuffered?. >> @dean-long what?s your view on this? > > No - sorry I forgot that you have to add override to all methods. Hi @dholmes-ora @dean-long, I have restored the previous version. Could you please review it again? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2702915180 From dholmes at openjdk.org Mon Jan 19 02:33:17 2026 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Jan 2026 02:33:17 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v8] In-Reply-To: References: Message-ID: On Sun, 18 Jan 2026 02:47:08 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> **Description:** >> >> When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 >> In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 >> Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. >> >> **Fix:** >> >> Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. >> To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. >> >> **Test:** >> >> GHA > > Guanqiang Han has updated the pull request incrementally with one additional commit since the last revision: > > remove unused code Changes requested by dholmes (Reviewer). src/hotspot/share/interpreter/bytecodeTracer.cpp line 195: > 193: #endif > 194: > 195: void BytecodeTracer::print_method_codes(const methodHandle& method, int from, int to, outputStream* st, int flags, bool coherent_output) { I don't like the new `coherent_output` name sorry. The output will always be "coherent" - either via the passed in stringStream, or the internal StringStream. ------------- PR Review: https://git.openjdk.org/jdk/pull/29186#pullrequestreview-3676046573 PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2703023849 From xgong at openjdk.org Mon Jan 19 03:09:02 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 19 Jan 2026 03:09:02 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v2] In-Reply-To: <0maDYNpdQQVJIEWMt1PcO-MV1UzBNoJTphzLo078N4w=.de4645cb-acef-4627-b7a9-8813d016d307@github.com> References: <0maDYNpdQQVJIEWMt1PcO-MV1UzBNoJTphzLo078N4w=.de4645cb-acef-4627-b7a9-8813d016d307@github.com> Message-ID: On Fri, 16 Jan 2026 19:10:54 GMT, Vladimir Ivanov wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Check "top" and revert the assertion changes > > src/hotspot/share/opto/vectorIntrinsics.cpp line 625: > >> 623: } >> 624: >> 625: const TypeVect* mask_vt = TypeVect::makemask(elem_bt, num_elem); > > Doesn't the same reasoning apply to vector intrinsics? If `mask_vec` and `opd` aren't TOP, they should produce vector values. So, additional input validation should rule out the problematic scenario. Vector intrinsics looks safer to me now. The APIs are inlined in an even earlier optimization stage, and the nodes are almost new created ones. Regarding to `unbox_vector()`, it either returns a new created `VectorUnboxNode` or a GVN transformed node of `VectorUnbox`. Currently there is not `TOP` input check for `VectorUnboxNode` itself during GVN. It might be an issue that we need to revisit once we add such checks for vector nodes. I agree with that additional input validation should be better. We can abort the API inlining as early as possible. Code may look like: Node* mask_vec = unbox_vector(mask, mask_box_type, elem_bt, num_elem); if (mask_vec == nullptr || gvn().type(mask_vec) == Type::TOP) { log_if_needed(" ** unbox failed mask=%s", NodeClassNames[argument(4)->Opcode()]); return false; } That looks a common issue for all APIs that we'd better fix for all code after `unbox_vector` ? I'm unsure whether I have to do this regarding to the issue this PR reported. Or maybe we could revisit the whole file in future. Any suggestions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29057#discussion_r2703069129 From xgong at openjdk.org Mon Jan 19 03:17:20 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 19 Jan 2026 03:17:20 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v4] In-Reply-To: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Fri, 9 Jan 2026 13:32:54 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comments in vectornode.hpp > > Nice work, thanks for taking the time for this, much appreciated! > > On the whole I'm super happy with this, but left a few extra comments :) Hi @eme64 , @jatin-bhateja , I'v updated a new commit to address the remaining comments. Could you please take another look? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29130#issuecomment-3766206495 From xgong at openjdk.org Mon Jan 19 03:17:19 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 19 Jan 2026 03:17:19 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v4] In-Reply-To: References: Message-ID: <58UIws2ScvjuFGH2fpt_9g_RS79mZIHBzYDkxhp9ZPQ=.670ba6ef-b428-486b-bcd5-623412c49f3b@github.com> > The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific > features, making the related code in HotSpot difficult to understand and review. > > This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and > maintainability. > > Note: This patch only adds comments; no functional changes are made. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Update comments in vectornode.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29130/files - new: https://git.openjdk.org/jdk/pull/29130/files/083f5754..431f2b86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29130&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29130&range=02-03 Stats: 12 lines in 1 file changed: 3 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/29130.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29130/head:pull/29130 PR: https://git.openjdk.org/jdk/pull/29130 From ghan at openjdk.org Mon Jan 19 03:17:30 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Mon, 19 Jan 2026 03:17:30 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v8] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 02:29:24 GMT, David Holmes wrote: >> Guanqiang Han has updated the pull request incrementally with one additional commit since the last revision: >> >> remove unused code > > src/hotspot/share/interpreter/bytecodeTracer.cpp line 195: > >> 193: #endif >> 194: >> 195: void BytecodeTracer::print_method_codes(const methodHandle& method, int from, int to, outputStream* st, int flags, bool coherent_output) { > > I don't like the new `coherent_output` name sorry. The output will always be "coherent" - either via the passed in stringStream, or the internal StringStream. How about use_local_buffer? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2703083278 From dholmes at openjdk.org Mon Jan 19 04:42:40 2026 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Jan 2026 04:42:40 GMT Subject: RFR: 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code Message-ID: Please review this simple enhancement to remove runtime checks for `VM_Version::supports_fast_class_init_checks()` in platform specific code, and replace them with asserts - as we know statically whether a given platform supports it. The changes are the same for each platform. Testing - building all platforms via GHA - tiers 1-3 (sanity) Thanks ------------- Commit messages: - Fixed overzealous code removal - 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code Changes: https://git.openjdk.org/jdk/pull/29293/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29293&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370112 Stats: 112 lines in 10 files changed: 10 ins; 6 del; 96 mod Patch: https://git.openjdk.org/jdk/pull/29293.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29293/head:pull/29293 PR: https://git.openjdk.org/jdk/pull/29293 From dholmes at openjdk.org Mon Jan 19 04:48:28 2026 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Jan 2026 04:48:28 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v8] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 03:14:01 GMT, Guanqiang Han wrote: >> src/hotspot/share/interpreter/bytecodeTracer.cpp line 195: >> >>> 193: #endif >>> 194: >>> 195: void BytecodeTracer::print_method_codes(const methodHandle& method, int from, int to, outputStream* st, int flags, bool coherent_output) { >> >> I don't like the new `coherent_output` name sorry. The output will always be "coherent" - either via the passed in stringStream, or the internal StringStream. > > How about use_local_buffer? That's workable I suppose - but your original was also workable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29186#discussion_r2703197288 From dholmes at openjdk.org Mon Jan 19 04:54:29 2026 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Jan 2026 04:54:29 GMT Subject: RFR: 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 04:34:36 GMT, David Holmes wrote: > Please review this simple enhancement to remove runtime checks for `VM_Version::supports_fast_class_init_checks()` in platform specific code, and replace them with asserts - as we know statically whether a given platform supports it. > > The changes are the same for each platform. > > Testing > - building all platforms via GHA > - tiers 1-3 (sanity) > > Thanks Manually adding runtime as this was filed as an interpreter issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29293#issuecomment-3766388787 From fyang at openjdk.org Mon Jan 19 05:28:05 2026 From: fyang at openjdk.org (Fei Yang) Date: Mon, 19 Jan 2026 05:28:05 GMT Subject: RFR: 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 04:34:36 GMT, David Holmes wrote: > Please review this simple enhancement to remove runtime checks for `VM_Version::supports_fast_class_init_checks()` in platform specific code, and replace them with asserts - as we know statically whether a given platform supports it. > > The changes are the same for each platform. > > Testing > - building all platforms via GHA > - tiers 1-3 (sanity) > > Thanks Looks reasonable to me. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29293#pullrequestreview-3676303251 From dholmes at openjdk.org Mon Jan 19 05:48:36 2026 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Jan 2026 05:48:36 GMT Subject: RFR: 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 05:24:41 GMT, Fei Yang wrote: >> Please review this simple enhancement to remove runtime checks for `VM_Version::supports_fast_class_init_checks()` in platform specific code, and replace them with asserts - as we know statically whether a given platform supports it. >> >> The changes are the same for each platform. >> >> Testing >> - building all platforms via GHA >> - tiers 1-3 (sanity) >> >> Thanks > > Looks reasonable to me. Thanks for the review @RealFYang ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29293#issuecomment-3766507632 From ghan at openjdk.org Mon Jan 19 05:49:22 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Mon, 19 Jan 2026 05:49:22 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v9] In-Reply-To: References: Message-ID: > Please review this change. Thanks! > > **Description:** > > When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 > In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 > Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. > > **Fix:** > > Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. > To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. > > **Test:** > > GHA Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - change variable name - Merge remote-tracking branch 'upstream/master' into 8374862 - remove unused code - revert - Merge remote-tracking branch 'upstream/master' into 8374862 - fix a compile error - remove unnecessary blank line - correct copyright year - Add outputStream::is_buffered() - change variable name - ... and 3 more: https://git.openjdk.org/jdk/compare/76d3f6a3...a63c4613 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29186/files - new: https://git.openjdk.org/jdk/pull/29186/files/601ff0da..a63c4613 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29186&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29186&range=07-08 Stats: 132 lines in 9 files changed: 117 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/29186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29186/head:pull/29186 PR: https://git.openjdk.org/jdk/pull/29186 From jbhateja at openjdk.org Mon Jan 19 07:23:47 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 19 Jan 2026 07:23:47 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v4] In-Reply-To: <58UIws2ScvjuFGH2fpt_9g_RS79mZIHBzYDkxhp9ZPQ=.670ba6ef-b428-486b-bcd5-623412c49f3b@github.com> References: <58UIws2ScvjuFGH2fpt_9g_RS79mZIHBzYDkxhp9ZPQ=.670ba6ef-b428-486b-bcd5-623412c49f3b@github.com> Message-ID: On Mon, 19 Jan 2026 03:17:19 GMT, Xiaohong Gong wrote: >> The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific >> features, making the related code in HotSpot difficult to understand and review. >> >> This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and >> maintainability. >> >> Note: This patch only adds comments; no functional changes are made. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update comments in vectornode.hpp LGTM. Best Regards, Jatin ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29130#pullrequestreview-3676632274 From jbhateja at openjdk.org Mon Jan 19 07:24:21 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 19 Jan 2026 07:24:21 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors Message-ID: Patch to add support for dumping primary vector IR created by various VectorAPI inline expanders. Currently auto-vectorization prints the newly create vector IR with -XX:+TraceNewVectors, similar message is now emitted for VectorAPI. TraceNewVectors [AutoVectorization]: 1397 AddVI === _ 1395 1396 [[ ]] #vectorx TraceNewVectors [VectorAPI]: 1591 AddVI === _ 1545 1569 [[ ]] #vectory !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int256Vector::lanewise @ bci:3 (line 278) Int256Vector::lanewise @ bci:3 (line 43) IntVector::add @ bci:5 (line 1380) AddTestI::workload @ bci:34 (line 18) Thanks, Jatin ------------- Commit messages: - 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors Changes: https://git.openjdk.org/jdk/pull/29265/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29265&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375498 Stats: 36 lines in 1 file changed: 9 ins; 0 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/29265.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29265/head:pull/29265 PR: https://git.openjdk.org/jdk/pull/29265 From dholmes at openjdk.org Mon Jan 19 07:44:34 2026 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Jan 2026 07:44:34 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v9] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 05:49:22 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> **Description:** >> >> When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 >> In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 >> Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. >> >> **Fix:** >> >> Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. >> To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. >> >> **Test:** >> >> GHA > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - change variable name > - Merge remote-tracking branch 'upstream/master' into 8374862 > - remove unused code > - revert > - Merge remote-tracking branch 'upstream/master' into 8374862 > - fix a compile error > - remove unnecessary blank line > - correct copyright year > - Add outputStream::is_buffered() > - change variable name > - ... and 3 more: https://git.openjdk.org/jdk/compare/3729b79b...a63c4613 LGTM. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29186#pullrequestreview-3676699878 From dfenacci at openjdk.org Mon Jan 19 08:00:41 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 19 Jan 2026 08:00:41 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 12:34:33 GMT, Christian Hagedorn wrote: >> This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. >> >> This patch is about naming updates: >> >> `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Update Test VM > - Update test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java > > Co-authored-by: Manuel H?ssig Looks good to me. Thanks @chhagedorn! test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/VMInfoParser.java line 46: > 44: > 45: /** > 46: * Extract VMInfo from the applicableIRRules. Suggestion: * Extract VMInfo from applicableIRRules. test/hotspot/jtreg/compiler/lib/ir_framework/flag/FlagVM.java line 99: > 97: > 98: /** > 99: * Emit Test VM flags to the dedicated Test VM flags file to parse them from the TestFramework "driver" VM again Suggestion: * Emit Test VM flags to the dedicated Test VM flags file to parse them from the TestFramework Driver VM again ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/29229#pullrequestreview-3676724955 PR Review Comment: https://git.openjdk.org/jdk/pull/29229#discussion_r2703600347 PR Review Comment: https://git.openjdk.org/jdk/pull/29229#discussion_r2703641426 From shade at openjdk.org Mon Jan 19 08:17:14 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Jan 2026 08:17:14 GMT Subject: RFR: 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 04:34:36 GMT, David Holmes wrote: > Please review this simple enhancement to remove runtime checks for `VM_Version::supports_fast_class_init_checks()` in platform specific code, and replace them with asserts - as we know statically whether a given platform supports it. > > The changes are the same for each platform. > > Testing > - building all platforms via GHA > - tiers 1-3 (sanity) > > Thanks Looks reasonable, and similar to C1/C2 does. src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 728: > 726: Label L_skip_barrier; > 727: > 728: { // Bypass the barrier for non-static methods Not entirely sure what extra `{ ... }` block is supposed to do here. Since you are changing these lines, maybe collapse it one indenting level down? ------------- PR Review: https://git.openjdk.org/jdk/pull/29293#pullrequestreview-3676807568 PR Review Comment: https://git.openjdk.org/jdk/pull/29293#discussion_r2703674670 From epeter at openjdk.org Mon Jan 19 08:17:48 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 08:17:48 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v4] In-Reply-To: <58UIws2ScvjuFGH2fpt_9g_RS79mZIHBzYDkxhp9ZPQ=.670ba6ef-b428-486b-bcd5-623412c49f3b@github.com> References: <58UIws2ScvjuFGH2fpt_9g_RS79mZIHBzYDkxhp9ZPQ=.670ba6ef-b428-486b-bcd5-623412c49f3b@github.com> Message-ID: On Mon, 19 Jan 2026 03:17:19 GMT, Xiaohong Gong wrote: >> The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific >> features, making the related code in HotSpot difficult to understand and review. >> >> This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and >> maintainability. >> >> Note: This patch only adds comments; no functional changes are made. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update comments in vectornode.hpp Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29130#pullrequestreview-3676825832 From epeter at openjdk.org Mon Jan 19 08:17:49 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 08:17:49 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v4] In-Reply-To: References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Mon, 19 Jan 2026 03:12:54 GMT, Xiaohong Gong wrote: >> Nice work, thanks for taking the time for this, much appreciated! >> >> On the whole I'm super happy with this, but left a few extra comments :) > > Hi @eme64 , @jatin-bhateja , I'v updated a new commit to address the remaining comments. Could you please take another look? Thanks a lot! @XiaohongGong Thanks for putting the effort into better documentation, really much appreciated ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29130#issuecomment-3767014368 From chagedorn at openjdk.org Mon Jan 19 08:25:49 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Jan 2026 08:25:49 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM [v3] In-Reply-To: References: Message-ID: > This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. > > This patch is about naming updates: > > `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/lib/ir_framework/flag/FlagVM.java Co-authored-by: Damon Fenacci ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29229/files - new: https://git.openjdk.org/jdk/pull/29229/files/17afa9ac..32a42462 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29229&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29229&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29229/head:pull/29229 PR: https://git.openjdk.org/jdk/pull/29229 From chagedorn at openjdk.org Mon Jan 19 08:25:51 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Jan 2026 08:25:51 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM [v2] In-Reply-To: References: Message-ID: <7jcHJJl5NOpYgP7CsR0sWowXj3Q5OVWBFqrY-_94nSY=.69508500-40fb-44d7-afea-27eaec3ffec4@github.com> On Wed, 14 Jan 2026 12:34:33 GMT, Christian Hagedorn wrote: >> This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. >> >> This patch is about naming updates: >> >> `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Update Test VM > - Update test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java > > Co-authored-by: Manuel H?ssig Thanks Damon for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29229#issuecomment-3767033425 From epeter at openjdk.org Mon Jan 19 08:28:07 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 08:28:07 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4] In-Reply-To: References: <-nGdkKB4aDkSwWlygwqkQ4tJ5s0DUwCfhyugVZtEDZk=.c4db9089-0dfa-4dca-a90e-d31639dd617a@github.com> <5lhbA8iGLm8PeGmQgHaW06-AyfmAzzHUXfDjyV6RF5k=.531bbd83-61f6-4d66-99a4-e9ddbc59e142@github.com> Message-ID: On Fri, 16 Jan 2026 21:32:42 GMT, Vladimir Ivanov wrote: >> What exactly are you suggesting here? Are you suggesting that instead of doing: >> >> ` * @run driver ${test.main.class} noSuperWord` >> we could do >> ` * @run driver ${test.main.class} -XX:-OptimizeFill` >> >> And then just `framework.addFlags(args)`? > > Yes. Or introduce a property. @iwanowww What do you mean by `property`? >> Yes. Because they are not exactly the same. One is designed to test the implementation, the other to deliver reasonably stable and meaningful benchmarks. >> >> Example of some differences: >> - In the "test" environment, I have access to test libraries like `Generators.java`, which are better at generating edge-cases than regular `Random`. >> - In the benchmark, the size is a fixed parameter `SEED`. In the test, it is a randomly chosen value, so we can test better for alignment/drain/post loops. >> - `eI` can be chosen randomly in each iteration of the test. But for the benchmark it is better if we have an array of values to chose from, so that we can pick different values for each benchmark invocation. >> >> But there is still a lot of overlap. I could try to split it into a "shared" and "local" part, and stuff the "shared" part into `VectorAlgorithmsImpl.Data`. @iwanowww Do you think that is worth it? > > Ok, up to you. I'm not very satisfied with the duplication either. Maybe a mind refreshed from the weekend can come up with something better... Btw: do you know any good way to share code between JMH benchmarks and regular tests? Because it is a shame that we have to duplicate `VectorAlgorithmsImpl.java`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2703710125 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2703723987 From epeter at openjdk.org Mon Jan 19 08:28:08 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 08:28:08 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4] In-Reply-To: References: <-nGdkKB4aDkSwWlygwqkQ4tJ5s0DUwCfhyugVZtEDZk=.c4db9089-0dfa-4dca-a90e-d31639dd617a@github.com> <5lhbA8iGLm8PeGmQgHaW06-AyfmAzzHUXfDjyV6RF5k=.531bbd83-61f6-4d66-99a4-e9ddbc59e142@github.com> <-8X3rKDGWGBNYCWyn5GwB1HobhPkwXCRXwgMA1LsvkI=.75dc60a2-f2f0-4b8c-82ec-b87a50e17b1d@github.com> Message-ID: On Fri, 16 Jan 2026 21:32:08 GMT, Vladimir Ivanov wrote: >>>Also, it makes it harder to reproduce input dependent variance. >> >> I suppose my whole goal was to eliminate input dependent variance as far as possible. Do you think it would be better to make input dependent variance measurable at the `Iteration` level? I fear that this will make the variance of the benchmark very large, and the results of a fork would be quite noisy. > > My personal preference is `Level.Trial`. When I work with a microbenchmark, the fewer "moving parts" it has the better. It's easy to spot fork-to-fork variance. Not so much with iteration variance where a single outlier can be caused by many factors. @iwanowww Why is fork variance easier to spot than iteration variance? I suppose I can try doing the setup per fork. But that does drive up the runtime of the benchmark, because you need to do warmup for each fork. But I suppose you think that is worth it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2703718518 From shade at openjdk.org Mon Jan 19 08:35:06 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Jan 2026 08:35:06 GMT Subject: RFR: 8375530: PPC64: incorrect quick verify_method_data_pointer check causes poor performance in debug build In-Reply-To: <7CAMgaI9MMOT9EaKu8uIdNaSU9kfUgnWrpjbqrO2yXw=.19d1849e-03e5-47d7-ac3c-8f77fd134a3c@github.com> References: <7CAMgaI9MMOT9EaKu8uIdNaSU9kfUgnWrpjbqrO2yXw=.19d1849e-03e5-47d7-ac3c-8f77fd134a3c@github.com> Message-ID: On Fri, 16 Jan 2026 12:06:31 GMT, David Briemann wrote: > Workaround for this fix is setting -XX:-VerifyDataPointer So `add(R11_scratch1, R12_scratch2, R12_scratch2);` completely ignores previous computations in `R11`? Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29279#pullrequestreview-3676883299 From ghan at openjdk.org Mon Jan 19 08:42:12 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Mon, 19 Jan 2026 08:42:12 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v9] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 07:41:05 GMT, David Holmes wrote: >> Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - change variable name >> - Merge remote-tracking branch 'upstream/master' into 8374862 >> - remove unused code >> - revert >> - Merge remote-tracking branch 'upstream/master' into 8374862 >> - fix a compile error >> - remove unnecessary blank line >> - correct copyright year >> - Add outputStream::is_buffered() >> - change variable name >> - ... and 3 more: https://git.openjdk.org/jdk/compare/da37c031...a63c4613 > > LGTM. Thanks Hi @dholmes-ora, Thank you for reviewing! I?ve integrated the change ? could you please sponsor it? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29186#issuecomment-3767104159 From duke at openjdk.org Mon Jan 19 08:42:14 2026 From: duke at openjdk.org (duke) Date: Mon, 19 Jan 2026 08:42:14 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v9] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 05:49:22 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> **Description:** >> >> When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 >> In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 >> Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. >> >> **Fix:** >> >> Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. >> To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. >> >> **Test:** >> >> GHA > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - change variable name > - Merge remote-tracking branch 'upstream/master' into 8374862 > - remove unused code > - revert > - Merge remote-tracking branch 'upstream/master' into 8374862 > - fix a compile error > - remove unnecessary blank line > - correct copyright year > - Add outputStream::is_buffered() > - change variable name > - ... and 3 more: https://git.openjdk.org/jdk/compare/da37c031...a63c4613 @hgqxjj Your change (at version a63c46131b6329de2df80600995c3ea1492dcd98) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29186#issuecomment-3767107046 From dbriemann at openjdk.org Mon Jan 19 08:54:43 2026 From: dbriemann at openjdk.org (David Briemann) Date: Mon, 19 Jan 2026 08:54:43 GMT Subject: RFR: 8375530: PPC64: incorrect quick verify_method_data_pointer check causes poor performance in debug build In-Reply-To: <7CAMgaI9MMOT9EaKu8uIdNaSU9kfUgnWrpjbqrO2yXw=.19d1849e-03e5-47d7-ac3c-8f77fd134a3c@github.com> References: <7CAMgaI9MMOT9EaKu8uIdNaSU9kfUgnWrpjbqrO2yXw=.19d1849e-03e5-47d7-ac3c-8f77fd134a3c@github.com> Message-ID: <_wBXwKhAuzQKyEpl2qGehXzNTq-AYM878tItIHsGSXc=.347a1563-14dd-433b-bc13-829134f7a3ed@github.com> On Fri, 16 Jan 2026 12:06:31 GMT, David Briemann wrote: > Workaround for this fix is setting -XX:-VerifyDataPointer Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29279#issuecomment-3767158791 From dbriemann at openjdk.org Mon Jan 19 08:58:44 2026 From: dbriemann at openjdk.org (David Briemann) Date: Mon, 19 Jan 2026 08:58:44 GMT Subject: Integrated: 8375530: PPC64: incorrect quick verify_method_data_pointer check causes poor performance in debug build In-Reply-To: <7CAMgaI9MMOT9EaKu8uIdNaSU9kfUgnWrpjbqrO2yXw=.19d1849e-03e5-47d7-ac3c-8f77fd134a3c@github.com> References: <7CAMgaI9MMOT9EaKu8uIdNaSU9kfUgnWrpjbqrO2yXw=.19d1849e-03e5-47d7-ac3c-8f77fd134a3c@github.com> Message-ID: <_FxVWGJs0tCiPerQVx18z82Mds-9acNwE4XNDd4ccn8=.f17f0881-ed7f-461c-a5ee-657217df3866@github.com> On Fri, 16 Jan 2026 12:06:31 GMT, David Briemann wrote: > Workaround for this fix is setting -XX:-VerifyDataPointer This pull request has now been integrated. Changeset: 30f39d88 Author: David Briemann URL: https://git.openjdk.org/jdk/commit/30f39d88e5af36bb6db458c03215e9fa6a31d6f3 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8375530: PPC64: incorrect quick verify_method_data_pointer check causes poor performance in debug build Reviewed-by: mdoerr, shade ------------- PR: https://git.openjdk.org/jdk/pull/29279 From jbhateja at openjdk.org Mon Jan 19 09:04:26 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 19 Jan 2026 09:04:26 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v9] In-Reply-To: References: Message-ID: > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762 - Update callGenerator.hpp copyright year - Review comments resolution - Cleanups - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762 - Updating predicate checks - Fixes for failing regressions - Optimizing AVX2 backend and some re-factoring - ... and 3 more: https://git.openjdk.org/jdk/compare/b7346c30...9da1f862 ------------- Changes: https://git.openjdk.org/jdk/pull/24104/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=08 Stats: 1347 lines in 29 files changed: 1243 ins; 1 del; 103 mod Patch: https://git.openjdk.org/jdk/pull/24104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104 PR: https://git.openjdk.org/jdk/pull/24104 From jbhateja at openjdk.org Mon Jan 19 09:04:30 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 19 Jan 2026 09:04:30 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v8] In-Reply-To: References: Message-ID: On Mon, 29 Sep 2025 06:03:15 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update callGenerator.hpp copyright year > > Hi @erifan , Thanks for your comments. I will address them soon, please keep reviewing in the meantime :-) > @jatin-bhateja I have no further comments, great work. After this PR is merged, I will complete the backend optimization of the aarch64 part based on it. Thanks! Thanks @erifan , I think partial case is specific for AARCH64 backend and tests should accompany relevant AARCH64 changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24104#issuecomment-3767179812 From jbhateja at openjdk.org Mon Jan 19 09:04:34 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 19 Jan 2026 09:04:34 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v8] In-Reply-To: References: Message-ID: On Mon, 29 Sep 2025 03:24:32 GMT, Eric Fang wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update callGenerator.hpp copyright year > > test/hotspot/jtreg/compiler/vectorapi/TestSliceOptValueTransforms.java line 101: > >> 99: .slice(0, ByteVector.fromArray(BSP, bsrc2, i)) >> 100: .intoArray(bdst, i); >> 101: } > > Would you mind adding a correctness check for these tests, for byte type, like: > > @DontInline > static void verifyVectorSliceByte(int origin) { > for (int i = 0; i < BSP.loopBound(SIZE); i += BSP.length()) { > int index = i; > for (int j = i + origin; j < i + BSP.length(); j++) { > Asserts.assertEquals(bsrc1[j], bdst[index++]); > } > for (int j = i; j < i + origin; j++) { > Asserts.assertEquals(bsrc2[j], bdst[index++]); > } > } > } There are enough number of functional correctness tests in existing VectorAPI JTREG suite, this test specifically checks for newly added VectorSlice IR node and associated ideal transformations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2703827513 From bmaillard at openjdk.org Mon Jan 19 09:22:00 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 19 Jan 2026 09:22:00 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v5] In-Reply-To: References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: On Thu, 15 Jan 2026 08:45:39 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: >> >> t1 = int:0 >> t2 = int:-2..3, widen = 3 >> >> Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. >> >> The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > grammar Thanks for working on this @merykitty, I agree with the proposed solution and the implementation looks good to me. And apologies for the delay ;) ------------- Marked as reviewed by bmaillard (Committer). PR Review: https://git.openjdk.org/jdk/pull/28952#pullrequestreview-3677077803 From qamai at openjdk.org Mon Jan 19 10:01:01 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 19 Jan 2026 10:01:01 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v5] In-Reply-To: References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: <4QQ06d9oQRmx65ogNNreKeo5A9BnPsJ64OnjW_sFYRM=.af86dcc7-b669-403a-9875-16e8b4164ed8@github.com> On Mon, 19 Jan 2026 09:18:40 GMT, Beno?t Maillard wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> grammar > > Thanks for working on this @merykitty, I agree with the proposed solution and the implementation looks good to me. And apologies for the delay ;) @benoitmaillard Thanks a lot for your review! I need a Reviewer to approve this PR. @eme64 , could you take a look if you have time, please? Thanks in advance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28952#issuecomment-3767449893 From thartmann at openjdk.org Mon Jan 19 10:04:08 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jan 2026 10:04:08 GMT Subject: [jdk26] RFR: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 07:08:06 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [1d889b92](https://github.com/openjdk/jdk/commit/1d889b92bde5dfcb1fbe6cddb389a77f92eb1ce7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Volodymyr Paprotski on 15 Jan 2026 and was reviewed by Tobias Hartmann, Jatin Bhateja and Sandhya Viswanathan. > > Thanks! Thanks for the reviews! Waiting for approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29263#issuecomment-3767468632 From dfenacci at openjdk.org Mon Jan 19 10:09:50 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 19 Jan 2026 10:09:50 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM [v3] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 08:25:49 GMT, Christian Hagedorn wrote: >> This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. >> >> This patch is about naming updates: >> >> `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/lib/ir_framework/flag/FlagVM.java > > Co-authored-by: Damon Fenacci Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29229#pullrequestreview-3677286608 From duke at openjdk.org Mon Jan 19 10:09:57 2026 From: duke at openjdk.org (George Wort) Date: Mon, 19 Jan 2026 10:09:57 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v6] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 01:33:37 GMT, Chad Rakoczy wrote: >> ### Summary >> This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. >> >> ### Description >> The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. >> >> Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. >> >> The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. >> >> The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). >> >> Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. >> >> ### Performance >> Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. >> >> ### Testing >> * CodeCache tests have been updated to cover the new `HotCodeHeap`. >> * Added ded... > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix builds Hi, I've played around with this PR a bit and had a few thoughts. The way the grouping is set up currently means that if you run the program for long enough you will keep adding profiled code onto the hot code heap, even if it doesn't really meet the definition of hot. This also means that if the program changes "phase", and the hot code changes, the hot code heap might already be full and you will be unable to compact the new hot code. Have you thought about adding some kind of refresh/reset when the hot code heap is full, to purge code that has not appeared in recent profiles? Other small configuration changes that helped me try this out, adding a delay variable to avoid profiling the setup period of a program, and making the sampling period configurable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27858#issuecomment-3759597826 From mdoerr at openjdk.org Mon Jan 19 10:54:27 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 19 Jan 2026 10:54:27 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 13:56:13 GMT, David Briemann wrote: > Adds the following mach nodes: > match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); You could also remove the "// Worst case is branch + move + stop, no stop without scheduler." comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29281#issuecomment-3767694320 From dholmes at openjdk.org Mon Jan 19 11:46:27 2026 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Jan 2026 11:46:27 GMT Subject: RFR: 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 08:08:16 GMT, Aleksey Shipilev wrote: >> Please review this simple enhancement to remove runtime checks for `VM_Version::supports_fast_class_init_checks()` in platform specific code, and replace them with asserts - as we know statically whether a given platform supports it. >> >> The changes are the same for each platform. >> >> Testing >> - building all platforms via GHA >> - tiers 1-3 (sanity) >> >> Thanks > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 728: > >> 726: Label L_skip_barrier; >> 727: >> 728: { // Bypass the barrier for non-static methods > > Not entirely sure what extra `{ ... }` block is supposed to do here. Since you are changing these lines, maybe collapse it one indenting level down? I can't see what the block does either - may be a relic from some older code. The same pattern is in most of the files. I can take it out. Thanks for looking at this @shipilev ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29293#discussion_r2704418453 From dholmes at openjdk.org Mon Jan 19 11:47:31 2026 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Jan 2026 11:47:31 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v9] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 08:37:59 GMT, Guanqiang Han wrote: >> LGTM. Thanks > > Hi @dholmes-ora, Thank you for reviewing! > I?ve integrated the change ? could you please sponsor it? Thanks! @hgqxjj hotspot changes require two reviews, so we will need to wait for @dean-long to have another look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29186#issuecomment-3767918796 From epeter at openjdk.org Mon Jan 19 13:25:25 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 13:25:25 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v3] In-Reply-To: References: Message-ID: <56S8PDf47ChUWcTf6ONUQQk66LzeZNkf9-uFYgCYQNw=.1e0f70ec-05f1-4a59-862d-ca6b1c4fdfc6@github.com> On Fri, 16 Jan 2026 11:42:27 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> halt refactor by demand of reviewers > > Marked as reviewed by qamai (Committer). @merykitty @dean-long Can I please get a re-approval after the last minor changes in documentation? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29169#issuecomment-3768313390 From epeter at openjdk.org Mon Jan 19 13:39:07 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 13:39:07 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v8] In-Reply-To: References: Message-ID: > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge >> >> `windows_x64_oci` >> algo_windows_x64_oci_server >> >> `macosx_x64_sandybridge` >> > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add hashCodeB test and benchmark I also added a `hashCode` benchmark. One of the VectorAPI approaches looks faster than our intrinsics: Benchmark (NUM_X_OBJECTS) (SEED) (SIZE) Mode Cnt Score Error Units VectorAlgorithms.hashCodeB_Arrays 10000 0 640000 avgt 10 97211.277 ? 92.931 ns/op VectorAlgorithms.hashCodeB_VectorAPI_v1 10000 0 640000 avgt 10 362260.946 ? 375.695 ns/op VectorAlgorithms.hashCodeB_VectorAPI_v2 10000 0 640000 avgt 10 63640.184 ? 802.249 ns/op VectorAlgorithms.hashCodeB_loop 10000 0 640000 avgt 10 784368.577 ? 877.616 ns/op Note: the `v2` solution that looks fastest is inspired by: https://www.dynatrace.com/news/blog/java-arrays-hashcode-byte-efficiency-techniques/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3768381551 From thartmann at openjdk.org Mon Jan 19 13:51:36 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jan 2026 13:51:36 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM [v3] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 08:25:49 GMT, Christian Hagedorn wrote: >> This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. >> >> This patch is about naming updates: >> >> `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/lib/ir_framework/flag/FlagVM.java > > Co-authored-by: Damon Fenacci Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29229#pullrequestreview-3678181430 From epeter at openjdk.org Mon Jan 19 13:52:39 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 13:52:39 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v5] In-Reply-To: References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: On Thu, 15 Jan 2026 08:45:39 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: >> >> t1 = int:0 >> t2 = int:-2..3, widen = 3 >> >> Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. >> >> The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > grammar Looks good to me, thanks for fixing it @merykitty ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28952#pullrequestreview-3678187915 From epeter at openjdk.org Mon Jan 19 13:51:37 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 13:51:37 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v8] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 13:39:07 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server >> >> `macosx_x64_sandybridge` >> > Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: > > - more hashCodeB > - wip hashCode > - v2 hashCode wip I'm adding Otmar Ertl as contributor because of the inspiration I took from his work on `hashCode`: https://www.dynatrace.com/news/blog/java-arrays-hashcode-byte-efficiency-techniques/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3768428778 From epeter at openjdk.org Mon Jan 19 14:00:14 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 14:00:14 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v9] In-Reply-To: References: Message-ID: > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge > Also, the description in the JIRA and the opening comment in this PR should mention that the intrinsic can be simplified in response to the stricter preconditions maintained by the Java client. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29141#discussion_r2704886149 From chagedorn at openjdk.org Mon Jan 19 14:03:33 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Jan 2026 14:03:33 GMT Subject: RFR: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM [v3] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 08:25:49 GMT, Christian Hagedorn wrote: >> This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. >> >> This patch is about naming updates: >> >> `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/lib/ir_framework/flag/FlagVM.java > > Co-authored-by: Damon Fenacci Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29229#issuecomment-3768481205 From chagedorn at openjdk.org Mon Jan 19 14:08:03 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Jan 2026 14:08:03 GMT Subject: Integrated: 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 11:21:58 GMT, Christian Hagedorn wrote: > This patch is part of a series of patches split of from a prototype implementation for replacing the hotspot-pid file based transfer of IR dumps between the Test VM and the Driver VM with a socket based transfer (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271) for more information). This should ease reviews. > > This patch is about naming updates: > > `IREncoding` is quite generic and does not really tell what it is about by looking at its name. I suggest to rename it to `ApplicableIRRules` to better reflect what it is about. I also capitalized the first letter of "driver/flag/test VM" to better indicate that these are proper names. > > Thanks, > Christian This pull request has now been integrated. Changeset: e7f1f16a Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/e7f1f16a88ce239f22f86e479a5e806f531fbe31 Stats: 498 lines in 28 files changed: 162 ins; 159 del; 177 mod 8375271: [IR Framework] Rename IREncoding to ApplicableIRRules and driver/flag/test VM to Driver/Flag/Test VM Reviewed-by: dfenacci, thartmann, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/29229 From hgreule at openjdk.org Mon Jan 19 14:08:30 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 19 Jan 2026 14:08:30 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v5] In-Reply-To: References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: On Thu, 15 Jan 2026 08:45:39 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: >> >> t1 = int:0 >> t2 = int:-2..3, widen = 3 >> >> Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. >> >> The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > grammar Nothing more from my side either :) ------------- Marked as reviewed by hgreule (Committer). PR Review: https://git.openjdk.org/jdk/pull/28952#pullrequestreview-3678249015 From duke at openjdk.org Mon Jan 19 14:24:23 2026 From: duke at openjdk.org (Otmar Ertl) Date: Mon, 19 Jan 2026 14:24:23 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v8] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 13:48:14 GMT, Emanuel Peter wrote: >> Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: >> >> - more hashCodeB >> - wip hashCode >> - v2 hashCode wip > > I'm adding Otmar Ertl as contributor because of the inspiration I took from his work on `hashCode`: > https://www.dynatrace.com/news/blog/java-arrays-hashcode-byte-efficiency-techniques/ > @eme64 Contributor `Ormar Ertl ` successfully added. @eme64 There is a typo in my name, should be `Otmar Ertl` ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3768497185 From epeter at openjdk.org Mon Jan 19 14:24:24 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 14:24:24 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v8] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 14:03:35 GMT, Otmar Ertl wrote: >> I'm adding Otmar Ertl as contributor because of the inspiration I took from his work on `hashCode`: >> https://www.dynatrace.com/news/blog/java-arrays-hashcode-byte-efficiency-techniques/ > >> @eme64 Contributor `Ormar Ertl ` successfully added. > > @eme64 There is a typo in my name, should be `Otmar Ertl` @oertl My fingers were faster than my eyes. Please forgive me about mistyping your name ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3768580977 From qamai at openjdk.org Mon Jan 19 14:24:33 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 19 Jan 2026 14:24:33 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v5] In-Reply-To: References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: On Mon, 19 Jan 2026 14:04:57 GMT, Hannes Greule wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> grammar > > Nothing more from my side either :) @SirYwell @benoitmaillard @eme64 Thank you very much for your approvals! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28952#issuecomment-3768563706 From qamai at openjdk.org Mon Jan 19 14:24:35 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 19 Jan 2026 14:24:35 GMT Subject: Integrated: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic In-Reply-To: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: On Mon, 22 Dec 2025 12:18:06 GMT, Quan Anh Mai wrote: > Hi, > > The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: > > t1 = int:0 > t2 = int:-2..3, widen = 3 > > Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. > > The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. > > Please take a look and leave your reviews, thanks a lot. This pull request has now been integrated. Changeset: c44a99a7 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/c44a99a758f38ceea84e03905d2ffb9c1fd1987a Stats: 96 lines in 4 files changed: 80 ins; 1 del; 15 mod 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic Reviewed-by: hgreule, bmaillard, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28952 From roland at openjdk.org Mon Jan 19 14:48:31 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Jan 2026 14:48:31 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked [v2] In-Reply-To: References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: On Fri, 16 Jan 2026 13:59:15 GMT, Damon Fenacci wrote: > Looks good to me too. Thanks a lot @rwestrel! (I just added a couple of very marginal nits) Thanks. I made the requested changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29231#issuecomment-3768684004 From chagedorn at openjdk.org Mon Jan 19 14:57:56 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Jan 2026 14:57:56 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v5] In-Reply-To: <_qZm_vXhwEf_OcRMb72w4t7vk1XKxjxwc_8eO1SmJsk=.d5ed1803-78b5-403a-baea-bbc5567facc7@github.com> References: <_qZm_vXhwEf_OcRMb72w4t7vk1XKxjxwc_8eO1SmJsk=.d5ed1803-78b5-403a-baea-bbc5567facc7@github.com> Message-ID: On Mon, 12 Jan 2026 12:11:51 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/macroArrayCopy.cpp > > Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> Sorry for the delay, testing looked good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28769#issuecomment-3768723325 From roland at openjdk.org Mon Jan 19 15:32:07 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Jan 2026 15:32:07 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v6] In-Reply-To: References: Message-ID: > The base input of `AddP` is expected to only be set for heap accesses > but I noticed some inconsistencies so I added an assert in the `AddP` > constructor and fixed issues that it caught. AFAFICT, the > inconsistencies shouldn't create issues. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge branch 'master' into JDK-8373343 - Update src/hotspot/share/opto/macroArrayCopy.cpp Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> - more - more - review - Merge branch 'master' into JDK-8373343 - review - review - review - merge - ... and 5 more: https://git.openjdk.org/jdk/compare/370d2034...2f618436 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28769/files - new: https://git.openjdk.org/jdk/pull/28769/files/507b8f45..2f618436 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=04-05 Stats: 61075 lines in 1252 files changed: 30953 ins; 11926 del; 18196 mod Patch: https://git.openjdk.org/jdk/pull/28769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28769/head:pull/28769 PR: https://git.openjdk.org/jdk/pull/28769 From roland at openjdk.org Mon Jan 19 15:32:09 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Jan 2026 15:32:09 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v3] In-Reply-To: <-GY9oBe-WRSh16Yi90rp79Xxi784nRtvdqBlMh4TiMs=.ba972df0-9298-4f94-8699-597601bd39ba@github.com> References: <-GY9oBe-WRSh16Yi90rp79Xxi784nRtvdqBlMh4TiMs=.ba972df0-9298-4f94-8699-597601bd39ba@github.com> Message-ID: On Sat, 20 Dec 2025 03:31:13 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Tests passed. I merged with latest. Can one of you @dean-long @chhagedorn @merykitty , re-approve? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28769#issuecomment-3768888493 From epeter at openjdk.org Mon Jan 19 15:35:18 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 15:35:18 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v10] In-Reply-To: References: Message-ID: <6MVHXmojhitsx7Q291bDVHOhtWQ9PmFudaGpFj-kf5U=.799e395f-b1e4-4c7a-b593-174d05da72fc@github.com> > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge > @iwanowww Why is fork variance easier to spot than iteration variance? > > I suppose I can try doing the setup per fork. But that does drive up the runtime of the benchmark, because you need to do warmup for each fork. But I suppose you think that is worth it? I now am using Forks. But it does indeed drive up the runtime. I need about a 5-sec warmup for some benchmarks until they really reach peak performance. Multiply that with multiple forks, and it gets slow. Not great, but I can live with that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2705224082 From epeter at openjdk.org Mon Jan 19 16:10:04 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 16:10:04 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v12] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 08:56:21 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Adding testpoint for JDK-8373574 > - Review comments resolutions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Fix incorrect argument passed to smokeTest > - Fix from Bhavana Kilambi for failing JTREG regressions on AARCH64 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Including test changes from Bhavana Kilambi (ARM) > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Optimizing tail handling > - ... and 18 more: https://git.openjdk.org/jdk/compare/499b5882...273b219e @jatin-bhateja I had a quick look at some of the changes. The patch is HUGE (80k+ lines), so it will take me a bit more time to review. I also realized that quite some changes are not directly related. For example, you are renaming lots of existing files. I would prefer if those changes were done separately. The issue is that at some point GitHub chokes, and it is no fun doing the review :/ src/hotspot/share/opto/vectorIntrinsics.cpp line 2895: > 2893: opd1 = gvn().transform(VectorNode::make(Op_AndV, opd1, wrap_mask_vec, opd1->bottom_type()->is_vect())); > 2894: operation = gvn().transform(VectorNode::make(Op_SelectFromTwoVector, opd1, opd2, opd3, vt)); > 2895: VectorNode::trace_new_vector(operation, "VectorAPI"); I thought you wanted to add that in a separate RFE? src/hotspot/share/prims/vectorSupport.cpp line 206: > 204: } > 205: > 206: int VectorSupport::vop2ideal(jint id, int lane_type) { I think it would be nice if there was a name for `LaneType`. It's just nicer to have types named. After all, the code here used to use `BasicType`, and it helps the user know what is expected here. src/hotspot/share/prims/vectorSupport.cpp line 244: > 242: case T_FLOAT: return Op_MulF; > 243: case T_DOUBLE: return Op_MulD; > 244: default: fatal("MUL: %s", lanetype2name(lane_type)); You should fix the indentation here as well, since you are already doing it everywhere else ;) src/hotspot/share/utilities/globalDefinitions.hpp line 719: > 717: > 718: inline bool is_java_primitive(BasicType t) { > 719: return (t != T_FLOAT16 && T_BOOLEAN <= t && t <= T_LONG); This change seems unnecessary, right? `T_FLOAT16` is outside the range, as far as I can see. src/hotspot/share/utilities/globalDefinitions.hpp line 741: > 739: inline bool is_custom_basic_type(BasicType t) { > 740: return (t == T_FLOAT16); > 741: } What exactly is the definition of a "custom" basic type? Is it defined somewhere? If not, it would be useful to define it here. I assume you chose the name because we expect more such "custom" basic types in the future? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28002#pullrequestreview-3678699226 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2705309070 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2705323981 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2705316085 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2705333596 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2705291064 From epeter at openjdk.org Mon Jan 19 16:19:02 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jan 2026 16:19:02 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v12] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 08:56:21 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Adding testpoint for JDK-8373574 > - Review comments resolutions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Fix incorrect argument passed to smokeTest > - Fix from Bhavana Kilambi for failing JTREG regressions on AARCH64 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Including test changes from Bhavana Kilambi (ARM) > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Optimizing tail handling > - ... and 18 more: https://git.openjdk.org/jdk/compare/499b5882...273b219e test/jdk/jdk/incubator/vector/IntVectorMaxTests.java line 68: > 66: static IntVector bcast_vec = IntVector.broadcast(SPECIES, (int)10); > 67: > 68: static void AssertEquals(int actual, int expected) { There are lots of changes in this file that do not seem to have anything to do with Float16. Please file them separately. It will make review much easier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2705376899 From qamai at openjdk.org Mon Jan 19 17:08:25 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 19 Jan 2026 17:08:25 GMT Subject: RFR: 8375618: Incorrect assert in CastLLNode::Ideal Message-ID: Hi, This PR fixes the incorrect assert in `CastLLNode::Ideal`. The assert intends to verify that the output is either a proper subset of or the same as the input. It does so by checking the signed lower bound and signed upper bound of the `TypeLong` instances. This method is not correct now. Please kindly review, thanks a lot. ------------- Commit messages: - Incorrect assertion in CastLLNode::Ideal Changes: https://git.openjdk.org/jdk/pull/29304/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29304&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375618 Stats: 82 lines in 4 files changed: 77 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/29304.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29304/head:pull/29304 PR: https://git.openjdk.org/jdk/pull/29304 From qamai at openjdk.org Mon Jan 19 17:11:56 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 19 Jan 2026 17:11:56 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v3] In-Reply-To: References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: On Fri, 16 Jan 2026 12:51:21 GMT, Kerem Kat wrote: >> The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into fix-c2-checkCastPP > - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed > - Simplify expand_vbox_node_helper by merging VectorBox Phi handling > - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded src/hotspot/share/opto/vector.cpp line 331: > 329: // Handle the case when the allocation input to VectorBoxNode is a Phi. > 330: // This is generated after the transformation in PhiNode::merge_through_phi: > 331: // Phi (VectorBox1 VectorBox2) => VectorBox (Phi1 Phi2) Should this be something like: Phi(VectorBox(vbox1, vect1), VectorBox(vbox2, vect2)) -> VectorBox(Phi(vbox1, vbox2), Phi(vect1, vect2)) I think it is a bit clearer, but it is fine either way. test/hotspot/jtreg/compiler/vectorapi/VectorBoxExpandPhi.java line 1: > 1: /* Can these 2 tests be merged into 1? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29200#discussion_r2705566555 PR Review Comment: https://git.openjdk.org/jdk/pull/29200#discussion_r2705570455 From liach at openjdk.org Mon Jan 19 17:48:43 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 19 Jan 2026 17:48:43 GMT Subject: RFR: 8375618: Incorrect assert in CastLLNode::Ideal In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 17:00:39 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the incorrect assert in `CastLLNode::Ideal`. The assert intends to verify that the output is either a proper subset of or the same as the input. It does so by checking the signed lower bound and signed upper bound of the `TypeLong` instances. This method is not correct now. > > Please kindly review, thanks a lot. Should properly_contains be a product or a debug-only function? It is currently only used for debug, wonder what product uses it would see. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29304#issuecomment-3769525873 From dholmes at openjdk.org Mon Jan 19 20:45:57 2026 From: dholmes at openjdk.org (David Holmes) Date: Mon, 19 Jan 2026 20:45:57 GMT Subject: RFR: 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code [v2] In-Reply-To: References: Message-ID: > Please review this simple enhancement to remove runtime checks for `VM_Version::supports_fast_class_init_checks()` in platform specific code, and replace them with asserts - as we know statically whether a given platform supports it. > > The changes are the same for each platform. > > Testing > - building all platforms via GHA > - tiers 1-3 (sanity) > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Remove redundant block ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29293/files - new: https://git.openjdk.org/jdk/pull/29293/files/4d79690f..a721b886 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29293&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29293&range=00-01 Stats: 25 lines in 5 files changed: 0 ins; 5 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/29293.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29293/head:pull/29293 PR: https://git.openjdk.org/jdk/pull/29293 From eastigeevich at openjdk.org Mon Jan 19 22:22:38 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 19 Jan 2026 22:22:38 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v6] In-Reply-To: References: Message-ID: On Sat, 17 Jan 2026 20:36:34 GMT, Jan Kratochvil wrote: > I see a FAIL for `test/jdk/jdk/jfr/event/compiler/TestCodeCacheFull.java` on linux64 fastdebug: > > ``` > Error occurred during initialization of VM > HotCodeHeapSize requires HotCodeHeap enabled > ``` > > But GHA does not report it. Thank you, @jankratochvil. We will take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27858#issuecomment-3770292456 From eastigeevich at openjdk.org Mon Jan 19 23:11:03 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 19 Jan 2026 23:11:03 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v6] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 11:24:08 GMT, George Wort wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix builds > > Hi, > > I've played around with this PR a bit and had a few thoughts. > > The way the grouping is set up currently means that if you run the program for long enough you will keep adding profiled code onto the hot code heap, even if it doesn't really meet the definition of hot. This also means that if the program changes "phase", and the hot code changes, the hot code heap might already be full and you will be unable to compact the new hot code. Have you thought about adding some kind of refresh/reset when the hot code heap is full, to purge code that has not appeared in recent profiles? > > Other small configuration changes that helped me try this out, adding a delay variable to avoid profiling the setup period of a program, and making the sampling period configurable. Hi @GeorgeWort Thank you for the feedback. > The way the grouping is set up currently means that if you run the program for long enough you will keep adding profiled code onto the hot code heap, even if it doesn't really meet the definition of hot. This also means that if the program changes "phase", and the hot code changes, the hot code heap might already be full and you will be unable to compact the new hot code. We've seen only cases when an application after running for long enough gets a stable profile which stays almost unchanged forever. We currently rely on GC removing cold nmethods from HotCodeHeap. An application gradually switches between states of being idling and being active. So, possible cases: - Running <-> Idling. If no new nmethods appear, nothing should be removed from HotCodeHeap. If they appear, some other should become cold and be remove by GC. We will detect new nmethods and move them to HotCodeHeap. - Running Profile 1 <-> Running Profile 2. If both profiles cannot be kept in HotCodeHeap, we will have a problem. We will depend on how quickly switching between profiles happens. Anyway switching between profiles might cause performance issues: throwing away nmethods from HotCodeHeap, interpreting them again, recompiling them, relocating them to HotCodeHeap. > Have you thought about adding some kind of refresh/reset when the hot code heap is full, to purge code that has not appeared in recent profiles? We can consider relocating nmethods back to the normal heap, the non-profiled code heap. IMO we should do this instead of GC throwing them away. If after being moved to the normal heap they become cold, GC will remove them from CodeCache. If they become hot again, they will be relocated to HotCodeHeap. > > Other small configuration changes that helped me try this out, adding a delay variable to avoid profiling the setup period of a program, and making the sampling period configurable. Yes, this is useful. We can consider adding it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27858#issuecomment-3770383465 From dzhang at openjdk.org Tue Jan 20 01:12:36 2026 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 20 Jan 2026 01:12:36 GMT Subject: RFR: 8375657: RISC-V: Need to check size in SharedRuntime::is_wide_vector Message-ID: Hi, Can you help to review this patch? Thanks! Currently, we only check `UseRVV` flag in `SharedRuntime::is_wide_vector` on RISC-V platforms. This is not optimal when no vector instructions is used by the nmethod. In this case, the the input size parameter is zero. We should consider this case so that we choose the right sub in `SharedRuntime::get_poll_stub` when handling safepoint. ------------- Commit messages: - 8375657: RISC-V: Need to check size in SharedRuntime::is_wide_vector Changes: https://git.openjdk.org/jdk/pull/29307/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29307&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375657 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29307/head:pull/29307 PR: https://git.openjdk.org/jdk/pull/29307 From fjiang at openjdk.org Tue Jan 20 01:21:37 2026 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 20 Jan 2026 01:21:37 GMT Subject: RFR: 8375657: RISC-V: Need to check size in SharedRuntime::is_wide_vector In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 01:05:43 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, we only check `UseRVV` flag in `SharedRuntime::is_wide_vector` on RISC-V platforms. This is not optimal when no vector instructions is used by the nmethod. In this case, the the input size parameter is zero. We should consider this case so that we choose the right sub in `SharedRuntime::get_poll_stub` when handling safepoint. Thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/29307#pullrequestreview-3679985930 From fyang at openjdk.org Tue Jan 20 01:31:01 2026 From: fyang at openjdk.org (Fei Yang) Date: Tue, 20 Jan 2026 01:31:01 GMT Subject: RFR: 8375657: RISC-V: Need to check size in SharedRuntime::is_wide_vector In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 01:05:43 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, we only check `UseRVV` flag in `SharedRuntime::is_wide_vector` on RISC-V platforms. This is not optimal when no vector instructions is used by the nmethod. In this case, the the input size parameter is zero. We should consider this case so that we choose the right sub in `SharedRuntime::get_poll_stub` when handling safepoint. Looks reasonable. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29307#pullrequestreview-3679996102 From erfang at openjdk.org Tue Jan 20 01:33:36 2026 From: erfang at openjdk.org (Eric Fang) Date: Tue, 20 Jan 2026 01:33:36 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v8] In-Reply-To: References: Message-ID: <6_ZqCg4GNfBg0NonFKzf24z0NCkuRKRZQ9T2VVXUg0E=.745f107f-985f-4e7a-ac51-63c86ac7034f@github.com> On Mon, 19 Jan 2026 08:57:23 GMT, Jatin Bhateja wrote: > > @jatin-bhateja I have no further comments, great work. After this PR is merged, I will complete the backend optimization of the aarch64 part based on it. Thanks! > > Thanks @erifan , I think partial case is specific for AARCH64 backend and tests should accompany relevant AARCH64 changes. Thanks, @jatin-bhateja. Make sense. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24104#issuecomment-3770638063 From xgong at openjdk.org Tue Jan 20 01:45:59 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 20 Jan 2026 01:45:59 GMT Subject: Integrated: 8370666: VectorAPI: Add clear comments for vector relative code in c2 In-Reply-To: References: Message-ID: <-PoPCD5issQjkgIDEMGIC4vxIHe64_I-dPHPv1ra5Jg=.1888c7cd-0203-4a46-a9cd-05b48c7f5c5a@github.com> On Fri, 9 Jan 2026 01:36:50 GMT, Xiaohong Gong wrote: > The VectorMask implementation in Vector API involves complex interactions between types, nodes, and platform-specific > features, making the related code in HotSpot difficult to understand and review. > > This patch adds comprehensive comments for vector mask related types, nodes, and methods in C2 to improve code clarity and > maintainability. > > Note: This patch only adds comments; no functional changes are made. This pull request has now been integrated. Changeset: 303de9a3 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/303de9a3f2ba93f0bbe42044483a0b48c82b70cb Stats: 332 lines in 5 files changed: 171 ins; 118 del; 43 mod 8370666: VectorAPI: Add clear comments for vector relative code in c2 Reviewed-by: epeter, jbhateja, qamai ------------- PR: https://git.openjdk.org/jdk/pull/29130 From xgong at openjdk.org Tue Jan 20 01:45:57 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 20 Jan 2026 01:45:57 GMT Subject: RFR: 8370666: VectorAPI: Add clear comments for vector relative code in c2 [v4] In-Reply-To: References: <1OhWKmT-TY2mfTTwsLPo1H3swd8gx0DVx0dwO0ZT1_E=.dd0e8f33-e7a9-4fd4-b431-518fe055c158@github.com> Message-ID: On Mon, 19 Jan 2026 08:14:17 GMT, Emanuel Peter wrote: >> Hi @eme64 , @jatin-bhateja , I'v updated a new commit to address the remaining comments. Could you please take another look? Thanks a lot! > > @XiaohongGong Thanks for putting the effort into better documentation, really much appreciated ? Thanks so much for your review @eme64 , @jatin-bhateja and @merykitty ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29130#issuecomment-3770657234 From qamai at openjdk.org Tue Jan 20 02:50:59 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 20 Jan 2026 02:50:59 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic Message-ID: Hi, This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. For example, given `r = CmpU(x, y)`. At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - CmpUNode::sub is not monotonic Changes: https://git.openjdk.org/jdk/pull/29308/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375653 Stats: 159 lines in 2 files changed: 64 ins; 77 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/29308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29308/head:pull/29308 PR: https://git.openjdk.org/jdk/pull/29308 From qamai at openjdk.org Tue Jan 20 02:53:31 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 20 Jan 2026 02:53:31 GMT Subject: RFR: 8375618: Incorrect assert in CastLLNode::Ideal [v2] In-Reply-To: References: Message-ID: > Hi, > > This PR fixes the incorrect assert in `CastLLNode::Ideal`. The assert intends to verify that the output is either a proper subset of or the same as the input. It does so by checking the signed lower bound and signed upper bound of the `TypeLong` instances. This method is not correct now. > > Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Make properly_contains debug only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29304/files - new: https://git.openjdk.org/jdk/pull/29304/files/4778bcba..948a0198 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29304&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29304&range=00-01 Stats: 10 lines in 2 files changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29304.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29304/head:pull/29304 PR: https://git.openjdk.org/jdk/pull/29304 From qamai at openjdk.org Tue Jan 20 02:53:31 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 20 Jan 2026 02:53:31 GMT Subject: RFR: 8375618: Incorrect assert in CastLLNode::Ideal In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 17:45:14 GMT, Chen Liang wrote: >> Hi, >> >> This PR fixes the incorrect assert in `CastLLNode::Ideal`. The assert intends to verify that the output is either a proper subset of or the same as the input. It does so by checking the signed lower bound and signed upper bound of the `TypeLong` instances. This method is not correct now. >> >> Please kindly review, thanks a lot. > > Should properly_contains be a product or a debug-only function? It is currently only used for debug, wonder what product uses it would see. @liach Thanks for taking a look, that's a good idea! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29304#issuecomment-3770784796 From shade at openjdk.org Tue Jan 20 05:34:00 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 20 Jan 2026 05:34:00 GMT Subject: RFR: 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code [v2] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 20:45:57 GMT, David Holmes wrote: >> Please review this simple enhancement to remove runtime checks for `VM_Version::supports_fast_class_init_checks()` in platform specific code, and replace them with asserts - as we know statically whether a given platform supports it. >> >> The changes are the same for each platform. >> >> Testing >> - building all platforms via GHA >> - tiers 1-3 (sanity) >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant block Looks fine to me! ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29293#pullrequestreview-3680406748 From dholmes at openjdk.org Tue Jan 20 06:24:55 2026 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Jan 2026 06:24:55 GMT Subject: RFR: 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code [v2] In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 05:30:24 GMT, Aleksey Shipilev wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove redundant block > > Looks fine to me! Thanks @shipilev ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29293#issuecomment-3771217015 From dholmes at openjdk.org Tue Jan 20 06:24:57 2026 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Jan 2026 06:24:57 GMT Subject: Integrated: 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 04:34:36 GMT, David Holmes wrote: > Please review this simple enhancement to remove runtime checks for `VM_Version::supports_fast_class_init_checks()` in platform specific code, and replace them with asserts - as we know statically whether a given platform supports it. > > The changes are the same for each platform. > > Testing > - building all platforms via GHA > - tiers 1-3 (sanity) > > Thanks This pull request has now been integrated. Changeset: ca6925ec Author: David Holmes URL: https://git.openjdk.org/jdk/commit/ca6925ec6bf44cf7d4704becc194389e4c87b74f Stats: 113 lines in 10 files changed: 10 ins; 11 del; 92 mod 8370112: Remove VM_Version::supports_fast_class_init_checks() in platform-specific code Reviewed-by: shade, fyang ------------- PR: https://git.openjdk.org/jdk/pull/29293 From xgong at openjdk.org Tue Jan 20 06:37:02 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 20 Jan 2026 06:37:02 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v5] In-Reply-To: References: <4oX30eH34qM7pVxT3HgnqRcprVQGiMBNcu_Yqq9VDas=.009cf016-10b0-4b3d-8d10-59ba9099c62e@github.com> Message-ID: On Fri, 16 Jan 2026 09:36:32 GMT, Emanuel Peter wrote: >> test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithmsImpl.java line 61: >> >>> 59: } >>> 60: return r; >>> 61: } >> >> You can add another flavor for vector API kernels where tail is implemented using masked operations. >> >> >> if (i < r.length) { >> VectorMask mask = SPECIES_I.indexInRange(i, r.length); >> v.intoArray(r, i, mask); >> } >> >> >> Simply replicated the loop body guarded by Mask. >> https://github.com/openjdk/jdk/pull/28002/changes#diff-b5c49811dff21107eb8c8ab0578be4cd235c6f69bafd879a8e4b4620b974c25bR153-R159 > > Good idea, I could! > However, it would mean I would have to probably add this version for every benchmark. > I'm wondering if that is worth it. > I think I won't add it now, but maybe in a follow-up RFE :) Creating a mask for each loop iteration may affect Vector API benchmark results, especially on SVE. If the benchmark is intended to compare Vector API performance against auto?vectorization, using a tail?loop version would be preferable for a more fair comparison. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2706933392 From thartmann at openjdk.org Tue Jan 20 06:46:46 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 20 Jan 2026 06:46:46 GMT Subject: [jdk26] Integrated: 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 07:08:06 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [1d889b92](https://github.com/openjdk/jdk/commit/1d889b92bde5dfcb1fbe6cddb389a77f92eb1ce7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Volodymyr Paprotski on 15 Jan 2026 and was reviewed by Tobias Hartmann, Jatin Bhateja and Sandhya Viswanathan. > > Thanks! This pull request has now been integrated. Changeset: 2f0d03d6 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/2f0d03d64e14fb2fd9d99e1885e584421aacca6b Stats: 19 lines in 2 files changed: 11 ins; 0 del; 8 mod 8360271: String.indexOf intrinsics fail with +EnableX86ECoreOpts and -CompactStrings Reviewed-by: mhaessig, sviswanathan Backport-of: 1d889b92bde5dfcb1fbe6cddb389a77f92eb1ce7 ------------- PR: https://git.openjdk.org/jdk/pull/29263 From epeter at openjdk.org Tue Jan 20 07:09:21 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 Jan 2026 07:09:21 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v5] In-Reply-To: References: <4oX30eH34qM7pVxT3HgnqRcprVQGiMBNcu_Yqq9VDas=.009cf016-10b0-4b3d-8d10-59ba9099c62e@github.com> Message-ID: On Tue, 20 Jan 2026 06:27:07 GMT, Xiaohong Gong wrote: >> Good idea, I could! >> However, it would mean I would have to probably add this version for every benchmark. >> I'm wondering if that is worth it. >> I think I won't add it now, but maybe in a follow-up RFE :) > > Creating a mask for each loop iteration may affect Vector API benchmark results, especially on SVE. If the benchmark is intended to compare Vector API performance against auto?vectorization, using a tail?loop version would be preferable for a more fair comparison. @XiaohongGong Thanks for the comment :) Right, we can do all of those things. We can have lots of variants to demonstrate different effects and trade-offs. I might do that in a follow-up RFE :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2707034336 From xgong at openjdk.org Tue Jan 20 07:09:23 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 20 Jan 2026 07:09:23 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v10] In-Reply-To: <6MVHXmojhitsx7Q291bDVHOhtWQ9PmFudaGpFj-kf5U=.799e395f-b1e4-4c7a-b593-174d05da72fc@github.com> References: <6MVHXmojhitsx7Q291bDVHOhtWQ9PmFudaGpFj-kf5U=.799e395f-b1e4-4c7a-b593-174d05da72fc@github.com> Message-ID: On Mon, 19 Jan 2026 15:35:18 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server >> >> `macosx_x64_sandybridge` >> > Emanuel Peter has updated the pull request incrementally with four additional commits since the last revision: > > - Data refactor part 4 > - Data refactor part 3 > - Data refactor part 2 > - Data refactor part 1 test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithmsImpl.java line 598: > 596: if (mask.anyTrue()) { > 597: var ml = mask.toLong(); > 598: return i + Long.numberOfTrailingZeros(ml); Can we use `mask.firstTrue()` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2706970674 From xgong at openjdk.org Tue Jan 20 07:17:32 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 20 Jan 2026 07:17:32 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v10] In-Reply-To: <6MVHXmojhitsx7Q291bDVHOhtWQ9PmFudaGpFj-kf5U=.799e395f-b1e4-4c7a-b593-174d05da72fc@github.com> References: <6MVHXmojhitsx7Q291bDVHOhtWQ9PmFudaGpFj-kf5U=.799e395f-b1e4-4c7a-b593-174d05da72fc@github.com> Message-ID: On Mon, 19 Jan 2026 15:35:18 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server >> >> `macosx_x64_sandybridge` >> > Emanuel Peter has updated the pull request incrementally with four additional commits since the last revision: > > - Data refactor part 4 > - Data refactor part 3 > - Data refactor part 2 > - Data refactor part 1 > If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. linux_aarch64_server: filterI, scanAddI, reduceAddIFieldsX4 are very slow Hi @eme64 , I noticed that these benchmarks have even much worse performance on AAch64. May I ask whether the machine supports SVE/SVE2 feature? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3771371623 From xgong at openjdk.org Tue Jan 20 07:17:34 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 20 Jan 2026 07:17:34 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v5] In-Reply-To: References: <4oX30eH34qM7pVxT3HgnqRcprVQGiMBNcu_Yqq9VDas=.009cf016-10b0-4b3d-8d10-59ba9099c62e@github.com> Message-ID: On Tue, 20 Jan 2026 07:05:39 GMT, Emanuel Peter wrote: >> Creating a mask for each loop iteration may affect Vector API benchmark results, especially on SVE. If the benchmark is intended to compare Vector API performance against auto?vectorization, using a tail?loop version would be preferable for a more fair comparison. > > @XiaohongGong Thanks for the comment :) > Right, we can do all of those things. We can have lots of variants to demonstrate different effects and trade-offs. > I might do that in a follow-up RFE :) Yeah, I agree adding another version of Vector API benchmark used with mask. That's fine to me. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2707056738 From epeter at openjdk.org Tue Jan 20 07:22:00 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 Jan 2026 07:22:00 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v11] In-Reply-To: References: Message-ID: <-avpi_0V0b9jLf29Sah1VB8ZiOWu9MSFl8X3vO22NFQ=.0c006266-c5eb-4077-b238-311e6190486f@github.com> > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <-yc3O4xvk6Wy7-EouV2BoiK5iydinqSFez5WKCeCUdw=.b006cc2b-d490-4c66-a103-fa4459b35ecb@github.com> Message-ID: On Fri, 16 Jan 2026 09:42:57 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopnode.hpp line 1440: >> >>> 1438: // result control flow branches >>> 1439: // either to inner clone or outer >>> 1440: // strip mined loop. >> >> I have trouble understanding the comments here (not your fault, it was here already). >> I'm also wondering if this is only used for `post_loop`? If so, maybe we could rename it, and improve the comments here? > > At least in your code, it would read much better if it was called `InsertPost` > I have trouble understanding the comments here (not your fault, it was here already). I'm also wondering if this is only used for `post_loop`? If so, maybe we could rename it, and improve the comments here? `ControlAroundStripMined` is not used only for `post_loop`; it is also used by `do_peeling()` and `duplicate_loop_backedge()`. For that reason, I?m afraid renaming it to something like `InsertPost` wouldn?t be appropriate. Based on my understanding, `ControlAroundStripMined` means that only the inner strip-mined loop is cloned, and a control-flow decision is inserted around the inner clone, allowing the exit control flow either to the inner clone or to the outer strip-mined loop. How about refining the comments as follows: ... ControlAroundStripMined = 2, // Only clone the inner strip-mined loop and insert // control flow around it. Exit control flow // branches either to inner clone or to the outer // strip-mined loop. InsertVectorizedDrain = 3 // Only clone the inner strip-mined vector loop and // insert control flow that branches either to the // cloned inner loop or to the scalar post loop. ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2705287124 From shade at openjdk.org Tue Jan 20 11:29:10 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 20 Jan 2026 11:29:10 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v11] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8360557-ctw-inlining - JDK-8375046 fix - JDK-8375694 POC fix - Merge branch 'master' into JDK-8360557-ctw-inlining - Debug - Merge branch 'master' into JDK-8360557-ctw-inlining - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/fd0f506e..6513fc52 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=09-10 Stats: 9823 lines in 371 files changed: 5558 ins; 1784 del; 2481 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From jbhateja at openjdk.org Tue Jan 20 11:56:16 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 20 Jan 2026 11:56:16 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v13] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28002/files - new: https://git.openjdk.org/jdk/pull/28002/files/273b219e..fe7075ee Webrevs: - full: Webrev is not available because diff is too large - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=11-12 Stats: 85 lines in 4 files changed: 0 ins; 3 del; 82 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From jbhateja at openjdk.org Tue Jan 20 11:56:19 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 20 Jan 2026 11:56:19 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v12] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 15:49:49 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Adding testpoint for JDK-8373574 >> - Review comments resolutions >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Fix incorrect argument passed to smokeTest >> - Fix from Bhavana Kilambi for failing JTREG regressions on AARCH64 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Including test changes from Bhavana Kilambi (ARM) >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Optimizing tail handling >> - ... and 18 more: https://git.openjdk.org/jdk/compare/499b5882...273b219e > > src/hotspot/share/utilities/globalDefinitions.hpp line 741: > >> 739: inline bool is_custom_basic_type(BasicType t) { >> 740: return (t == T_FLOAT16); >> 741: } > > What exactly is the definition of a "custom" basic type? Is it defined somewhere? > If not, it would be useful to define it here. > > I assume you chose the name because we expect more such "custom" basic types in the future? You are right, primarily basic types are driven by JVM language specification...in this case T_FLOAT16 is a non standard basic type. > test/jdk/jdk/incubator/vector/IntVectorMaxTests.java line 68: > >> 66: static IntVector bcast_vec = IntVector.broadcast(SPECIES, (int)10); >> 67: >> 68: static void AssertEquals(int actual, int expected) { > > There are lots of changes in this file that do not seem to have anything to do with Float16. Please file them separately. It will make review much easier. I have added an assertion wrapper so that float16 values (short) can be converted to float before calling actual Assert.* routines to handle all possible NaN bit patterns. Since the tests are generate from common template hence these changes appear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2708024220 PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2708023788 From bmaillard at openjdk.org Tue Jan 20 12:32:18 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 20 Jan 2026 12:32:18 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 13:55:53 GMT, Roland Westrelin wrote: >> I think we should use the following test, which is quite concise and only takes a few seconds to execute thanks to setting `memlimit` to `100M`. >> >> ```c++ >> /** >> * @test >> * @key stress randomness >> * @bug 8370519 >> * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations >> * @run main/othervm -XX:CompileCommand=compileonly,${test.main.class}::* -XX:-TieredCompilation -Xbatch >> * -XX:+UnlockDiagnosticVMOptions -XX:+IgnoreUnrecognizedVMOptions >> * -XX:+StressLoopPeeling -XX:+VerifyLoopOptimizations >> * -XX:CompileCommand=memlimit,${test.main.class}::*,100M~crash >> * -XX:StressSeed=3106998670 ${test.main.class} >> * @run main ${test.main.class} >> */ >> >> package compiler.c2; >> >> public class TestVerifyLoopOptimizationsHighMemUsage { >> >> static int b = 400; >> static long c; >> static boolean d; >> >> static long lMeth(int e) { >> int f, g, h, k[] = new int[b]; >> long l[] = new long[b]; >> boolean m[] = new boolean[b]; >> for (f = 5; f < 330; ++f) >> for (g = 1; g < 5; ++g) >> for (h = 2; h > 1; h -= 3) >> switch (f * 5 + 54) { >> case 156: >> case 354: >> case 98: >> case 173: >> case 120: >> case 374: >> case 140: >> case 57: >> case 106: >> case 306: >> case 87: >> case 399: >> k[1] = (int)c; >> case 51: >> case 287: >> case 148: >> case 70: >> case 74: >> case 59: >> m[h] = d; >> } >> long n = p(l); >> return n; >> } >> >> public static long p(long[] a) { >> long o = 0; >> for (int j = 0; j < a.length; j++) >> o += j; >> return o; >> } >> >> public static void main(String[] args) { >> for (int i = 0; i < 10; i++) >> lMeth(9); >> } >> } > > @benoitmaillard how feasible/time consuming would it be to find a more robust test case? Do you agree with my concern above? Apologies for the delay @rwestrel, this somehow got under the radar after Christmas. I agree 100% with your concerns. I think it's worth it to give it a try, I will launch another run of creduce and come back to you with the results (this should go a bit faster this time). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3772652281 From chagedorn at openjdk.org Tue Jan 20 12:40:26 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 20 Jan 2026 12:40:26 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v6] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 15:32:07 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'master' into JDK-8373343 > - Update src/hotspot/share/opto/macroArrayCopy.cpp > > Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> > - more > - more > - review > - Merge branch 'master' into JDK-8373343 > - review > - review > - review > - merge > - ... and 5 more: https://git.openjdk.org/jdk/compare/40060a1b...2f618436 Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28769#pullrequestreview-3681978641 From roland at openjdk.org Tue Jan 20 12:46:29 2026 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 Jan 2026 12:46:29 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v3] In-Reply-To: <-GY9oBe-WRSh16Yi90rp79Xxi784nRtvdqBlMh4TiMs=.ba972df0-9298-4f94-8699-597601bd39ba@github.com> References: <-GY9oBe-WRSh16Yi90rp79Xxi784nRtvdqBlMh4TiMs=.ba972df0-9298-4f94-8699-597601bd39ba@github.com> Message-ID: <-QGYQkvP71rFI0oQB-6GS7kyfmOTBp882G16GXXRuWs=.b608c29b-ee05-4a1f-9143-99df5a31238f@github.com> On Sat, 20 Dec 2025 03:31:13 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Tests passed. @dean-long @chhagedorn @merykitty thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/28769#issuecomment-3772706108 From epeter at openjdk.org Tue Jan 20 13:52:53 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 Jan 2026 13:52:53 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 20 Jan 2026 11:12:35 GMT, Fei Gao wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1841: >> >>> 1839: } >>> 1840: } >>> 1841: >> >> A few questions: >> - Why not cache the `skip_assertion_predicates_with_halt` values? Do the values change over time? If there are lots of predicates, you will do a traversal over and over again. >> - Why do we need this special logic for the `drain` loop cloning? What makes it different to other cloning cases? >> - The new ctrl you set is either at the post-head entry, or after skipping the predicates. Why did you chose those? > > Thanks for the questions. Great points. > >> Why not cache the `skip_assertion_predicates_with_halt` values? Do the values change over time? If there are lots of predicates, you will do a traversal over and over again. > > You?re right. There?s no strong reason not to cache the result of `skip_assertion_predicates_with_halt() `here. We can do that to avoid repeated traversals when there are many predicates. I?ll refine the code accordingly. > >> Why do we need this special logic for the drain loop cloning? What makes it different to other cloning cases? > > Drain-loop cloning is significantly more complex than other cloning cases. > > When cloning a `post loop`, the loop structure is relatively simple and does not involve a `pre-loop` or a `minimum-trip guard`. The control-flow shape typically looks like: > > ... > / \ > ... multiple predicates > / \ > EntryControl > / \ > main loop head > > In this case, some nodes in the main loop may take any control node between the predicate `IfTrue` nodes and the loop `EntryControl` as their control input, meaning they can float along that control chain. When such nodes are cloned, their control inputs need to be fixed to the corresponding nodes in the cloned loop. > > This is already handled by `initialize_assertion_predicates_for_post_loop()`, which correctly fixes control inputs for `TemplateAssertionPredicates`. See https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/src/hotspot/share/opto/predicates.cpp#L157 > > However, when inserting a `drain loop`, the control-flow structure is more involved: > > main zero-trip guard > / \ > IfFalse IfTrue > / \ > ... multiple predicates > / \ > EntryControl > / \ > main loop head > > > Here, the main loop is preceded by both a `pre-loop` and a `minimum-trip guard`. As a result, control inputs of nodes inside the main loop may originate anywhere from the `IfTrue` of the `minimum-trip guard` down to the loop `EntryControl`. These cases are outside the scope of `initialize_assertion_predicates_for_post_loop()`, which only handles `TemplateAssertionPredicates`. > > In practice, I?ve also observed nodes attached to `InitializedAssertionPredicates`, which are not covered by that helper either. > > This is why we need separate, more ... Thanks for the explanations! They sound reasonable to me. Though eventually it would be good if @chhagedorn or @rwestrel looked at this, they are more familiar with this code. One more question here: could it be that one node that you now conservatively pin further down actually already has a use in a predicate further up, and now we'd create a `bad graph` cycle? > The worklist should be empty before we reach this point. Then we should add an assert, both for correctness and the benefit of the reader :) >> src/hotspot/share/opto/loopopts.cpp line 2485: >> >>> 2483: } >>> 2484: >>> 2485: Node_List visit_list; >> >> Suggestion: >> >> ResourceMark rm; >> Node_List visit_list; >> >> Can we do this, or do we run into issues? > > No, we can?t. I think we discussed this before. Since `old_new` can grow within this scope, we can?t use `ResourceMark` here. Oh dear, we probably discussed this before. Ok, that's a shame, but fine. If it should create issues down the road we can see what to do about then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2708426565 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2708439901 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2708436382 From epeter at openjdk.org Tue Jan 20 13:52:55 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 Jan 2026 13:52:55 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <-yc3O4xvk6Wy7-EouV2BoiK5iydinqSFez5WKCeCUdw=.b006cc2b-d490-4c66-a103-fa4459b35ecb@github.com> Message-ID: <2PGhdGpByhnFkLyZCdgARpLPqqkTnXVSNkTCeM0GZDQ=.818a6be0-e16e-4a98-93b8-2316b10dc703@github.com> On Mon, 19 Jan 2026 15:48:48 GMT, Fei Gao wrote: >> At least in your code, it would read much better if it was called `InsertPost` > >> I have trouble understanding the comments here (not your fault, it was here already). I'm also wondering if this is only used for `post_loop`? If so, maybe we could rename it, and improve the comments here? > > `ControlAroundStripMined` is not used only for `post_loop`; it is also used by `do_peeling()` and `duplicate_loop_backedge()`. For that reason, I?m afraid renaming it to something like `InsertPost` wouldn?t be appropriate. > > Based on my understanding, `ControlAroundStripMined` means that only the inner strip-mined loop is cloned, and a control-flow decision is inserted around the inner clone, allowing the exit control flow either to the inner clone or to the outer strip-mined loop. > > How about refining the comments as follows: > > ... > ControlAroundStripMined = 2, // Only clone the inner strip-mined loop and insert > // control flow around it. Exit control flow > // branches either to inner clone or to the outer > // strip-mined loop. > InsertVectorizedDrain = 3 // Only clone the inner strip-mined vector loop and > // insert control flow that branches either to the > // cloned inner loop or to the scalar post loop. > ... Ok, that sounds good :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2708430638 From epeter at openjdk.org Tue Jan 20 14:05:18 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 Jan 2026 14:05:18 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <_gdObRdkYkS7d3fQQ6bcms709TpeM1IQtuPJtI0fcyE=.073d0496-8dfc-48a9-aee8-b64b408a6e62@github.com> Message-ID: On Tue, 13 Jan 2026 15:10:29 GMT, Fei Gao wrote: >> @fg1417 I hope you had a good start into the new year. I'd love to make this PR a bit of a priority in the next weeks. Would you mind synching with master and fixing merge conflicts? >> >> I'd review, run testing and look into running some benchmarks myself. > > Hi @eme64 the PR is ready for review and testing. Thanks! @fg1417 Thanks for your responses! I realized that going at this pace it will take me a while more to dig through everything. I'll still have to continue at `fix_data_uses_for_vectorized_drain`. But for now, I'll run some testing and benchmarking., and continue the review in parallel as I have time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3773031454 From ghan at openjdk.org Tue Jan 20 14:56:54 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Tue, 20 Jan 2026 14:56:54 GMT Subject: RFR: 8375598: VM crashes with "assert((labs(val) & 0xFFFFFFFF00000000) == 0 || dest == (address)-1) failed: must be 32bit offset or -1" when using too high value for NonNMethodCodeHeapSize Message-ID: Please review this change. Thanks! **Description:** On x86/x64, near calls/jumps use 32-bit signed PC-relative displacements. With SegmentedCodeCache enabled, a very large NonNMethodCodeHeapSize can inflate the derived ReservedCodeCacheSize, causing the code cache span to exceed the reach of 32-bit relative branches. This may later lead to relocation failures (e.g. "must be 32bit offset") when installing nmethods. https://github.com/openjdk/jdk/blob/037040129e82958bd023e0b24d962627e8653710/src/hotspot/cpu/x86/nativeInst_x86.hpp#L433-L440 **Fix:** Add an x86-specific validation in CodeCache::initialize_heaps() after final segment alignment. If the computed code cache size exceeds max_jint bytes, abort VM initialization with a clear error message that includes the segment sizes, instead of failing later during compilation/relocation. **Test:** GHA ------------- Commit messages: - fix 8375598 Changes: https://git.openjdk.org/jdk/pull/29324/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29324&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375598 Stats: 23 lines in 2 files changed: 20 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29324.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29324/head:pull/29324 PR: https://git.openjdk.org/jdk/pull/29324 From aph at openjdk.org Tue Jan 20 15:01:59 2026 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Jan 2026 15:01:59 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v2] In-Reply-To: <3HfqoF6XNWDXq8P95PQ78B1_QquFMPDTkcuXPbmybNs=.cc8fd652-9949-4a0d-bf18-76cad5aac332@github.com> References: <3HfqoF6XNWDXq8P95PQ78B1_QquFMPDTkcuXPbmybNs=.cc8fd652-9949-4a0d-bf18-76cad5aac332@github.com> Message-ID: On Tue, 20 Jan 2026 10:01:31 GMT, Eric Fang wrote: >> This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. >> >> Changes: >> -------- >> >> 1. C2 mid-end: >> - Added UMinReductionVNode and UMaxReductionVNode >> >> 2. AArch64 Backend: >> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions >> - Updated match rules for all vector sizes and element types >> - Both NEON and SVE implementation are supported >> >> 3. Test: >> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java >> - Added assembly tests in aarch64-asmtest.py for new instructions >> - Added a JTReg test file VectorUMinMaxReductionTest.java >> >> Different configurations were tested on aarch64 and x86 machines, and all tests passed. >> >> Test results of JMH benchmarks from the panama-vector project: >> -------- >> >> On a Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 >> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 >> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 >> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 >> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 >> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 >> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 >> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 >> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 >> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 >> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 >> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 >> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 >> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 >> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 >> Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 >> Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 >> ... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Rebase commit 56d7b52 > - Merge branch 'master' into JDK-8372980-umin-umax-intrinsic > - 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations > > This patch adds intrinsic support for UMIN and UMAX reduction operations > in the Vector API on AArch64, enabling direct hardware instruction mapping > for better performance. > > Changes: > -------- > > 1. C2 mid-end: > - Added UMinReductionVNode and UMaxReductionVNode > > 2. AArch64 Backend: > - Added uminp/umaxp/sve_uminv/sve_umaxv instructions > - Updated match rules for all vector sizes and element types > - Both NEON and SVE implementation are supported > > 3. Test: > - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java > - Added assembly tests in aarch64-asmtest.py for new instructions > - Added a JTReg test file VectorUMinMaxReductionTest.java > > Different configurations were tested on aarch64 and x86 machines, and > all tests passed. > > Test results of JMH benchmarks from the panama-vector project: > -------- > > On a Nvidia Grace machine with 128-bit SVE: > ``` > Benchmark Unit Before Error After Error Uplift > Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 > Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 > Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 > Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 > Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 > Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 > Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 > Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 > Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 > Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 > Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 > Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 > Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 > Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 > Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 > Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 > Short128Vector.UMAXLanes ops/ms 316.65 4... I found the selection logic getting more complex and harder to follow. If you refactor things a little by making signed/unsigned a parameter to the assembly instruction, you can do this. While this approach makes the assembler a bit more fiddly, `neon_reduce_minmax_integral()` is better. See what you think. diff --git a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp index fc6e58b801c..35f29db1675 100644 --- a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp +++ b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp @@ -2625,17 +2625,27 @@ template #undef INSN // Advanced SIMD three different -#define INSN(NAME, opc, opc2, acceptT2D) \ - void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm) { \ + +#define neon3different(U, opc, acceptT2D, Vd, T, Vn, Vm) \ + { \ guarantee(T != T1Q && T != T1D, "incorrect arrangement"); \ if (!acceptT2D) guarantee(T != T2D, "incorrect arrangement"); \ - if (opc2 == 0b101101) guarantee(T != T8B && T != T16B, "incorrect arrangement"); \ + if (opc == 0b101101) guarantee(T != T8B && T != T16B, "incorrect arrangement"); \ starti; \ - f(0, 31), f((int)T & 1, 30), f(opc, 29), f(0b01110, 28, 24); \ - f((int)T >> 1, 23, 22), f(1, 21), rf(Vm, 16), f(opc2, 15, 10); \ + f(0, 31), f((int)T & 1, 30), f(U, 29), f(0b01110, 28, 24); \ + f((int)T >> 1, 23, 22), f(1, 21), rf(Vm, 16), f(opc, 15, 10); \ rf(Vn, 5), rf(Vd, 0); \ } +#define INSN(NAME, U, opc, acceptT2D) \ + void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm) { \ + neon3different(U, opc, acceptT2D, Vd, T, Vn, Vm); \ + } +#define INSN2(NAME, opc, acceptT2D) \ +void NAME(int U, FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm) { \ + neon3different(U, opc, acceptT2D, Vd, T, Vn, Vm); \ + } + INSN(addv, 0, 0b100001, true); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S, T2D INSN(subv, 1, 0b100001, true); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S, T2D INSN(sqaddv, 0, 0b000011, true); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S, T2D @@ -2663,21 +2673,33 @@ template INSN(sqdmulh,0, 0b101101, false); // accepted arrangements: T4H, T8H, T2S, T4S INSN(shsubv, 0, 0b001001, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S + INSN2(maxp, 0b101001, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S + INSN2(minp, 0b101011, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S + #undef INSN +#undef INSN2 // Advanced SIMD across lanes -#define INSN(NAME, opc, opc2, accepted) \ - void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn) { \ +#define neonAcrossLanes(U, opc, accepted, Vd, T, Vn) \ + { \ guarantee(T != T1Q && T != T1D, "incorrect arrangement"); \ if (accepted < 3) guarantee(T != T2D, "incorrect arrangement"); \ if (accepted < 2) guarantee(T != T2S, "incorrect arrangement"); \ if (accepted < 1) guarantee(T == T8B || T == T16B, "incorrect arrangement"); \ starti; \ - f(0, 31), f((int)T & 1, 30), f(opc, 29), f(0b01110, 28, 24); \ - f((int)T >> 1, 23, 22), f(opc2, 21, 10); \ + f(0, 31), f((int)T & 1, 30), f(U, 29), f(0b01110, 28, 24); \ + f((int)T >> 1, 23, 22), f(opc, 21, 10); \ rf(Vn, 5), rf(Vd, 0); \ } +#define INSN(NAME, U, opc, accepted) \ + void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn) { \ + neonAcrossLanes(U, opc, accepted, Vd, T, Vn); \ + } +#define INSN2(NAME, opc, accepted) \ + void NAME(int U, FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn) { \ + neonAcrossLanes(U, opc, accepted, Vd, T, Vn); \ + } INSN(absr, 0, 0b100000101110, 3); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S, T2D INSN(negr, 1, 0b100000101110, 3); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S, T2D INSN(notr, 1, 0b100000010110, 0); // accepted arrangements: T8B, T16B @@ -2692,7 +2714,11 @@ template INSN(uaddlp, 1, 0b100000001010, 2); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S INSN(uaddlv, 1, 0b110000001110, 1); // accepted arrangements: T8B, T16B, T4H, T8H, T4S + INSN2(maxv, 0b110000101010, 1); // accepted arrangements: T8B, T16B, T4H, T8H, T4S + INSN2(minv, 0b110001101010, 1); // accepted arrangements: T8B, T16B, T4H, T8H, T4S + #undef INSN +#undef INSN2 #define INSN(NAME, opc) \ void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn) { \ diff --git a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp index 3431b4f700a..b6e5e04e1e8 100644 --- a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp @@ -1973,10 +1973,22 @@ void C2_MacroAssembler::neon_reduce_minmax_integral(int opc, Register dst, Basic assert(bt == T_BYTE || bt == T_SHORT || bt == T_INT || bt == T_LONG, "unsupported"); assert_different_registers(dst, isrc); bool isQ = vector_length_in_bytes == 16; - bool is_min = (opc == Op_MinReductionV || opc == Op_UMinReductionV); - bool is_unsigned = (opc == Op_UMinReductionV || opc == Op_UMaxReductionV); - Assembler::Condition cond = is_min ? (is_unsigned ? Assembler::LO : Assembler::LT) - : (is_unsigned ? Assembler::HI : Assembler::GT); + + bool is_min; + int is_unsigned; + Condition cond; + switch(opc) { + case Op_MinReductionV: + is_min = true; is_unsigned = false; cond = LT; break; + case Op_MaxReductionV: + is_min = false; is_unsigned = false; cond = GT; break; + case Op_UMinReductionV: + is_min = true; is_unsigned = true; cond = LO; break; + case Op_UMaxReductionV: + is_min = false; is_unsigned = true; cond = HI; break; + default: + ShouldNotReachHere(); + } BLOCK_COMMENT("neon_reduce_minmax_integral {"); if (bt == T_LONG) { @@ -1993,18 +2005,12 @@ void C2_MacroAssembler::neon_reduce_minmax_integral(int opc, Register dst, Basic if (size == T2S) { // For T2S (2x32-bit elements), use pairwise instructions because // uminv/umaxv/sminv/smaxv don't support arrangement 2S. - if (is_unsigned) { - is_min ? uminp(vtmp, size, vsrc, vsrc) : umaxp(vtmp, size, vsrc, vsrc); - } else { - is_min ? sminp(vtmp, size, vsrc, vsrc) : smaxp(vtmp, size, vsrc, vsrc); - } + is_min ? minp(is_unsigned, vtmp, size, vsrc, vsrc) + : maxp(is_unsigned, vtmp, size, vsrc, vsrc); } else { // For other sizes, use reduction to scalar instructions. - if (is_unsigned) { - is_min ? uminv(vtmp, size, vsrc) : umaxv(vtmp, size, vsrc); - } else { - is_min ? sminv(vtmp, size, vsrc) : smaxv(vtmp, size, vsrc); - } + is_min ? minv(is_unsigned, vtmp, size, vsrc) + : maxv(is_unsigned, vtmp, size, vsrc); } if (bt == T_INT) { umov(dst, vtmp, S, 0); ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3773303625 From mchevalier at openjdk.org Tue Jan 20 15:54:53 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 20 Jan 2026 15:54:53 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order [v2] In-Reply-To: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: > As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. > > I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. > There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Randomize insertion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29110/files - new: https://git.openjdk.org/jdk/pull/29110/files/38569db4..f45e368e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29110&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29110&range=00-01 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29110/head:pull/29110 PR: https://git.openjdk.org/jdk/pull/29110 From adinn at openjdk.org Tue Jan 20 16:39:43 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 20 Jan 2026 16:39:43 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression can be simplified on aarch64 [v3] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Mon, 19 Jan 2026 14:01:56 GMT, Ferenc Rakoczi wrote: >> The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > improve comment This looks good now, thank you. I'm a little unhappy that the initial test did not detect the reads and writes that overflowed the end of, respectively, the input and output arrays. That may indeed be fixed now but it would have been nicer it the test had been able to catch the error. However, I understand that it is hard to achieve that when driving the VM from Java. So, let's hope we don't need any more changes or, if we do, we do our best to ensure (by eyeball) that we don't overshoot the end of the arrays. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29141#pullrequestreview-3683152734 From fgao at openjdk.org Tue Jan 20 17:08:26 2026 From: fgao at openjdk.org (Fei Gao) Date: Tue, 20 Jan 2026 17:08:26 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: <8huN5sDf2y95Hq2iuaMXN7aLeSik_gUnHcSpcc82Exw=.38fd6510-3fcd-4a28-a1c3-29eb18f51724@github.com> On Tue, 20 Jan 2026 13:45:44 GMT, Emanuel Peter wrote: > One more question here: could it be that one node that you now conservatively pin further down actually already has a use in a predicate further up, and now we'd create a `bad graph` cycle? If a node has a `use` that is attached to a predicate further up, then that `use` would also be pinned down to the loop `entry control`. Since we also fix the control of the `use`, which is itself a cloned node, I would expect that we wouldn?t end up creating a bad control-flow cycle. Does that make sense? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2709269918 From dlong at openjdk.org Tue Jan 20 17:25:31 2026 From: dlong at openjdk.org (Dean Long) Date: Tue, 20 Jan 2026 17:25:31 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v4] In-Reply-To: <0ACkKqpv7QYOl_JgsGfTRxefXJiwKYjS5QZIXhYMEv8=.f4f9a28c-a8f8-4ee8-8694-9bacdf192c3e@github.com> References: <0ACkKqpv7QYOl_JgsGfTRxefXJiwKYjS5QZIXhYMEv8=.f4f9a28c-a8f8-4ee8-8694-9bacdf192c3e@github.com> Message-ID: On Fri, 16 Jan 2026 11:57:48 GMT, Emanuel Peter wrote: >> I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 >> >> In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 >> >> When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. >> >> At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. >> But it seems that nothing prevents the VM from compiling such an (unreachable) path. >> >> Here is how I think it happens: >> - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. >> - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. >> - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. >> >> https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 >> >> That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. >> >> **Update: ** instead of not intrinsifying, we chose a stronger path: we intrinsify it with a `HaltNode` that should never be encountered at runtime. The reproducer I have eventually even is able to fold away the `HaltNode`, so it turns out it is indeed a dead path. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > improve comments for merykitty Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29169#pullrequestreview-3683374277 From epeter at openjdk.org Tue Jan 20 18:33:44 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 Jan 2026 18:33:44 GMT Subject: RFR: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float [v2] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 11:11:52 GMT, Dean Long wrote: >> @dean-long @merykitty @rose00 I did the refactor. We could now consider doing a separate refactor for the non-parsing use-cases of `HaltNode`, but that's out of scope. >> >> Of course there is a small risk that I messed up something here, so please review carefully - we don't want to accidentally mess up `HaltNode`s and loose their asserting powers! > >> Of course there is a small risk that I messed up something here, so please review carefully - we don't want to accidentally mess up HaltNodes and loose their asserting powers! > > Looks good to me, but I suspect we don't have great test coverage for HaltNodes, since they are never supposed to get executed. @dean-long @merykitty Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29169#issuecomment-3774349716 From epeter at openjdk.org Tue Jan 20 18:33:46 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 Jan 2026 18:33:46 GMT Subject: Integrated: 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 13:34:28 GMT, Emanuel Peter wrote: > I found this bug with the Template Framework (Vector API Library extension): https://github.com/openjdk/jdk/pull/28873 > > In `VectorCastNode::opcode`, we have an assert that we **cannot** have an `unsigned cast from float`, it would be nonsense. > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/hotspot/share/opto/vectornode.cpp#L1490-L1491 > > When we intrinsify `VectorSupport.convert`, we get a constant from the VectorAPI, that determines if we have a signed or unsigned cast, and other constants that determine the `from` and `to` types. > > At runtime, the VectorAPI implementation can (I assume) never take a path of `unsigned cast from float`. > But it seems that nothing prevents the VM from compiling such an (unreachable) path. > > Here is how I think it happens: > - `AbstractVector::castShape` creates a `C` conversion (lane-wise conversion). So at runtime, we will take the `C` switch-case in `AbstractVector::convert0`. > - Profiling can also make the `Z` (lane-wise reinterpret) switch-case live, because of other cast and reinterpret calls that use that code path. And we may not (yet) have proven that the `Z` switch-case is never taken. > - So we end up compiling the `Z` path, and call `VectorSupport.convert` with types `float` and `long`. And since the `Z` path sees that the from size is smaller than the to size, we get a `UCAST` (zero extension). Hence, we try to intrinsify a vector unsigned cast from float, and hit the assert. > > https://github.com/openjdk/jdk/blob/2fbe47559e9ba45306bd08c3636647f865a75abd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L742-L743 > > That the `Z` path is unreachable for `castShape` seems to be an invariant that only the VectorAPI knows about, and the VM cannot directly know that and determine that it is dead code. Thus, **I propose that we just check for the condition when trying to intrinsify**, and refuse intrinsification if it is violated. > > **Update: ** instead of not intrinsifying, we chose a stronger path: we intrinsify it with a `HaltNode` that should never be encountered at runtime. The reproducer I have eventually even is able to fold away the `HaltNode`, so it turns out it is indeed a dead path. This pull request has now been integrated. Changeset: 42439eb6 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/42439eb60c4488711f182d0d6ee5165b4972b99d Stats: 129 lines in 5 files changed: 119 ins; 2 del; 8 mod 8374889: C2 VectorAPI: must handle impossible combination of signed cast from float Reviewed-by: dlong, qamai ------------- PR: https://git.openjdk.org/jdk/pull/29169 From aph at openjdk.org Tue Jan 20 19:28:04 2026 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Jan 2026 19:28:04 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v2] In-Reply-To: <3HfqoF6XNWDXq8P95PQ78B1_QquFMPDTkcuXPbmybNs=.cc8fd652-9949-4a0d-bf18-76cad5aac332@github.com> References: <3HfqoF6XNWDXq8P95PQ78B1_QquFMPDTkcuXPbmybNs=.cc8fd652-9949-4a0d-bf18-76cad5aac332@github.com> Message-ID: On Tue, 20 Jan 2026 10:01:31 GMT, Eric Fang wrote: >> This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. >> >> Changes: >> -------- >> >> 1. C2 mid-end: >> - Added UMinReductionVNode and UMaxReductionVNode >> >> 2. AArch64 Backend: >> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions >> - Updated match rules for all vector sizes and element types >> - Both NEON and SVE implementation are supported >> >> 3. Test: >> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java >> - Added assembly tests in aarch64-asmtest.py for new instructions >> - Added a JTReg test file VectorUMinMaxReductionTest.java >> >> Different configurations were tested on aarch64 and x86 machines, and all tests passed. >> >> Test results of JMH benchmarks from the panama-vector project: >> -------- >> >> On a Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 >> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 >> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 >> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 >> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 >> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 >> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 >> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 >> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 >> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 >> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 >> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 >> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 >> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 >> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 >> Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 >> Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 >> ... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Rebase commit 56d7b52 > - Merge branch 'master' into JDK-8372980-umin-umax-intrinsic > - 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations > > This patch adds intrinsic support for UMIN and UMAX reduction operations > in the Vector API on AArch64, enabling direct hardware instruction mapping > for better performance. > > Changes: > -------- > > 1. C2 mid-end: > - Added UMinReductionVNode and UMaxReductionVNode > > 2. AArch64 Backend: > - Added uminp/umaxp/sve_uminv/sve_umaxv instructions > - Updated match rules for all vector sizes and element types > - Both NEON and SVE implementation are supported > > 3. Test: > - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java > - Added assembly tests in aarch64-asmtest.py for new instructions > - Added a JTReg test file VectorUMinMaxReductionTest.java > > Different configurations were tested on aarch64 and x86 machines, and > all tests passed. > > Test results of JMH benchmarks from the panama-vector project: > -------- > > On a Nvidia Grace machine with 128-bit SVE: > ``` > Benchmark Unit Before Error After Error Uplift > Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 > Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 > Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 > Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 > Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 > Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 > Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 > Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 > Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 > Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 > Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 > Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 > Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 > Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 > Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 > Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 > Short128Vector.UMAXLanes ops/ms 316.65 4... I'm sorry, I _completely_ overthought that one. All you need are definitions for `min[vp]` and `max[vp]` in C2_Macroassembler. Like so: `void minv(bool is_unsigned, ...) { if (is_unsigned) { uminv(... } else { sminv(... } }` No need to mess with class `Assembler`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3774562567 From vlivanov at openjdk.org Tue Jan 20 19:32:30 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 20 Jan 2026 19:32:30 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v2] In-Reply-To: References: <0maDYNpdQQVJIEWMt1PcO-MV1UzBNoJTphzLo078N4w=.de4645cb-acef-4627-b7a9-8813d016d307@github.com> Message-ID: On Mon, 19 Jan 2026 03:05:24 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 625: >> >>> 623: } >>> 624: >>> 625: const TypeVect* mask_vt = TypeVect::makemask(elem_bt, num_elem); >> >> Doesn't the same reasoning apply to vector intrinsics? If `mask_vec` and `opd` aren't TOP, they should produce vector values. So, additional input validation should rule out the problematic scenario. > > Vector intrinsics looks safer to me now. The APIs are inlined in an even earlier optimization stage, and the nodes are almost new created ones. Regarding to `unbox_vector()`, it either returns a new created `VectorUnboxNode` or a GVN transformed node of `VectorUnbox`. Currently there is not `TOP` input check for `VectorUnboxNode` itself during GVN. > It might be an issue that we need to revisit once we add such checks for vector nodes. > > I agree with that additional input validation should be better. We can abort the API inlining as early as possible. Code may look like: > > Node* mask_vec = unbox_vector(mask, mask_box_type, elem_bt, num_elem); > if (mask_vec == nullptr || gvn().type(mask_vec) == Type::TOP) { > log_if_needed(" ** unbox failed mask=%s", > NodeClassNames[argument(4)->Opcode()]); > return false; > } > > That looks a common issue for all APIs that we'd better fix for all code after `unbox_vector` ? I'm unsure whether I have to do this regarding to the issue this PR reported. Or maybe we could revisit the whole file in future. Any suggestions? I'd suggest to shape it as follows: Node* GraphKit::unbox_vector(Node* v, const TypeInstPtr* vbox_type, BasicType elem_bt, int num_elem) { ... Node* unbox = gvn().transform(new VectorUnboxNode(C, vt, v, merged_memory())); if (!gvn().type(unbox)->isa_vect()) { assert(gvn().type(unbox) == Type::TOP, "sanity"); return nullptr; // not a vector } return unbox; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29057#discussion_r2709783849 From vlivanov at openjdk.org Tue Jan 20 19:43:23 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 20 Jan 2026 19:43:23 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: References: Message-ID: On Sat, 17 Jan 2026 13:31:43 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> Description: >> >> This change fixes a crash in Deoptimization::uncommon_trap_inner when running with -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp. >> >> With -XX:-ProfileTraps, create_if_missing is set to false. >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2121-L2122 >> >> When create_if_missing is false, new mdo can not be created when m()->method_data() return null, so get_method_data() may return null . >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L1911-L1912 >> >> and trap_mdo can be null as a result >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2134-L2136 >> >> The crash happens here because trap_mdo is null >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2157 >> >> Fix: >> >> The fix makes acquisition of extra_data_lock conditional on trap_mdo being non-null, preserving the required lock ordering (extra_data_lock before ttyLocker) when an MDO exists. >> >> Test: >> >> GHA > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - revert > - Merge remote-tracking branch 'upstream/master' into 8374807 > - narrow lock scope > - Merge remote-tracking branch 'upstream/master' into 8374807 > - split long line > - Merge remote-tracking branch 'upstream/master' into 8374807 > - fix 8374807 Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29147#pullrequestreview-3683976061 From vlivanov at openjdk.org Tue Jan 20 19:50:35 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 20 Jan 2026 19:50:35 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v3] In-Reply-To: References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: <0_wYDA2lNvTyIDv7ist5heu-hs4J8pmEKT1mqRyiBBk=.438156e1-24fd-4352-8a61-9cf85efacb25@github.com> On Fri, 16 Jan 2026 12:51:21 GMT, Kerem Kat wrote: >> The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into fix-c2-checkCastPP > - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed > - Simplify expand_vbox_node_helper by merging VectorBox Phi handling > - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded Test results (hs-tier1 - hs-tier4) are clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29200#issuecomment-3774647567 From vlivanov at openjdk.org Tue Jan 20 19:52:21 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 20 Jan 2026 19:52:21 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v11] In-Reply-To: <-avpi_0V0b9jLf29Sah1VB8ZiOWu9MSFl8X3vO22NFQ=.0c006266-c5eb-4077-b238-311e6190486f@github.com> References: <-avpi_0V0b9jLf29Sah1VB8ZiOWu9MSFl8X3vO22NFQ=.0c006266-c5eb-4077-b238-311e6190486f@github.com> Message-ID: On Tue, 20 Jan 2026 07:22:00 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use firstTrue for XiaohongGong Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28639#pullrequestreview-3684004831 From vlivanov at openjdk.org Tue Jan 20 20:05:06 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 20 Jan 2026 20:05:06 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> <2Fow0aJ6zpA2aTnACLAxW_d_O_OucOXKz08y-Lx7IBo=.73eeafd0-cf1d-4ce2-a438-89b51739f593@github.com> Message-ID: On Fri, 16 Jan 2026 23:57:25 GMT, Quan Anh Mai wrote: > Moving this assert into check_escape_status will make it harder to reuse a LocalEA across multiple calls of find_previous_store. This is useful, for example, when the load is from a memory Phi, and we try to follow the Phi inputs to find the stored value along different paths of the merge. Do I get it right that it's something for a future enhancement? Because I don't see where multiple `find_previous_store` can share a `LocalEA` now. In the future , we can look into enhancing `LocalEA` with incremental analysis capabilities, but for now I'm more concerned with the noise introduced by `has_not_escaped` caching in `find_previous_store`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2709881456 From dzhang at openjdk.org Tue Jan 20 23:52:43 2026 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 20 Jan 2026 23:52:43 GMT Subject: Integrated: 8375657: RISC-V: Need to check size in SharedRuntime::is_wide_vector In-Reply-To: References: Message-ID: <-yTYGasEy8uSbEr4kzcfnF7bfz8o9xRHS0_w-Y1aJ_E=.e39b516b-c172-4852-af98-dd13c5d2a118@github.com> On Tue, 20 Jan 2026 01:05:43 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, we only check `UseRVV` flag in `SharedRuntime::is_wide_vector` on RISC-V platforms. This is not optimal when no vector instructions is used by the nmethod. In this case, the the input size parameter is zero. We should consider this case so that we choose the right sub in `SharedRuntime::get_poll_stub` when handling safepoint. This pull request has now been integrated. Changeset: ca3e6236 Author: Dingli Zhang URL: https://git.openjdk.org/jdk/commit/ca3e6236a28794156cc2acf697755229c47735a8 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8375657: RISC-V: Need to check size in SharedRuntime::is_wide_vector Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/29307 From dzhang at openjdk.org Tue Jan 20 23:52:43 2026 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 20 Jan 2026 23:52:43 GMT Subject: RFR: 8375657: RISC-V: Need to check size in SharedRuntime::is_wide_vector In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 01:05:43 GMT, Dingli Zhang wrote: > Hi, > Can you help to review this patch? Thanks! > > Currently, we only check `UseRVV` flag in `SharedRuntime::is_wide_vector` on RISC-V platforms. This is not optimal when no vector instructions is used by the nmethod. In this case, the the input size parameter is zero. We should consider this case so that we choose the right sub in `SharedRuntime::get_poll_stub` when handling safepoint. Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29307#issuecomment-3775471047 From ghan at openjdk.org Wed Jan 21 00:00:50 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Wed, 21 Jan 2026 00:00:50 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 19:40:09 GMT, Vladimir Ivanov wrote: >> Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - revert >> - Merge remote-tracking branch 'upstream/master' into 8374807 >> - narrow lock scope >> - Merge remote-tracking branch 'upstream/master' into 8374807 >> - split long line >> - Merge remote-tracking branch 'upstream/master' into 8374807 >> - fix 8374807 > > Looks good. Hi @iwanowww Thanks for the review! Does it need a second reviewer? If not, I?ll integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29147#issuecomment-3775498647 From jiefu at openjdk.org Wed Jan 21 01:24:28 2026 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 21 Jan 2026 01:24:28 GMT Subject: RFR: 8375787: compiler/vectorapi/TestCastShapeBadOpc.java fails with release VMs Message-ID: Add `-XX:+UnlockDiagnosticVMOptions` for release VMs. ------------- Commit messages: - 8375787: compiler/vectorapi/TestCastShapeBadOpc.java fails with release VMs Changes: https://git.openjdk.org/jdk/pull/29336/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29336&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375787 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29336.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29336/head:pull/29336 PR: https://git.openjdk.org/jdk/pull/29336 From xgong at openjdk.org Wed Jan 21 01:42:01 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 21 Jan 2026 01:42:01 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v2] In-Reply-To: References: <0maDYNpdQQVJIEWMt1PcO-MV1UzBNoJTphzLo078N4w=.de4645cb-acef-4627-b7a9-8813d016d307@github.com> Message-ID: On Tue, 20 Jan 2026 19:28:56 GMT, Vladimir Ivanov wrote: >> Vector intrinsics looks safer to me now. The APIs are inlined in an even earlier optimization stage, and the nodes are almost new created ones. Regarding to `unbox_vector()`, it either returns a new created `VectorUnboxNode` or a GVN transformed node of `VectorUnbox`. Currently there is not `TOP` input check for `VectorUnboxNode` itself during GVN. >> It might be an issue that we need to revisit once we add such checks for vector nodes. >> >> I agree with that additional input validation should be better. We can abort the API inlining as early as possible. Code may look like: >> >> Node* mask_vec = unbox_vector(mask, mask_box_type, elem_bt, num_elem); >> if (mask_vec == nullptr || gvn().type(mask_vec) == Type::TOP) { >> log_if_needed(" ** unbox failed mask=%s", >> NodeClassNames[argument(4)->Opcode()]); >> return false; >> } >> >> That looks a common issue for all APIs that we'd better fix for all code after `unbox_vector` ? I'm unsure whether I have to do this regarding to the issue this PR reported. Or maybe we could revisit the whole file in future. Any suggestions? > > I'd suggest to shape it as follows: > > Node* GraphKit::unbox_vector(Node* v, const TypeInstPtr* vbox_type, BasicType elem_bt, int num_elem) { > ... > Node* unbox = gvn().transform(new VectorUnboxNode(C, vt, v, merged_memory())); > if (!gvn().type(unbox)->isa_vect()) { > assert(gvn().type(unbox) == Type::TOP, "sanity"); > return nullptr; // not a vector > } > return unbox; > } Sounds reasonable. I will fix it with next commit. Thanks for your suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29057#discussion_r2710648393 From syan at openjdk.org Wed Jan 21 02:52:02 2026 From: syan at openjdk.org (SendaoYan) Date: Wed, 21 Jan 2026 02:52:02 GMT Subject: RFR: 8375787: compiler/vectorapi/TestCastShapeBadOpc.java fails with release VMs In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 01:14:54 GMT, Jie Fu wrote: > Add `-XX:+UnlockDiagnosticVMOptions` for release VMs. LGTM ------------- Marked as reviewed by syan (Committer). PR Review: https://git.openjdk.org/jdk/pull/29336#pullrequestreview-3685130164 From qamai at openjdk.org Wed Jan 21 02:57:17 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 21 Jan 2026 02:57:17 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v19] In-Reply-To: References: Message-ID: <-4y1DobU4R4eTjnjpv56qGRCd-8wWSiE4LO1mVnnmZ4=.b8985412-7ec8-45f2-9e61-53ff0bf0c532@github.com> > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > ## The current PR: > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > ## Future work: > > 1. Fold a memory `Phi`. > > This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. > > 2. Fold a pointer `Phi`. > > Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: > > Point p1 = new Point; > Point p2 = new Point; > p1.x = v1; > p2.x = v2; > Point p = Phi(p1, p2); > int a = p.x; > > Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. > > Another interesting case: > > Point p = Phi(p1, p2); > p.x = v; > p1.x = v1; > int a = p.x; > > Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. > > 3. Nested objects > > It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: > > Point p = new Point; > PointHolder h = new PointHolder; > h.p = p; > int x = p.x; > escape(h); > > Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: - Remove the TriBool - Merge branch 'master' into loadfoldingigvn - Fix dead accesses, address reviews - Merge branch 'master' into loadfoldingigvn - Early return when not a heap access - Fix escape at store - Fix outdated and unclear comments - copyright year, return, comments, whitespace - Merge branch 'master' into loadfoldingigvn - ea of phis and nested objects - ... and 12 more: https://git.openjdk.org/jdk/compare/8ca5795b...ac82c2ea ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/97297f8d..ac82c2ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=17-18 Stats: 44318 lines in 787 files changed: 27459 ins; 10068 del; 6791 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Wed Jan 21 02:57:17 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 21 Jan 2026 02:57:17 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> <2Fow0aJ6zpA2aTnACLAxW_d_O_OucOXKz08y-Lx7IBo=.73eeafd0-cf1d-4ce2-a438-89b51739f593@github.com> Message-ID: On Tue, 20 Jan 2026 20:01:34 GMT, Vladimir Ivanov wrote: >> Do you mean adding an early return in `check_escape_control` when the queried control is a transitive input of the cached one like this: >> >> if (_not_escaped_controls.member(ctl)) { >> return NOT_ESCAPED; >> } >> >> I think it is correct to do so, but an assert that `_not_escaped_controls` does contain the `ctl` is a little bit stronger in terms of strictness. Moving this assert into `check_escape_status` will make it harder to reuse a `LocalEA` across multiple calls of `find_previous_store`. This is useful, for example, when the load is from a memory `Phi`, and we try to follow the `Phi` inputs to find the stored value along different paths of the merge. > >> Moving this assert into check_escape_status will make it harder to reuse a LocalEA across multiple calls of find_previous_store. This is useful, for example, when the load is from a memory Phi, and we try to follow the Phi inputs to find the stored value along different paths of the merge. > > Do I get it right that it's something for a future enhancement? Because I don't see where multiple `find_previous_store` can share a `LocalEA` now. In the future , we can look into enhancing `LocalEA` with incremental analysis capabilities, but for now I'm more concerned with the noise introduced by `has_not_escaped` caching in `find_previous_store`. Yes, it is for future enhancement. I have refactored this part and removed the use of the `TriBool`. I hope it is less noisy now, what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2710768460 From lmesnik at openjdk.org Wed Jan 21 05:14:52 2026 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 21 Jan 2026 05:14:52 GMT Subject: RFR: 8375787: compiler/vectorapi/TestCastShapeBadOpc.java fails with release VMs In-Reply-To: References: Message-ID: <1MMlEIWUypZJMxvf11E4Lwbj59jyIhWqnbDoj8kv2GI=.ee434a9d-ce02-4ac6-bd16-b0173a19655d@github.com> On Wed, 21 Jan 2026 01:14:54 GMT, Jie Fu wrote: > Add `-XX:+UnlockDiagnosticVMOptions` for release VMs. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29336#pullrequestreview-3685373678 From jiefu at openjdk.org Wed Jan 21 06:03:45 2026 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 21 Jan 2026 06:03:45 GMT Subject: RFR: 8375787: compiler/vectorapi/TestCastShapeBadOpc.java fails with release VMs In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 02:48:55 GMT, SendaoYan wrote: >> Add `-XX:+UnlockDiagnosticVMOptions` for release VMs. > > LGTM Thanks @sendaoYan and @lmesnik for the review. Will push it later since it's just a test bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29336#issuecomment-3776344026 From fyang at openjdk.org Wed Jan 21 06:17:41 2026 From: fyang at openjdk.org (Fei Yang) Date: Wed, 21 Jan 2026 06:17:41 GMT Subject: RFR: 8375787: compiler/vectorapi/TestCastShapeBadOpc.java fails with release VMs In-Reply-To: References: Message-ID: <1z8sGN1_mxEixlAxZGlG7ByYFtlKnWF7-pXGlcyYbLk=.354adcda-888f-4b5a-9b76-905dab9eceaf@github.com> On Wed, 21 Jan 2026 01:14:54 GMT, Jie Fu wrote: > Add `-XX:+UnlockDiagnosticVMOptions` for release VMs. Looks OK. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29336#pullrequestreview-3685508315 From epeter at openjdk.org Wed Jan 21 06:27:23 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 06:27:23 GMT Subject: RFR: 8375787: compiler/vectorapi/TestCastShapeBadOpc.java fails with release VMs In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 01:14:54 GMT, Jie Fu wrote: > Add `-XX:+UnlockDiagnosticVMOptions` for release VMs. @DamonFool Thanks for the fix. I would say this is trivial and can be integrated before the 24h usual waiting period. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29336#pullrequestreview-3685535458 From jiefu at openjdk.org Wed Jan 21 06:38:14 2026 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 21 Jan 2026 06:38:14 GMT Subject: Integrated: 8375787: compiler/vectorapi/TestCastShapeBadOpc.java fails with release VMs In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 01:14:54 GMT, Jie Fu wrote: > Add `-XX:+UnlockDiagnosticVMOptions` for release VMs. This pull request has now been integrated. Changeset: 560a92a6 Author: Jie Fu URL: https://git.openjdk.org/jdk/commit/560a92a6327221c90596bcd17a87722e4910472a Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8375787: compiler/vectorapi/TestCastShapeBadOpc.java fails with release VMs Reviewed-by: syan, lmesnik, fyang, epeter ------------- PR: https://git.openjdk.org/jdk/pull/29336 From jbhateja at openjdk.org Wed Jan 21 07:03:22 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 Jan 2026 07:03:22 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors In-Reply-To: References: Message-ID: <9VUTjaTyDMs_VgsvLC5jWD3DoxT_VqHAOGGCSW6fadk=.c0cc258a-b0fe-4656-b412-f00161760fca@github.com> On Fri, 16 Jan 2026 08:27:54 GMT, Jatin Bhateja wrote: > Patch to add support for dumping primary vector IR created by various VectorAPI inline expanders. > Currently auto-vectorization prints the newly create vector IR with -XX:+TraceNewVectors, similar message is > now emitted for VectorAPI. > > > TraceNewVectors [AutoVectorization]: 1397 AddVI === _ 1395 1396 [[ ]] #vectorx > > TraceNewVectors [VectorAPI]: 1591 AddVI === _ 1545 1569 [[ ]] #vectory !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int256Vector::lanewise @ bci:3 (line 278) Int256Vector::lanewise @ bci:3 (line 43) IntVector::add @ bci:5 (line 1380) AddTestI::workload @ bci:34 (line 18) > > > Thanks, > Jatin Hi @eme64, @XiaohongGong, Let me know if you have any comments here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29265#issuecomment-3776493943 From jbhateja at openjdk.org Wed Jan 21 07:04:04 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 Jan 2026 07:04:04 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v13] In-Reply-To: <1ElN5XvEXAYGINpCIB2smhDrzekGyiXmG6o8-jnxDxk=.83a69a64-2894-40af-a2ee-9c35448c88b2@github.com> References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> <1ElN5XvEXAYGINpCIB2smhDrzekGyiXmG6o8-jnxDxk=.83a69a64-2894-40af-a2ee-9c35448c88b2@github.com> Message-ID: On Fri, 19 Dec 2025 07:01:31 GMT, Emanuel Peter wrote: >> Hi @PaulSandoz , your comments have been addressed. Please let us know if there are other comments. >> Hi @eme64 , Kindly share your comments. > > @jatin-bhateja Thanks for the ping! I'll put this on the list for review early in 2026 :) Hi @eme64 , Your comments have been addressed ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3776496085 From epeter at openjdk.org Wed Jan 21 08:05:30 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 08:05:30 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 08:27:54 GMT, Jatin Bhateja wrote: > Patch to add support for dumping primary vector IR created by various VectorAPI inline expanders. > Currently auto-vectorization prints the newly create vector IR with -XX:+TraceNewVectors, similar message is > now emitted for VectorAPI. > > > TraceNewVectors [AutoVectorization]: 1397 AddVI === _ 1395 1396 [[ ]] #vectorx > > TraceNewVectors [VectorAPI]: 1591 AddVI === _ 1545 1569 [[ ]] #vectory !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int256Vector::lanewise @ bci:3 (line 278) Int256Vector::lanewise @ bci:3 (line 43) IntVector::add @ bci:5 (line 1380) AddTestI::workload @ bci:34 (line 18) > > > Thanks, > Jatin @jatin-bhateja Nice, thanks for working on this! I've been missing this feature for a while :) Question: how did you verify that you cover all cases? I think some are missing, I found some just by scrolling down in the file a bit: https://github.com/openjdk/jdk/pull/29265/files#diff-33d0866101d899687e04303fb2232574f2cb796ce060528a243ebdc9903b01b1R3012-R3018 Could we have some way to verify that all vector nodes are traced in some way? It is just so easy to forget some. We can also file a separate RFE for that. I'll still approve it because it already is a step in the right direction :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29265#pullrequestreview-3685824145 From xgong at openjdk.org Wed Jan 21 08:45:13 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 21 Jan 2026 08:45:13 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 08:27:54 GMT, Jatin Bhateja wrote: > Patch to add support for dumping primary vector IR created by various VectorAPI inline expanders. > Currently auto-vectorization prints the newly create vector IR with -XX:+TraceNewVectors, similar message is > now emitted for VectorAPI. > > > TraceNewVectors [AutoVectorization]: 1397 AddVI === _ 1395 1396 [[ ]] #vectorx > > TraceNewVectors [VectorAPI]: 1591 AddVI === _ 1545 1569 [[ ]] #vectory !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int256Vector::lanewise @ bci:3 (line 278) Int256Vector::lanewise @ bci:3 (line 43) IntVector::add @ bci:5 (line 1380) AddTestI::workload @ bci:34 (line 18) > > > Thanks, > Jatin src/hotspot/share/opto/vectorIntrinsics.cpp line 465: > 463: default: fatal("unsupported arity: %d", n); > 464: } > 465: trace_vector(operation); So why not adding this under line-475? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29265#discussion_r2711504241 From xgong at openjdk.org Wed Jan 21 08:56:19 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 21 Jan 2026 08:56:19 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v3] In-Reply-To: References: Message-ID: > ### Problem: > > Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: > > > // A fatal error has been detected by the Java Runtime Environment: > // > // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 > // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector > // ... > > > The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: > > https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 > > ### Root Cause: > > The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. > > Here is the simplified ideal graph showing the crash scenario: > > > Con #top > | ConI > \ / > \ / > VectorStoreMask > | > VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong > > > ### Detailed Scenario: > > Following is the method in the test case that hits the assertion: > > https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 > > This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. > > When compiling a specific test case such as: > https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 > > the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: > > > VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() > / \ > AddP \ > | \ > LoadNClass \ > ConP #IntMaxMask | | > \ | | > \ DecodeNClass | > \ / | > \ / | > CmpP ... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Ensure it is vector type for vector unbox result ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29057/files - new: https://git.openjdk.org/jdk/pull/29057/files/cf73b3ad..328140eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29057&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29057&range=01-02 Stats: 8 lines in 1 file changed: 4 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29057/head:pull/29057 PR: https://git.openjdk.org/jdk/pull/29057 From xgong at openjdk.org Wed Jan 21 08:56:20 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 21 Jan 2026 08:56:20 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v2] In-Reply-To: References: <0maDYNpdQQVJIEWMt1PcO-MV1UzBNoJTphzLo078N4w=.de4645cb-acef-4627-b7a9-8813d016d307@github.com> Message-ID: On Wed, 21 Jan 2026 01:38:50 GMT, Xiaohong Gong wrote: >> I'd suggest to shape it as follows: >> >> Node* GraphKit::unbox_vector(Node* v, const TypeInstPtr* vbox_type, BasicType elem_bt, int num_elem) { >> ... >> Node* unbox = gvn().transform(new VectorUnboxNode(C, vt, v, merged_memory())); >> if (!gvn().type(unbox)->isa_vect()) { >> assert(gvn().type(unbox) == Type::TOP, "sanity"); >> return nullptr; // not a vector >> } >> return unbox; >> } > > Sounds reasonable. I will fix it with next commit. Thanks for your suggestion! Hi, I just updated a commit to check the result type of vector unbox in latest commit. Please help take another look. Thanks a lot! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29057#discussion_r2711553611 From dfenacci at openjdk.org Wed Jan 21 09:05:12 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 21 Jan 2026 09:05:12 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked [v2] In-Reply-To: References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: On Mon, 19 Jan 2026 13:59:25 GMT, Roland Westrelin wrote: >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` caches >> intermediate results in `_dom_lca_tags` when the late control is >> computed by `PhaseIdealLoop::get_late_ctrl()` for a node `n`: the code >> iterates over all uses of `n` potentially calling >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` multiple >> times. `_dom_lca_tags` is used to cache data that is specific to the >> lca computation for `n`. `_dom_lca_tags` is set to a tag that depends >> on `n` to mark the cached data as only valid during the lca >> computation for `n`. >> >> `PhaseIdealLoop::try_sink_out_of_loop()` checks that all uses of a >> node are out of loop with >> `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` which also needs to >> consider anti-dependences for `Load`s and also calls >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` through >> `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`: this computes the >> late control for a node and one particular out of loop >> use. `_dom_lca_tags` values computed by an earlier >> `PhaseIdealLoop::get_late_ctrl()` should be ignored (because it >> computes the late control for a node and all its uses). To address >> that issue, the tag that's used by >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` is made >> different on each call from >> `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` by incrementing >> `_dom_lca_tags_round`. >> >> The issue here is that one `Load` node is input to a `Phi` twice. So >> the `Phi` is considered twice as a use of the node along 2 different >> paths. `PhaseIdealLoop::get_late_ctrl_with_anti_dep()` is called twice >> from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` but >> `_dom_lca_tags_round` is not incremented between the 2 >> calls. `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` when >> called for the second `Phi` input uses incorrect cached data which, in >> turn, causes an incorrect computation. >> >> The fix I propose is to make sure `_dom_lca_tags_round` is incremented >> for every call to `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29231#pullrequestreview-3686064975 From roland at openjdk.org Wed Jan 21 09:14:24 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 Jan 2026 09:14:24 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked [v2] In-Reply-To: References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: On Wed, 21 Jan 2026 09:01:14 GMT, Damon Fenacci wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Marked as reviewed by dfenacci (Committer). @dafedafe @chhagedorn @merykitty thanks for the reviews and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/29231#issuecomment-3776981810 From epeter at openjdk.org Wed Jan 21 09:17:41 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 09:17:41 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: References: Message-ID: On Sat, 17 Jan 2026 13:31:43 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> Description: >> >> This change fixes a crash in Deoptimization::uncommon_trap_inner when running with -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp. >> >> With -XX:-ProfileTraps, create_if_missing is set to false. >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2121-L2122 >> >> When create_if_missing is false, new mdo can not be created when m()->method_data() return null, so get_method_data() may return null . >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L1911-L1912 >> >> and trap_mdo can be null as a result >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2134-L2136 >> >> The crash happens here because trap_mdo is null >> https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2157 >> >> Fix: >> >> The fix makes acquisition of extra_data_lock conditional on trap_mdo being non-null, preserving the required lock ordering (extra_data_lock before ttyLocker) when an MDO exists. >> >> Test: >> >> GHA > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - revert > - Merge remote-tracking branch 'upstream/master' into 8374807 > - narrow lock scope > - Merge remote-tracking branch 'upstream/master' into 8374807 > - split long line > - Merge remote-tracking branch 'upstream/master' into 8374807 > - fix 8374807 @hgqxjj Thanks for working on this! We generally require 2 reviews for hotspot changes, unless both the author and single reviewer say it is "trivial". This here is not trivial, I think ;) I have some comments in the test, otherwise the fix looks reasonable. test/hotspot/jtreg/compiler/uncommontrap/TestPrintDiagnosticsWithoutProfileTraps.java line 32: > 30: * @run main/othervm -XX:+TraceDeoptimization -XX:-ProfileTraps > 31: * -XX:-TieredCompilation -Xcomp > 32: * compiler.uncommontrap.TestPrintDiagnosticsWithoutProfileTraps `TraceDeoptimization` is a diagnostic flag. This test will cause issues without `-XX:+UnlockDiagnosticVMOptions`, right? And `ProfileTraps` is debug. So won't this need `-XX:+IgnoreUnrecognizedVMOptions`? @iwanowww Did you already run testing on this patch or should I run some? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29147#pullrequestreview-3686113323 PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2711636953 From epeter at openjdk.org Wed Jan 21 09:27:09 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 09:27:09 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v13] In-Reply-To: References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> <1ElN5XvEXAYGINpCIB2smhDrzekGyiXmG6o8-jnxDxk=.83a69a64-2894-40af-a2ee-9c35448c88b2@github.com> Message-ID: On Wed, 21 Jan 2026 07:01:39 GMT, Jatin Bhateja wrote: >> @jatin-bhateja Thanks for the ping! I'll put this on the list for review early in 2026 :) > > Hi @eme64 , Your comments have been addressed @jatin-bhateja This patch is really really large. There are lots of renamings that could be done in a separate patch first (as a subtask). It would make reviewing easier, allowing focus on the substantial work. See discussion here: https://github.com/openjdk/jdk/pull/28002#discussion_r2705376899 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3777034545 From epeter at openjdk.org Wed Jan 21 09:27:12 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 09:27:12 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v12] In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 11:51:20 GMT, Jatin Bhateja wrote: >> test/jdk/jdk/incubator/vector/IntVectorMaxTests.java line 68: >> >>> 66: static IntVector bcast_vec = IntVector.broadcast(SPECIES, (int)10); >>> 67: >>> 68: static void AssertEquals(int actual, int expected) { >> >> There are lots of changes in this file that do not seem to have anything to do with Float16. Please file them separately. It will make review much easier. > > I have added an assertion wrapper so that float16 values (short) can be converted to float before calling actual Assert.* routines to handle all possible NaN bit patterns. Since the tests are generate from common template hence these changes appear. Can we not do those changes in a separate change, please? It will make it easier to review the rest of the PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2711675095 From xgong at openjdk.org Wed Jan 21 09:30:26 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 21 Jan 2026 09:30:26 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v11] In-Reply-To: <-avpi_0V0b9jLf29Sah1VB8ZiOWu9MSFl8X3vO22NFQ=.0c006266-c5eb-4077-b238-311e6190486f@github.com> References: <-avpi_0V0b9jLf29Sah1VB8ZiOWu9MSFl8X3vO22NFQ=.0c006266-c5eb-4077-b238-311e6190486f@github.com> Message-ID: On Tue, 20 Jan 2026 07:22:00 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use firstTrue for XiaohongGong Many thanks to your update! Almost looks good to me. Some IR checks missed SVE/NEON support, but I can add them with a followed-up PR. test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 429: > 427: IRNode.ADD_VI, "> 0", > 428: IRNode.STORE_VECTOR, "> 0"}, > 429: applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}, Is this better? Suggestion: applyIfCPUFeatureOr = {"avx512", "true", "sve", "true"}, test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java line 486: > 484: int sum = 0; > 485: int i = 0; > 486: for (; i < a.length - 3; i+=4) { Suggestion: for (; i < a.length - 3; i += 4) { test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java line 695: > 693: for (; i < SPECIES_I.loopBound(oops.length); i += SPECIES_I.length()) { > 694: var oopv = IntVector.fromArray(SPECIES_I, oops, i); > 695: var mask = oopv.compare(VectorOperators.NE, nulls); Is this better? `nulls` can be removed although it may have no difference for compilation result? Suggestion: var mask = oopv.compare(VectorOperators.NE, 0); ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3777052243 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2711611052 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2711647328 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2711671984 From erfang at openjdk.org Wed Jan 21 09:42:13 2026 From: erfang at openjdk.org (Eric Fang) Date: Wed, 21 Jan 2026 09:42:13 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v2] In-Reply-To: References: <3HfqoF6XNWDXq8P95PQ78B1_QquFMPDTkcuXPbmybNs=.cc8fd652-9949-4a0d-bf18-76cad5aac332@github.com> Message-ID: On Tue, 20 Jan 2026 19:23:38 GMT, Andrew Haley wrote: > I'm sorry, I _completely_ overthought that one. All you need are definitions for `min[vp]` and `max[vp]` in C2_Macroassembler. > > Like so: > > `void minv(bool is_unsigned, ...) { if (is_unsigned) { uminv(... } else { sminv(... } }` > > No need to mess with class `Assembler`. Make sense, I'll do the modification in next commit soon, thanks for your review! @theRealAph ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3777104550 From epeter at openjdk.org Wed Jan 21 09:50:16 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 09:50:16 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v11] In-Reply-To: References: <-avpi_0V0b9jLf29Sah1VB8ZiOWu9MSFl8X3vO22NFQ=.0c006266-c5eb-4077-b238-311e6190486f@github.com> Message-ID: On Wed, 21 Jan 2026 09:05:10 GMT, Xiaohong Gong wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use firstTrue for XiaohongGong > > test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 429: > >> 427: IRNode.ADD_VI, "> 0", >> 428: IRNode.STORE_VECTOR, "> 0"}, >> 429: applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}, > > Is this better? > > Suggestion: > > applyIfCPUFeatureOr = {"avx512", "true", "sve", "true"}, Are you getting failures with `sse4.1` or `asimd`? Or what is the reason for weakening the conditions here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2711772360 From epeter at openjdk.org Wed Jan 21 10:01:08 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 10:01:08 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v12] In-Reply-To: References: Message-ID: > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server> 464: } >> 465: trace_vector(operation); > > So why not adding this under line-475? Only added trace_vector calls for primary vector IR for an inline expander for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29265#discussion_r2711832664 From epeter at openjdk.org Wed Jan 21 10:09:55 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 10:09:55 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors [v2] In-Reply-To: <9gG4XLzdFYYAiZocLBepqB4uQrZugCVb6j_pOBloKjI=.323d5617-d06e-49f9-9559-03f1356904b2@github.com> References: <9gG4XLzdFYYAiZocLBepqB4uQrZugCVb6j_pOBloKjI=.323d5617-d06e-49f9-9559-03f1356904b2@github.com> Message-ID: On Wed, 21 Jan 2026 10:00:12 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 465: >> >>> 463: default: fatal("unsupported arity: %d", n); >>> 464: } >>> 465: trace_vector(operation); >> >> So why not adding this under line-475? > > Only added trace_vector calls for primary vector IR for an inline expander for now. But the `VectorBlendNode` node would also be useful if we have a masked operation but don't use a predicated instruction, don't you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29265#discussion_r2711854763 From erfang at openjdk.org Wed Jan 21 10:15:24 2026 From: erfang at openjdk.org (Eric Fang) Date: Wed, 21 Jan 2026 10:15:24 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: > This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. > > Changes: > -------- > > 1. C2 mid-end: > - Added UMinReductionVNode and UMaxReductionVNode > > 2. AArch64 Backend: > - Added uminp/umaxp/sve_uminv/sve_umaxv instructions > - Updated match rules for all vector sizes and element types > - Both NEON and SVE implementation are supported > > 3. Test: > - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java > - Added assembly tests in aarch64-asmtest.py for new instructions > - Added a JTReg test file VectorUMinMaxReductionTest.java > > Different configurations were tested on aarch64 and x86 machines, and all tests passed. > > Test results of JMH benchmarks from the panama-vector project: > -------- > > On a Nvidia Grace machine with 128-bit SVE: > > Benchmark Unit Before Error After Error Uplift > Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 > Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 > Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 > Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 > Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 > Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 > Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 > Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 > Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 > Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 > Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 > Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 > Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 > Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 > Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 > Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 > Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 > Short128Vector.UMAXMaskedLanes ops/ms 308.90 351.78 15155.26 31.03 49.06 > Sh... Eric Fang has updated the pull request incrementally with one additional commit since the last revision: Extract some helper functions for better readability ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28693/files - new: https://git.openjdk.org/jdk/pull/28693/files/481c3ee6..fc3dee3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=01-02 Stats: 120 lines in 2 files changed: 95 ins; 10 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/28693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28693/head:pull/28693 PR: https://git.openjdk.org/jdk/pull/28693 From erfang at openjdk.org Wed Jan 21 10:15:27 2026 From: erfang at openjdk.org (Eric Fang) Date: Wed, 21 Jan 2026 10:15:27 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v2] In-Reply-To: References: <3HfqoF6XNWDXq8P95PQ78B1_QquFMPDTkcuXPbmybNs=.cc8fd652-9949-4a0d-bf18-76cad5aac332@github.com> Message-ID: On Tue, 20 Jan 2026 19:23:38 GMT, Andrew Haley wrote: >> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Rebase commit 56d7b52 >> - Merge branch 'master' into JDK-8372980-umin-umax-intrinsic >> - 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations >> >> This patch adds intrinsic support for UMIN and UMAX reduction operations >> in the Vector API on AArch64, enabling direct hardware instruction mapping >> for better performance. >> >> Changes: >> -------- >> >> 1. C2 mid-end: >> - Added UMinReductionVNode and UMaxReductionVNode >> >> 2. AArch64 Backend: >> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions >> - Updated match rules for all vector sizes and element types >> - Both NEON and SVE implementation are supported >> >> 3. Test: >> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java >> - Added assembly tests in aarch64-asmtest.py for new instructions >> - Added a JTReg test file VectorUMinMaxReductionTest.java >> >> Different configurations were tested on aarch64 and x86 machines, and >> all tests passed. >> >> Test results of JMH benchmarks from the panama-vector project: >> -------- >> >> On a Nvidia Grace machine with 128-bit SVE: >> ``` >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 >> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 >> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 >> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 >> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 >> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 >> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 >> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 >> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 >> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 >> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 >> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 >> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 >> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 >> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 >> Long64V... > > I'm sorry, I _completely_ overthought that one. All you need are definitions for `min[vp]` and `max[vp]` in C2_Macroassembler. > > Like so: > > `void minv(bool is_unsigned, ...) { if (is_unsigned) { uminv(... } else { sminv(... } }` > > No need to mess with class `Assembler`. @theRealAph I have made the change, please help take another look, thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3777257496 From jbhateja at openjdk.org Wed Jan 21 10:33:09 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 Jan 2026 10:33:09 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors [v3] In-Reply-To: References: Message-ID: > Patch to add support for dumping primary vector IR created by various VectorAPI inline expanders. > Currently auto-vectorization prints the newly create vector IR with -XX:+TraceNewVectors, similar message is > now emitted for VectorAPI. > > > TraceNewVectors [AutoVectorization]: 1397 AddVI === _ 1395 1396 [[ ]] #vectorx > > TraceNewVectors [VectorAPI]: 1591 AddVI === _ 1545 1569 [[ ]] #vectory !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int256Vector::lanewise @ bci:3 (line 278) Int256Vector::lanewise @ bci:3 (line 43) IntVector::add @ bci:5 (line 1380) AddTestI::workload @ bci:34 (line 18) > > > Thanks, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding more tracing calls ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29265/files - new: https://git.openjdk.org/jdk/pull/29265/files/5927b475..397269b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29265&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29265&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29265.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29265/head:pull/29265 PR: https://git.openjdk.org/jdk/pull/29265 From jbhateja at openjdk.org Wed Jan 21 10:33:11 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 Jan 2026 10:33:11 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors [v3] In-Reply-To: References: <9gG4XLzdFYYAiZocLBepqB4uQrZugCVb6j_pOBloKjI=.323d5617-d06e-49f9-9559-03f1356904b2@github.com> Message-ID: On Wed, 21 Jan 2026 10:05:57 GMT, Emanuel Peter wrote: >> Only added trace_vector calls for primary vector IR for an inline expander for now. > > But the `VectorBlendNode` node would also be useful if we have a masked operation but don't use a predicated instruction, don't you think? Without any contention here, included!, again we can always increase the coverage. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29265#discussion_r2711936759 From jbhateja at openjdk.org Wed Jan 21 10:43:09 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 Jan 2026 10:43:09 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 08:02:15 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding more tracing calls > > @jatin-bhateja Nice, thanks for working on this! I've been missing this feature for a while :) > > Question: how did you verify that you cover all cases? I think some are missing, I found some just by scrolling down in the file a bit: > https://github.com/openjdk/jdk/pull/29265/files#diff-33d0866101d899687e04303fb2232574f2cb796ce060528a243ebdc9903b01b1R3012-R3018 > > Could we have some way to verify that all vector nodes are traced in some way? It is just so easy to forget some. We can also file a separate RFE for that. > > I'll still approve it because it already is a step in the right direction :) @eme64 , please verify and re-approve. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29265#issuecomment-3777391790 From epeter at openjdk.org Wed Jan 21 10:43:15 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 10:43:15 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: <5o-TOPc_uS09RV2X544hVOK-06mYnbCX6M9ng80UMRM=.67450359-509e-47a3-85ba-b973551b160d@github.com> On Tue, 13 Jan 2026 11:27:53 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Fix build failure after rebasing and address review comments > - Merge branch 'master' into optimize-atomic-post > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 Some comments about the benchmark. BTW: my regular testing has passed. Now I'll look at some basic benchmarks. test/micro/org/openjdk/bench/vm/compiler/VectorThroughputForIterationCount.java line 136: > 134: // effects of this patch unobservable. > 135: @Param({"true", "false"}) > 136: public static boolean ENABLE_LARGE_LOOP_WARMUP; It would be nice to have some more comments here: - for which benchmarks would the effect of "this patch" not be observable? Also: referring to "this patch" will require a future reader to trace things back in the "git blame" history, that's a bit unfortunate. - Generally, it would now be nice to have a summary of which types of benchmarks show what kind of results, and why do we have all the variants. test/micro/org/openjdk/bench/vm/compiler/VectorThroughputForIterationCount.java line 282: > 280: byteadd(aB, bB, rB, START_IDX, offsets[r]+ITERATION_COUNT); > 281: } > 282: } Why do you name them `drain`? I feel the name is a bit too specific to "this patch". Do you have a better name? Maybe a name that separates them from `bench011B_aligned_memoryBound`? ------------- PR Review: https://git.openjdk.org/jdk/pull/22629#pullrequestreview-3686181773 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2711958455 PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2711965275 From epeter at openjdk.org Wed Jan 21 10:43:16 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 10:43:16 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: <5o-TOPc_uS09RV2X544hVOK-06mYnbCX6M9ng80UMRM=.67450359-509e-47a3-85ba-b973551b160d@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <5o-TOPc_uS09RV2X544hVOK-06mYnbCX6M9ng80UMRM=.67450359-509e-47a3-85ba-b973551b160d@github.com> Message-ID: On Wed, 21 Jan 2026 10:34:03 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Fix build failure after rebasing and address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Fixed new test failures after rebasing and refined parts of the code to address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Merge branch 'master' into optimize-atomic-post >> - Clean up comments for consistency and add spacing for readability >> - Fix some corner case failures and refined part of code >> - Merge branch 'master' into optimize-atomic-post >> - Refine ascii art, rename some variables and resolve conflicts >> - Merge branch 'master' into optimize-atomic-post >> - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 > > test/micro/org/openjdk/bench/vm/compiler/VectorThroughputForIterationCount.java line 136: > >> 134: // effects of this patch unobservable. >> 135: @Param({"true", "false"}) >> 136: public static boolean ENABLE_LARGE_LOOP_WARMUP; > > It would be nice to have some more comments here: > - for which benchmarks would the effect of "this patch" not be observable? Also: referring to "this patch" will require a future reader to trace things back in the "git blame" history, that's a bit unfortunate. > - Generally, it would now be nice to have a summary of which types of benchmarks show what kind of results, and why do we have all the variants. I'm asking for more comments because I fear the benchmark is becoming harder to use, with all the extra options and benchmark variants. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2711970327 From epeter at openjdk.org Wed Jan 21 10:52:48 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 10:52:48 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: <8huN5sDf2y95Hq2iuaMXN7aLeSik_gUnHcSpcc82Exw=.38fd6510-3fcd-4a28-a1c3-29eb18f51724@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <8huN5sDf2y95Hq2iuaMXN7aLeSik_gUnHcSpcc82Exw=.38fd6510-3fcd-4a28-a1c3-29eb18f51724@github.com> Message-ID: <0CusqXskKxfU0Cqxr0s1Mrnuu_L-bAtz-I7ehpKyERA=.fb47c428-a70a-410a-a2b1-8998e313988c@github.com> On Tue, 20 Jan 2026 17:04:27 GMT, Fei Gao wrote: >> Thanks for the explanations! They sound reasonable to me. Though eventually it would be good if @chhagedorn or @rwestrel looked at this, they are more familiar with this code. >> >> One more question here: could it be that one node that you now conservatively pin further down actually already has a use in a predicate further up, and now we'd create a `bad graph` cycle? > >> One more question here: could it be that one node that you now conservatively pin further down actually already has a use in a predicate further up, and now we'd create a `bad graph` cycle? > > If a node has a `use` that is attached to a predicate further up, then that `use` would also be pinned down to the loop `entry control`. Since we also fix the control of the `use`, which is itself a cloned node, I would expect that we wouldn?t end up creating a bad control-flow cycle. Does that make sense? But what if such a predicate uses the node as an input? Then the node is pinned below its use. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2712018943 From epeter at openjdk.org Wed Jan 21 10:56:20 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 10:56:20 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 10:33:09 GMT, Jatin Bhateja wrote: >> Patch to add support for dumping primary vector IR created by various VectorAPI inline expanders. >> Currently auto-vectorization prints the newly create vector IR with -XX:+TraceNewVectors, similar message is >> now emitted for VectorAPI. >> >> >> TraceNewVectors [AutoVectorization]: 1397 AddVI === _ 1395 1396 [[ ]] #vectorx >> >> TraceNewVectors [VectorAPI]: 1591 AddVI === _ 1545 1569 [[ ]] #vectory !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int256Vector::lanewise @ bci:3 (line 278) Int256Vector::lanewise @ bci:3 (line 43) IntVector::add @ bci:5 (line 1380) AddTestI::workload @ bci:34 (line 18) >> >> >> Thanks, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding more tracing calls Still looks good :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29265#pullrequestreview-3686567465 From jbhateja at openjdk.org Wed Jan 21 11:23:02 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 Jan 2026 11:23:02 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 10:33:09 GMT, Jatin Bhateja wrote: >> Patch to add support for dumping primary vector IR created by various VectorAPI inline expanders. >> Currently auto-vectorization prints the newly create vector IR with -XX:+TraceNewVectors, similar message is >> now emitted for VectorAPI. >> >> >> TraceNewVectors [AutoVectorization]: 1397 AddVI === _ 1395 1396 [[ ]] #vectorx >> >> TraceNewVectors [VectorAPI]: 1591 AddVI === _ 1545 1569 [[ ]] #vectory !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int256Vector::lanewise @ bci:3 (line 278) Int256Vector::lanewise @ bci:3 (line 43) IntVector::add @ bci:5 (line 1380) AddTestI::workload @ bci:34 (line 18) >> >> >> Thanks, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding more tracing calls Thanks @XiaohongGong and @eme64 for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29265#issuecomment-3777551413 From jbhateja at openjdk.org Wed Jan 21 11:23:06 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 Jan 2026 11:23:06 GMT Subject: Integrated: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 08:27:54 GMT, Jatin Bhateja wrote: > Patch to add support for dumping primary vector IR created by various VectorAPI inline expanders. > Currently auto-vectorization prints the newly create vector IR with -XX:+TraceNewVectors, similar message is > now emitted for VectorAPI. > > > TraceNewVectors [AutoVectorization]: 1397 AddVI === _ 1395 1396 [[ ]] #vectorx > > TraceNewVectors [VectorAPI]: 1591 AddVI === _ 1545 1569 [[ ]] #vectory !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int256Vector::lanewise @ bci:3 (line 278) Int256Vector::lanewise @ bci:3 (line 43) IntVector::add @ bci:5 (line 1380) AddTestI::workload @ bci:34 (line 18) > > > Thanks, > Jatin This pull request has now been integrated. Changeset: 983ae96f Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/983ae96f60c935aa52f482d21ae6a0d947679541 Stats: 45 lines in 1 file changed: 9 ins; 0 del; 36 mod 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors Reviewed-by: epeter ------------- PR: https://git.openjdk.org/jdk/pull/29265 From duke at openjdk.org Wed Jan 21 12:22:23 2026 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 21 Jan 2026 12:22:23 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression can be simplified on aarch64 In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Mon, 12 Jan 2026 10:37:31 GMT, Andrew Dinn wrote: >> The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. > > Changes look good. What testing have you run? Thanks! @adinn would you be so kind as to /sponsor the integration? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29141#issuecomment-3777775558 From duke at openjdk.org Wed Jan 21 12:22:24 2026 From: duke at openjdk.org (duke) Date: Wed, 21 Jan 2026 12:22:24 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression can be simplified on aarch64 [v3] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Mon, 19 Jan 2026 14:01:56 GMT, Ferenc Rakoczi wrote: >> The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > improve comment @ferakocz Your change (at version da86c0bbdba0ec17891621c391cb8cc142dca93f) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29141#issuecomment-3777794081 From dbriemann at openjdk.org Wed Jan 21 13:11:52 2026 From: dbriemann at openjdk.org (David Briemann) Date: Wed, 21 Jan 2026 13:11:52 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v2] In-Reply-To: References: Message-ID: > Adds the following mach nodes: > match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); David Briemann has updated the pull request incrementally with one additional commit since the last revision: address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29281/files - new: https://git.openjdk.org/jdk/pull/29281/files/497d0733..6553a246 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=00-01 Stats: 15 lines in 4 files changed: 3 ins; 9 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29281/head:pull/29281 PR: https://git.openjdk.org/jdk/pull/29281 From epeter at openjdk.org Wed Jan 21 13:28:03 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 13:28:03 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 13 Jan 2026 11:27:53 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Fix build failure after rebasing and address review comments > - Merge branch 'master' into optimize-atomic-post > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 I just ran the `bench001B_aligned_computeBound` benchmark on my `AVX512` machine, and realized that (as I think you tried to say) the PR here has no effect on it: image That's a bit of a bummer :/ I'd have to do some more digging to confirm what you said: that this is because of profiling, i.e. that we don't actually unroll the loop enough and don't insert the drain loop, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3778121818 From mdoerr at openjdk.org Wed Jan 21 13:32:03 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 21 Jan 2026 13:32:03 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v2] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 13:11:52 GMT, David Briemann wrote: >> Adds the following mach nodes: >> match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > address review comments LGTM. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29281#pullrequestreview-3687255429 From epeter at openjdk.org Wed Jan 21 13:33:49 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 13:33:49 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 13 Jan 2026 11:27:53 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Fix build failure after rebasing and address review comments > - Merge branch 'master' into optimize-atomic-post > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 It's a bummer because I had initially hoped that this PR would address (at least a part of) the performance regression that vectorization can cause, see https://github.com/openjdk/jdk/pull/27315 image You can see that for very small iteration counts, it is faster to disable the auto vectorizer. There were some regressions filed, like this one: https://bugs.openjdk.org/browse/JDK-8368245 So it would seem we have to investigate options around that separately. That said: it is still worth going ahead with this here, even if for now we only see the performance impact on benchmarks with special large-iteration warmup. But I am worried that if one does warm-up with small iteration count, that we don't generate good vectorized code, and don't get good performance if the loop later on uses larger iterations. I'll have to do some more investigation and experiments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3778148309 From chagedorn at openjdk.org Wed Jan 21 13:39:16 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 Jan 2026 13:39:16 GMT Subject: RFR: 8375618: Incorrect assert in CastLLNode::Ideal [v2] In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 02:53:31 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the incorrect assert in `CastLLNode::Ideal`. The assert intends to verify that the output is either a proper subset of or the same as the input. It does so by checking the signed lower bound and signed upper bound of the `TypeLong` instances. This method is not correct now. >> >> Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Make properly_contains debug only The fix looks good to me, thanks! src/hotspot/share/opto/type.cpp line 1820: > 1818: > 1819: #ifdef ASSERT > 1820: bool TypeInt::properly_contains(const TypeInt* t) const { Is `strictly_contains` easier to understand? But from a set theory perspective, "proper" and "strict" are both correct. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29304#pullrequestreview-3687284998 PR Review Comment: https://git.openjdk.org/jdk/pull/29304#discussion_r2712605987 From adinn at openjdk.org Wed Jan 21 13:39:51 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 21 Jan 2026 13:39:51 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression can be simplified on aarch64 [v3] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Mon, 19 Jan 2026 14:01:56 GMT, Ferenc Rakoczi wrote: >> The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > improve comment I believe this needs a second sign-off. @theRealAph can you do the honours? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29141#issuecomment-3778177332 From ghan at openjdk.org Wed Jan 21 13:55:52 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Wed, 21 Jan 2026 13:55:52 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: References: Message-ID: <17qmKHGiPNSNpHn2By-gtJjPpwFFDj98wiY-bQG16nY=.4df59ed3-8120-496b-9a1e-06b80c9149ae@github.com> On Wed, 21 Jan 2026 09:11:54 GMT, Emanuel Peter wrote: >> Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - revert >> - Merge remote-tracking branch 'upstream/master' into 8374807 >> - narrow lock scope >> - Merge remote-tracking branch 'upstream/master' into 8374807 >> - split long line >> - Merge remote-tracking branch 'upstream/master' into 8374807 >> - fix 8374807 > > test/hotspot/jtreg/compiler/uncommontrap/TestPrintDiagnosticsWithoutProfileTraps.java line 32: > >> 30: * @run main/othervm -XX:+TraceDeoptimization -XX:-ProfileTraps >> 31: * -XX:-TieredCompilation -Xcomp >> 32: * compiler.uncommontrap.TestPrintDiagnosticsWithoutProfileTraps > > `TraceDeoptimization` is a diagnostic flag. This test will cause issues without `-XX:+UnlockDiagnosticVMOptions`, right? And `ProfileTraps` is debug. So won't this need `-XX:+IgnoreUnrecognizedVMOptions`? > > @iwanowww Did you already run testing on this patch or should I run some? Hi @eme64 , thanks for the comments. I think we don?t need to add -XX:+UnlockDiagnosticVMOptions or -XX:+IgnoreUnrecognizedVMOptions here, because the test is guarded by @requires vm.debug. In debug builds, UnlockDiagnosticVMOptions is enabled by default (trueInDebug), so diagnostic flags like TraceDeoptimization are already unlocked, and ProfileTraps (a develop flag) is also available/recognized. https://github.com/openjdk/jdk/blob/983ae96f60c935aa52f482d21ae6a0d947679541/src/hotspot/share/runtime/globals.hpp#L172-L173 https://github.com/openjdk/jdk/blob/983ae96f60c935aa52f482d21ae6a0d947679541/src/hotspot/share/runtime/globals.hpp#L1188-L1189 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2712675296 From aph at openjdk.org Wed Jan 21 14:15:24 2026 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Jan 2026 14:15:24 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 10:15:24 GMT, Eric Fang wrote: >> This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. >> >> Changes: >> -------- >> >> 1. C2 mid-end: >> - Added UMinReductionVNode and UMaxReductionVNode >> >> 2. AArch64 Backend: >> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions >> - Updated match rules for all vector sizes and element types >> - Both NEON and SVE implementation are supported >> >> 3. Test: >> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java >> - Added assembly tests in aarch64-asmtest.py for new instructions >> - Added a JTReg test file VectorUMinMaxReductionTest.java >> >> Different configurations were tested on aarch64 and x86 machines, and all tests passed. >> >> Test results of JMH benchmarks from the panama-vector project: >> -------- >> >> On a Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 >> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 >> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 >> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 >> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 >> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 >> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 >> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 >> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 >> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 >> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 >> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 >> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 >> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 >> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 >> Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 >> Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 >> ... > > Eric Fang has updated the pull request incrementally with one additional commit since the last revision: > > Extract some helper functions for better readability src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1965: > 1963: // Helper function to decode min/max reduction operation properties > 1964: static void decode_minmax_reduction_opc(int opc, bool& is_min, bool& is_unsigned, > 1965: Assembler::Condition& cond) { Suggestion: Condition cond) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28693#discussion_r2712746847 From qamai at openjdk.org Wed Jan 21 14:25:59 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 21 Jan 2026 14:25:59 GMT Subject: RFR: 8375618: Incorrect assert in CastLLNode::Ideal [v3] In-Reply-To: References: Message-ID: > Hi, > > This PR fixes the incorrect assert in `CastLLNode::Ideal`. The assert intends to verify that the output is either a proper subset of or the same as the input. It does so by checking the signed lower bound and signed upper bound of the `TypeLong` instances. This method is not correct now. > > Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Rename for understandability ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29304/files - new: https://git.openjdk.org/jdk/pull/29304/files/948a0198..812892e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29304&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29304&range=01-02 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/29304.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29304/head:pull/29304 PR: https://git.openjdk.org/jdk/pull/29304 From qamai at openjdk.org Wed Jan 21 14:26:02 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 21 Jan 2026 14:26:02 GMT Subject: RFR: 8375618: Incorrect assert in CastLLNode::Ideal [v2] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 13:35:14 GMT, Christian Hagedorn wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Make properly_contains debug only > > src/hotspot/share/opto/type.cpp line 1820: > >> 1818: >> 1819: #ifdef ASSERT >> 1820: bool TypeInt::properly_contains(const TypeInt* t) const { > > Is `strictly_contains` easier to understand? But from a set theory perspective, "proper" and "strict" are both correct. That's definitely a more intuitive name, thanks for the suggestion :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29304#discussion_r2712784033 From chagedorn at openjdk.org Wed Jan 21 14:49:20 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 Jan 2026 14:49:20 GMT Subject: RFR: 8375618: Incorrect assert in CastLLNode::Ideal [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 14:25:59 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the incorrect assert in `CastLLNode::Ideal`. The assert intends to verify that the output is either a proper subset of or the same as the input. It does so by checking the signed lower bound and signed upper bound of the `TypeLong` instances. This method is not correct now. >> >> Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Rename for understandability Thanks for the update! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29304#pullrequestreview-3687642614 From mhaessig at openjdk.org Wed Jan 21 16:09:24 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 21 Jan 2026 16:09:24 GMT Subject: RFR: 8359335: Template-Framework Library: Primitive Types subtyping Message-ID: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> This PR implements the subtype relation for primitive types in the Template Framework Library according to JLS ?4.10.1. Further, this PR extends the primitive type example to test the addition. Testing: - [ ] Github Actions - [ ] tier1, tier2 ------------- Commit messages: - Implement subtyping for primitive types in templates Changes: https://git.openjdk.org/jdk/pull/29349/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29349&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359335 Stats: 38 lines in 3 files changed: 31 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/29349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29349/head:pull/29349 PR: https://git.openjdk.org/jdk/pull/29349 From mhaessig at openjdk.org Wed Jan 21 16:09:27 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 21 Jan 2026 16:09:27 GMT Subject: RFR: 8359335: Template-Framework Library: Primitive Types subtyping In-Reply-To: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> References: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> Message-ID: On Wed, 21 Jan 2026 15:56:36 GMT, Manuel H?ssig wrote: > This PR implements the subtype relation for primitive types in the Template Framework Library according to JLS ?4.10.1. Further, this PR extends the primitive type example to test the addition. > > Testing: > - [ ] Github Actions > - [ ] tier1, tier2 test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 133: > 131: ops.add(Expression.make(type, "(", type, " * ", type, ")")); > 132: ops.add(Expression.make(type, "(", type, " / ", type, ")", WITH_ARITHMETIC_EXCEPTION)); > 133: ops.add(Expression.make(type, "(", type, " % ", type, ")", WITH_ARITHMETIC_EXCEPTION)); Because all integer primitives are subtypes of the floating point primitives, the new subtype relation can lead to a situation where a fuzzer generates a float modulo expression and then generates two integer expressions for the left- and right-hand sides. Hence, the modulo operation that is actually executed is an integer expression that may throw an `ArithmeticException` if the divisor is zero. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29349#discussion_r2713239125 From epeter at openjdk.org Wed Jan 21 16:54:50 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jan 2026 16:54:50 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: <17qmKHGiPNSNpHn2By-gtJjPpwFFDj98wiY-bQG16nY=.4df59ed3-8120-496b-9a1e-06b80c9149ae@github.com> References: <17qmKHGiPNSNpHn2By-gtJjPpwFFDj98wiY-bQG16nY=.4df59ed3-8120-496b-9a1e-06b80c9149ae@github.com> Message-ID: On Wed, 21 Jan 2026 13:52:52 GMT, Guanqiang Han wrote: >> test/hotspot/jtreg/compiler/uncommontrap/TestPrintDiagnosticsWithoutProfileTraps.java line 32: >> >>> 30: * @run main/othervm -XX:+TraceDeoptimization -XX:-ProfileTraps >>> 31: * -XX:-TieredCompilation -Xcomp >>> 32: * compiler.uncommontrap.TestPrintDiagnosticsWithoutProfileTraps >> >> `TraceDeoptimization` is a diagnostic flag. This test will cause issues without `-XX:+UnlockDiagnosticVMOptions`, right? And `ProfileTraps` is debug. So won't this need `-XX:+IgnoreUnrecognizedVMOptions`? >> >> @iwanowww Did you already run testing on this patch or should I run some? > > Hi @eme64 , thanks for the comments. > > I think we don?t need to add -XX:+UnlockDiagnosticVMOptions or -XX:+IgnoreUnrecognizedVMOptions here, because the test is guarded by @requires vm.debug. In debug builds, UnlockDiagnosticVMOptions is enabled by default (trueInDebug), so diagnostic flags like TraceDeoptimization are already unlocked, and ProfileTraps (a develop flag) is also available/recognized. > > https://github.com/openjdk/jdk/blob/983ae96f60c935aa52f482d21ae6a0d947679541/src/hotspot/share/runtime/globals.hpp#L172-L173 > > https://github.com/openjdk/jdk/blob/983ae96f60c935aa52f482d21ae6a0d947679541/src/hotspot/share/runtime/globals.hpp#L1188-L1189 @hgqxjj Hmm I see. I'm not a fan of using `@requires vm.debug`, because it means that your test is not run in product, and sometimes bugs only show up in product. It's also not great that we have to run a whole JVM startup with `-Xcomp`, compiling EVERYTHING, and blocking for every compilation. We should only use `-Xcomp` in combination with a fairly restricted `compileonly`. Otherwise, we just waste a lot of compute resources. @iwanowww What is your opinion on this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2713463904 From dlong at openjdk.org Thu Jan 22 00:19:06 2026 From: dlong at openjdk.org (Dean Long) Date: Thu, 22 Jan 2026 00:19:06 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: References: Message-ID: <6cXuSAkm2-aVq75Pd1V0an-v8NApBXrz878R1k7H1fc=.d59f5069-887b-4879-9293-3fbdb94febb9@github.com> On Tue, 20 Jan 2026 02:42:41 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. > > For example, given `r = CmpU(x, y)`. > > At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. > > At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. > > Please take a look and leave your reviews, thanks a lot. Would it make sense to have stand-alone C++ tests for these and maybe other interesting cases? Maybe using the gtest framework? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29308#issuecomment-3781800297 From ghan at openjdk.org Thu Jan 22 01:56:52 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Thu, 22 Jan 2026 01:56:52 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: References: <17qmKHGiPNSNpHn2By-gtJjPpwFFDj98wiY-bQG16nY=.4df59ed3-8120-496b-9a1e-06b80c9149ae@github.com> Message-ID: <5YSsP2h52Kp9WWb4YrthjtYSziRKM5FM3s5NHThrPTg=.8be8a6fc-e841-4e7b-b9d5-e3910c9bc2e6@github.com> On Wed, 21 Jan 2026 16:51:48 GMT, Emanuel Peter wrote: >> Hi @eme64 , thanks for the comments. >> >> I think we don?t need to add -XX:+UnlockDiagnosticVMOptions or -XX:+IgnoreUnrecognizedVMOptions here, because the test is guarded by @requires vm.debug. In debug builds, UnlockDiagnosticVMOptions is enabled by default (trueInDebug), so diagnostic flags like TraceDeoptimization are already unlocked, and ProfileTraps (a develop flag) is also available/recognized. >> >> https://github.com/openjdk/jdk/blob/983ae96f60c935aa52f482d21ae6a0d947679541/src/hotspot/share/runtime/globals.hpp#L172-L173 >> >> https://github.com/openjdk/jdk/blob/983ae96f60c935aa52f482d21ae6a0d947679541/src/hotspot/share/runtime/globals.hpp#L1188-L1189 > > @hgqxjj Hmm I see. I'm not a fan of using `@requires vm.debug`, because it means that your test is not run in product, and sometimes bugs only show up in product. > > It's also not great that we have to run a whole JVM startup with `-Xcomp`, compiling EVERYTHING, and blocking for every compilation. We should only use `-Xcomp` in combination with a fairly restricted `compileonly`. Otherwise, we just waste a lot of compute resources. > > @iwanowww What is your opinion on this? @eme64 Thanks for the feedback. On @requires vm.debug: I?d like to keep it for this reproducer. ProfileTraps is the key knob here: the failure requires ProfileTraps=false (create_if_missing = ProfileTraps, so get_method_data(..., false) may return NULL). Since ProfileTraps is a develop_pd flag and not settable on product builds, this reproducer has to run on a non-product VM (i.e., a debug VM). On -Xcomp: agreed. I?ll keep it but restrict it with -XX:CompileCommand=compileonly,... so we only compile the relevant method(s). If that sounds reasonable, I?ll proceed with just the compileonly tightening. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2715016925 From kvn at openjdk.org Thu Jan 22 02:00:51 2026 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 Jan 2026 02:00:51 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 10:18:53 GMT, David Briemann wrote: >> Aligning upwards instead of downwards not only solves the crash in large huge page scenarios but also ensures that the cache sizes are at least as big as they were set. > > Thanks for the reviews! @dbriemann, this change invalidated assumption that ReservedCodeCacheSize can't change if specified on command line. You replaced `align_down` with `align_up` but did not check that `cache_size` may increase after that. It also causing issue with AOT because CodeCache size varies between different phases because we use different number of compiler threads and as result different NonNmethod section size. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28658#issuecomment-3782072965 From xgong at openjdk.org Thu Jan 22 02:11:32 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 22 Jan 2026 02:11:32 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v11] In-Reply-To: References: <-avpi_0V0b9jLf29Sah1VB8ZiOWu9MSFl8X3vO22NFQ=.0c006266-c5eb-4077-b238-311e6190486f@github.com> Message-ID: On Wed, 21 Jan 2026 09:47:22 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 429: >> >>> 427: IRNode.ADD_VI, "> 0", >>> 428: IRNode.STORE_VECTOR, "> 0"}, >>> 429: applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}, >> >> Is this better? >> >> Suggestion: >> >> applyIfCPUFeatureOr = {"avx512", "true", "sve", "true"}, > > Are you getting failures with `sse4.1` or `asimd`? Or what is the reason for weakening the conditions here? There is no failures with these two features because we have checks of `MaxVectorSize` as well. I was thinking that only `avx512` and `sve` CPUs would match the condition `MaxVectorSize >= 64`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2715040988 From xgong at openjdk.org Thu Jan 22 02:26:33 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 22 Jan 2026 02:26:33 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v12] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 10:01:08 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > updates for review I ran the new tests on my ARM NEON machine with `-XX:MaxVectorSize=8`, and following tests crashed with the same error: compiler/vectorization/TestVectorAlgorithms.java#noOptimizeFill compiler/vectorization/TestVectorAlgorithms.java#noSuperWord compiler/vectorization/TestVectorAlgorithms.java#vanilla Here is the log: Standard Output --------------- CompileCommand: inline *VectorAlgorithmsImpl*.* bool inline = true TestVM main() called - about to run tests in class compiler.vectorization.TestVectorAlgorithms For random generator using seed: 5121565769469166450 To re-run test with same seed value please add "-Djdk.test.lib.random.seed=5121565769469166450" to command line. 300 Phi === 103 1050 302 [[ 399 299 ]] #rawptr:BotPTR !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int64Vector::lanewise @ bci:3 (line 278) Int64Vector::lanewise @ bci:3 (line 43) IntVector::lanewise @ bci:43 (line 944) IntVector::add @ bci:5 (line 1406) VectorAlgorithmsImpl::findMinIndexI_VectorAPI @ bci:96 (line 563) 300 Phi === 103 1050 302 [[ 399 299 ]] #rawptr:BotPTR !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int64Vector::lanewise @ bci:3 (line 278) Int64Vector::lanewise @ bci:3 (line 43) IntVector::lanewise @ bci:43 (line 944) IntVector::add @ bci:5 (line 1406) VectorAlgorithmsImpl::findMinIndexI_VectorAPI @ bci:96 (line 563) 98 safePoint === 101 0 401 0 0 99 905 402 403 404 282 0 0 0 0 908 909 912 [[ 100 575 675 ]] !jvms: VectorAlgorithmsImpl::findMinIndexI_VectorAPI @ bci:113 (line 558) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (jdk-src/src/hotspot/share/opto/buildOopMap.cpp:371), pid=145228, tid=145250 # assert(false) failed: there should be an oop in OopMap instead of a live raw oop at safepoint # # JRE version: OpenJDK Runtime Environment (27.0) (fastdebug build 27-internal-git-362f4c7acc8) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 27-internal-git-362f4c7acc8, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x72ae50] OopFlow::build_oop_map(Node*, int, PhaseRegAlloc*, int*)+0xf80 # And the VM options: -ea -esa -Xmx768m -XX:UseSVE=0 -XX:MaxVectorSize=8 --add-modules=jdk.incubator.vector -XX:CompileCommand=inline,*VectorAlgorithmsImpl*::* -XX:-BackgroundCompilation -XX:CompileCommand=quiet Could you please take a look? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3782145169 From xgong at openjdk.org Thu Jan 22 02:29:38 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 22 Jan 2026 02:29:38 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors [v3] In-Reply-To: References: <9gG4XLzdFYYAiZocLBepqB4uQrZugCVb6j_pOBloKjI=.323d5617-d06e-49f9-9559-03f1356904b2@github.com> Message-ID: <7CZd-55-hI444w7Ff__SX72DsyyhmYn0esx5ICdSyXU=.727aff99-3fa4-4c2a-9636-dda4ffd06761@github.com> On Wed, 21 Jan 2026 10:28:16 GMT, Jatin Bhateja wrote: >> But the `VectorBlendNode` node would also be useful if we have a masked operation but don't use a predicated instruction, don't you think? > > Without any contention here, included!, again we can always increase the coverage. My idea is we'd better also trace the predicated version of IR which has another mask input. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29265#discussion_r2715069027 From erfang at openjdk.org Thu Jan 22 03:13:59 2026 From: erfang at openjdk.org (Eric Fang) Date: Thu, 22 Jan 2026 03:13:59 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 14:10:39 GMT, Andrew Haley wrote: >> Eric Fang has updated the pull request incrementally with one additional commit since the last revision: >> >> Extract some helper functions for better readability > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1965: > >> 1963: // Helper function to decode min/max reduction operation properties >> 1964: static void decode_minmax_reduction_opc(int opc, bool& is_min, bool& is_unsigned, >> 1965: Assembler::Condition& cond) { > > Suggestion: > > Condition cond) { Considering that this function is only used by this file and does not call any instructions, I made it a **file-scope static** function. And as we don't declare `using Assembler::Condition;` in this file, so we have to use `Assembler::Condition&` here, or we'll get the following error: jdk/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp:1965:41: error: ?Condition? has not been declared 1965 | Condition& cond) { As for `&`, this is a reference parameter. To remove `Assembler::`, we can 1. Declare `using Assembler::Condition;` in this file. 2. Make this function as a private method of `C2_MacroAssembler`. WDYT ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28693#discussion_r2715142364 From qamai at openjdk.org Thu Jan 22 03:48:24 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 03:48:24 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: <6cXuSAkm2-aVq75Pd1V0an-v8NApBXrz878R1k7H1fc=.d59f5069-887b-4879-9293-3fbdb94febb9@github.com> References: <6cXuSAkm2-aVq75Pd1V0an-v8NApBXrz878R1k7H1fc=.d59f5069-887b-4879-9293-3fbdb94febb9@github.com> Message-ID: On Thu, 22 Jan 2026 00:16:43 GMT, Dean Long wrote: >> Hi, >> >> This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. >> >> For example, given `r = CmpU(x, y)`. >> >> At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. >> >> At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. >> >> Please take a look and leave your reviews, thanks a lot. > > Would it make sense to have stand-alone C++ tests for these and maybe other interesting cases? Maybe using the gtest framework? @dean-long I guess it is possible, but the implementation is simple enough that the test will just be a repetition of the implementation. Furthermore, doing so also requires some non-trivial refactoring since `CmpUNode::sub` inspects the structure around the node, not just the `Type` of its inputs. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29308#issuecomment-3782362892 From dlong at openjdk.org Thu Jan 22 04:22:05 2026 From: dlong at openjdk.org (Dean Long) Date: Thu, 22 Jan 2026 04:22:05 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 02:42:41 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. > > For example, given `r = CmpU(x, y)`. > > At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. > > At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. > > Please take a look and leave your reviews, thanks a lot. src/hotspot/share/opto/subnode.cpp line 767: > 765: return TypeInt::CC_GT; > 766: } else if (r0->is_con() && r1->is_con()) { > 767: assert(r0->get_con() == r1->get_con(), "must be equal"); Please explain why this assert must be true. Does it need a comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2715242235 From dlong at openjdk.org Thu Jan 22 04:36:26 2026 From: dlong at openjdk.org (Dean Long) Date: Thu, 22 Jan 2026 04:36:26 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: References: Message-ID: <5EfYl-SSl6pWQ-qbk9yY5HtyuH8MjDWJinn7TBrz2Y4=.d98f06b2-f761-40c9-aa2e-587a38072481@github.com> On Tue, 20 Jan 2026 02:42:41 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. > > For example, given `r = CmpU(x, y)`. > > At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. > > At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. > > Please take a look and leave your reviews, thanks a lot. src/hotspot/share/opto/subnode.cpp line 758: > 756: // (This is a gross hack, since the sub method never > 757: // looks at the structure of the node in any other case.) > 758: if (r0->_lo >= 0 && r1->_lo >= 0 && is_index_range_check()) { Do we still need this after improvements like JDK-8356813? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2715266363 From dlong at openjdk.org Thu Jan 22 04:42:47 2026 From: dlong at openjdk.org (Dean Long) Date: Thu, 22 Jan 2026 04:42:47 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: <6cXuSAkm2-aVq75Pd1V0an-v8NApBXrz878R1k7H1fc=.d59f5069-887b-4879-9293-3fbdb94febb9@github.com> References: <6cXuSAkm2-aVq75Pd1V0an-v8NApBXrz878R1k7H1fc=.d59f5069-887b-4879-9293-3fbdb94febb9@github.com> Message-ID: On Thu, 22 Jan 2026 00:16:43 GMT, Dean Long wrote: >> Hi, >> >> This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. >> >> For example, given `r = CmpU(x, y)`. >> >> At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. >> >> At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. >> >> Please take a look and leave your reviews, thanks a lot. > > Would it make sense to have stand-alone C++ tests for these and maybe other interesting cases? Maybe using the gtest framework? > @dean-long I guess it is possible, but the implementation is simple enough that the test will just be a repetition of the implementation. Furthermore, doing so also requires some non-trivial refactoring since `CmpUNode::sub` inspects the structure around the node, not just the `Type` of its inputs. What do you think? Good point. I'm hoping we can get rid of node structure inspection, then we could make these static functions that only deal with the type. Even if it turns out that the is_index_range_check() code is still needed, we could make CmpULNode::sub static and refactor CmpUNode::sub to have a static helper that can be tested stand-alone. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29308#issuecomment-3782469169 From qamai at openjdk.org Thu Jan 22 05:21:05 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 05:21:05 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v2] In-Reply-To: References: Message-ID: <8__NB6VFheHPlPKxBK1P_-AnewGZaiuT96paSGI39hg=.f7ae4fc6-ccd4-4945-8a50-21cba8d38c66@github.com> > Hi, > > This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. > > For example, given `r = CmpU(x, y)`. > > At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. > > At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: A little more detailed explanation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29308/files - new: https://git.openjdk.org/jdk/pull/29308/files/d3c80f3d..089ff911 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29308/head:pull/29308 PR: https://git.openjdk.org/jdk/pull/29308 From qamai at openjdk.org Thu Jan 22 05:21:07 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 05:21:07 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: References: <6cXuSAkm2-aVq75Pd1V0an-v8NApBXrz878R1k7H1fc=.d59f5069-887b-4879-9293-3fbdb94febb9@github.com> Message-ID: On Thu, 22 Jan 2026 04:40:17 GMT, Dean Long wrote: >> Would it make sense to have stand-alone C++ tests for these and maybe other interesting cases? Maybe using the gtest framework? > >> @dean-long I guess it is possible, but the implementation is simple enough that the test will just be a repetition of the implementation. Furthermore, doing so also requires some non-trivial refactoring since `CmpUNode::sub` inspects the structure around the node, not just the `Type` of its inputs. What do you think? > > Good point. I'm hoping we can get rid of node structure inspection, then we could make these static functions that only deal with the type. Even if it turns out that the is_index_range_check() code is still needed, we could make CmpULNode::sub static and refactor CmpUNode::sub to have a static helper that can be tested stand-alone. @dean-long Thanks for your comments, I have addressed them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29308#issuecomment-3782539304 From qamai at openjdk.org Thu Jan 22 05:21:12 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 05:21:12 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v2] In-Reply-To: <5EfYl-SSl6pWQ-qbk9yY5HtyuH8MjDWJinn7TBrz2Y4=.d98f06b2-f761-40c9-aa2e-587a38072481@github.com> References: <5EfYl-SSl6pWQ-qbk9yY5HtyuH8MjDWJinn7TBrz2Y4=.d98f06b2-f761-40c9-aa2e-587a38072481@github.com> Message-ID: On Thu, 22 Jan 2026 04:33:45 GMT, Dean Long wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> A little more detailed explanation > > src/hotspot/share/opto/subnode.cpp line 758: > >> 756: // (This is a gross hack, since the sub method never >> 757: // looks at the structure of the node in any other case.) >> 758: if (r0->_lo >= 0 && r1->_lo >= 0 && is_index_range_check()) { > > Do we still need this after improvements like JDK-8356813? Yes, that only solves the issue when the divisor is constant, this helps when the divisor is not a constant as well. > src/hotspot/share/opto/subnode.cpp line 767: > >> 765: return TypeInt::CC_GT; >> 766: } else if (r0->is_con() && r1->is_con()) { >> 767: assert(r0->get_con() == r1->get_con(), "must be equal"); > > Please explain why this assert must be true. Does it need a comment? Done! The reason is that here we have `r0->_ulo == r0->_uhi` and `r1->_ulo == r1->_uhi`. So if `r0->_ulo != r1->_ulo`, we must reach a previous branch instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2715335794 PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2715334761 From duke at openjdk.org Thu Jan 22 06:29:43 2026 From: duke at openjdk.org (duke) Date: Thu, 22 Jan 2026 06:29:43 GMT Subject: Withdrawn: 8371419: IGV: Add view to visualise dominator tree and dominator information In-Reply-To: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> References: <6Vojoez34k5CHSLTQ-sSxRERAHEraT-OV9epmtS1s2E=.462bea29-750b-455e-a20e-4a223a601374@github.com> Message-ID: <8oAZ_OxHbq5DWTYbUIjQ3WLbYJZAVkuQLGuCk49WQq0=.4a975ad2-8eaf-422c-9e09-f6ceef063115@github.com> On Thu, 13 Nov 2025 11:59:20 GMT, Damon Fenacci wrote: > This change introduces a dominator tree view in IGV?s CFG panel, enabling users to toggle between the control flow graph and the dominator tree. This makes dominator relationships easier to inspect than the current stdout-based output (`-XX:+PrintDominators`). > > ## Motivation > * Today, dominator information is difficult to access (e.g. via `-XX:+PrintDominators`, which is hard to read and correlate with the graph). > * IGV already computes dominators for some phases but does not visualize them. > * Comparing dominator trees across graphs/phases was not supported. > > ## What?s New > 1. Toggle in the CFG view (toolbar button (image) to switch between: > * Control Flow Graph (CFG) > * Dominator Tree > 2. Dominator edge coloring to indicate provenance: > * Blue: dominator info provided by C2 (from GCM phase onward for now, a follow RFE will handle loop optimization dominator information) > * Red: dominator info computed by IGV (pre-GCM) > 3. Graph comparison enhancements: > * Compare dominator trees between graphs (new) > * Compare CFG differences between graphs (previously missing) > 4. Node annotations: > * `idom`: immediate dominator > * `dom_depth`: dominator depth > * `block`: numeric block ID for all nodes in a block > > The resulting main view looks like this: > Screenshot 2025-11-13 at 15 04 12 > > ## Testing > * Tier 1-3 > * Manual testing in IGV This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/28293 From chagedorn at openjdk.org Thu Jan 22 06:39:24 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 22 Jan 2026 06:39:24 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked [v2] In-Reply-To: References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: On Mon, 19 Jan 2026 13:59:25 GMT, Roland Westrelin wrote: >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` caches >> intermediate results in `_dom_lca_tags` when the late control is >> computed by `PhaseIdealLoop::get_late_ctrl()` for a node `n`: the code >> iterates over all uses of `n` potentially calling >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` multiple >> times. `_dom_lca_tags` is used to cache data that is specific to the >> lca computation for `n`. `_dom_lca_tags` is set to a tag that depends >> on `n` to mark the cached data as only valid during the lca >> computation for `n`. >> >> `PhaseIdealLoop::try_sink_out_of_loop()` checks that all uses of a >> node are out of loop with >> `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` which also needs to >> consider anti-dependences for `Load`s and also calls >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` through >> `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`: this computes the >> late control for a node and one particular out of loop >> use. `_dom_lca_tags` values computed by an earlier >> `PhaseIdealLoop::get_late_ctrl()` should be ignored (because it >> computes the late control for a node and all its uses). To address >> that issue, the tag that's used by >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` is made >> different on each call from >> `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` by incrementing >> `_dom_lca_tags_round`. >> >> The issue here is that one `Load` node is input to a `Phi` twice. So >> the `Phi` is considered twice as a use of the node along 2 different >> paths. `PhaseIdealLoop::get_late_ctrl_with_anti_dep()` is called twice >> from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` but >> `_dom_lca_tags_round` is not incremented between the 2 >> calls. `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` when >> called for the second `Phi` input uses incorrect cached data which, in >> turn, causes an incorrect computation. >> >> The fix I propose is to make sure `_dom_lca_tags_round` is incremented >> for every call to `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review test/hotspot/jtreg/compiler/loopopts/TestSinkingLoadInputOfPhi.java line 40: > 38: static int iFld2 = 10; > 39: static void test() { > 40: int iArr[] = new int[iFld2]; Suggestion: int iArr[] = new int[iFld2]; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29231#discussion_r2715505021 From hgreule at openjdk.org Thu Jan 22 07:03:55 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 22 Jan 2026 07:03:55 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: References: <6cXuSAkm2-aVq75Pd1V0an-v8NApBXrz878R1k7H1fc=.d59f5069-887b-4879-9293-3fbdb94febb9@github.com> Message-ID: <_2UwWc9jX4tAh4lRmdqupR57IZpuyM-5rj8kwaMsXPw=.5431dd1c-b45f-48d1-9048-600a7aebb766@github.com> On Thu, 22 Jan 2026 04:40:17 GMT, Dean Long wrote: > Even if it turns out that the is_index_range_check() code is still needed, we could make CmpULNode::sub static and refactor CmpUNode::sub to have a static helper that can be tested stand-alone. With memory segments, I'd argue we should rather look into also improving CmpULNode in the same way (there are also more opportunities that could be handled there, see https://bugs.openjdk.org/browse/JDK-8286679). But the purely type-based code could be extracted and probably even shared between CmpU and CmpUL. Given this is a P2 bug, it might make sense to rather integrate this simple fix and do additional refactorings later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29308#issuecomment-3782857416 From epeter at openjdk.org Thu Jan 22 07:07:06 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jan 2026 07:07:06 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v2] In-Reply-To: <8__NB6VFheHPlPKxBK1P_-AnewGZaiuT96paSGI39hg=.f7ae4fc6-ccd4-4945-8a50-21cba8d38c66@github.com> References: <8__NB6VFheHPlPKxBK1P_-AnewGZaiuT96paSGI39hg=.f7ae4fc6-ccd4-4945-8a50-21cba8d38c66@github.com> Message-ID: On Thu, 22 Jan 2026 05:21:05 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. >> >> For example, given `r = CmpU(x, y)`. >> >> At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. >> >> At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > A little more detailed explanation @merykitty Thanks for the fix! And: it is so much nicer to do unsigned ops now that we actually have unsigned types :) It would indeed be nice if eventually we refactored the code such that gtests are possible. Is that not generally the plan? I was also wondering: do we already have some good IR tests for `compareUnsigned`? It would be a shame if this led to a performance regression just because we don't have the coverage. And: you should probably not just test `Integer.compareUnsigned`, but also `Long.compareUnsigned`, right? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29308#pullrequestreview-3690927068 From qamai at openjdk.org Thu Jan 22 07:49:32 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 07:49:32 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v3] In-Reply-To: References: Message-ID: > Hi, > > This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. > > For example, given `r = CmpU(x, y)`. > > At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. > > At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Test Long::compareUnsigned ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29308/files - new: https://git.openjdk.org/jdk/pull/29308/files/089ff911..195a8ee1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=01-02 Stats: 19 lines in 1 file changed: 16 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29308/head:pull/29308 PR: https://git.openjdk.org/jdk/pull/29308 From qamai at openjdk.org Thu Jan 22 07:49:33 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 07:49:33 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v2] In-Reply-To: References: <8__NB6VFheHPlPKxBK1P_-AnewGZaiuT96paSGI39hg=.f7ae4fc6-ccd4-4945-8a50-21cba8d38c66@github.com> Message-ID: On Thu, 22 Jan 2026 07:04:35 GMT, Emanuel Peter wrote: > It would indeed be nice if eventually we refactored the code such that gtests are possible. Is that not generally the plan? It is actually possible to do that with the `RangeInference` infrastructure. But unfortunately, it is unavailable in jdk26, and given it is simple and very similar to the existing `CmpINode::sub`, I think such need is less important. > I was also wondering: do we already have some good IR tests for `compareUnsigned`? It would be a shame if this led to a performance regression just because we don't have the coverage. Given this is a strict improvement, I don't think there can be any performance regression. If you look at the old code, it just tried to compute `_ulo` and `_uhi` and used them like we are having here. > And: you should probably not just test `Integer.compareUnsigned`, but also `Long.compareUnsigned`, right? Good catch, I have added another test for `Long`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29308#issuecomment-3783006696 From dlong at openjdk.org Thu Jan 22 07:57:58 2026 From: dlong at openjdk.org (Dean Long) Date: Thu, 22 Jan 2026 07:57:58 GMT Subject: RFR: 8375618: Incorrect assert in CastLLNode::Ideal [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 14:25:59 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the incorrect assert in `CastLLNode::Ideal`. The assert intends to verify that the output is either a proper subset of or the same as the input. It does so by checking the signed lower bound and signed upper bound of the `TypeLong` instances. This method is not correct now. >> >> Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Rename for understandability Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29304#pullrequestreview-3691083946 From jbhateja at openjdk.org Thu Jan 22 08:02:05 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 Jan 2026 08:02:05 GMT Subject: RFR: 8375498: [VectorAPI] Dump primary vector IR details with -XX:+TraceNewVectors [v3] In-Reply-To: <7CZd-55-hI444w7Ff__SX72DsyyhmYn0esx5ICdSyXU=.727aff99-3fa4-4c2a-9636-dda4ffd06761@github.com> References: <9gG4XLzdFYYAiZocLBepqB4uQrZugCVb6j_pOBloKjI=.323d5617-d06e-49f9-9559-03f1356904b2@github.com> <7CZd-55-hI444w7Ff__SX72DsyyhmYn0esx5ICdSyXU=.727aff99-3fa4-4c2a-9636-dda4ffd06761@github.com> Message-ID: On Thu, 22 Jan 2026 02:26:36 GMT, Xiaohong Gong wrote: >> Without any contention here, included!, again we can always increase the coverage and fine tune the tracings. > > My idea is we'd better also trace the predicated version of IR which has another mask input. Sure, will accommodate it in subsequent patch, thanks for your very valid suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29265#discussion_r2715738723 From chagedorn at openjdk.org Thu Jan 22 08:05:07 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 22 Jan 2026 08:05:07 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v29] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 16:38:25 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix safepoint detection With your latest safepoint fix, the closed test is no longer failing! So, that seems to have fixed the issue. I'm currently running the DIFF patch with your fix on top up to tier7. Looking good so far. I'm also running some testing again for this patch only up to tier7. Will report back once it's complete (probably takes a while). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3783056595 From jbhateja at openjdk.org Thu Jan 22 08:17:55 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 Jan 2026 08:17:55 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v10] In-Reply-To: References: Message-ID: > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762 - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762 - Update callGenerator.hpp copyright year - Review comments resolution - Cleanups - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762 - Updating predicate checks - Fixes for failing regressions - ... and 4 more: https://git.openjdk.org/jdk/compare/0f4d7750...4b807102 ------------- Changes: https://git.openjdk.org/jdk/pull/24104/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=09 Stats: 1350 lines in 29 files changed: 1247 ins; 2 del; 101 mod Patch: https://git.openjdk.org/jdk/pull/24104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104 PR: https://git.openjdk.org/jdk/pull/24104 From jbhateja at openjdk.org Thu Jan 22 08:17:57 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 Jan 2026 08:17:57 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v2] In-Reply-To: <1Vs8Ud-yh7FtFJN9sddNXDVM6Mc0ue9oi_oa0w5pRzU=.022172f3-1622-4d05-888b-c7afc66a5254@github.com> References: <1Vs8Ud-yh7FtFJN9sddNXDVM6Mc0ue9oi_oa0w5pRzU=.022172f3-1622-4d05-888b-c7afc66a5254@github.com> Message-ID: On Mon, 11 Aug 2025 03:07:13 GMT, Xiaohong Gong wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating predicate checks > > Thanks for your work @jatin-bhateja! This PR also provides help on AArch64 that we also have plan to do the same intrinsifaction in our side. Hi @XiaohongGong , @eme64 , Let me know if you have comments / suggestions here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24104#issuecomment-3783097561 From roland at openjdk.org Thu Jan 22 08:28:41 2026 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jan 2026 08:28:41 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked [v3] In-Reply-To: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` caches > intermediate results in `_dom_lca_tags` when the late control is > computed by `PhaseIdealLoop::get_late_ctrl()` for a node `n`: the code > iterates over all uses of `n` potentially calling > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` multiple > times. `_dom_lca_tags` is used to cache data that is specific to the > lca computation for `n`. `_dom_lca_tags` is set to a tag that depends > on `n` to mark the cached data as only valid during the lca > computation for `n`. > > `PhaseIdealLoop::try_sink_out_of_loop()` checks that all uses of a > node are out of loop with > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` which also needs to > consider anti-dependences for `Load`s and also calls > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` through > `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`: this computes the > late control for a node and one particular out of loop > use. `_dom_lca_tags` values computed by an earlier > `PhaseIdealLoop::get_late_ctrl()` should be ignored (because it > computes the late control for a node and all its uses). To address > that issue, the tag that's used by > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` is made > different on each call from > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` by incrementing > `_dom_lca_tags_round`. > > The issue here is that one `Load` node is input to a `Phi` twice. So > the `Phi` is considered twice as a use of the node along 2 different > paths. `PhaseIdealLoop::get_late_ctrl_with_anti_dep()` is called twice > from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` but > `_dom_lca_tags_round` is not incremented between the 2 > calls. `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` when > called for the second `Phi` input uses incorrect cached data which, in > turn, causes an incorrect computation. > > The fix I propose is to make sure `_dom_lca_tags_round` is incremented > for every call to `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/loopopts/TestSinkingLoadInputOfPhi.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29231/files - new: https://git.openjdk.org/jdk/pull/29231/files/16bc98a6..3768ded8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29231&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29231&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29231.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29231/head:pull/29231 PR: https://git.openjdk.org/jdk/pull/29231 From jbhateja at openjdk.org Thu Jan 22 08:31:30 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 Jan 2026 08:31:30 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v11] In-Reply-To: References: Message-ID: > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24104/files - new: https://git.openjdk.org/jdk/pull/24104/files/4b807102..2c7eb96d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=09-10 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104 PR: https://git.openjdk.org/jdk/pull/24104 From qamai at openjdk.org Thu Jan 22 08:35:24 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 08:35:24 GMT Subject: RFR: 8375618: Incorrect assert in CastLLNode::Ideal [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 14:47:05 GMT, Christian Hagedorn wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename for understandability > > Thanks for the update! @chhagedorn @dean-long Thanks a lot for your reviews and suggestions ------------- PR Comment: https://git.openjdk.org/jdk/pull/29304#issuecomment-3783163302 From qamai at openjdk.org Thu Jan 22 08:35:26 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 08:35:26 GMT Subject: Integrated: 8375618: Incorrect assert in CastLLNode::Ideal In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 17:00:39 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the incorrect assert in `CastLLNode::Ideal`. The assert intends to verify that the output is either a proper subset of or the same as the input. It does so by checking the signed lower bound and signed upper bound of the `TypeLong` instances. This method is not correct now. > > Please kindly review, thanks a lot. This pull request has now been integrated. Changeset: 92236ead Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/92236ead1dea813cf456855f0aa6b73c16e9dc70 Stats: 90 lines in 4 files changed: 87 ins; 0 del; 3 mod 8375618: Incorrect assert in CastLLNode::Ideal Reviewed-by: chagedorn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/29304 From qamai at openjdk.org Thu Jan 22 08:58:28 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 08:58:28 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v3] In-Reply-To: References: Message-ID: <9-3KsZroBE_aYgrCI0-aEPqDZVPo_VusSQ6RQBpqbnk=.e4095135-169e-4389-b402-b433cf4f4e82@github.com> On Wed, 21 Jan 2026 08:56:19 GMT, Xiaohong Gong wrote: >> ### Problem: >> >> Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: >> >> >> // A fatal error has been detected by the Java Runtime Environment: >> // >> // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 >> // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector >> // ... >> >> >> The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 >> >> ### Root Cause: >> >> The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. >> >> Here is the simplified ideal graph showing the crash scenario: >> >> >> Con #top >> | ConI >> \ / >> \ / >> VectorStoreMask >> | >> VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong >> >> >> ### Detailed Scenario: >> >> Following is the method in the test case that hits the assertion: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 >> >> This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. >> >> When compiling a specific test case such as: >> https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 >> >> the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: >> >> >> VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() >> / \ >> AddP \ >> | \ >> LoadNClass \ >> ConP #IntMaxMask | | >> \ | | >> \ DecodeN... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Ensure it is vector type for vector unbox result LGTM ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/29057#pullrequestreview-3691309586 From xgong at openjdk.org Thu Jan 22 08:58:29 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 22 Jan 2026 08:58:29 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 08:56:19 GMT, Xiaohong Gong wrote: >> ### Problem: >> >> Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: >> >> >> // A fatal error has been detected by the Java Runtime Environment: >> // >> // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 >> // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector >> // ... >> >> >> The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 >> >> ### Root Cause: >> >> The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. >> >> Here is the simplified ideal graph showing the crash scenario: >> >> >> Con #top >> | ConI >> \ / >> \ / >> VectorStoreMask >> | >> VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong >> >> >> ### Detailed Scenario: >> >> Following is the method in the test case that hits the assertion: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 >> >> This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. >> >> When compiling a specific test case such as: >> https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 >> >> the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: >> >> >> VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() >> / \ >> AddP \ >> | \ >> LoadNClass \ >> ConP #IntMaxMask | | >> \ | | >> \ DecodeN... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Ensure it is vector type for vector unbox result Hi @iwanowww , @merykitty , do you have any further insights on this change? I?d really appreciate it if you could take another look. Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29057#issuecomment-3783257698 From dlong at openjdk.org Thu Jan 22 09:07:09 2026 From: dlong at openjdk.org (Dean Long) Date: Thu, 22 Jan 2026 09:07:09 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v6] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 15:32:07 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'master' into JDK-8373343 > - Update src/hotspot/share/opto/macroArrayCopy.cpp > > Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> > - more > - more > - review > - Merge branch 'master' into JDK-8373343 > - review > - review > - review > - merge > - ... and 5 more: https://git.openjdk.org/jdk/compare/1343aafa...2f618436 Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28769#pullrequestreview-3691365766 From epeter at openjdk.org Thu Jan 22 09:13:35 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jan 2026 09:13:35 GMT Subject: RFR: 8359335: Template-Framework Library: Primitive Types subtyping In-Reply-To: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> References: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> Message-ID: On Wed, 21 Jan 2026 15:56:36 GMT, Manuel H?ssig wrote: > This PR implements the subtype relation for primitive types in the Template Framework Library according to JLS ?4.10.1. Further, this PR extends the primitive type example to test the addition. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 Points to discuss: - Sampling in expressions: - subtype or exact type? Modulo throws no exception for float, but for subtype int it does. - sampling probabilities: would not be great if implicit conversions skew the probability of operations/types ------------- PR Comment: https://git.openjdk.org/jdk/pull/29349#issuecomment-3783328956 From xgong at openjdk.org Thu Jan 22 09:45:30 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 22 Jan 2026 09:45:30 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v11] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 08:31:30 GMT, Jatin Bhateja wrote: >> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. >> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. >> >> Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). >> >> Vector API jtreg tests pass at AVX level 2, remaining validation in progress. >> >> Performance numbers: >> >> >> System : 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms >> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms >> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms >> VectorSliceB... > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762 Overall, looks good to me; I?ve just left a few minor comments. src/hotspot/share/opto/vectorIntrinsics.cpp line 1712: > 1710: log_if_needed(" ** vector slice from non-constant index not supported"); > 1711: return false; > 1712: } Is it better floating this check up to an earlier line? Maybe followed line-1704 or merged into line-1689. src/hotspot/share/opto/vectornode.cpp line 2440: > 2438: > 2439: Node* VectorSliceNode::Identity(PhaseGVN* phase) { > 2440: if (origin()->is_Con()) { `origin` must be a constant now? src/hotspot/share/opto/vectornode.cpp line 2443: > 2441: jint index = origin()->get_int(); > 2442: uint vlen = vect_type()->length_in_bytes(); > 2443: if (vlen == (uint)index) { Suggestion: if (vlen == (uint) index) { src/hotspot/share/opto/vectornode.hpp line 1697: > 1695: class VectorSliceNode : public VectorNode { > 1696: public: > 1697: VectorSliceNode(Node* vec1, Node* vec2, Node* origin, const TypeVect* vt) Do we need an assertion for `origin` which is always a constant ? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java line 2173: > 2171: FloatVector slice(int origin, Vector v1); > 2172: > 2173: Revert this new added blank line? ------------- PR Review: https://git.openjdk.org/jdk/pull/24104#pullrequestreview-3691494636 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2716077178 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2716083174 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2716080787 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2716090510 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2716094647 From mdoerr at openjdk.org Thu Jan 22 10:04:23 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 22 Jan 2026 10:04:23 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 01:57:14 GMT, Vladimir Kozlov wrote: >> Thanks for the reviews! > > @dbriemann, this change invalidated assumption that ReservedCodeCacheSize can't change if specified on command line. You replaced `align_down` with `align_up` but did not check that `cache_size` may increase after that. > > It also causing issue with AOT because CodeCache size varies between different phases because we use different number of compiler threads and as result different NonNmethod section size. @vnkozlov: If `ReservedCodeCacheSize` is specified on the command line, but `NonProfiledCodeHeapSize` or `ProfiledCodeHeapSize` is not specified explicitly, we could subtract from one of them. What do you think about that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28658#issuecomment-3783537403 From epeter at openjdk.org Thu Jan 22 10:12:00 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jan 2026 10:12:00 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: <5YSsP2h52Kp9WWb4YrthjtYSziRKM5FM3s5NHThrPTg=.8be8a6fc-e841-4e7b-b9d5-e3910c9bc2e6@github.com> References: <17qmKHGiPNSNpHn2By-gtJjPpwFFDj98wiY-bQG16nY=.4df59ed3-8120-496b-9a1e-06b80c9149ae@github.com> <5YSsP2h52Kp9WWb4YrthjtYSziRKM5FM3s5NHThrPTg=.8be8a6fc-e841-4e7b-b9d5-e3910c9bc2e6@github.com> Message-ID: On Thu, 22 Jan 2026 01:54:16 GMT, Guanqiang Han wrote: >> @hgqxjj Hmm I see. I'm not a fan of using `@requires vm.debug`, because it means that your test is not run in product, and sometimes bugs only show up in product. >> >> It's also not great that we have to run a whole JVM startup with `-Xcomp`, compiling EVERYTHING, and blocking for every compilation. We should only use `-Xcomp` in combination with a fairly restricted `compileonly`. Otherwise, we just waste a lot of compute resources. >> >> @iwanowww What is your opinion on this? > > @eme64 Thanks for the feedback. > > On @requires vm.debug: I?d like to keep it for this reproducer. ProfileTraps is the key knob here: the failure requires ProfileTraps=false (create_if_missing = ProfileTraps, so get_method_data(..., false) may return NULL). Since ProfileTraps is a develop_pd flag and not settable on product builds, this reproducer has to run on a non-product VM (i.e., a debug VM). > > On -Xcomp: agreed. I?ll keep it but restrict it with -XX:CompileCommand=compileonly,... so we only compile the relevant method(s). > > If that sounds reasonable, I?ll proceed with just the compileonly tightening. Generally, it would also be nicer to extract a reproducer into a `test` method, and only compile that one. That way, the code shape leading to the crash is preserved. Would that be possible? Otherwise, we risk that someone changes the code shape (maybe in the core libs), and the test would not reproduce any more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2716206558 From roland at openjdk.org Thu Jan 22 10:40:52 2026 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jan 2026 10:40:52 GMT Subject: Integrated: 8373343: C2: verify AddP base input only set for heap addresses In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:31:55 GMT, Roland Westrelin wrote: > The base input of `AddP` is expected to only be set for heap accesses > but I noticed some inconsistencies so I added an assert in the `AddP` > constructor and fixed issues that it caught. AFAFICT, the > inconsistencies shouldn't create issues. This pull request has now been integrated. Changeset: 6e9256cb Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/6e9256cb613c9a3594546a45975a81def2efcf46 Stats: 107 lines in 16 files changed: 29 ins; 9 del; 69 mod 8373343: C2: verify AddP base input only set for heap addresses Reviewed-by: dlong, chagedorn, qamai ------------- PR: https://git.openjdk.org/jdk/pull/28769 From chagedorn at openjdk.org Thu Jan 22 11:45:49 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 22 Jan 2026 11:45:49 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked [v3] In-Reply-To: References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: <_b6ufdaOJMIyed1F-P4Yq_yq07RIdjEPuQ5KW0eGiVA=.f76121f7-7d99-4ec4-8043-276477af6f97@github.com> On Thu, 22 Jan 2026 08:28:41 GMT, Roland Westrelin wrote: >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` caches >> intermediate results in `_dom_lca_tags` when the late control is >> computed by `PhaseIdealLoop::get_late_ctrl()` for a node `n`: the code >> iterates over all uses of `n` potentially calling >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` multiple >> times. `_dom_lca_tags` is used to cache data that is specific to the >> lca computation for `n`. `_dom_lca_tags` is set to a tag that depends >> on `n` to mark the cached data as only valid during the lca >> computation for `n`. >> >> `PhaseIdealLoop::try_sink_out_of_loop()` checks that all uses of a >> node are out of loop with >> `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` which also needs to >> consider anti-dependences for `Load`s and also calls >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` through >> `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`: this computes the >> late control for a node and one particular out of loop >> use. `_dom_lca_tags` values computed by an earlier >> `PhaseIdealLoop::get_late_ctrl()` should be ignored (because it >> computes the late control for a node and all its uses). To address >> that issue, the tag that's used by >> `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` is made >> different on each call from >> `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` by incrementing >> `_dom_lca_tags_round`. >> >> The issue here is that one `Load` node is input to a `Phi` twice. So >> the `Phi` is considered twice as a use of the node along 2 different >> paths. `PhaseIdealLoop::get_late_ctrl_with_anti_dep()` is called twice >> from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` but >> `_dom_lca_tags_round` is not incremented between the 2 >> calls. `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` when >> called for the second `Phi` input uses incorrect cached data which, in >> turn, causes an incorrect computation. >> >> The fix I propose is to make sure `_dom_lca_tags_round` is incremented >> for every call to `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/TestSinkingLoadInputOfPhi.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29231#pullrequestreview-3692040918 From krk at openjdk.org Thu Jan 22 11:46:31 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 22 Jan 2026 11:46:31 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v9] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - address comments - fix rename - rename test file - Merge branch 'master' into fix-c2-segfault-unlocknode - fix test spacing - ... and 4 more: https://git.openjdk.org/jdk/compare/7ef55f65...d29208cf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/8713f16d..d29208cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=07-08 Stats: 66031 lines in 1205 files changed: 34686 ins; 11711 del; 19634 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From roland at openjdk.org Thu Jan 22 12:12:43 2026 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jan 2026 12:12:43 GMT Subject: RFR: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked [v3] In-Reply-To: <_b6ufdaOJMIyed1F-P4Yq_yq07RIdjEPuQ5KW0eGiVA=.f76121f7-7d99-4ec4-8043-276477af6f97@github.com> References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> <_b6ufdaOJMIyed1F-P4Yq_yq07RIdjEPuQ5KW0eGiVA=.f76121f7-7d99-4ec4-8043-276477af6f97@github.com> Message-ID: On Thu, 22 Jan 2026 11:42:56 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/TestSinkingLoadInputOfPhi.java >> >> Co-authored-by: Christian Hagedorn > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn thanks for re-approving. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29231#issuecomment-3784053749 From roland at openjdk.org Thu Jan 22 12:12:45 2026 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jan 2026 12:12:45 GMT Subject: Integrated: 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked In-Reply-To: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> References: <9TKe1AdZQVd7YPtBmDUtWg60WdWTfx-NMQr9QtE40T8=.ecd86331-b9a4-4ba2-8825-b30afc0ef767@github.com> Message-ID: On Wed, 14 Jan 2026 13:45:09 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` caches > intermediate results in `_dom_lca_tags` when the late control is > computed by `PhaseIdealLoop::get_late_ctrl()` for a node `n`: the code > iterates over all uses of `n` potentially calling > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` multiple > times. `_dom_lca_tags` is used to cache data that is specific to the > lca computation for `n`. `_dom_lca_tags` is set to a tag that depends > on `n` to mark the cached data as only valid during the lca > computation for `n`. > > `PhaseIdealLoop::try_sink_out_of_loop()` checks that all uses of a > node are out of loop with > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` which also needs to > consider anti-dependences for `Load`s and also calls > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` through > `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`: this computes the > late control for a node and one particular out of loop > use. `_dom_lca_tags` values computed by an earlier > `PhaseIdealLoop::get_late_ctrl()` should be ignored (because it > computes the late control for a node and all its uses). To address > that issue, the tag that's used by > `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` is made > different on each call from > `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` by incrementing > `_dom_lca_tags_round`. > > The issue here is that one `Load` node is input to a `Phi` twice. So > the `Phi` is considered twice as a use of the node along 2 different > paths. `PhaseIdealLoop::get_late_ctrl_with_anti_dep()` is called twice > from `PhaseIdealLoop::ctrl_of_all_uses_out_of_loop()` but > `_dom_lca_tags_round` is not incremented between the 2 > calls. `PhaseIdealLoop::dom_lca_for_get_late_ctrl_internal()` when > called for the second `Phi` input uses incorrect cached data which, in > turn, causes an incorrect computation. > > The fix I propose is to make sure `_dom_lca_tags_round` is incremented > for every call to `PhaseIdealLoop::get_late_ctrl_with_anti_dep()`. This pull request has now been integrated. Changeset: 0d1d4d07 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/0d1d4d07b9fa2368f471f30e176d446698500115 Stats: 73 lines in 2 files changed: 68 ins; 5 del; 0 mod 8374725: C2: assert(x_ctrl == get_late_ctrl_with_anti_dep(x->as_Load(), early_ctrl, x_ctrl)) failed: anti-dependences were already checked Reviewed-by: chagedorn, qamai, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/29231 From chagedorn at openjdk.org Thu Jan 22 12:22:13 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 22 Jan 2026 12:22:13 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v3] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 07:49:32 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. >> >> For example, given `r = CmpU(x, y)`. >> >> At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. >> >> At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Test Long::compareUnsigned Otherwise, the fix looks good and is nicely solved with unsigned types! I agree with @eme64 that we should have better coverage either with IR tests and/or gtests. Since gtests are out of question for JDK 26, I suggest to have some basic IR test coverage for your improvement using unsigned types. We can then still come back in JDK 27 with gtests. test/hotspot/jtreg/compiler/ccp/TestCmpUMonotonicity.java line 29: > 27: * @bug 8375653 > 28: * @summary Test that CmpUNode::sub conforms monotonicity > 29: * @run main/othervm -Xcomp -XX:CompileCommand=compileonly,${test.main.class}::test ${test.main.class} Was hard to spot but this will not compile anything since we match on `test` instead of `test*`: Suggestion: * @run main/othervm -Xcomp -XX:CompileCommand=compileonly,${test.main.class}::test* ${test.main.class} ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29308#pullrequestreview-3692119300 PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2716625343 From chagedorn at openjdk.org Thu Jan 22 12:22:15 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 22 Jan 2026 12:22:15 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v3] In-Reply-To: References: Message-ID: <-jJSCbOmmFkS6P3JKrEMn2298oTlTguiQWNPORIG8-8=.a758a8c3-0e9e-4e8f-b6a2-0ff920c5eb31@github.com> On Thu, 22 Jan 2026 12:04:24 GMT, Christian Hagedorn wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Test Long::compareUnsigned > > test/hotspot/jtreg/compiler/ccp/TestCmpUMonotonicity.java line 29: > >> 27: * @bug 8375653 >> 28: * @summary Test that CmpUNode::sub conforms monotonicity >> 29: * @run main/othervm -Xcomp -XX:CompileCommand=compileonly,${test.main.class}::test ${test.main.class} > > Was hard to spot but this will not compile anything since we match on `test` instead of `test*`: > > Suggestion: > > * @run main/othervm -Xcomp -XX:CompileCommand=compileonly,${test.main.class}::test* ${test.main.class} Can you also add a run without `Xcomp`? Is it required or would it also work with `Xbatch`? The reason I'm asking is because we use 20000 iterations below in the loop in `main()` (maybe we can also run with fewer iterations to trigger the issue). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2716641069 From qamai at openjdk.org Thu Jan 22 13:46:51 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 13:46:51 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v4] In-Reply-To: References: Message-ID: <_UG0YScCOLECIwsGoUWPdKvonLEAp1mwO7gEAJpH8wA=.56e38d55-f01d-4073-816c-6bba5ae5f2f6@github.com> > Hi, > > This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. > > For example, given `r = CmpU(x, y)`. > > At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. > > At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: - Missing patterns - Add IR tests, fix correctness tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29308/files - new: https://git.openjdk.org/jdk/pull/29308/files/195a8ee1..125f2d80 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=02-03 Stats: 155 lines in 2 files changed: 154 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29308/head:pull/29308 PR: https://git.openjdk.org/jdk/pull/29308 From qamai at openjdk.org Thu Jan 22 13:46:52 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 13:46:52 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v3] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 12:19:01 GMT, Christian Hagedorn wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Test Long::compareUnsigned > > Otherwise, the fix looks good and is nicely solved with unsigned types! > > I agree with @eme64 that we should have better coverage either with IR tests and/or gtests. Since gtests are out of question for JDK 26, I suggest to have some basic IR test coverage for your improvement using unsigned types. We can then still come back in JDK 27 with gtests. @chhagedorn @eme64 I have added an IR test to verify that the folding happens as expected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29308#issuecomment-3784479722 From qamai at openjdk.org Thu Jan 22 13:46:55 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 13:46:55 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v3] In-Reply-To: <-jJSCbOmmFkS6P3JKrEMn2298oTlTguiQWNPORIG8-8=.a758a8c3-0e9e-4e8f-b6a2-0ff920c5eb31@github.com> References: <-jJSCbOmmFkS6P3JKrEMn2298oTlTguiQWNPORIG8-8=.a758a8c3-0e9e-4e8f-b6a2-0ff920c5eb31@github.com> Message-ID: On Thu, 22 Jan 2026 12:09:24 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/compiler/ccp/TestCmpUMonotonicity.java line 29: >> >>> 27: * @bug 8375653 >>> 28: * @summary Test that CmpUNode::sub conforms monotonicity >>> 29: * @run main/othervm -Xcomp -XX:CompileCommand=compileonly,${test.main.class}::test ${test.main.class} >> >> Was hard to spot but this will not compile anything since we match on `test` instead of `test*`: >> >> Suggestion: >> >> * @run main/othervm -Xcomp -XX:CompileCommand=compileonly,${test.main.class}::test* ${test.main.class} > > Can you also add a run without `Xcomp`? Is it required or would it also work with `Xbatch`? The reason I'm asking is because we use 20000 iterations below in the loop in `main()` (maybe we can also run with fewer iterations to trigger the issue). `Xbatch` works fine, so I changed to `Xbatch`. Running without any flags can also trigger the failure. I keep both run configurations. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2716971527 From krk at openjdk.org Thu Jan 22 14:12:12 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 22 Jan 2026 14:12:12 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v4] In-Reply-To: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: <1E6C99XanWuE9QF_-SfyFzxgD4asf7WUO7RyR1pYn7M=.47f1b163-1b67-4f54-acd4-d061779ad95d@github.com> > The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into fix-c2-checkCastPP - Merge branch 'master' into fix-c2-checkCastPP - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed - Simplify expand_vbox_node_helper by merging VectorBox Phi handling - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29200/files - new: https://git.openjdk.org/jdk/pull/29200/files/6b3695cc..9670be04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29200&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29200&range=02-03 Stats: 12402 lines in 338 files changed: 7735 ins; 1772 del; 2895 mod Patch: https://git.openjdk.org/jdk/pull/29200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29200/head:pull/29200 PR: https://git.openjdk.org/jdk/pull/29200 From krk at openjdk.org Thu Jan 22 14:12:15 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 22 Jan 2026 14:12:15 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v3] In-Reply-To: <0_wYDA2lNvTyIDv7ist5heu-hs4J8pmEKT1mqRyiBBk=.438156e1-24fd-4352-8a61-9cf85efacb25@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> <0_wYDA2lNvTyIDv7ist5heu-hs4J8pmEKT1mqRyiBBk=.438156e1-24fd-4352-8a61-9cf85efacb25@github.com> Message-ID: On Tue, 20 Jan 2026 19:47:00 GMT, Vladimir Ivanov wrote: >> Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into fix-c2-checkCastPP >> - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed >> - Simplify expand_vbox_node_helper by merging VectorBox Phi handling >> - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded > > Test results (hs-tier1 - hs-tier4) are clean. Thanks for testing @iwanowww! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29200#issuecomment-3784610578 From krk at openjdk.org Thu Jan 22 14:12:18 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 22 Jan 2026 14:12:18 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v3] In-Reply-To: References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: On Mon, 19 Jan 2026 17:06:03 GMT, Quan Anh Mai wrote: >> Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into fix-c2-checkCastPP >> - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed >> - Simplify expand_vbox_node_helper by merging VectorBox Phi handling >> - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded > > src/hotspot/share/opto/vector.cpp line 331: > >> 329: // Handle the case when the allocation input to VectorBoxNode is a Phi. >> 330: // This is generated after the transformation in PhiNode::merge_through_phi: >> 331: // Phi (VectorBox1 VectorBox2) => VectorBox (Phi1 Phi2) > > Should this be something like: > > Phi(VectorBox(vbox1, vect1), VectorBox(vbox2, vect2)) -> VectorBox(Phi(vbox1, vbox2), Phi(vect1, vect2)) > > I think it is a bit clearer, but it is fine either way. Thanks, leaving as is for now. > test/hotspot/jtreg/compiler/vectorapi/VectorBoxExpandPhi.java line 1: > >> 1: /* > > Can these 2 tests be merged into 1? They are isolated tests, I would prefer to keep them separate unless there is a strong reason to merge them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29200#discussion_r2717070137 PR Review Comment: https://git.openjdk.org/jdk/pull/29200#discussion_r2717072541 From epeter at openjdk.org Thu Jan 22 14:32:19 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jan 2026 14:32:19 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v4] In-Reply-To: <_UG0YScCOLECIwsGoUWPdKvonLEAp1mwO7gEAJpH8wA=.56e38d55-f01d-4073-816c-6bba5ae5f2f6@github.com> References: <_UG0YScCOLECIwsGoUWPdKvonLEAp1mwO7gEAJpH8wA=.56e38d55-f01d-4073-816c-6bba5ae5f2f6@github.com> Message-ID: On Thu, 22 Jan 2026 13:46:51 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. >> >> For example, given `r = CmpU(x, y)`. >> >> At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. >> >> At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - Missing patterns > - Add IR tests, fix correctness tests Nice, thanks for adding some IR tests! test/hotspot/jtreg/compiler/c2/gvn/CmpUNodeValueTests.java line 59: > 57: Random r = Utils.getRandomInstance(); > 58: long x = r.nextLong(); > 59: long y = r.nextLong(); If you use `Generators.java` you may be more likely to hit interesting edge cases (zero, max_uint, etc). test/hotspot/jtreg/compiler/c2/gvn/CmpUNodeValueTests.java line 86: > 84: > 85: @Test > 86: @IR(applyIfPlatformOr = {"x64", "true", "aarch64", "true"}, failOn = {IRNode.CMP_U, IRNode.CALL}) Just as a "control test", can you add a comparison that does not fold away, and so we should find the nodes? Otherwise, the risk is that we do `failOn` for the wrong nodes and don't notice. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29308#pullrequestreview-3692707141 PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2717124928 PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2717134137 From epeter at openjdk.org Thu Jan 22 14:32:21 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jan 2026 14:32:21 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v4] In-Reply-To: References: <_UG0YScCOLECIwsGoUWPdKvonLEAp1mwO7gEAJpH8wA=.56e38d55-f01d-4073-816c-6bba5ae5f2f6@github.com> Message-ID: On Thu, 22 Jan 2026 14:26:18 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - Missing patterns >> - Add IR tests, fix correctness tests > > test/hotspot/jtreg/compiler/c2/gvn/CmpUNodeValueTests.java line 86: > >> 84: >> 85: @Test >> 86: @IR(applyIfPlatformOr = {"x64", "true", "aarch64", "true"}, failOn = {IRNode.CMP_U, IRNode.CALL}) > > Just as a "control test", can you add a comparison that does not fold away, and so we should find the nodes? > Otherwise, the risk is that we do `failOn` for the wrong nodes and don't notice. Below, it may be more helpful to annotate the tests with what you expect the test folds to: `LT`, `GE`, `NE`, ... It would also make it easier to quickly spot if we have covered all cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2717141764 From dbriemann at openjdk.org Thu Jan 22 14:38:30 2026 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 22 Jan 2026 14:38:30 GMT Subject: RFR: 8376113: PPC64: Implement special MachNodes for floating point Min / Max Message-ID: Add mach nodes MinF, MaxF, MinD, MaxD for PPC. ------------- Commit messages: - 8376113: PPC64: Implement special MachNodes for floating point Min / Max Changes: https://git.openjdk.org/jdk/pull/29361/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29361&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376113 Stats: 57 lines in 3 files changed: 57 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29361.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29361/head:pull/29361 PR: https://git.openjdk.org/jdk/pull/29361 From mdoerr at openjdk.org Thu Jan 22 14:41:37 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 22 Jan 2026 14:41:37 GMT Subject: RFR: 8376113: PPC64: Implement special MachNodes for floating point Min / Max In-Reply-To: References: Message-ID: <98ZR76NJV5IlZ5ndDUHKACMOqugOmxaZFKHEDTH37IA=.ccfe6100-f788-4e01-bb95-1204fc5871b7@github.com> On Thu, 22 Jan 2026 14:27:12 GMT, David Briemann wrote: > Add mach nodes MinF, MaxF, MinD, MaxD for PPC. Please also test it on Power8 (or with -XX:PowerArchitecturePPC64=8). I think we need to add a check to `Matcher::match_rule_supported` like in x86.ad. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29361#issuecomment-3784734862 From qamai at openjdk.org Thu Jan 22 15:01:41 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 15:01:41 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v4] In-Reply-To: References: <_UG0YScCOLECIwsGoUWPdKvonLEAp1mwO7gEAJpH8wA=.56e38d55-f01d-4073-816c-6bba5ae5f2f6@github.com> Message-ID: On Thu, 22 Jan 2026 14:27:56 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/gvn/CmpUNodeValueTests.java line 86: >> >>> 84: >>> 85: @Test >>> 86: @IR(applyIfPlatformOr = {"x64", "true", "aarch64", "true"}, failOn = {IRNode.CMP_U, IRNode.CALL}) >> >> Just as a "control test", can you add a comparison that does not fold away, and so we should find the nodes? >> Otherwise, the risk is that we do `failOn` for the wrong nodes and don't notice. > > Below, it may be more helpful to annotate the tests with what you expect the test folds to: `LT`, `GE`, `NE`, ... It would also make it easier to quickly spot if we have covered all cases. Done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2717274021 From qamai at openjdk.org Thu Jan 22 15:01:36 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 15:01:36 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v5] In-Reply-To: References: Message-ID: > Hi, > > This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. > > For example, given `r = CmpU(x, y)`. > > At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. > > At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Improve IR tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29308/files - new: https://git.openjdk.org/jdk/pull/29308/files/125f2d80..625bed1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=03-04 Stats: 115 lines in 1 file changed: 60 ins; 3 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/29308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29308/head:pull/29308 PR: https://git.openjdk.org/jdk/pull/29308 From qamai at openjdk.org Thu Jan 22 15:01:40 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 15:01:40 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v4] In-Reply-To: References: <_UG0YScCOLECIwsGoUWPdKvonLEAp1mwO7gEAJpH8wA=.56e38d55-f01d-4073-816c-6bba5ae5f2f6@github.com> Message-ID: On Thu, 22 Jan 2026 14:24:06 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - Missing patterns >> - Add IR tests, fix correctness tests > > test/hotspot/jtreg/compiler/c2/gvn/CmpUNodeValueTests.java line 59: > >> 57: Random r = Utils.getRandomInstance(); >> 58: long x = r.nextLong(); >> 59: long y = r.nextLong(); > > If you use `Generators.java` you may be more likely to hit interesting edge cases (zero, max_uint, etc). Done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2717273444 From dbriemann at openjdk.org Thu Jan 22 15:01:06 2026 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 22 Jan 2026 15:01:06 GMT Subject: RFR: 8376113: PPC64: Implement special MachNodes for floating point Min / Max [v2] In-Reply-To: References: Message-ID: <_DHsk8iBTmk0lS6aqGi5pBFhQDt0S6KNCjpeuVxp81U=.16b63b88-cf47-4904-a273-8b16aae0e538@github.com> > Add mach nodes MinF, MaxF, MinD, MaxD for PPC. David Briemann has updated the pull request incrementally with one additional commit since the last revision: add match_rule_supported check for MinF, MaxF, MinD, MaxD and PPC >= 9 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29361/files - new: https://git.openjdk.org/jdk/pull/29361/files/495c3d4c..37d41b89 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29361&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29361&range=00-01 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29361.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29361/head:pull/29361 PR: https://git.openjdk.org/jdk/pull/29361 From dbriemann at openjdk.org Thu Jan 22 15:06:58 2026 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 22 Jan 2026 15:06:58 GMT Subject: RFR: 8376113: PPC64: Implement special MachNodes for floating point Min / Max In-Reply-To: <98ZR76NJV5IlZ5ndDUHKACMOqugOmxaZFKHEDTH37IA=.ccfe6100-f788-4e01-bb95-1204fc5871b7@github.com> References: <98ZR76NJV5IlZ5ndDUHKACMOqugOmxaZFKHEDTH37IA=.ccfe6100-f788-4e01-bb95-1204fc5871b7@github.com> Message-ID: On Thu, 22 Jan 2026 14:39:27 GMT, Martin Doerr wrote: > Please also test it on Power8 (or with -XX:PowerArchitecturePPC64=8). I think we need to add a check to `Matcher::match_rule_supported` like in x86.ad. Thanks for catching this. Fixed it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29361#issuecomment-3784896993 From mdoerr at openjdk.org Thu Jan 22 15:22:30 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 22 Jan 2026 15:22:30 GMT Subject: RFR: 8376113: PPC64: Implement special MachNodes for floating point Min / Max [v2] In-Reply-To: <_DHsk8iBTmk0lS6aqGi5pBFhQDt0S6KNCjpeuVxp81U=.16b63b88-cf47-4904-a273-8b16aae0e538@github.com> References: <_DHsk8iBTmk0lS6aqGi5pBFhQDt0S6KNCjpeuVxp81U=.16b63b88-cf47-4904-a273-8b16aae0e538@github.com> Message-ID: On Thu, 22 Jan 2026 15:01:06 GMT, David Briemann wrote: >> Add mach nodes MinF, MaxF, MinD, MaxD for PPC. > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > add match_rule_supported check for MinF, MaxF, MinD, MaxD and PPC >= 9 LGTM. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29361#pullrequestreview-3693021345 From roland at openjdk.org Thu Jan 22 15:26:42 2026 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jan 2026 15:26:42 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test [v2] In-Reply-To: References: Message-ID: On Fri, 16 Jan 2026 16:04:49 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the handling of `depends_only_on_test` when the control graph is transformed. It has to do with the theoretical idea of `depends_only_on_test`, copy from the JBS issue description: >> >> To start with, what is `depends_only_on_test`? Given a node `n` with the control input `c`, if `c` can be deduced from `c'` and `n->depends_only_on_test() == true`, then we can rewire the control input of `n` to `c'`. This means that `depends_only_on_test` does not mean that the node depends on a test, it means that the node depends on the test that is its control input. >> >> For example: >> >> if (y != 0) { >> if (x > 0) { >> if (y != 0) { >> x / y; >> } >> } >> } >> >> Then `x/y` `depends_only_on_test` because its control input is the test `y != 0`. Then, we can rewire the control input of the division to the outer `y != 0`, resulting in: >> >> if (y != 0) { >> x / y; >> if (x > 0) { >> } >> } >> >> On the other hand, consider this case: >> >> if (x > 0) { >> if (y != 0) { >> if (x > 0) { >> x / y; >> } >> } >> } >> >> Then `x/y` does not `depends_only_on_test` because its control input is the test `x > 0` which is unrelated, we can see that if we rewire the division to the outer `x > 0` test, the division floats above the actual test `y != 0`. This means that `depends_only_on_test` is a dynamic property of a node, and not a static property of the division operation. It can change when we transform the graph and it can be different for different nodes of the same kind. >> >> More details can be found in the description of `Node::depends_only_on_test` and `Node::pin_node_under_control` in this change. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - more clarification > - Refine comments As I understand, this change removes logic that's overly conservative but doesn't address any correctness issue (i.e. there's no crash or incorrect execution that this fixes). Given the new logic is less conservative, there should be cases where the code optimizes better with this change. Would it make sense to add IR test cases to catch regressions? src/hotspot/share/opto/cfgnode.hpp line 463: > 461: static Node* up_one_dom(Node* curr, bool linear_only = false); > 462: bool is_zero_trip_guard() const; > 463: Node* dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool prev_dom_not_imply_this); I find `prev_dom_not_imply_this` confusing. I'm actually not sure I understand why you named it that way. src/hotspot/share/opto/memnode.cpp line 1030: > 1028: LoadNode* LoadNode::pin_array_access_node() const { > 1029: const TypePtr* adr_type = this->adr_type(); > 1030: if (adr_type != nullptr && adr_type->isa_aryptr()) { As I understand you got rid of the check for an array access. Are there non array loads that we would like pinned? Are there loads that are control dependent on a condition and are non array loads that now gets pinned when they don't need to? src/hotspot/share/opto/split_if.cpp line 719: > 717: > 718: _igvn.remove_dead_node(region); > 719: if (iff->Opcode() == Op_RangeCheck) { Do we want really want to pin array accesses when a condition other that a `RangeCheck` is eliminated? It seems overly conservative to me. The risk is we would pin nodes that don't need to. ------------- PR Review: https://git.openjdk.org/jdk/pull/29158#pullrequestreview-3692952131 PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2717306115 PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2717354428 PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2717373046 From qamai at openjdk.org Thu Jan 22 15:48:06 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 15:48:06 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test [v2] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 15:04:52 GMT, Roland Westrelin wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - more clarification >> - Refine comments > > src/hotspot/share/opto/cfgnode.hpp line 463: > >> 461: static Node* up_one_dom(Node* curr, bool linear_only = false); >> 462: bool is_zero_trip_guard() const; >> 463: Node* dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool prev_dom_not_imply_this); > > I find `prev_dom_not_imply_this` confusing. I'm actually not sure I understand why you named it that way. This method tries to remove `this` and rewires all nodes that `depends_only_on_test` to `prev_dom`. `prev_dom_not_imply_this` means that whether the test at `this` can be implied from the test at `prev_dom` only. For example: if (x > 0) { if (y != 0) { if (x != -1) { r = y / (x + 1); } } } `x != -1` can be implied solely from `x > 0`. As a result, when moving the division to the `IfProj` of the test `x > 0`, the division still `depends_only_on_test`. On the other hand, if (x != y) { if (y == -1) { if (x != -1) { r = y / (x + 1); } } } We can remove the test `x != -1`, since it can be implied from `x != y` and `y == -1`. But now `x != -1` cannot be implied solely from `y == -1`. As a result, the division must cease to `depends_only_on_test`. This is the meaning of the parameter name `prev_dom_not_imply_this`. > src/hotspot/share/opto/memnode.cpp line 1030: > >> 1028: LoadNode* LoadNode::pin_array_access_node() const { >> 1029: const TypePtr* adr_type = this->adr_type(); >> 1030: if (adr_type != nullptr && adr_type->isa_aryptr()) { > > As I understand you got rid of the check for an array access. Are there non array loads that we would like pinned? Are there loads that are control dependent on a condition and are non array loads that now gets pinned when they don't need to? Any node that `depends_only_on_test` must be pinned because it does not `depends_only_on_test` anymore. If we somehow hoist that node afterwards because we find an equivalent test, then it is an incorrect optimization. This is the cause of the previous issues with `DivNode`s, they fail to be pinned when they cannot `depends_only_on_test` anymore. > src/hotspot/share/opto/split_if.cpp line 719: > >> 717: >> 718: _igvn.remove_dead_node(region); >> 719: if (iff->Opcode() == Op_RangeCheck) { > > Do we want really want to pin array accesses when a condition other that a `RangeCheck` is eliminated? It seems overly conservative to me. The risk is we would pin nodes that don't need to. Similar to above, all nodes that `depends_only_on_test` must be pinned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2717464707 PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2717471714 PR Review Comment: https://git.openjdk.org/jdk/pull/29158#discussion_r2717473056 From qamai at openjdk.org Thu Jan 22 15:50:24 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 22 Jan 2026 15:50:24 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test [v2] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 15:23:55 GMT, Roland Westrelin wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - more clarification >> - Refine comments > > As I understand, this change removes logic that's overly conservative but doesn't address any correctness issue (i.e. there's no crash or incorrect execution that this fixes). Given the new logic is less conservative, there should be cases where the code optimizes better with this change. Would it make sense to add IR test cases to catch regressions? @rwestrel Thanks for taking a look, this PR makes it strictly more conservative by pinning more nodes when an `IfNode` is elided. However, this conservativeness is necessary, any optimization that can arise from a node not pinned is incorrect, such as [JDK-8331717](https://bugs.openjdk.org/browse/JDK-8331717) or [JDK-8257822](https://bugs.openjdk.org/browse/JDK-8257822). ------------- PR Comment: https://git.openjdk.org/jdk/pull/29158#issuecomment-3785135042 From kxu at openjdk.org Thu Jan 22 16:08:15 2026 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 22 Jan 2026 16:08:15 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v29] In-Reply-To: References: Message-ID: <07tFkCFGwOkP8trgxsIrhw1uj3B1579R2zx2e-COvB0=.6b9d2f95-d8fc-4177-b4c7-e8f1038e22ba@github.com> On Thu, 22 Jan 2026 08:02:20 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix safepoint detection > > With your latest safepoint fix, the closed test is no longer failing! So, that seems to have fixed the issue. I'm currently running the DIFF patch with your fix on top up to tier7. Looking good so far. I'm also running some testing again for this patch only up to tier7. Will report back once it's complete (probably takes a while). @chhagedorn Thank you so much! For what it's worth I also updated the old-vs-new branch with the safepoint fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3785229653 From fgao at openjdk.org Thu Jan 22 16:30:28 2026 From: fgao at openjdk.org (Fei Gao) Date: Thu, 22 Jan 2026 16:30:28 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Wed, 21 Jan 2026 13:23:56 GMT, Emanuel Peter wrote: > I'd have to do some more digging to confirm what you said: that this is because of profiling, i.e. that we don't actually unroll the loop enough and don't insert the drain loop, right? Thanks for your testing. Yes, that's what I meant. > It's a bummer because I had initially hoped that this PR would address (at least a part of) the performance regression that vectorization can cause, see #27315 You can see that for very small iteration counts, it is faster to disable the auto vectorizer. There were some regressions filed, like this one: https://bugs.openjdk.org/browse/JDK-8368245 Did you obtain the scalar vs. vector performance results by overriding `-XX:AutoVectorizationOverrideProfitability=0/2`, or by comparing runs without and with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751)? For these benchmarks with small iteration counts, what are the main differences between the generated scalar and vectorized code? For example, when `NUM_ACCESS_ELEMENTS` is `15`, what code does C2 generate for `copy_byte_loop()`? I?m asking because I?m a bit unclear about the vectorization behavior here. As mentioned earlier, AFAIK, fixed small-trip-count loops are typically not auto-vectorized due to profiling. Is vectorization happening in this case because the benchmark uses nested loops? In particular, does the inner loop become vectorized after sufficient unrolling driven by the outer loop? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3785341550 From roland at openjdk.org Thu Jan 22 16:33:21 2026 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jan 2026 16:33:21 GMT Subject: RFR: 8350330: C2: PhaseIdealLoop::add_parse_predicate() should mirror GraphKit::add_parse_predicate() Message-ID: `PhaseIdealLoop::add_parse_predicate()` was intented to mirror `GraphKit::add_parse_predicate()` but it doesn't. That last one checks `too_many_traps` per bci but the `PhaseIdealLoop` version doesn't. As demonstrated by the test case, a method can get compiled with a predicate, take a trap, and get recompiled with the same predicate many times (up to ~100). ------------- Commit messages: - test & fix Changes: https://git.openjdk.org/jdk/pull/29367/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29367&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350330 Stats: 123 lines in 3 files changed: 100 ins; 20 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29367/head:pull/29367 PR: https://git.openjdk.org/jdk/pull/29367 From fgao at openjdk.org Thu Jan 22 16:33:12 2026 From: fgao at openjdk.org (Fei Gao) Date: Thu, 22 Jan 2026 16:33:12 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: <0CusqXskKxfU0Cqxr0s1Mrnuu_L-bAtz-I7ehpKyERA=.fb47c428-a70a-410a-a2b1-8998e313988c@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <8huN5sDf2y95Hq2iuaMXN7aLeSik_gUnHcSpcc82Exw=.38fd6510-3fcd-4a28-a1c3-29eb18f51724@github.com> <0CusqXskKxfU0Cqxr0s1Mrnuu_L-bAtz-I7ehpKyERA=.fb47c428-a70a-410a-a2b1-8998e313988c@github.com> Message-ID: On Wed, 21 Jan 2026 10:50:44 GMT, Emanuel Peter wrote: >>> One more question here: could it be that one node that you now conservatively pin further down actually already has a use in a predicate further up, and now we'd create a `bad graph` cycle? >> >> If a node has a `use` that is attached to a predicate further up, then that `use` would also be pinned down to the loop `entry control`. Since we also fix the control of the `use`, which is itself a cloned node, I would expect that we wouldn?t end up creating a bad control-flow cycle. Does that make sense? > > But what if such a predicate uses the node as an input? Then the node is pinned below its use. I haven?t observed such cases so far, and I?m not sure whether this scenario can actually occur. That said, I understand your concern. I believe it should be possible to address this by offloading part of the logic into an extended version of `initialize_assertion_predicates_for_post_loop()`. This would allow us to identify the precise corresponding taken node for each predicate instance, rather than sinking all predicates together. I?ll try this approach, and if it works, I?ll include the fix in the next commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2717234335 From fgao at openjdk.org Thu Jan 22 16:33:15 2026 From: fgao at openjdk.org (Fei Gao) Date: Thu, 22 Jan 2026 16:33:15 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <5o-TOPc_uS09RV2X544hVOK-06mYnbCX6M9ng80UMRM=.67450359-509e-47a3-85ba-b973551b160d@github.com> Message-ID: On Wed, 21 Jan 2026 10:37:24 GMT, Emanuel Peter wrote: >> test/micro/org/openjdk/bench/vm/compiler/VectorThroughputForIterationCount.java line 136: >> >>> 134: // effects of this patch unobservable. >>> 135: @Param({"true", "false"}) >>> 136: public static boolean ENABLE_LARGE_LOOP_WARMUP; >> >> It would be nice to have some more comments here: >> - for which benchmarks would the effect of "this patch" not be observable? Also: referring to "this patch" will require a future reader to trace things back in the "git blame" history, that's a bit unfortunate. >> - Generally, it would now be nice to have a summary of which types of benchmarks show what kind of results, and why do we have all the variants. > > I'm asking for more comments because I fear the benchmark is becoming harder to use, with all the extra options and benchmark variants. Really great suggestions. I'll refine the comments as like: // When enabled, run an additional warm-up phase using a large loop iteration // count to encourage C2 to generate vectorized and unrolled loop bodies. // // Rationale: // Some benchmarks in this suite use small, fixed trip-count loops. During // early profiling, C2 may treat such loops as trivial, avoid vectorization, // or optimize them away entirely. In those cases, changes that affect loop // vectorization behavior, such as the improvement introduced by JDK-8307084, // may not be observable in the generated code. // // As a result, this benchmark suite contains two main classes of // microbenchmarks: // 1) bench_xx_computeBound / bench_xx_memoryBound // These measure the performance of C2-generated code for the given // workload without relying on a special warm-up phase. // 2) bench03xx_staticTripCount / bench03xx_dynamicTripCount // These benchmarks are sensitive to early profiling. Enabling a // large-loop warm-up forces the optimizer to observe the loop at scale, // making vectorized code generation more likely and allowing such // effects to be measured. // // Usage guidance: // - Enable for microbenchmarks that rely on observing vectorization or // unrolling effects, especially when loop trip counts are small or // constant (e.g., bench03xx_staticTripCount and bench03xx_dynamicTripCount, // introduced by JDK-8307084). // - Disable for general regression testing and for other microbenchmarks. @Param({"true", "false"}) public static boolean ENABLE_LARGE_LOOP_WARMUP; WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2717360633 From fgao at openjdk.org Thu Jan 22 16:33:22 2026 From: fgao at openjdk.org (Fei Gao) Date: Thu, 22 Jan 2026 16:33:22 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: <5o-TOPc_uS09RV2X544hVOK-06mYnbCX6M9ng80UMRM=.67450359-509e-47a3-85ba-b973551b160d@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <5o-TOPc_uS09RV2X544hVOK-06mYnbCX6M9ng80UMRM=.67450359-509e-47a3-85ba-b973551b160d@github.com> Message-ID: On Wed, 21 Jan 2026 10:35:59 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Fix build failure after rebasing and address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Fixed new test failures after rebasing and refined parts of the code to address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Merge branch 'master' into optimize-atomic-post >> - Clean up comments for consistency and add spacing for readability >> - Fix some corner case failures and refined part of code >> - Merge branch 'master' into optimize-atomic-post >> - Refine ascii art, rename some variables and resolve conflicts >> - Merge branch 'master' into optimize-atomic-post >> - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 > > test/micro/org/openjdk/bench/vm/compiler/VectorThroughputForIterationCount.java line 282: > >> 280: byteadd(aB, bB, rB, START_IDX, offsets[r]+ITERATION_COUNT); >> 281: } >> 282: } > > Why do you name them `drain`? I feel the name is a bit too specific to "this patch". Do you have a better name? > Maybe a name that separates them from `bench011B_aligned_memoryBound`? As discussed in the refined comments above, I?ll rename them to `bench03xx_staticTripCount `and `bench03xx_dynamicTripCount`. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2717371113 From kvn at openjdk.org Thu Jan 22 17:08:05 2026 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 Jan 2026 17:08:05 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: <-1hi-QnXIfgxANEAYgSPw_rfkKsQUN62iy5yWwx-k3k=.cf98ff9c-0e3f-497e-af22-ba40d6fcccbf@github.com> On Thu, 22 Jan 2026 10:01:17 GMT, Martin Doerr wrote: >> @dbriemann, this change invalidated assumption that ReservedCodeCacheSize can't change if specified on command line. You replaced `align_down` with `align_up` but did not check that `cache_size` may increase after that. >> >> It also causing issue with AOT because CodeCache size varies between different phases because we use different number of compiler threads and as result different NonNmethod section size. > > @vnkozlov: If `ReservedCodeCacheSize` is specified on the command line, but `NonProfiledCodeHeapSize` or `ProfiledCodeHeapSize` is not specified explicitly, we could subtract from one of them. What do you think about that? @TheRealMDoerr current code at https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/codeCache.cpp#L228-L262 is doing exactly that. But new code from these changes at lines L305-L308 invalidates that by aligned up sizes and recalculates new code cache size. Previous code aligned down sizes but did not recalculate code cache size and, as result, sum of sections sizes could be smaller then it. Which is fine except this bug. I am not against aligning up but we need to do it before code cache size checks and be smarter how we align it. We can adjust default ReservedCodeCacheSize so that aligning up will not change it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28658#issuecomment-3785549463 From aph at openjdk.org Thu Jan 22 17:31:36 2026 From: aph at openjdk.org (Andrew Haley) Date: Thu, 22 Jan 2026 17:31:36 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: <0Sb85mfgAvhtvGsNtdZvcwvaaOYvfDyTS-2TmnRW95Q=.6a523d10-6f54-4af0-b7f9-bef49e995fa3@github.com> On Wed, 21 Jan 2026 10:15:24 GMT, Eric Fang wrote: >> This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. >> >> Changes: >> -------- >> >> 1. C2 mid-end: >> - Added UMinReductionVNode and UMaxReductionVNode >> >> 2. AArch64 Backend: >> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions >> - Updated match rules for all vector sizes and element types >> - Both NEON and SVE implementation are supported >> >> 3. Test: >> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java >> - Added assembly tests in aarch64-asmtest.py for new instructions >> - Added a JTReg test file VectorUMinMaxReductionTest.java >> >> Different configurations were tested on aarch64 and x86 machines, and all tests passed. >> >> Test results of JMH benchmarks from the panama-vector project: >> -------- >> >> On a Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 >> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 >> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 >> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 >> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 >> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 >> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 >> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 >> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 >> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 >> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 >> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 >> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 >> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 >> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 >> Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 >> Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 >> ... > > Eric Fang has updated the pull request incrementally with one additional commit since the last revision: > > Extract some helper functions for better readability src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.hpp line 50: > 48: void sve_maxv(bool is_unsigned, FloatRegister dst, SIMD_RegVariant size, > 49: PRegister pg, FloatRegister src); > 50: Using separate definitions here is adding unnecessary complexity. I'd do something like this in the header file: // Typedefs used to disambiguate overloaded member functions. typedef void (Assembler::*neon_reduction2) (FloatRegister, Assembler::SIMD_Arrangement, FloatRegister); typedef void (Assembler::*sve_reduction3) (FloatRegister, Assembler::SIMD_RegVariant, PRegister, FloatRegister); // Helper functions for min/max reduction operations void neon_minp(bool is_unsigned, FloatRegister dst, SIMD_Arrangement size, FloatRegister src1, FloatRegister src2) { auto m = is_unsigned ? &Assembler::uminp : &Assembler::sminp; (this->*m)(dst, size, src1, src2); } void neon_maxp(bool is_unsigned, FloatRegister dst, SIMD_Arrangement size, FloatRegister src1, FloatRegister src2) { auto m = is_unsigned ? &Assembler::umaxp : &Assembler::smaxp; (this->*m)(dst, size, src1, src2); } void neon_minv(bool is_unsigned, FloatRegister dst, SIMD_Arrangement size, FloatRegister src) { auto m = is_unsigned ? (neon_reduction2)&Assembler::uminv : &Assembler::sminv; (this->*m)(dst, size, src); } void neon_maxv(bool is_unsigned, FloatRegister dst, SIMD_Arrangement size, FloatRegister src) { auto m = is_unsigned ? (neon_reduction2)&Assembler::umaxv : &Assembler::smaxv; (this->*m)(dst, size, src); } void sve_minv(bool is_unsigned, FloatRegister dst, SIMD_RegVariant size, PRegister pg, FloatRegister src) { auto m = is_unsigned ? (sve_reduction3)&Assembler::sve_uminv : &Assembler::sve_sminv; (this->*m)(dst, size, pg, src); } void sve_maxv(bool is_unsigned, FloatRegister dst, SIMD_RegVariant size, PRegister pg, FloatRegister src) { auto m = is_unsigned ? (sve_reduction3)&Assembler::sve_umaxv : &Assembler::sve_smaxv; (this->*m)(dst, size, pg, src); } To some extent it's a matter of taste, but please try not to use much more repetitive and boilerplate code than you need to. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28693#discussion_r2717902945 From mdoerr at openjdk.org Thu Jan 22 17:33:06 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 22 Jan 2026 17:33:06 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 10:01:17 GMT, Martin Doerr wrote: >> @dbriemann, this change invalidated assumption that ReservedCodeCacheSize can't change if specified on command line. You replaced `align_down` with `align_up` but did not check that `cache_size` may increase after that. >> >> It also causing issue with AOT because CodeCache size varies between different phases because we use different number of compiler threads and as result different NonNmethod section size. > > @vnkozlov: If `ReservedCodeCacheSize` is specified on the command line, but `NonProfiledCodeHeapSize` or `ProfiledCodeHeapSize` is not specified explicitly, we could subtract from one of them. What do you think about that? > @TheRealMDoerr current code at https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/codeCache.cpp#L228-L262 is doing exactly that. But new code from these changes at lines L305-L308 invalidates that by aligned up sizes and recalculates new code cache size. Previous code aligned down sizes but did not recalculate code cache size and, as result, sum of sections sizes could be smaller then it. Which is fine except this bug. I am not against aligning up but we need to do it before code cache size checks and be smarter how we align it. We can adjust default ReservedCodeCacheSize so that aligning up will not change it. I mean subtracting it after aligning. Something like diff --git a/src/hotspot/share/code/codeCache.cpp b/src/hotspot/share/code/codeCache.cpp index 95a2fb908de..e03efdac3ae 100644 --- a/src/hotspot/share/code/codeCache.cpp +++ b/src/hotspot/share/code/codeCache.cpp @@ -305,7 +305,13 @@ void CodeCache::initialize_heaps() { non_nmethod.size = align_up(non_nmethod.size, min_size); profiled.size = align_up(profiled.size, min_size); non_profiled.size = align_up(non_profiled.size, min_size); - cache_size = non_nmethod.size + profiled.size + non_profiled.size; + if (FLAG_IS_CMDLINE(ReservedCodeCacheSize) && !FLAG_IS_CMDLINE(NonProfiledCodeHeapSize)) { + non_profiled.size = cache_size - non_nmethod.size - profiled.size; + } else if (FLAG_IS_CMDLINE(ReservedCodeCacheSize) && !FLAG_IS_CMDLINE(ProfiledCodeHeapSize)) { + profiled.size = cache_size - non_nmethod.size - non_profiled.size; + } else { + cache_size = non_nmethod.size + profiled.size + non_profiled.size; + } FLAG_SET_ERGO(NonNMethodCodeHeapSize, non_nmethod.size); FLAG_SET_ERGO(ProfiledCodeHeapSize, profiled.size); I think align_down is not ideal when large pages are used and the sizes are tuned by the user. We should have them at least as large as specified. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28658#issuecomment-3785704328 From aph at openjdk.org Thu Jan 22 17:37:48 2026 From: aph at openjdk.org (Andrew Haley) Date: Thu, 22 Jan 2026 17:37:48 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 10:15:24 GMT, Eric Fang wrote: >> This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. >> >> Changes: >> -------- >> >> 1. C2 mid-end: >> - Added UMinReductionVNode and UMaxReductionVNode >> >> 2. AArch64 Backend: >> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions >> - Updated match rules for all vector sizes and element types >> - Both NEON and SVE implementation are supported >> >> 3. Test: >> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java >> - Added assembly tests in aarch64-asmtest.py for new instructions >> - Added a JTReg test file VectorUMinMaxReductionTest.java >> >> Different configurations were tested on aarch64 and x86 machines, and all tests passed. >> >> Test results of JMH benchmarks from the panama-vector project: >> -------- >> >> On a Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 >> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 >> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 >> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 >> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 >> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 >> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 >> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 >> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 >> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 >> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 >> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 >> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 >> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 >> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 >> Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 >> Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 >> ... > > Eric Fang has updated the pull request incrementally with one additional commit since the last revision: > > Extract some helper functions for better readability src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1980: > 1978: } > 1979: > 1980: // neon minp: pairwise minimum operation This comment, and the ones like it, is unnecessary. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2176: > 2174: bool is_unsigned; > 2175: Condition cond; > 2176: decode_minmax_reduction_opc(opc, is_min, is_unsigned, cond); Suggestion: decode_minmax_reduction_opc(opc, &is_min, &is_unsigned, &cond); In this case, passing address arguments is easier for the reader to understand because they are visible at the call site as well as the declaration. It is obvious at a glance that some arguments are passed by address. We do use C++ reference parameters in cases where it helps to do so, but I don't think it does here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28693#discussion_r2717929066 PR Review Comment: https://git.openjdk.org/jdk/pull/28693#discussion_r2717920250 From aph at openjdk.org Thu Jan 22 17:37:43 2026 From: aph at openjdk.org (Andrew Haley) Date: Thu, 22 Jan 2026 17:37:43 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 03:11:04 GMT, Eric Fang wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1965: >> >>> 1963: // Helper function to decode min/max reduction operation properties >>> 1964: static void decode_minmax_reduction_opc(int opc, bool& is_min, bool& is_unsigned, >>> 1965: Assembler::Condition& cond) { >> >> Suggestion: >> >> Condition cond) { > > Considering that this function is only used by this file and does not call any instructions, I made it a **file-scope static** function. And as we don't declare `using Assembler::Condition;` in this file, so we have to use `Assembler::Condition&` here, or we'll get the following error: > > jdk/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp:1965:41: error: ?Condition? has not been declared > 1965 | Condition& cond) { > > As for `&`, this is a reference parameter. > > To remove `Assembler::`, we can > 1. Declare `using Assembler::Condition;` in this file. > 2. Make this function as a private method of `C2_MacroAssembler`. > > WDYT ? I can't see any positive reason not to use Option 2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28693#discussion_r2717926249 From aph at openjdk.org Thu Jan 22 17:43:22 2026 From: aph at openjdk.org (Andrew Haley) Date: Thu, 22 Jan 2026 17:43:22 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 10:15:24 GMT, Eric Fang wrote: >> This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. >> >> Changes: >> -------- >> >> 1. C2 mid-end: >> - Added UMinReductionVNode and UMaxReductionVNode >> >> 2. AArch64 Backend: >> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions >> - Updated match rules for all vector sizes and element types >> - Both NEON and SVE implementation are supported >> >> 3. Test: >> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java >> - Added assembly tests in aarch64-asmtest.py for new instructions >> - Added a JTReg test file VectorUMinMaxReductionTest.java >> >> Different configurations were tested on aarch64 and x86 machines, and all tests passed. >> >> Test results of JMH benchmarks from the panama-vector project: >> -------- >> >> On a Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 >> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 >> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 >> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 >> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 >> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 >> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 >> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 >> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 >> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 >> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 >> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 >> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 >> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 >> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 >> Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 >> Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 >> ... > > Eric Fang has updated the pull request incrementally with one additional commit since the last revision: > > Extract some helper functions for better readability Please add the `TypeNNVector` JMH test files to this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3785770324 From kvn at openjdk.org Thu Jan 22 17:46:04 2026 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 Jan 2026 17:46:04 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 12:59:03 GMT, David Briemann wrote: > Aligning upwards instead of downwards not only solves the crash in large huge page scenarios but also ensures that the cache sizes are at least as big as they were set. Then sections sizes will not be aligned any more after that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28658#issuecomment-3785785888 From mdoerr at openjdk.org Thu Jan 22 17:58:33 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 22 Jan 2026 17:58:33 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 17:43:13 GMT, Vladimir Kozlov wrote: > Then sections sizes will not be aligned any more after that. If all 4 values are multiples of `min_size`, all results of subtraction and addition will still be a multiple of it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28658#issuecomment-3785856945 From vlivanov at openjdk.org Thu Jan 22 18:57:17 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 22 Jan 2026 18:57:17 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: References: <17qmKHGiPNSNpHn2By-gtJjPpwFFDj98wiY-bQG16nY=.4df59ed3-8120-496b-9a1e-06b80c9149ae@github.com> <5YSsP2h52Kp9WWb4YrthjtYSziRKM5FM3s5NHThrPTg=.8be8a6fc-e841-4e7b-b9d5-e3910c9bc2e6@github.com> Message-ID: On Thu, 22 Jan 2026 10:09:46 GMT, Emanuel Peter wrote: >> @eme64 Thanks for the feedback. >> >> On @requires vm.debug: I?d like to keep it for this reproducer. ProfileTraps is the key knob here: the failure requires ProfileTraps=false (create_if_missing = ProfileTraps, so get_method_data(..., false) may return NULL). Since ProfileTraps is a develop_pd flag and not settable on product builds, this reproducer has to run on a non-product VM (i.e., a debug VM). >> >> On -Xcomp: agreed. I?ll keep it but restrict it with -XX:CompileCommand=compileonly,... so we only compile the relevant method(s). >> >> If that sounds reasonable, I?ll proceed with just the compileonly tightening. > > Generally, it would also be nicer to extract a reproducer into a `test` method, and only compile that one. That way, the code shape leading to the crash is preserved. Would that be possible? Otherwise, we risk that someone changes the code shape (maybe in the core libs), and the test would not reproduce any more. I agree that `@requires vm.debug` is well justified here since it tests debug-only functionality. And `-XX:CompileCommand=compileonly,...` is a good improvement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2718178197 From vlivanov at openjdk.org Thu Jan 22 18:57:18 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 22 Jan 2026 18:57:18 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: References: <17qmKHGiPNSNpHn2By-gtJjPpwFFDj98wiY-bQG16nY=.4df59ed3-8120-496b-9a1e-06b80c9149ae@github.com> <5YSsP2h52Kp9WWb4YrthjtYSziRKM5FM3s5NHThrPTg=.8be8a6fc-e841-4e7b-b9d5-e3910c9bc2e6@github.com> Message-ID: On Thu, 22 Jan 2026 18:53:46 GMT, Vladimir Ivanov wrote: >> Generally, it would also be nicer to extract a reproducer into a `test` method, and only compile that one. That way, the code shape leading to the crash is preserved. Would that be possible? Otherwise, we risk that someone changes the code shape (maybe in the core libs), and the test would not reproduce any more. > > I agree that `@requires vm.debug` is well justified here since it tests debug-only functionality. > > And `-XX:CompileCommand=compileonly,...` is a good improvement. > Did you already run testing on this patch or should I run some? @eme64 no, I haven't performed any testing yet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2718179941 From dlong at openjdk.org Thu Jan 22 21:05:19 2026 From: dlong at openjdk.org (Dean Long) Date: Thu, 22 Jan 2026 21:05:19 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v9] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 05:49:22 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> **Description:** >> >> When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 >> In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 >> Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. >> >> **Fix:** >> >> Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. >> To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. >> >> **Test:** >> >> GHA > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - change variable name > - Merge remote-tracking branch 'upstream/master' into 8374862 > - remove unused code > - revert > - Merge remote-tracking branch 'upstream/master' into 8374862 > - fix a compile error > - remove unnecessary blank line > - correct copyright year > - Add outputStream::is_buffered() > - change variable name > - ... and 3 more: https://git.openjdk.org/jdk/compare/f0ffac1f...a63c4613 Looks OK, but I'm seeing some weird failures and crashes during testing. They should be unrelated, but let me look into it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29186#issuecomment-3786669654 From ghan at openjdk.org Fri Jan 23 01:41:11 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Fri, 23 Jan 2026 01:41:11 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v5] In-Reply-To: References: Message-ID: > Please review this change. Thanks! > > Description: > > This change fixes a crash in Deoptimization::uncommon_trap_inner when running with -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp. > > With -XX:-ProfileTraps, create_if_missing is set to false. > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2121-L2122 > > When create_if_missing is false, new mdo can not be created when m()->method_data() return null, so get_method_data() may return null . > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L1911-L1912 > > and trap_mdo can be null as a result > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2134-L2136 > > The crash happens here because trap_mdo is null > https://github.com/openjdk/jdk/blob/74faf033127ab3a5e28be75b91e662c589f81084/src/hotspot/share/runtime/deoptimization.cpp#L2157 > > Fix: > > The fix makes acquisition of extra_data_lock conditional on trap_mdo being non-null, preserving the required lock ordering (extra_data_lock before ttyLocker) when an MDO exists. > > Test: > > GHA Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Extract a reproducer into a test method - Merge remote-tracking branch 'upstream/master' into 8374807 - revert - Merge remote-tracking branch 'upstream/master' into 8374807 - narrow lock scope - Merge remote-tracking branch 'upstream/master' into 8374807 - split long line - Merge remote-tracking branch 'upstream/master' into 8374807 - fix 8374807 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29147/files - new: https://git.openjdk.org/jdk/pull/29147/files/e065ae56..3c86ec1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29147&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29147&range=03-04 Stats: 10756 lines in 339 files changed: 6642 ins; 1394 del; 2720 mod Patch: https://git.openjdk.org/jdk/pull/29147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29147/head:pull/29147 PR: https://git.openjdk.org/jdk/pull/29147 From ghan at openjdk.org Fri Jan 23 03:16:40 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Fri, 23 Jan 2026 03:16:40 GMT Subject: RFR: 8374807: Crash in MethodData::extra_data_lock()+0x0 when running -XX:+TraceDeoptimization -XX:-ProfileTraps -XX:-TieredCompilation -Xcomp -version [v4] In-Reply-To: References: <17qmKHGiPNSNpHn2By-gtJjPpwFFDj98wiY-bQG16nY=.4df59ed3-8120-496b-9a1e-06b80c9149ae@github.com> <5YSsP2h52Kp9WWb4YrthjtYSziRKM5FM3s5NHThrPTg=.8be8a6fc-e841-4e7b-b9d5-e3910c9bc2e6@github.com> Message-ID: <3lRq3TXzF4XjYLDNlbHfqamzIck_A9WSfXRhG7yA3Ww=.cf8ba7c6-29cf-403d-aea3-58a00a43a496@github.com> On Thu, 22 Jan 2026 10:09:46 GMT, Emanuel Peter wrote: >> @eme64 Thanks for the feedback. >> >> On @requires vm.debug: I?d like to keep it for this reproducer. ProfileTraps is the key knob here: the failure requires ProfileTraps=false (create_if_missing = ProfileTraps, so get_method_data(..., false) may return NULL). Since ProfileTraps is a develop_pd flag and not settable on product builds, this reproducer has to run on a non-product VM (i.e., a debug VM). >> >> On -Xcomp: agreed. I?ll keep it but restrict it with -XX:CompileCommand=compileonly,... so we only compile the relevant method(s). >> >> If that sounds reasonable, I?ll proceed with just the compileonly tightening. > > Generally, it would also be nicer to extract a reproducer into a `test` method, and only compile that one. That way, the code shape leading to the crash is preserved. Would that be possible? Otherwise, we risk that someone changes the code shape (maybe in the core libs), and the test would not reproduce any more. Hi @eme64 @iwanowww , I?ve updated the regression test to keep the reproducer self-contained in a dedicated test() method and restricted compilation to that method only. Could you please take another look? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29147#discussion_r2719423962 From jbhateja at openjdk.org Fri Jan 23 04:24:57 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 23 Jan 2026 04:24:57 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v14] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Refactoring and cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28002/files - new: https://git.openjdk.org/jdk/pull/28002/files/fe7075ee..72d15568 Webrevs: - full: Webrev is not available because diff is too large - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=12-13 Stats: 1415 lines in 52 files changed: 441 ins; 259 del; 715 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From jbhateja at openjdk.org Fri Jan 23 05:01:46 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 23 Jan 2026 05:01:46 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v14] In-Reply-To: References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> <1ElN5XvEXAYGINpCIB2smhDrzekGyiXmG6o8-jnxDxk=.83a69a64-2894-40af-a2ee-9c35448c88b2@github.com> Message-ID: <-cin72VWnqAukd5JbCMV9BfsGR1tWcWh_aGpdrlhHlM=.2d22e5da-35d2-4ffc-8ce5-86c90d662fd7@github.com> On Wed, 21 Jan 2026 07:01:39 GMT, Jatin Bhateja wrote: >> @jatin-bhateja Thanks for the ping! I'll put this on the list for review early in 2026 :) > > Hi @eme64 , Your comments have been addressed > @jatin-bhateja This patch is really really large. There are lots of renamings that could be done in a separate patch first (as a subtask). It would make reviewing easier, allowing focus on the substantial work. See discussion here: [#28002 (comment)](https://github.com/openjdk/jdk/pull/28002#discussion_r2705376899) Hi @eme64 , I have done some cleanups, following is the summary of changes included with the patch:- ``` 1 Changes to introduce a new (custom) basictype T_FLOAT16 - Global Definition. - Skip over handling where ever applicable. 2 Changes to pass laneType (BasicType) to intrinsific entry point instead of element classes. - Inline expander interface changes mainly. 3 Changes in abstract and concrete vector class generation templates. 4 Changing the nomenclature of Vector classes to avoid Float1664... sort of names... 5 Changes in the LaneType to add a new carrier type field. 6 Changes in inline expanders to selectivelty enable intrinsification for opration for which we have auto-vectorization and backend support in place.. 7 Changes in test generation templates. b. Assert wrappers to conver float16 (short) value to float before invoking testng Asserts. c. Scalar operation wrappers to selectivelty invoke Float16 math routine which are not part of Java SE math libraries. 8 New IR verification test. 9 New Micro-benchmark. 10 AARCH64 test failure - patch + test fixed by Bhavana Kilambi. Out of above change 7b consumes 40000+ LOC. Q. Why do we need wrapper assertions ? A. To handle all possible NaN representations of SNaN and QNaN, since float16 uses short carrier type hence we need to promote them float values before invoking TestNG assertions. This conversion is accomplished by assertion wrappers I think all the tasks are related and since most of source/test are generated using scripts we should not go by the size of patch and review the templates files. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3788233245 From jkarthikeyan at openjdk.org Fri Jan 23 06:19:20 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 23 Jan 2026 06:19:20 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Fix whitespace - Update tests after merge, apply changes from review - Merge from master - Update tests, cleanup logic - Merge branch 'master' into vectorize-subword - Check for AVX2 for byte/long conversions - Whitespace and benchmark tweak - Address more comments, make test and benchmark more exhaustive - Merge from master - Fix copyright after merge - ... and 9 more: https://git.openjdk.org/jdk/compare/de6f35ef...13378368 ------------- Changes: https://git.openjdk.org/jdk/pull/23413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=14 Stats: 817 lines in 16 files changed: 721 ins; 15 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Fri Jan 23 06:28:08 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 23 Jan 2026 06:28:08 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v13] In-Reply-To: <3bpbyYd0EThEBJwkRk4FkijqvN_4YHm_QuOIiYw5234=.b4d12b58-0483-4ceb-878d-be6c972f8f85@github.com> References: <3bpbyYd0EThEBJwkRk4FkijqvN_4YHm_QuOIiYw5234=.b4d12b58-0483-4ceb-878d-be6c972f8f85@github.com> Message-ID: On Thu, 15 May 2025 13:15:05 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Check for AVX2 for byte/long conversions > >> This is a good point, while testing I experimented with patterns like this: >> >> ```java >> private static short[] testSubwordVector(short[] out, int[] in) { >> for (int i = 0; i < 512; i++) { >> out[i] = (short) (((short) in[i]) + (short) in[i]); >> } >> >> return out; >> } >> ``` >> >> The IR it produces looks like: `StoreC(AddI(RShiftI(LShiftI(LoadI, 16), 16)`. The same thing happens for sign extension as well. I didn't investigate too deeply, but I think the shifts prevent this pattern from vectorizing. The shifts are needed in the scalar IR since we don't have a `AddS` node, but in the future, when translating the IR to the vector graph we could convert the shift pattern into a `VectorCastX2Y` node as well. > > I suppose there are 2 options here, when vectorizing: > - Cast between `short <-> int`, do the add in `int`. > - Somehow detect that this is an "`AddS`" in the type analysis phase of SuperWord. And then hack the graph so that we do not need the shifts. This would be more complicated, but might give us better results in the end. > > Is there already an RFE for this? If not, would you mind filing one? Hi @eme64, I apologize for the long delay, but I have some more time to work on this! I've merged from master, updated the tests, and addressed the previous code review concerns. I would love to know what you think! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-3788515264 From jkarthikeyan at openjdk.org Fri Jan 23 06:28:19 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 23 Jan 2026 06:28:19 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v14] In-Reply-To: References: Message-ID: On Wed, 27 Aug 2025 10:16:23 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Update tests, cleanup logic >> - Merge branch 'master' into vectorize-subword >> - Check for AVX2 for byte/long conversions >> - Whitespace and benchmark tweak >> - Address more comments, make test and benchmark more exhaustive >> - Merge from master >> - Fix copyright after merge >> - Fix copyright >> - Merge >> - Implement patch with VectorCastNode::implemented >> - ... and 6 more: https://git.openjdk.org/jdk/compare/8fcbb110...aabaafba > > src/hotspot/share/opto/superword.cpp line 2422: > >> 2420: // Opcode is only required to disambiguate half float, so we pass -1 as it can't be encountered here. >> 2421: return (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt); >> 2422: } > > Not sure if we discussed this before: should we not move this to `VectorCastNode`, rather than having it in `SuperWord`? I've moved the function to `VectorCastNode`, I think it's a better fit there because the other cast functions are located there as well. > src/hotspot/share/opto/superwordVTransformBuilder.cpp line 197: > >> 195: >> 196: // If the use and def types are different, emit a cast node >> 197: if (use_bt != def_bt && !p0->is_Convert() && SuperWord::is_supported_subword_cast(def_bt, use_bt, pack->size())) { > > Is `SuperWord::is_supported_subword_cast(def_bt, use_bt, pack->size())` really a true condition that you need to check here (and if false we can continue in the "else"), or should it be rather an assert? I believe this condition is indeed needed, since for example `& 1` produces a TypeInt with a basic type of boolean, which would be otherwise harmless with the current logic but would trip the assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2719777514 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2719780695 From jkarthikeyan at openjdk.org Fri Jan 23 06:32:41 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 23 Jan 2026 06:32:41 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v14] In-Reply-To: References: Message-ID: <70BeLtxksHdakB7fHCSHc0F1KcFTCsSEGH429J73lGU=.08529d22-c524-4416-99be-89cebdec2111@github.com> On Wed, 27 Aug 2025 10:20:37 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Update tests, cleanup logic >> - Merge branch 'master' into vectorize-subword >> - Check for AVX2 for byte/long conversions >> - Whitespace and benchmark tweak >> - Address more comments, make test and benchmark more exhaustive >> - Merge from master >> - Fix copyright after merge >> - Fix copyright >> - Merge >> - Implement patch with VectorCastNode::implemented >> - ... and 6 more: https://git.openjdk.org/jdk/compare/8fcbb110...aabaafba > > test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 513: > >> 511: @Test >> 512: @IR(applyIfCPUFeature = { "avx", "true" }, >> 513: applyIfOr = {"AlignVector", "false", "UseCompactObjectHeaders", "false"}, > > Do you think these would be supported with `asimd` as well? > If you just cannot test with it feel free to file an RFE and then I can find someone to take care of it (e.g. as a starter bug). I did a bit of testing on aarch64 hardware and have updated the tests to include `asimd` where supported. > test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 76: > >> 74: >> 75: @Test >> 76: @IR(applyIfCPUFeature = { "avx2", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" }) > > Do you think we can make the vector size more precise here? This is a good idea, I've replaced the `any` sizes with `min(int, )` which should continue to be portable across architectures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2719788905 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2719788217 From jbhateja at openjdk.org Fri Jan 23 06:34:00 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 23 Jan 2026 06:34:00 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v12] In-Reply-To: References: Message-ID: > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24104/files - new: https://git.openjdk.org/jdk/pull/24104/files/2c7eb96d..ae242926 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=10-11 Stats: 23 lines in 10 files changed: 4 ins; 12 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104 PR: https://git.openjdk.org/jdk/pull/24104 From jbhateja at openjdk.org Fri Jan 23 06:34:02 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 23 Jan 2026 06:34:02 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v11] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 09:42:06 GMT, Xiaohong Gong wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762 > > Overall, looks good to me; I?ve just left a few minor comments. Hi @XiaohongGong , your comments have been addressed. Hi @sviswa7, can you kindly review x86 part. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24104#issuecomment-3788537143 From chagedorn at openjdk.org Fri Jan 23 07:40:03 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 Jan 2026 07:40:03 GMT Subject: RFR: 8350330: C2: PhaseIdealLoop::add_parse_predicate() should mirror GraphKit::add_parse_predicate() In-Reply-To: References: Message-ID: <8jET6W_ZVuz7gdnA7fscABp054UMADSpU51eRxIZ_YE=.ace7ab11-eb33-4a04-8da2-d03d4b3e2adb@github.com> On Thu, 22 Jan 2026 16:22:34 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::add_parse_predicate()` was intented to mirror > `GraphKit::add_parse_predicate()` but it doesn't. That last one checks > `too_many_traps` per bci but the `PhaseIdealLoop` version doesn't. As > demonstrated by the test case, a method can get compiled with a > predicate, take a trap, and get recompiled with the same predicate > many times (up to ~100). That looks good to me, thanks for fixing this inconsistency! test/hotspot/jtreg/compiler/longcountedloops/TestLoopNestTooManyTraps.java line 34: > 32: * -XX:-BackgroundCompilation -XX:-ShortRunningLongLoop -XX:-UseOnStackReplacement > 33: * -XX:CompileOnly=*TestLoopNestTooManyTraps::test1 -XX:LoopMaxUnroll=0 > 34: * compiler.longcountedloops.TestLoopNestTooManyTraps Nice test! Would it make sense for this special test to also have a non-flag run? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29367#pullrequestreview-3696092503 PR Review Comment: https://git.openjdk.org/jdk/pull/29367#discussion_r2719968874 From chagedorn at openjdk.org Fri Jan 23 07:51:57 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 Jan 2026 07:51:57 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v5] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 15:01:36 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. >> >> For example, given `r = CmpU(x, y)`. >> >> At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. >> >> At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Improve IR tests Update looks good, thanks! test/hotspot/jtreg/compiler/c2/gvn/CmpUNodeValueTests.java line 36: > 34: * @run driver ${test.main.class} > 35: */ > 36: public class CmpUNodeValueTests { Thanks for adding an IR test! test/hotspot/jtreg/compiler/ccp/TestCmpUMonotonicity.java line 35: > 33: public class TestCmpUMonotonicity { > 34: public static void main(String[] args) { > 35: for (int i = 0; i < 20000; i++) { But do you really need 20000 iterations to trigger the issue? We end up executing the inner loop 20000*50 times. But it's not much work in the loop, so probably does not matter too much. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29308#pullrequestreview-3696101563 PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2719978794 PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2719977624 From qamai at openjdk.org Fri Jan 23 08:24:32 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 08:24:32 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v5] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 07:40:52 GMT, Christian Hagedorn wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve IR tests > > test/hotspot/jtreg/compiler/ccp/TestCmpUMonotonicity.java line 35: > >> 33: public class TestCmpUMonotonicity { >> 34: public static void main(String[] args) { >> 35: for (int i = 0; i < 20000; i++) { > > But do you really need 20000 iterations to trigger the issue? We end up executing the inner loop 20000*50 times. But it's not much work in the loop, so probably does not matter too much. Thanks for your approval, the inner loop update statement is `i *= step` instead of `i += step`, so the inner loop runs only 7 iterations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2720143519 From mchevalier at openjdk.org Fri Jan 23 08:47:35 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 23 Jan 2026 08:47:35 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v5] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 15:01:36 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. >> >> For example, given `r = CmpU(x, y)`. >> >> At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. >> >> At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Improve IR tests Seems good, sound, and as precise as we can be. I just have a cosmetic detail. src/hotspot/share/opto/subnode.cpp line 749: > 747: // If both inputs are constants, compare them. > 748: const Type* CmpUNode::sub(const Type* t1, const Type* t2) const { > 749: const TypeInt *r0 = t1->is_int(); We prefer `TypeInt* r0`. Same under. ------------- Marked as reviewed by mchevalier (Committer). PR Review: https://git.openjdk.org/jdk/pull/29308#pullrequestreview-3696381918 PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2720237259 From qamai at openjdk.org Fri Jan 23 08:51:52 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 08:51:52 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v6] In-Reply-To: References: Message-ID: > Hi, > > This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. > > For example, given `r = CmpU(x, y)`. > > At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. > > At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29308/files - new: https://git.openjdk.org/jdk/pull/29308/files/625bed1c..053c5757 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29308&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29308/head:pull/29308 PR: https://git.openjdk.org/jdk/pull/29308 From qamai at openjdk.org Fri Jan 23 08:51:54 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 08:51:54 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v5] In-Reply-To: References: Message-ID: <2mYbrhdVUbsfoM098J1Z9Vm_omjC9XYbF_CFQMnRNaI=.f8202f23-d181-4bce-b510-dab4eee372dc@github.com> On Fri, 23 Jan 2026 08:40:48 GMT, Marc Chevalier wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve IR tests > > src/hotspot/share/opto/subnode.cpp line 749: > >> 747: // If both inputs are constants, compare them. >> 748: const Type* CmpUNode::sub(const Type* t1, const Type* t2) const { >> 749: const TypeInt *r0 = t1->is_int(); > > We prefer `TypeInt* r0`. Same under. Thanks, I missed that, fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29308#discussion_r2720273073 From mchevalier at openjdk.org Fri Jan 23 08:59:49 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 23 Jan 2026 08:59:49 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v6] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 08:51:52 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. >> >> For example, given `r = CmpU(x, y)`. >> >> At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. >> >> At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style Marked as reviewed by mchevalier (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29308#pullrequestreview-3696457971 From qamai at openjdk.org Fri Jan 23 09:11:02 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 09:11:02 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v4] In-Reply-To: <1E6C99XanWuE9QF_-SfyFzxgD4asf7WUO7RyR1pYn7M=.47f1b163-1b67-4f54-acd4-d061779ad95d@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> <1E6C99XanWuE9QF_-SfyFzxgD4asf7WUO7RyR1pYn7M=.47f1b163-1b67-4f54-acd4-d061779ad95d@github.com> Message-ID: On Thu, 22 Jan 2026 14:12:12 GMT, Kerem Kat wrote: >> The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into fix-c2-checkCastPP > - Merge branch 'master' into fix-c2-checkCastPP > - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed > - Simplify expand_vbox_node_helper by merging VectorBox Phi handling > - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29200#pullrequestreview-3696500310 From epeter at openjdk.org Fri Jan 23 09:12:27 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 09:12:27 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 06:19:20 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Fix whitespace > - Update tests after merge, apply changes from review > - Merge from master > - Update tests, cleanup logic > - Merge branch 'master' into vectorize-subword > - Check for AVX2 for byte/long conversions > - Whitespace and benchmark tweak > - Address more comments, make test and benchmark more exhaustive > - Merge from master > - Fix copyright after merge > - ... and 9 more: https://git.openjdk.org/jdk/compare/de6f35ef...13378368 @jaskarth Wow, I just realized how big the impact of this PR is, by the number of IR rules you were able to adjust. Very exciting! I left quite a few comments below, but only 3 are about the VM code, so we are not far from the finish line :) The rest is more about tracking future work. If you don't have the time to file the issues just let me know, and I can file some RFEs for tracking :) One major improvement for the future, would be to track down the cases where we now cast from subword->int, then do int-ops, and cast int->subword. This loses us a factor of 2 or 4 with the vector length and introduces more ops we probably don't always need. But optimizing this could be quite a big task, so not a high priority. But we should file an issue for it for sure :) src/hotspot/share/opto/superwordVTransformBuilder.cpp line 264: > 262: if (use_bt != def_bt && !p0->is_Convert() && VectorCastNode::is_supported_subword_cast(def_bt, use_bt, pack->size())) { > 263: VTransformNode* in = get_vtnode(pack_in->at(0)); > 264: VTransformNode* cast = new (_vtransform.arena()) VTransformCastVectorNode(_vtransform, pack->size(), def_bt, use_bt); I just noticed: above, we already handle a cast case, but use `VTransformElementWiseVectorNode`: https://github.com/openjdk/jdk/pull/23413/files#diff-cd8469676c3f287680696b4dbd87fd02b765f2c9a249bd485c55613b15843435L213-L217 I'm not happy with using `VTransformElementWiseVectorNode` for some casts and `VTransformCastVectorNode` for others. So I see 2 options: - Use `VTransformCastVectorNode` for both, refactor the code I linded. - Somehow try to remove `VTransformCastVectorNode`, and use `VTransformElementWiseVectorNode` here. Do you think that would be possible? src/hotspot/share/opto/vtransform.cpp line 1313: > 1311: } > 1312: > 1313: if (current_red->in_req(2)->isa_Vector() == nullptr && current_red->in_req(2)->isa_CastVector() == nullptr) { Having `VTransformCastVectorNode` subtype from `VTransformVectorNode` would make this change unnecessary. src/hotspot/share/opto/vtransform.hpp line 981: > 979: }; > 980: > 981: class VTransformCastVectorNode : public VTransformNode { I do wonder if we really need this one, or if we could just use the element-wise operator. If it's too much work or even impossible: can we at least make it a subtype of `VTransformVectorNode`, analogue to how the `VTransformReinterpretVectorNode` does it? test/hotspot/jtreg/compiler/c2/TestMinMaxSubword.java line 65: > 63: > 64: @Test > 65: @IR(applyIfCPUFeature = { "avx", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" }) I think you could get more precise vector size here as well, using `IRNode.VECTOR_SIZE + "min(max_int, max_short)"` as you did in the other test :) test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java line 464: > 462: applyIf = {"AutoVectorizationOverrideProfitability", "> 0"}) > 463: @IR(failOn = IRNode.LOAD_VECTOR_B, > 464: applyIf = {"AutoVectorizationOverrideProfitability", "= 0"}) Wow, I think I had not noticed this before! This is actually a great win already. Though we could still do better by not casting to int, and rather staying in byte. I now filed [JDK-8376176](https://bugs.openjdk.org/browse/JDK-8376176): C2 SuperWord: implement/improve subword reductions test/hotspot/jtreg/compiler/vectorization/TestRotateByteAndShortVector.java line 122: > 120: @IR(counts = { IRNode.LOAD_VECTOR_B, IRNode.VECTOR_SIZE + "min(max_int, max_byte)", "> 0", > 121: IRNode.ROTATE_LEFT_V, "> 0" }, > 122: applyIfCPUFeature = {"avx512f", "true"}) We could also improve things here, right? Or is there a reason why we need to cast from and to int? Do you agree that we should file an RFE to track this? test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 77: > 75: @Test > 76: @IR(counts = { IRNode.LOAD_VECTOR_S, IRNode.VECTOR_SIZE + "min(max_int, max_short)", "> 0" }, > 77: applyIfCPUFeatureOr = { "avx2", "true", "asimd", "true" }) And how about here? Could we optimize and remove the casts? test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 125: > 123: @Test > 124: @IR(failOn = {IRNode.STORE_VECTOR}) > 125: // Subword vector casts with char do not work currently, see JDK-8349562. Ah, you had already filed something about unsigned casts! I think this is now a possible duplicate of: [JDK-8375502](https://bugs.openjdk.org/browse/JDK-8375502) C2 SuperWord: implement unsigned casts But the issues are linked, so just leave the comment as is :) test/hotspot/jtreg/compiler/vectorization/runner/BasicShortOpTest.java line 216: > 214: > 215: @Test > 216: @IR(applyIfCPUFeature = { "avx", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" }) Can we make the size more precise, please? :) I suspect we might be able to eventually implement this with a short min, rather than a int min? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23413#pullrequestreview-3696382185 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720237565 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720256416 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720248998 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720264113 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720294642 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720302974 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720306727 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720320456 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720326664 From epeter at openjdk.org Fri Jan 23 09:12:29 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 09:12:29 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 08:57:15 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix whitespace >> - Update tests after merge, apply changes from review >> - Merge from master >> - Update tests, cleanup logic >> - Merge branch 'master' into vectorize-subword >> - Check for AVX2 for byte/long conversions >> - Whitespace and benchmark tweak >> - Address more comments, make test and benchmark more exhaustive >> - Merge from master >> - Fix copyright after merge >> - ... and 9 more: https://git.openjdk.org/jdk/compare/de6f35ef...13378368 > > test/hotspot/jtreg/compiler/vectorization/TestRotateByteAndShortVector.java line 122: > >> 120: @IR(counts = { IRNode.LOAD_VECTOR_B, IRNode.VECTOR_SIZE + "min(max_int, max_byte)", "> 0", >> 121: IRNode.ROTATE_LEFT_V, "> 0" }, >> 122: applyIfCPUFeature = {"avx512f", "true"}) > > We could also improve things here, right? Or is there a reason why we need to cast from and to int? > > Do you agree that we should file an RFE to track this? Can you add the current issue/bug/RFE number to all the files JTREG `@bug` annotations, please :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720310350 From epeter at openjdk.org Fri Jan 23 09:23:50 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 09:23:50 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 06:19:20 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Fix whitespace > - Update tests after merge, apply changes from review > - Merge from master > - Update tests, cleanup logic > - Merge branch 'master' into vectorize-subword > - Check for AVX2 for byte/long conversions > - Whitespace and benchmark tweak > - Address more comments, make test and benchmark more exhaustive > - Merge from master > - Fix copyright after merge > - ... and 9 more: https://git.openjdk.org/jdk/compare/de6f35ef...13378368 Ok, I filed this as another follow-up: [JDK-8376179](https://bugs.openjdk.org/browse/JDK-8376179): C2 SuperWord: improve subword vectorization, avoid cast to-and-from int ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-3789240675 From epeter at openjdk.org Fri Jan 23 09:23:53 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 09:23:53 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 08:54:37 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix whitespace >> - Update tests after merge, apply changes from review >> - Merge from master >> - Update tests, cleanup logic >> - Merge branch 'master' into vectorize-subword >> - Check for AVX2 for byte/long conversions >> - Whitespace and benchmark tweak >> - Address more comments, make test and benchmark more exhaustive >> - Merge from master >> - Fix copyright after merge >> - ... and 9 more: https://git.openjdk.org/jdk/compare/de6f35ef...13378368 > > test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java line 464: > >> 462: applyIf = {"AutoVectorizationOverrideProfitability", "> 0"}) >> 463: @IR(failOn = IRNode.LOAD_VECTOR_B, >> 464: applyIf = {"AutoVectorizationOverrideProfitability", "= 0"}) > > Wow, I think I had not noticed this before! This is actually a great win already. Though we could still do better by not casting to int, and rather staying in byte. > > I now filed > [JDK-8376176](https://bugs.openjdk.org/browse/JDK-8376176): C2 SuperWord: implement/improve subword reductions Related: [JDK-8376179](https://bugs.openjdk.org/browse/JDK-8376179): C2 SuperWord: improve subword vectorization, avoid cast to-and-from int > test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 77: > >> 75: @Test >> 76: @IR(counts = { IRNode.LOAD_VECTOR_S, IRNode.VECTOR_SIZE + "min(max_int, max_short)", "> 0" }, >> 77: applyIfCPUFeatureOr = { "avx2", "true", "asimd", "true" }) > > And how about here? Could we optimize and remove the casts? Filed: [JDK-8376179](https://bugs.openjdk.org/browse/JDK-8376179): C2 SuperWord: improve subword vectorization, avoid cast to-and-from int ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720383260 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720381664 From epeter at openjdk.org Fri Jan 23 09:23:55 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 09:23:55 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 08:59:41 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorization/TestRotateByteAndShortVector.java line 122: >> >>> 120: @IR(counts = { IRNode.LOAD_VECTOR_B, IRNode.VECTOR_SIZE + "min(max_int, max_byte)", "> 0", >>> 121: IRNode.ROTATE_LEFT_V, "> 0" }, >>> 122: applyIfCPUFeature = {"avx512f", "true"}) >> >> We could also improve things here, right? Or is there a reason why we need to cast from and to int? >> >> Do you agree that we should file an RFE to track this? > > Can you add the current issue/bug/RFE number to all the files JTREG `@bug` annotations, please :) Filed: [JDK-8376179](https://bugs.openjdk.org/browse/JDK-8376179): C2 SuperWord: improve subword vectorization, avoid cast to-and-from int ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2720385001 From qamai at openjdk.org Fri Jan 23 09:41:22 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 09:41:22 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v6] In-Reply-To: References: Message-ID: <3BOWwtWeFEe1w23f0W3Ppk8KTHDO0i4S4fqxw0ogj2s=.0b6ac660-f864-4712-a0ea-36ef64bbc619@github.com> On Fri, 23 Jan 2026 08:57:05 GMT, Marc Chevalier wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> style > > Marked as reviewed by mchevalier (Committer). @marc-chevalier Thanks for your approval. @eme64 @chhagedorn May I have your reapproval, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29308#issuecomment-3789320056 From epeter at openjdk.org Fri Jan 23 10:18:23 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 10:18:23 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v14] In-Reply-To: <-cin72VWnqAukd5JbCMV9BfsGR1tWcWh_aGpdrlhHlM=.2d22e5da-35d2-4ffc-8ce5-86c90d662fd7@github.com> References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> <1ElN5XvEXAYGINpCIB2smhDrzekGyiXmG6o8-jnxDxk=.83a69a64-2894-40af-a2ee-9c35448c88b2@github.com> <-cin72VWnqAukd5JbCMV9BfsGR1tWcWh_aGpdrlhHlM=.2d22e5da-35d2-4ffc-8ce5-86c90d662fd7@github.com> Message-ID: <0xRSPGoZtGTXyFJEH5pvABR-qa2UZjNnQ2mXCxbYP4U=.ac7ab717-3adf-4910-a333-dd15b3fedd32@github.com> On Fri, 23 Jan 2026 04:57:04 GMT, Jatin Bhateja wrote: >> Hi @eme64 , Your comments have been addressed > >> @jatin-bhateja This patch is really really large. There are lots of renamings that could be done in a separate patch first (as a subtask). It would make reviewing easier, allowing focus on the substantial work. See discussion here: [#28002 (comment)](https://github.com/openjdk/jdk/pull/28002#discussion_r2705376899) > > Hi @eme64 , > > I have done some cleanups, following is the summary of changes included with the patch:- > > ``` > 1 Changes to introduce a new (custom) basictype T_FLOAT16 > - Global Definition. > - Skip over handling where ever applicable. > 2 Changes to pass laneType (BasicType) to intrinsific entry point instead of element classes. > - Inline expander interface changes mainly. > 3 Changes in abstract and concrete vector class generation templates. > 4 Changing the nomenclature of Vector classes to avoid Float1664... sort of names... > 5 Changes in the LaneType to add a new carrier type field. > 6 Changes in inline expanders to selectivelty enable intrinsification for opration for which we have > auto-vectorization and backend support in place.. > 7 Changes in test generation templates. > b. Assert wrappers to conver float16 (short) value to float before invoking testng Asserts. > c. Scalar operation wrappers to selectivelty invoke Float16 math routine which are not > part of Java SE math libraries. > > 8 New IR verification test. > 9 New Micro-benchmark. > 10 AARCH64 test failure - patch + test fixed by Bhavana Kilambi. > > > Out of above change 7b consumes 40000+ LOC. > > Q. Why do we need wrapper assertions ? > A. To handle all possible NaN representations of SNaN and QNaN, since float16 uses short carrier type hence we need to promote them float values before invoking TestNG assertions. This conversion is accomplished by assertion wrappers > > All the tasks are related and most of source/test are generated using scripts we should not go by the size of patch and review the templates files. @jatin-bhateja Thanks for your response. And thanks for the list of changes included in the patch :) It seems to me, many of these subtasks you mention could be done as separate tasks prior to the Float16Vector and auto-vectorizer work: 1 Changes to introduce a new (custom) basictype T_FLOAT16 - Global Definition. - Skip over handling where ever applicable. And 2 Changes to pass laneType (BasicType) to intrinsific entry point instead of element classes. - Inline expander interface changes mainly. And in the below at least changes that don't include the Float16Vector: 3 Changes in abstract and concrete vector class generation templates. 4 Changing the nomenclature of Vector classes to avoid Float1664... sort of names... Probably also this: 5 Changes in the LaneType to add a new carrier type field. And maybe also this, as long as it is not Float16 specific: 6 Changes in inline expanders to selectivelty enable intrinsification for opration for which we have auto-vectorization and backend support in place.. For `7`, probably only the `7b` part, since `7a` is about Float16Vector. 7 Changes in test generation templates. b. Assert wrappers to conver float16 (short) value to float before invoking testng Asserts. c. Scalar operation wrappers to selectivelty invoke Float16 math routine which are not part of Java SE math libraries. Parts 8, 9, 10 seem Float16Vector specific, so those should stay here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3789507594 From epeter at openjdk.org Fri Jan 23 10:18:26 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 10:18:26 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v14] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 04:24:57 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring and cleanups The goal of separating these is that reviewing is much easier, and so we can reach a higher confidence in the quality of the code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3789510739 From dlong at openjdk.org Fri Jan 23 10:21:38 2026 From: dlong at openjdk.org (Dean Long) Date: Fri, 23 Jan 2026 10:21:38 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v9] In-Reply-To: References: Message-ID: <31ROwZLk7vP70S_r-9eLgW0z6zb0Ur6Vfi9qQcE_n_Y=.aee0d7f4-f776-4086-9b1e-77a63bfe3bfc@github.com> On Mon, 19 Jan 2026 05:49:22 GMT, Guanqiang Han wrote: >> Please review this change. Thanks! >> >> **Description:** >> >> When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 >> In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). >> https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 >> Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. >> >> **Fix:** >> >> Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. >> To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. >> >> **Test:** >> >> GHA > > Guanqiang Han has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - change variable name > - Merge remote-tracking branch 'upstream/master' into 8374862 > - remove unused code > - revert > - Merge remote-tracking branch 'upstream/master' into 8374862 > - fix a compile error > - remove unnecessary blank line > - correct copyright year > - Add outputStream::is_buffered() > - change variable name > - ... and 3 more: https://git.openjdk.org/jdk/compare/72b38ac9...a63c4613 I retested after merging with the latest in master branch and it passed. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29186#pullrequestreview-3696811333 From epeter at openjdk.org Fri Jan 23 10:23:57 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 10:23:57 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <5o-TOPc_uS09RV2X544hVOK-06mYnbCX6M9ng80UMRM=.67450359-509e-47a3-85ba-b973551b160d@github.com> Message-ID: On Thu, 22 Jan 2026 15:18:33 GMT, Fei Gao wrote: >> I'm asking for more comments because I fear the benchmark is becoming harder to use, with all the extra options and benchmark variants. > > Really great suggestions. > > I'll refine the comments as like: > > // When enabled, run an additional warm-up phase using a large loop iteration > // count to encourage C2 to generate vectorized and unrolled loop bodies. > // > // Rationale: > // Some benchmarks in this suite use small, fixed trip-count loops. During > // early profiling, C2 may treat such loops as trivial, avoid vectorization, > // or optimize them away entirely. In those cases, changes that affect loop > // vectorization behavior, such as the improvement introduced by JDK-8307084, > // may not be observable in the generated code. > // > // As a result, this benchmark suite contains two main classes of > // microbenchmarks: > // 1) bench_xx_computeBound / bench_xx_memoryBound > // These measure the performance of C2-generated code for the given > // workload without relying on a special warm-up phase. > // 2) bench03xx_staticTripCount / bench03xx_dynamicTripCount > // These benchmarks are sensitive to early profiling. Enabling a > // large-loop warm-up forces the optimizer to observe the loop at scale, > // making vectorized code generation more likely and allowing such > // effects to be measured. > // > // Usage guidance: > // - Enable for microbenchmarks that rely on observing vectorization or > // unrolling effects, especially when loop trip counts are small or > // constant (e.g., bench03xx_staticTripCount and bench03xx_dynamicTripCount, > // introduced by JDK-8307084). > // - Disable for general regression testing and for other microbenchmarks. > @Param({"true", "false"}) > public static boolean ENABLE_LARGE_LOOP_WARMUP; > > WDYT? That sounds quite good, actually :) But I do wonder: why should we not also have the "large loop warmup" for the other benchmarks? Are we sure that they would not also be affected? Or what exactly is the explanation that we cannot see the impact of this patch on those benchmarks? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2720609553 From ghan at openjdk.org Fri Jan 23 10:29:42 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Fri, 23 Jan 2026 10:29:42 GMT Subject: RFR: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) [v9] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 11:44:24 GMT, David Holmes wrote: >> Hi @dholmes-ora, Thank you for reviewing! >> I?ve integrated the change ? could you please sponsor it? Thanks! > > @hgqxjj hotspot changes require two reviews, so we will need to wait for @dean-long to have another look. @dholmes-ora @dean-long Thanks for the review . I?ve already integrated this PR . Could one of you please sponsor it ? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29186#issuecomment-3789556787 From epeter at openjdk.org Fri Jan 23 10:29:49 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 10:29:49 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 13 Jan 2026 11:27:53 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Fix build failure after rebasing and address review comments > - Merge branch 'master' into optimize-atomic-post > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 These are all excellent questions. I think I need to spend some time figuring them out. But all of that should not block us here from this PR. This PR may just not turn out to be the lowest-hanging-fruit, but I think it will still be a valuable contribution in the long-run :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3789556043 From epeter at openjdk.org Fri Jan 23 10:43:43 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 10:43:43 GMT Subject: RFR: 8359335: Template-Framework Library: Primitive Types subtyping In-Reply-To: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> References: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> Message-ID: <8vAQozxKS_iXqsyRVjAEC5F-TAs71whcvEkBWL36ECs=.43db0285-cfd3-4f69-867c-51ff50238ef4@github.com> On Wed, 21 Jan 2026 15:56:36 GMT, Manuel H?ssig wrote: > This PR implements the subtype relation for primitive types in the Template Framework Library according to JLS ?4.10.1. Further, this PR extends the primitive type example to test the addition. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 Ok, so this is what I remember from our conversation yesterday: - I think the subtyping is a valuable contribution, to be able to more accurately model things. - But I now think that it is not great to allow sampling Expressions for subtypes, at least not in all cases: - It means that we can do an implicit cast. - That leads to issues with context sensitive operators like `%` as you have discovered. That operator is a bit strange, because the supertype (float) cannot throw an exception, but the subtype (int) can. - Implicit casts also means the fuzzer has no control over the probabilities of these casts. And we may want to (at some point) figure out a way to control probability of operations. For example, we may want to lower or increase the probability of casts, because we think that more interesting things happen with/without casts. - There may also be cases in the future where we do actively want to allow sampling Expressions for subtypes. For example, we may soon want to model Objects (and Valhalla value classes), and then we might want to use opreations to model things like method calls, where both the `this` and the arguments can be subtypes. - So I suspect that we will have to eventually support both kinds of operations: those that only work on exact types and those that allow subtypes. TLDR: for now, it would be nice to restrict `Expression` nesting to exact type matches, rather than `isSubtypeOf`, but we may allow both cases in the future. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29349#issuecomment-3789608450 From epeter at openjdk.org Fri Jan 23 11:12:14 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 11:12:14 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v12] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 02:24:03 GMT, Xiaohong Gong wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> updates for review > > I ran the new tests on my ARM NEON machine with `-XX:MaxVectorSize=8`, and following tests crashed with the same error: > > compiler/vectorization/TestVectorAlgorithms.java#noOptimizeFill > compiler/vectorization/TestVectorAlgorithms.java#noSuperWord > compiler/vectorization/TestVectorAlgorithms.java#vanilla > > > Here is the log: > > Standard Output > --------------- > CompileCommand: inline *VectorAlgorithmsImpl*.* bool inline = true > TestVM main() called - about to run tests in class compiler.vectorization.TestVectorAlgorithms > For random generator using seed: 5121565769469166450 > To re-run test with same seed value please add "-Djdk.test.lib.random.seed=5121565769469166450" to command line. > 300 Phi === 103 1050 302 [[ 399 299 ]] #rawptr:BotPTR !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int64Vector::lanewise @ bci:3 (line 278) Int64Vector::lanewise @ bci:3 (line 43) IntVector::lanewise @ bci:43 (line 944) IntVector::add @ bci:5 (line 1406) VectorAlgorithmsImpl::findMinIndexI_VectorAPI @ bci:96 (line 563) > 300 Phi === 103 1050 302 [[ 399 299 ]] #rawptr:BotPTR !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int64Vector::lanewise @ bci:3 (line 278) Int64Vector::lanewise @ bci:3 (line 43) IntVector::lanewise @ bci:43 (line 944) IntVector::add @ bci:5 (line 1406) VectorAlgorithmsImpl::findMinIndexI_VectorAPI @ bci:96 (line 563) > 98 safePoint === 101 0 401 0 0 99 905 402 403 404 282 0 0 0 0 908 909 912 [[ 100 575 675 ]] !jvms: VectorAlgorithmsImpl::findMinIndexI_VectorAPI @ bci:113 (line 558) > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (jdk-src/src/hotspot/share/opto/buildOopMap.cpp:371), pid=145228, tid=145250 > # assert(false) failed: there should be an oop in OopMap instead of a live raw oop at safepoint > # > # JRE version: OpenJDK Runtime Environment (27.0) (fastdebug build 27-internal-git-362f4c7acc8) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 27-internal-git-362f4c7acc8, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # V [libjvm.so+0x72ae50] OopFlow::build_oop_map(Node*, int, PhaseRegAlloc*, int*)+0xf80 > # > > > And the VM options: > > -ea -esa -Xmx768m -XX:UseSVE=0 -XX:MaxVectorSize=8 --add-modules=jdk.incubator.vector -XX:CompileCommand=inline,*VectorAlgorithmsImpl*::* -XX:-BackgroundCompilation -XX:CompileCommand=quiet > > Could you please take a look? Thanks! @XiaohongGong The failure is of course unrelated, since we have no VM changes here. A bit scary that a random "demo benchmark" triggers a bug :/ I could reproduce it as well, extracted a stand-alone test, and filed: https://bugs.openjdk.org/browse/JDK-8376189 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3789751176 From ghan at openjdk.org Fri Jan 23 11:40:58 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Fri, 23 Jan 2026 11:40:58 GMT Subject: Integrated: 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 08:42:07 GMT, Guanqiang Han wrote: > Please review this change. Thanks! > > **Description:** > > When -XX:+PrintDeoptimizationDetails is enabled, vframeArray.cpp prints interpreted frames under ttyLocker. > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/runtime/vframeArray.cpp#L493-L503 > In the WizardMode && Verbose branch it calls Method::print_codes() / print_codes_on(). The bytecode printing path may acquire MDOExtraData_lock (rank nosafepoint-1). > https://github.com/openjdk/jdk/blob/578204f8c49f06be8b9c4855359ca61c9e107678/src/hotspot/share/interpreter/bytecodeTracer.cpp#L597-L602 > Calling it while holding tty_lock violates the global lock ranking order and triggers a lock rank inversion assertion. > > **Fix:** > > Generate the bytecode dump before taking tty_lock, and print it afterwards under ttyLocker to keep the output coherent while preserving correct lock acquisition order. > To avoid double buffering when the target stream is already a thread-local stringStream, extend BytecodeTracer::print_method_codes with a buffered flag . The new call site uses buffered=false when dumping into the temporary stringStream. > > **Test:** > > GHA This pull request has now been integrated. Changeset: 6f6966b2 Author: Guanqiang Han Committer: Dean Long URL: https://git.openjdk.org/jdk/commit/6f6966b28b2c5a18b001be49f5db429c667d7a8f Stats: 71 lines in 6 files changed: 54 ins; 2 del; 15 mod 8374862: assert(false) failed: Attempting to acquire lock MDOExtraData_lock/nosafepoint-1 out of order with lock tty_lock/tty -- possible deadlock (running with -XX:+Verbose -XX:+WizardMode -XX:+PrintDeoptimizationDetails) Reviewed-by: dholmes, dlong ------------- PR: https://git.openjdk.org/jdk/pull/29186 From qamai at openjdk.org Fri Jan 23 11:44:31 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 11:44:31 GMT Subject: RFR: 8350330: C2: PhaseIdealLoop::add_parse_predicate() should mirror GraphKit::add_parse_predicate() In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 16:22:34 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::add_parse_predicate()` was intented to mirror > `GraphKit::add_parse_predicate()` but it doesn't. That last one checks > `too_many_traps` per bci but the `PhaseIdealLoop` version doesn't. As > demonstrated by the test case, a method can get compiled with a > predicate, take a trap, and get recompiled with the same predicate > many times (up to ~100). LGTM, too. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/29367#pullrequestreview-3697117490 From mhaessig at openjdk.org Fri Jan 23 12:37:24 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 23 Jan 2026 12:37:24 GMT Subject: RFR: 8359335: Template-Framework Library: Primitive Types subtyping In-Reply-To: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> References: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> Message-ID: On Wed, 21 Jan 2026 15:56:36 GMT, Manuel H?ssig wrote: > This PR implements the subtype relation for primitive types in the Template Framework Library according to JLS ?4.10.1. Further, this PR extends the primitive type example to test the addition. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 Thank you for writing this up. > it would be nice to restrict Expression nesting to exact type matches, rather than isSubtypeOf This breaks the assumptions of current template framework tests, but that should be easy enough to fix. Generally, I think that we will not get around annotating float division operators with the possibility for an `ArithmeticException`. Saying that we want to nest on exact types would only push that out into the future. Also, I do not really see the problem with the extra try-catch since C2 should be able to prove it is not necessary when all involved types are floats. Thus, my proposal is slightly different: - Add the possibility for `Expression` to nest with exact or subtypes. - Make the `ExpressionFuzzer` nest on exact types to preserve the current behavior and mitigate the sampling problem you laid out above. - Keep the exception annotation on the float division operations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29349#issuecomment-3790039695 From chagedorn at openjdk.org Fri Jan 23 13:22:45 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 Jan 2026 13:22:45 GMT Subject: RFR: 8375272: [IR Framework] Miscellaneous clean-ups Message-ID: This patch applies various unrelated clean-ups split out from a first prototype for [JDK-8375270](https://bugs.openjdk.org/browse/JDK-8375270). This should ease reviews. I left some comments in the PR to further explain some details. Thanks, Christian ------------- Commit messages: - fix tests - fix tests - fix build - cleanups Changes: https://git.openjdk.org/jdk/pull/29362/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29362&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375272 Stats: 162 lines in 15 files changed: 81 ins; 12 del; 69 mod Patch: https://git.openjdk.org/jdk/pull/29362.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29362/head:pull/29362 PR: https://git.openjdk.org/jdk/pull/29362 From chagedorn at openjdk.org Fri Jan 23 13:22:49 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 Jan 2026 13:22:49 GMT Subject: RFR: 8375272: [IR Framework] Miscellaneous clean-ups In-Reply-To: References: Message-ID: <_r32auB1BHeHwsFMJy1mIf8FoJ3WXnYCChXSaaItois=.3211fe73-0e7a-4f5b-87bb-e26ff70b0eca@github.com> On Thu, 22 Jan 2026 14:51:08 GMT, Christian Hagedorn wrote: > This patch applies various unrelated clean-ups split out from a first prototype for [JDK-8375270](https://bugs.openjdk.org/browse/JDK-8375270). This should ease reviews. I left some comments in the PR to further explain some details. > > Thanks, > Christian test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 186: > 184: private int defaultWarmup = -1; > 185: private boolean testClassesOnBootClassPath; > 186: private boolean allowNotCompilable = false; For consistency. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irmethod/NotCompilableIRMethod.java line 27: > 25: > 26: import compiler.lib.ir_framework.IR; > 27: import compiler.lib.ir_framework.Test; Unused + added import for Javadocs. Same below. test/hotspot/jtreg/compiler/lib/ir_framework/driver/network/testvm/java/IRRuleIds.java line 37: > 35: * Class to hold the indices of the applicable {@link IR @IR} rules of an {@link IRMethod}. > 36: */ > 37: public class IRRuleIds implements Iterable { Replaced usages of `int[] irRuleIds` with this class. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestCheckedTests.java line 188: > 186: super(s); > 187: } > 188: } Converted to an inner class to make the IDE stop complaining about duplicated classes in tests. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestSetupTests.java line 364: > 362: > 363: > 364: static class BadCheckedTestException extends RuntimeException { Same here: Converted to an inner class to make the IDE stop complaining about duplicated classes in tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29362#discussion_r2717254052 PR Review Comment: https://git.openjdk.org/jdk/pull/29362#discussion_r2717257890 PR Review Comment: https://git.openjdk.org/jdk/pull/29362#discussion_r2721157095 PR Review Comment: https://git.openjdk.org/jdk/pull/29362#discussion_r2717251432 PR Review Comment: https://git.openjdk.org/jdk/pull/29362#discussion_r2717252123 From dlunden at openjdk.org Fri Jan 23 13:30:55 2026 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 23 Jan 2026 13:30:55 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v19] In-Reply-To: <-4y1DobU4R4eTjnjpv56qGRCd-8wWSiE4LO1mVnnmZ4=.b8985412-7ec8-45f2-9e61-53ff0bf0c532@github.com> References: <-4y1DobU4R4eTjnjpv56qGRCd-8wWSiE4LO1mVnnmZ4=.b8985412-7ec8-45f2-9e61-53ff0bf0c532@github.com> Message-ID: On Wed, 21 Jan 2026 02:57:17 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 2. Fold a pointer `Phi`. >> >> Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. >> >> Another interesting case: >> >> Point p = Phi(p1, p2); >> p.x = v; >> p1.x = v1; >> int a = p.x; >> >> Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. >> >> 3. Nested objects >> >> It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: >> >> Point p = new Point; >> PointHolder h = new PointHolder; >> h.p = p; >> int x = p.x; >> escape(h); >> >> Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. >> >> Please... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: > > - Remove the TriBool > - Merge branch 'master' into loadfoldingigvn > - Fix dead accesses, address reviews > - Merge branch 'master' into loadfoldingigvn > - Early return when not a heap access > - Fix escape at store > - Fix outdated and unclear comments > - copyright year, return, comments, whitespace > - Merge branch 'master' into loadfoldingigvn > - ea of phis and nested objects > - ... and 12 more: https://git.openjdk.org/jdk/compare/93ce8b1a...ac82c2ea I, @anton-seoane , @sarannat , and @robcasloz had a joint look at this changeset earlier today, and here are some comments. Our main proposal is that we split this changeset into two: one preparatory and then one adding the actual local escape analysis. The reason is that you have, in addition to your changes, refactored and documented much of the code in `MemNode::find_previous_store`. This is good, of course, but makes reviewing the diff more complicated. Can you first make a separate PR just with the refactoring and documentation additions for the current mainline `MemNode::find_previous_store`? src/hotspot/share/opto/memnode.cpp line 1426: > 1424: // Secondly, from the set of allocations that may alias base, collect all nodes that may alias > 1425: // them, they may alias base as well. Actually, there may be cases that a may alias b and b may > 1426: // alias c but a may not alias c, but we are conservative here. We did not get an intuition for why this downwards pass is required in addition to the prior upwards pass. Do you have a test case that illustrates a situation where this is needed? src/hotspot/share/opto/memnode.cpp line 3879: > 3877: if (result == this) { > 3878: // the store may also apply to zero-bits in an earlier object > 3879: Node* prev_mem = find_previous_store(phase); It seems your changes to `find_previous_store` could also improve the analysis here for store nodes. Do you have an example illustrating this (or could you add one)? ------------- PR Review: https://git.openjdk.org/jdk/pull/28812#pullrequestreview-3697453169 PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2721175608 PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2721185492 From epeter at openjdk.org Fri Jan 23 13:33:45 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 13:33:45 GMT Subject: RFR: 8359335: Template-Framework Library: Primitive Types subtyping In-Reply-To: References: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> Message-ID: On Fri, 23 Jan 2026 12:34:27 GMT, Manuel H?ssig wrote: > > it would be nice to restrict Expression nesting to exact type matches, rather than isSubtypeOf > > This breaks the assumptions of current template framework tests, but that should be easy enough to fix. I wasn't aware. Do you have an example? > Generally, I think that we will not get around annotating float division operators with the possibility for an `ArithmeticException`. Saying that we want to nest on exact types would only push that out into the future. Also, I do not really see the problem with the extra try-catch since C2 should be able to prove it is not necessary when all involved types are floats. It's only a small point for me, so I could live with the annotation. But it is a small loss. If we ever don't want operations with exceptions, we'd have to filter them out, meaning we cannot have the float modulo. Ah: we could also add explicit float/double casts to the modulo operator arguments. That would at least force away any exception, and ensure we are choosing the float/double modulo, rather than int modulo. And why can't we get around adding the exception? Is there some fundamental restriction, or just the fear that we will continue to hit subtle bugs, and it's just not worth it? > Thus, my proposal is slightly different: > > * Add the possibility for `Expression` to nest with exact or subtypes. > * Make the `ExpressionFuzzer` nest on exact types to preserve the current behavior and mitigate the sampling problem you laid out above. I'm fine with that :) > * Keep the exception annotation on the float division operations. This we can keep discussing ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/29349#issuecomment-3790251512 From mchevalier at openjdk.org Fri Jan 23 13:35:43 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 23 Jan 2026 13:35:43 GMT Subject: RFR: 8375272: [IR Framework] Miscellaneous clean-ups In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 14:51:08 GMT, Christian Hagedorn wrote: > This patch applies various unrelated clean-ups split out from a first prototype for [JDK-8375270](https://bugs.openjdk.org/browse/JDK-8375270). This should ease reviews. I left some comments in the PR to further explain some details. > > Thanks, > Christian Overall, looks good. i'll look more in-depth, but some early comments: - We have quite a few unused import in tests and other... Nice to clean up! - I like the additional const in C++. Const all the things! - Some files (e.g. test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java) are missing their copyright update. ------------- PR Review: https://git.openjdk.org/jdk/pull/29362#pullrequestreview-3697494312 From dlunden at openjdk.org Fri Jan 23 13:36:56 2026 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 23 Jan 2026 13:36:56 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v17] In-Reply-To: References: <2hymR0S4xgySuZlpKY7h-xNVvJYo51YM05FfEEkgBYo=.20f04939-8170-4840-ac51-4dca7b01b3e4@github.com> Message-ID: On Fri, 16 Jan 2026 10:51:39 GMT, Daniel Lund?n wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: >> >> - Merge branch 'master' into loadfoldingigvn >> - Early return when not a heap access >> - Fix escape at store >> - Fix outdated and unclear comments >> - copyright year, return, comments, whitespace >> - Merge branch 'master' into loadfoldingigvn >> - ea of phis and nested objects >> - Add test scenarios >> - Add a flag to turn off the feature >> - Much more comments, refactor the data into a separate class >> - ... and 9 more: https://git.openjdk.org/jdk/compare/863b0237...c275e6e6 > > src/hotspot/share/opto/memnode.cpp line 2256: > >> 2254: // the alias index stuff. So instead, peek through Stores and IFF we can >> 2255: // fold up, do so. >> 2256: Node* prev_mem = find_previous_store(phase); > > Previously, we reached here even if `!can_reshape`. We no longer do so due to the additional check above. Is this correct? If so, can you add a brief comment explaining this? I see you added a comment here ("This performs complex analysis that requires a complete graph."), but I'm still worried that we disable optimizations that we previously performed when !can_reshape (as there was no can_reshape guard here before). Have you verified that adding the can_reshape guard here does not cause any regressions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2721215644 From dfenacci at openjdk.org Fri Jan 23 14:11:45 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 23 Jan 2026 14:11:45 GMT Subject: RFR: 8375272: [IR Framework] Miscellaneous clean-ups In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 14:51:08 GMT, Christian Hagedorn wrote: > This patch applies various unrelated clean-ups split out from a first prototype for [JDK-8375270](https://bugs.openjdk.org/browse/JDK-8375270). This should ease reviews. I left some comments in the PR to further explain some details. > > Thanks, > Christian Overall looks good. Thanks for the cleanup @chhagedorn! Just a couple of nits. src/hotspot/share/opto/compile.cpp line 868: > 866: #ifndef PRODUCT > 867: if (should_print_ideal()) { > 868: print_ideal_ir("PrintIdeal"); I guess this is for consistency as well, right? test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java line 833: > 831: */ > 832: private void runTests() { > 833: TreeMap durations = (PRINT_TIMES || VERBOSE) ? new TreeMap<>() : null; Do we still need `VERBOSE` here? ------------- PR Review: https://git.openjdk.org/jdk/pull/29362#pullrequestreview-3697591566 PR Review Comment: https://git.openjdk.org/jdk/pull/29362#discussion_r2721330432 PR Review Comment: https://git.openjdk.org/jdk/pull/29362#discussion_r2721288609 From rcastanedalo at openjdk.org Fri Jan 23 14:18:00 2026 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 23 Jan 2026 14:18:00 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v19] In-Reply-To: <-4y1DobU4R4eTjnjpv56qGRCd-8wWSiE4LO1mVnnmZ4=.b8985412-7ec8-45f2-9e61-53ff0bf0c532@github.com> References: <-4y1DobU4R4eTjnjpv56qGRCd-8wWSiE4LO1mVnnmZ4=.b8985412-7ec8-45f2-9e61-53ff0bf0c532@github.com> Message-ID: On Wed, 21 Jan 2026 02:57:17 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 2. Fold a pointer `Phi`. >> >> Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. >> >> Another interesting case: >> >> Point p = Phi(p1, p2); >> p.x = v; >> p1.x = v1; >> int a = p.x; >> >> Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. >> >> 3. Nested objects >> >> It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: >> >> Point p = new Point; >> PointHolder h = new PointHolder; >> h.p = p; >> int x = p.x; >> escape(h); >> >> Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. >> >> Please... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: > > - Remove the TriBool > - Merge branch 'master' into loadfoldingigvn > - Fix dead accesses, address reviews > - Merge branch 'master' into loadfoldingigvn > - Early return when not a heap access > - Fix escape at store > - Fix outdated and unclear comments > - copyright year, return, comments, whitespace > - Merge branch 'master' into loadfoldingigvn > - ea of phis and nested objects > - ... and 12 more: https://git.openjdk.org/jdk/compare/d18e3a33...ac82c2ea test/hotspot/jtreg/compiler/escapeAnalysis/TestLoadFolding.java line 1: > 1: /* All new tests trigger escaping by passing an allocated object into a non-inlined method (`escape`) or by storing the allocated object into an input `PointHolder` parameter. It would be good to add a few positive and negative tests that exercise escaping by storing the allocated object into a static field. test/hotspot/jtreg/compiler/escapeAnalysis/TestLoadFolding.java line 129: > 127: @IR(applyIf = {"DoLocalEscapeAnalysis", "true"}, counts = {IRNode.ALLOC, "1"}) > 128: public Point test105(int begin, int end) { > 129: // Fold the load that is a part of a cycle This comment, and the fact that this test is part of `runPositiveTests()`, seem to imply that we expect the load to be folded, but the current version of this changeset cannot do that, right? (due to the limitations listed in the PR description). If so, please update the comment, and consider moving this test and other tests that currently illustrate "future work" scenarios (`test106` and `test112`-`test115` I believe) into a separate `@Run` method (e.g. `runUnhandledTests` or similar). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2721375326 PR Review Comment: https://git.openjdk.org/jdk/pull/28812#discussion_r2721350190 From epeter at openjdk.org Fri Jan 23 14:42:27 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jan 2026 14:42:27 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v13] In-Reply-To: References: Message-ID: > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > I guess this is for consistency as well, right? Yes, exactly, that was the only reason. > test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java line 833: > >> 831: */ >> 832: private void runTests() { >> 833: TreeMap durations = (PRINT_TIMES || VERBOSE) ? new TreeMap<>() : null; > > Do we still need `VERBOSE` here? Good catch! In the complete prototype fix, I removed `durations`. But for now it's correct to remove `VERBOSE` since `PRINT_TIMES` is already set when `VERBOSE` is also set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29362#discussion_r2721508551 PR Review Comment: https://git.openjdk.org/jdk/pull/29362#discussion_r2721504122 From chagedorn at openjdk.org Fri Jan 23 15:28:54 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 Jan 2026 15:28:54 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v29] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 16:38:25 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix safepoint detection Thanks for doing that! I've ran it on the old state up to tier7 which looked good. Unfortunately, there was some intermittent closed test failure again with the current non-DIFF-patch with latest mainline. I will investigate further and try to extract a reproducer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3790777110 From qamai at openjdk.org Fri Jan 23 15:55:52 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 15:55:52 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 14:02:35 GMT, Daniel Lund?n wrote: >> I have made further changes that I believe have made the change pretty rigorous, I don't think I can see any flaw in the reasoning that allows mis-analysis now. > > Looks interesting @merykitty! I will also review this. @dlunde Thanks for your suggestion, that's a good idea, I have created #29390 , after we are finished with that we can come back to this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3790920753 From qamai at openjdk.org Fri Jan 23 16:00:28 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 16:00:28 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store Message-ID: Hi, This patch refactors the logic in `MemNode::find_previous_store` and makes a small improvement to `MemNode::detect_ptr_independence`. An IR test accompanies the improvement. Please take a look and share your thoughts, thanks a lot. ------------- Commit messages: - Refactor the logic in MemNode::find_previous_store Changes: https://git.openjdk.org/jdk/pull/29390/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376220 Stats: 170 lines in 2 files changed: 133 ins; 10 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/29390.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29390/head:pull/29390 PR: https://git.openjdk.org/jdk/pull/29390 From mhaessig at openjdk.org Fri Jan 23 16:07:26 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 23 Jan 2026 16:07:26 GMT Subject: RFR: 8359335: Template-Framework Library: Primitive Types subtyping In-Reply-To: References: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> Message-ID: <-RSZ6hVzLKo-Lb5Ik8GyfGwJ009zFM29mpH8ghb3KdU=.966790e2-1017-4186-b3d1-609aa3a37cf6@github.com> On Fri, 23 Jan 2026 13:31:19 GMT, Emanuel Peter wrote: > I wasn't aware. Do you have an example? `testlibrary_tests/template_framework/tests/TestExpression.java` will fail with `Template rendering mismatch`. I tried it before I added the exception to the float division operators. > just the fear that we will continue to hit subtle bugs, and it's just not worth it? Mainly this > we could also add explicit float/double casts to the modulo operator arguments. That would at least force away any exception, and ensure we are choosing the float/double modulo, rather than int modulo. I like this. Then the question is, whether we want both or only the one with or the one without the annotation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29349#issuecomment-3790974578 From dfenacci at openjdk.org Fri Jan 23 16:08:33 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 23 Jan 2026 16:08:33 GMT Subject: RFR: 8375272: [IR Framework] Miscellaneous clean-ups [v2] In-Reply-To: <6rGK6c2xEi0Y8Rce1f8kwPqbOxKBf2W7-ToVPO5lXy8=.d33314ea-4c5c-49b7-b8a4-990410db1ffe@github.com> References: <6rGK6c2xEi0Y8Rce1f8kwPqbOxKBf2W7-ToVPO5lXy8=.d33314ea-4c5c-49b7-b8a4-990410db1ffe@github.com> Message-ID: On Fri, 23 Jan 2026 14:52:23 GMT, Christian Hagedorn wrote: >> This patch applies various unrelated clean-ups split out from a first prototype for [JDK-8375270](https://bugs.openjdk.org/browse/JDK-8375270). This should ease reviews. I left some comments in the PR to further explain some details. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > review Marc & Damon Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29362#pullrequestreview-3698307204 From roland at openjdk.org Fri Jan 23 16:26:21 2026 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 23 Jan 2026 16:26:21 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test [v2] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 15:23:55 GMT, Roland Westrelin wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - more clarification >> - Refine comments > > As I understand, this change removes logic that's overly conservative but doesn't address any correctness issue (i.e. there's no crash or incorrect execution that this fixes). Given the new logic is less conservative, there should be cases where the code optimizes better with this change. Would it make sense to add IR test cases to catch regressions? > @rwestrel Thanks for taking a look, this PR makes it strictly more conservative by pinning more nodes when an `IfNode` is elided. However, this conservativeness is necessary, any optimization that can arise from a node not pinned is incorrect, such as [JDK-8331717](https://bugs.openjdk.org/browse/JDK-8331717) or [JDK-8257822](https://bugs.openjdk.org/browse/JDK-8257822). But those bugs are fixed. Are there known bugs that this PR fixes? The way I understand it, you're replacing some existing logic that addresses this kind of issues with some new logic that works some other way but addresses the same issues. Also, with attached test case, current c2 code compiles the method with 2 `DivI` nodes while, with your patch, it can common the 2 identical `DivI` nodes and we end up with one. That feels less conservative. So I'm wondering in what way it is more conservative. (FYI, I agree with the general direction of this patch) [TestRedundantDivs.java](https://github.com/user-attachments/files/24824874/TestRedundantDivs.java) ------------- PR Comment: https://git.openjdk.org/jdk/pull/29158#issuecomment-3791067096 From jvernee at openjdk.org Fri Jan 23 16:44:05 2026 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 23 Jan 2026 16:44:05 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 00:03:10 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Move the test to a core library purposed directory Marked as reviewed by jvernee (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28540#pullrequestreview-3698507443 From qamai at openjdk.org Fri Jan 23 17:05:01 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 17:05:01 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch refactors the logic in `MemNode::find_previous_store` and makes a small improvement to `MemNode::detect_ptr_independence`. An IR test accompanies the improvement. > > Please take a look and share your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Fix null access ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29390/files - new: https://git.openjdk.org/jdk/pull/29390/files/4ce0cf4f..ac27cabf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29390.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29390/head:pull/29390 PR: https://git.openjdk.org/jdk/pull/29390 From qamai at openjdk.org Fri Jan 23 17:41:33 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 17:41:33 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test [v2] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 16:22:25 GMT, Roland Westrelin wrote: >> As I understand, this change removes logic that's overly conservative but doesn't address any correctness issue (i.e. there's no crash or incorrect execution that this fixes). Given the new logic is less conservative, there should be cases where the code optimizes better with this change. Would it make sense to add IR test cases to catch regressions? > >> @rwestrel Thanks for taking a look, this PR makes it strictly more conservative by pinning more nodes when an `IfNode` is elided. However, this conservativeness is necessary, any optimization that can arise from a node not pinned is incorrect, such as [JDK-8331717](https://bugs.openjdk.org/browse/JDK-8331717) or [JDK-8257822](https://bugs.openjdk.org/browse/JDK-8257822). > > But those bugs are fixed. Are there known bugs that this PR fixes? > The way I understand it, you're replacing some existing logic that addresses this kind of issues with some new logic that works some other way but addresses the same issues. > Also, with attached test case, current c2 code compiles the method with 2 `DivI` nodes while, with your patch, it can common the 2 identical `DivI` nodes and we end up with one. That feels less conservative. So I'm wondering in what way it is more conservative. > > (FYI, I agree with the general direction of this patch) > > [TestRedundantDivs.java](https://github.com/user-attachments/files/24824874/TestRedundantDivs.java) @rwestrel Those bugs are fixed, but the fixes did not address the core issue, and I believe there is a chance a similar issue would surface in the future. As a result, this PR tries to fix those bugs again in a more systematic manner. In essence, the current fixes make it so that a `DivNode` acts similarly to a pinned node, albeit still returns `true` on `depends_only_on_test`. This PR makes it so that `DivNode`s act properly as a `depends_only_on_test` node. So you are right, for division and modulo, this PR relaxes their constraints, and I will add an IR test to verify this relaxation. For all other nodes, this PR makes them more conservative. This is necessary, as they act the same as `DivNode`s before those fixes, and the reason we only see the failures with `DivNode`s is that they throw a signal when their dependency is violated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29158#issuecomment-3791446210 From qamai at openjdk.org Fri Jan 23 18:08:43 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 23 Jan 2026 18:08:43 GMT Subject: RFR: 8347365: C2: Fix the handling of depends_only_on_test [v3] In-Reply-To: References: Message-ID: > Hi, > > This PR fixes the handling of `depends_only_on_test` when the control graph is transformed. It has to do with the theoretical idea of `depends_only_on_test`, copy from the JBS issue description: > > To start with, what is `depends_only_on_test`? Given a node `n` with the control input `c`, if `c` can be deduced from `c'` and `n->depends_only_on_test() == true`, then we can rewire the control input of `n` to `c'`. This means that `depends_only_on_test` does not mean that the node depends on a test, it means that the node depends on the test that is its control input. > > For example: > > if (y != 0) { > if (x > 0) { > if (y != 0) { > x / y; > } > } > } > > Then `x/y` `depends_only_on_test` because its control input is the test `y != 0`. Then, we can rewire the control input of the division to the outer `y != 0`, resulting in: > > if (y != 0) { > x / y; > if (x > 0) { > } > } > > On the other hand, consider this case: > > if (x > 0) { > if (y != 0) { > if (x > 0) { > x / y; > } > } > } > > Then `x/y` does not `depends_only_on_test` because its control input is the test `x > 0` which is unrelated, we can see that if we rewire the division to the outer `x > 0` test, the division floats above the actual test `y != 0`. This means that `depends_only_on_test` is a dynamic property of a node, and not a static property of the division operation. It can change when we transform the graph and it can be different for different nodes of the same kind. > > More details can be found in the description of `Node::depends_only_on_test` and `Node::pin_node_under_control` in this change. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into dependsonlyontest - Add IR test - more clarification - Refine comments - Fix depends_only_on_test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29158/files - new: https://git.openjdk.org/jdk/pull/29158/files/899d89f3..511c3f84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29158&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29158&range=01-02 Stats: 54484 lines in 1043 files changed: 33969 ins; 11266 del; 9249 mod Patch: https://git.openjdk.org/jdk/pull/29158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29158/head:pull/29158 PR: https://git.openjdk.org/jdk/pull/29158 From liach at openjdk.org Fri Jan 23 18:26:09 2026 From: liach at openjdk.org (Chen Liang) Date: Fri, 23 Jan 2026 18:26:09 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v12] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: - Design detail updates, thanks to jorn - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Missed IR test review, rearrange benches - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Stage - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Review - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Bugs and verify loader leak - Try to avoid loader leak - ... and 13 more: https://git.openjdk.org/jdk/compare/4ac48240...77ea5565 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/1d5461db..77ea5565 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=10-11 Stats: 93166 lines in 3834 files changed: 46542 ins; 16001 del; 30623 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From jvernee at openjdk.org Fri Jan 23 19:22:03 2026 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 23 Jan 2026 19:22:03 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v12] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 18:26:09 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: > > - Design detail updates, thanks to jorn > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Missed IR test review, rearrange benches > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Stage > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Review > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Bugs and verify loader leak > - Try to avoid loader leak > - ... and 13 more: https://git.openjdk.org/jdk/compare/d8405cdd...77ea5565 src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2043: > 2041: // > 2042: // Condition 1 and 2 indicates this access descriptor may see a VarHandle > 2043: // different from the captured VarHandle. Condition 3 requires the Suggestion: // Condition 1 and 2 indicates this access descriptor may see a VarHandle // different from the captured VarHandle. I don't see how this follows from condition 1 and 2 holding. A failure in either condition means we are may see multiple VH instance. I suggest: Suggestion: // If either condition 1 or 2 doesn't hold, we may see different var handle instances using the // same shared AccessDescriptor. In those cases we only cache the adaptation for one of the // var handle instances (the first one). Other instances will always use the slow path. src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2049: > 2047: // such as compareAndSet can appear at two sites, where each site > 2048: // has its own constant VarHandle. Such a usage pattern hurts adaption, > 2049: // but is perfectly dealt by the getMethodType_V constant folding branch. I think this information should be put on the template string instead, since it's mostly referencing that code. I think it's enough to say that we 'skip potentially costly adaptation' by going through the `getMethodType_V` branch. >From the text as written, it's not clear why such usage patterns hurt adaptation. I think It would be more accurate to say that: "in such cases, only one of the two var handles will have its adaptation cached". src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2055: > 2053: // invocation type of the underlying MemberName, or MH for indirect VH), > 2054: // perform a foldable lookup with a hash table, and hope C2 inline it > 2055: // all. Such an optimization applies for general MethodHandle.asType. Suggestion: // all. Such an optimization applies to general MethodHandle.asType. src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2055: > 2053: // invocation type of the underlying MemberName, or MH for indirect VH), > 2054: // perform a foldable lookup with a hash table, and hope C2 inline it > 2055: // all. Such an optimization applies for general MethodHandle.asType. This reads as "put a ... to another ... perform", which doesn't sounds grammatically correct. Maybe: Suggestion: // In the long run, we wish to cache each specific-type invoker that converts // from one fixed type (symbolicMethodTypeInvoker) to another (the // invocation type of the underlying MemberName, or MH for indirect VH), // using a foldable lookup with a hash table, and hope C2 inline it // all. Such an optimization applies for general MethodHandle.asType. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2722389574 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2722416361 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2722376203 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2722514070 From krk at openjdk.org Fri Jan 23 19:31:49 2026 From: krk at openjdk.org (Kerem Kat) Date: Fri, 23 Jan 2026 19:31:49 GMT Subject: RFR: 8356184: C2 MemorySegment: long RangeCheck with ConvI2L(iv + invar) prevents RCE Message-ID: `MemorySegment` bounds checks use long arithmetic, but when accessing with an int loop variable plus an int invariant offset, the pattern `ConvI2L(iv + invar)` was not recognized by Range Check Elimination. This prevented RCE and consequently blocked vectorization for common `MemorySegment` access patterns. The fix teaches `is_scaled_iv_plus_offset` to recognize linear int expressions inside `ConvI2L`. A new `short_offset` flag signals that the offset is part of int arithmetic (not added separately in long), requiring the range to be clamped at `max_jint + 1` to correctly handle potential int overflow. This also removes pre-existing dead code where an `exp_bt != bt` check was intended to bail out on such patterns but never actually executed. With this change, `MemorySegment` loops using int invariant offsets now benefit from RCE and vectorization, matching the behavior already supported for long invariant offsets. void process(MemorySegment segment, int offset, int size) { for (int i = 0; i < size; i++) { long addr = i + offset; // ConvI2L(AddI(iv, offset)) was not recognized segment.set(JAVA_BYTE, addr, (byte) 0); } } ------------- Commit messages: - 8356184: C2 MemorySegment: long RangeCheck with ConvI2L(iv + invar) prevents RCE Changes: https://git.openjdk.org/jdk/pull/29392/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29392&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356184 Stats: 112 lines in 4 files changed: 82 ins; 5 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/29392.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29392/head:pull/29392 PR: https://git.openjdk.org/jdk/pull/29392 From vlivanov at openjdk.org Fri Jan 23 19:59:25 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 23 Jan 2026 19:59:25 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v28] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge remote-tracking branch 'upstream/master' into 8290892.rf - IR test cases - cleanups - Merge branch 'master' into 8290892.rf - Revise RF redunancy & auto-boxed primitives handling Cleanups - updates - update - updates - Merge branch 'master' into 8290892.rf - cleanups - ... and 27 more: https://git.openjdk.org/jdk/compare/983ae96f...be42a719 ------------- Changes: https://git.openjdk.org/jdk/pull/25315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=27 Stats: 1519 lines in 39 files changed: 1457 ins; 20 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From liach at openjdk.org Fri Jan 23 20:08:58 2026 From: liach at openjdk.org (Chen Liang) Date: Fri, 23 Jan 2026 20:08:58 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v13] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Wording update, thanks Jorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/77ea5565..5ba75d45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=11-12 Stats: 16 lines in 2 files changed: 1 ins; 3 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From vlivanov at openjdk.org Fri Jan 23 21:41:31 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 23 Jan 2026 21:41:31 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jan 2026 08:56:19 GMT, Xiaohong Gong wrote: >> ### Problem: >> >> Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: >> >> >> // A fatal error has been detected by the Java Runtime Environment: >> // >> // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 >> // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector >> // ... >> >> >> The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 >> >> ### Root Cause: >> >> The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. >> >> Here is the simplified ideal graph showing the crash scenario: >> >> >> Con #top >> | ConI >> \ / >> \ / >> VectorStoreMask >> | >> VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong >> >> >> ### Detailed Scenario: >> >> Following is the method in the test case that hits the assertion: >> >> https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 >> >> This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. >> >> When compiling a specific test case such as: >> https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 >> >> the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: >> >> >> VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() >> / \ >> AddP \ >> | \ >> LoadNClass \ >> ConP #IntMaxMask | | >> \ | | >> \ DecodeN... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Ensure it is vector type for vector unbox result Testing passed (hs-tier1 - hs-tier4). ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29057#pullrequestreview-3699793380 From jrose at openjdk.org Fri Jan 23 21:49:50 2026 From: jrose at openjdk.org (John R Rose) Date: Fri, 23 Jan 2026 21:49:50 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 00:03:10 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Move the test to a core library purposed directory Good work. ------------- Marked as reviewed by jrose (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28540#pullrequestreview-3699825444 From jrose at openjdk.org Fri Jan 23 22:13:18 2026 From: jrose at openjdk.org (John R Rose) Date: Fri, 23 Jan 2026 22:13:18 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v13] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 20:08:58 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Wording update, thanks Jorn make/jdk/src/classes/build/tools/methodhandle/VarHandleGuardMethodGenerator.java line 130: > 128: > 129: // The void bypass is necessary if a (name + return-dropping type) combination has multiple call sites, each > 130: // having its own constant VarHandle. In that case, the AccessMode::adaptedMethodHandle adaption mechanism wrong word: s/adaption/adaptation/ https://grammarist.com/usage/adaption-adaptation/ (Wiktionary implies they are the same, but they apply to different areas of discourse. The grammarist article indicates that "adaptation" is the more correct word here. Also, the word "adaption" seems to be mostly obsolete.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2723023261 From snatarajan at openjdk.org Fri Jan 23 22:17:44 2026 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Fri, 23 Jan 2026 22:17:44 GMT Subject: RFR: 8366861: Phase AFTER_LOOP_OPTS printed even though the method has no loops Message-ID: **Issue** When a test program with no loop is run with flag `-XX:PrintPhaseLevel=3,` the output prints `PHASE_AFTER_LOOP_OPTS `but does not print `PHASE_BEFORE_LOOP_OPTS` **Solution** This simple fix introduces a variable `_print_phase_loop_opts` that caches the value of `has_loops()` before loop opts pass to make sure that `PHASE_AFTER_LOOP_OPTS` gets printed every time `PHASE_BEFORE_LOOP_OPTS` is printed. Before fix output image After fix output image **Testing** Github Actions, Tier 1-3 ------------- Commit messages: - fix - initial fix Changes: https://git.openjdk.org/jdk/pull/29387/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29387&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8366861 Stats: 6 lines in 2 files changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/29387.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29387/head:pull/29387 PR: https://git.openjdk.org/jdk/pull/29387 From jrose at openjdk.org Fri Jan 23 23:05:30 2026 From: jrose at openjdk.org (John R Rose) Date: Fri, 23 Jan 2026 23:05:30 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v13] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 20:08:58 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Wording update, thanks Jorn src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2022: > 2020: // In a class file, the JVM creates one access descriptor for one (name, type) combination. > 2021: // Many call sites in one class can have the same (name, type) combination. > 2022: // In this case, they share the same access descriptor. I love it when, as part of maintenance, informative comments like these are added. Thanks! Please add a comment something like this as well: // Note: The integers type and mode are proxies for the AccessType and // AccessMode enumerations, and the access type simply summarizes something // about the shape of the access mode. The crucial type here, of the (name, type) // combination, is the MethodType that decorates the access shape with specific // strong types for the handle operation inputs and outputs. I think it was a small faux pas, some time ago, to choose the term `AccessType` instead of `AccessKind`, simply because the term "type" is already disastrously overloaded in our system. But that?s water under the bridge. Now we have one more "type" floating around in this neighborhood. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2723131985 From jrose at openjdk.org Sat Jan 24 01:16:48 2026 From: jrose at openjdk.org (John R Rose) Date: Sat, 24 Jan 2026 01:16:48 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v13] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 20:08:58 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Wording update, thanks Jorn src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2014: > 2012: // Exists for the adaption mechanism of AccessDescriptor > 2013: // Each VH should report its explicitly (receiver, coordinates) and > 2014: // implicitly (static declaring class) used class to MethodHandle.isReachableFrom Perhaps add a comment: "Classes which define this abstract method should themselves be final or locally sealed, to make it possible to ensure that all relevant classes are taken into account." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2723365226 From jrose at openjdk.org Sat Jan 24 01:21:04 2026 From: jrose at openjdk.org (John R Rose) Date: Sat, 24 Jan 2026 01:21:04 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v13] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 20:08:58 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Wording update, thanks Jorn Good work. I?m assuming you will address my previous comments. src/java.base/share/classes/java/lang/invoke/SegmentVarHandle.java line 69: > 67: @Override > 68: boolean isReachableFrom(ClassLoader cl) { > 69: return true; Give a comment explaining why this is correct. Something like: The segment is neither an instance nor an array, so it is effectively common to all class loaders. Compare this to a class on the boot class loader, which also uses only types that are common reachable from all other class loaders. (Or something like that.) ------------- Marked as reviewed by jrose (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28585#pullrequestreview-3700343726 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2723372435 From duke at openjdk.org Sat Jan 24 03:23:06 2026 From: duke at openjdk.org (duke) Date: Sat, 24 Jan 2026 03:23:06 GMT Subject: Withdrawn: 8371768: AArch64: test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java fails on SVE after JDK-8340093 In-Reply-To: References: Message-ID: <2-MoBWd1gVfC4SZBfCb7pPP6_3YECb0wJ-QEVbqRvsU=.209aaa60-19c7-49c8-b59b-410b9cc30634@github.com> On Thu, 20 Nov 2025 11:02:54 GMT, Aleksey Shipilev wrote: > Looks like the test should be more resilient with UseSVE > 0, which _can_ vectorise. It does not look all that reliable to me to failOn when vectorization actually happens. So I dropped some non-arch-specific rules, and amended AArch64-specific rules for UseSVE. > > Testing: > - [x] Linux AArch64 server fastdebug, affected test on machine with UseSVE=1 by default > - [x] Linux AArch64 server fastdebug, affected test on machine with UseSVE=0 overridden This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/28423 From qamai at openjdk.org Sat Jan 24 17:52:49 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 24 Jan 2026 17:52:49 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v3] In-Reply-To: References: Message-ID: <2oJbWws_9cew3VkGG86QsfeJEvRWVrLOEbyXN4hI2Ew=.715e292e-70f2-439c-be0f-954c76e2973b@github.com> > Hi, > > This patch refactors the logic in `MemNode::find_previous_store` and makes a small improvement to `MemNode::detect_ptr_independence`. An IR test accompanies the improvement. > > Please take a look and share your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Fix test failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29390/files - new: https://git.openjdk.org/jdk/pull/29390/files/ac27cabf..4a88d34b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=01-02 Stats: 35 lines in 2 files changed: 14 ins; 1 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/29390.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29390/head:pull/29390 PR: https://git.openjdk.org/jdk/pull/29390 From hgreule at openjdk.org Sat Jan 24 18:28:01 2026 From: hgreule at openjdk.org (Hannes Greule) Date: Sat, 24 Jan 2026 18:28:01 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v3] In-Reply-To: <2oJbWws_9cew3VkGG86QsfeJEvRWVrLOEbyXN4hI2Ew=.715e292e-70f2-439c-be0f-954c76e2973b@github.com> References: <2oJbWws_9cew3VkGG86QsfeJEvRWVrLOEbyXN4hI2Ew=.715e292e-70f2-439c-be0f-954c76e2973b@github.com> Message-ID: On Sat, 24 Jan 2026 17:52:49 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors the logic in `MemNode::find_previous_store` and makes a small improvement to `MemNode::detect_ptr_independence`. An IR test accompanies the improvement. >> >> Please take a look and share your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Fix test failures test/hotspot/jtreg/compiler/c2/gvn/TestFindStore.java line 31: > 29: * @test > 30: * @bug 8360192 > 31: * @summary Tests that count bits nodes are handled correctly. I assume this is a copy-paste leftover? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2724467917 From qamai at openjdk.org Sat Jan 24 18:43:16 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 24 Jan 2026 18:43:16 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v3] In-Reply-To: References: <2oJbWws_9cew3VkGG86QsfeJEvRWVrLOEbyXN4hI2Ew=.715e292e-70f2-439c-be0f-954c76e2973b@github.com> Message-ID: On Sat, 24 Jan 2026 18:25:01 GMT, Hannes Greule wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test failures > > test/hotspot/jtreg/compiler/c2/gvn/TestFindStore.java line 31: > >> 29: * @test >> 30: * @bug 8360192 >> 31: * @summary Tests that count bits nodes are handled correctly. > > I assume this is a copy-paste leftover? Nice catch, I have fixed that :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2724481190 From qamai at openjdk.org Sat Jan 24 18:43:15 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 24 Jan 2026 18:43:15 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v4] In-Reply-To: References: Message-ID: <_DCQEBinOHkFUYvFf7boqdWG9VD4aaRaU0SwO2hct-w=.0c474ed2-23ab-4263-a89b-6ac4a94d7f14@github.com> > Hi, > > This patch refactors the logic in `MemNode::find_previous_store` and makes a small improvement to `MemNode::detect_ptr_independence`. An IR test accompanies the improvement. > > Please take a look and share your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Test description ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29390/files - new: https://git.openjdk.org/jdk/pull/29390/files/4a88d34b..89000ae8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=02-03 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29390.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29390/head:pull/29390 PR: https://git.openjdk.org/jdk/pull/29390 From qamai at openjdk.org Sat Jan 24 18:53:53 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 24 Jan 2026 18:53:53 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v4] In-Reply-To: <_DCQEBinOHkFUYvFf7boqdWG9VD4aaRaU0SwO2hct-w=.0c474ed2-23ab-4263-a89b-6ac4a94d7f14@github.com> References: <_DCQEBinOHkFUYvFf7boqdWG9VD4aaRaU0SwO2hct-w=.0c474ed2-23ab-4263-a89b-6ac4a94d7f14@github.com> Message-ID: On Sat, 24 Jan 2026 18:43:15 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors the logic in `MemNode::find_previous_store` and makes a small improvement to `MemNode::detect_ptr_independence`. An IR test accompanies the improvement. >> >> Please take a look and share your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Test description src/hotspot/share/opto/memnode.cpp line 1236: > 1234: // LoadVector/StoreVector needs additional check to ensure the types match. > 1235: if (st->is_StoreVector()) { > 1236: // Some kind of masked access or gather/scatter This condition is insufficient to determine if `this` inspects the same memory as `st`. Luckily, `LoadVectorMasked`, `LoadVectorGather`, and `LoadVectorGatherMasked` all have `store_Opcode()` being `-1`, preventing any folding with them. On the other hand, `LoadVector` has `store_Opcode()` being `Op_StoreVector`, so the only case here turns out the be correct. However, it is better to be precise here. src/hotspot/share/opto/memnode.cpp line 3565: > 3563: val->in(MemNode::Memory )->eqv_uncast(mem) && > 3564: val->as_Load()->store_Opcode() == Opcode()) { > 3565: if (!is_StoreVector()) { This condition here is also insufficient. But again, similar to above, only `LoadVector` has a valid `store_Opcode()`, and it is `Op_StoreVector`. Furthermore, we mistakenly check `mem->is_LoadVector()`, which is always `false`, so there is no chance of mis-optimization. src/hotspot/share/opto/memnode.cpp line 3567: > 3565: if (!is_StoreVector()) { > 3566: result = mem; > 3567: } else if (Opcode() == Op_StoreVector && val->Opcode() == Op_LoadVector && It should be possible to merge the check here and the check below. However, `LoadVectorNode` does not expose `indices` and `mask` the same way `StoreVectorNode` does. In addition, we need to verify the correctness thoroughly when changing `store_Opcode()` of `LoadVectorMask`, `LoadVectorGather`, and `LoadVectorGatherMasked` so as not to introduce any mis-optimization. As a result, I think it should be left to another PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2724484199 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2724485057 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2724491193 From qamai at openjdk.org Sat Jan 24 19:28:52 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 24 Jan 2026 19:28:52 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v5] In-Reply-To: References: Message-ID: <_DqULj-g1bNhAlaskqNJyTnW7r1a7-ykDp1z-gGvutE=.880fbf28-f484-4ec3-b585-a9301eb2de87@github.com> > Hi, > > This patch refactors the logic in `MemNode::find_previous_store` and makes a small improvement to `MemNode::detect_ptr_independence`. An IR test accompanies the improvement. > > Please take a look and share your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Add test store the loaded vector ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29390/files - new: https://git.openjdk.org/jdk/pull/29390/files/89000ae8..f48c006c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=03-04 Stats: 10 lines in 1 file changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29390.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29390/head:pull/29390 PR: https://git.openjdk.org/jdk/pull/29390 From jbhateja at openjdk.org Sun Jan 25 07:05:40 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 25 Jan 2026 07:05:40 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v15] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Refactoring and cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28002/files - new: https://git.openjdk.org/jdk/pull/28002/files/72d15568..0891bc70 Webrevs: - full: Webrev is not available because diff is too large - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=13-14 Stats: 822 lines in 48 files changed: 14 ins; 306 del; 502 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From jbhateja at openjdk.org Sun Jan 25 08:00:30 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 25 Jan 2026 08:00:30 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v16] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Refactoring vectorIntrinsics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28002/files - new: https://git.openjdk.org/jdk/pull/28002/files/0891bc70..aeba2e68 Webrevs: - full: Webrev is not available because diff is too large - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=14-15 Stats: 150 lines in 1 file changed: 74 ins; 14 del; 62 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From xgong at openjdk.org Mon Jan 26 01:54:08 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jan 2026 01:54:08 GMT Subject: RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector [v3] In-Reply-To: References: Message-ID: <4I_7Umnn_F7i-jYn_y2pEh5zErsc1Tvz5iEIZSziOC0=.0322f1cd-8570-4a12-b9a5-b700072eabdd@github.com> On Fri, 23 Jan 2026 21:38:46 GMT, Vladimir Ivanov wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Ensure it is vector type for vector unbox result > > Testing passed (hs-tier1 - hs-tier4). Thanks so much for your review @iwanowww and @merykitty ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29057#issuecomment-3797591827 From xgong at openjdk.org Mon Jan 26 01:54:09 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jan 2026 01:54:09 GMT Subject: Integrated: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 09:18:32 GMT, Xiaohong Gong wrote: > ### Problem: > > Test `compiler/vectorapi/VectorMaskToLongTest.java` crashes intermittently (approximately once per 200+ runs) with stress VM options such as `-XX:+StressIGVN`: > > > // A fatal error has been detected by the Java Runtime Environment: > // > // Internal Error (jdk/src/hotspot/share/opto/type.hpp:2287), pid=69056, tid=28419 > // assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector > // ... > > > The crash occurs in following code when calling `is_vect()` in the assertion added by JDK-8367292 [1]: > > https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/src/hotspot/share/opto/vectornode.cpp#L1920-L1924 > > ### Root Cause: > > The mask's type becomes TOP (unreachable) during compiler optimizations when the mask node is marked as dead before all its users are removed from the ideal graph. If `Ideal()` is subsequently called on a user node, it may access the TOP type, triggering the assertion. > > Here is the simplified ideal graph showing the crash scenario: > > > Con #top > | ConI > \ / > \ / > VectorStoreMask > | > VectorMaskToLong # !jvms: IntMaxVector$IntMaxMask::toLong > > > ### Detailed Scenario: > > Following is the method in the test case that hits the assertion: > > https://github.com/openjdk/jdk/blob/2cb228e142369ec73d768d8a69653a984b1c5908/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L65-L70 > > This method accepts a `VectorSpecies` parameter and calls vector APIs `VectorMask.fromLong()` and `toLong()`. It is called with species ranging from `ByteVector.SPECIES_MAX` to `DoubleVector.SPECIES_MAX`. During compilation, C2 speculatively generates fast paths for `toLong()` for all possible species. > > When compiling a specific test case such as: > https://github.com/openjdk/jdk/blob/6eaabed55ca4670d8c317f0a4323ccea4dd0b9ca/test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java#L177-L179 > > the compiler inlines the method and attempts to optimize away unreachable branches. The following graph shows the situation before the mask becomes `TOP`: > > > VectorBox # DoubleMaxMask, generated by VectorMask.fromLong() > / \ > AddP \ > | \ > LoadNClass \ > ConP #IntMaxMask | | > \ | | > \ DecodeNClass | > \ / | > \ / | > CmpP ... This pull request has now been integrated. Changeset: 38b66b12 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/38b66b12581a3745a37589e32aa0fc880d27b4d4 Stats: 14 lines in 2 files changed: 13 ins; 0 del; 1 mod 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector Reviewed-by: qamai, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/29057 From erfang at openjdk.org Mon Jan 26 02:02:01 2026 From: erfang at openjdk.org (Eric Fang) Date: Mon, 26 Jan 2026 02:02:01 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v12] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 06:34:00 GMT, Jatin Bhateja wrote: >> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. >> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. >> >> Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). >> >> Vector API jtreg tests pass at AVX level 2, remaining validation in progress. >> >> Performance numbers: >> >> >> System : 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms >> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms >> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms >> VectorSliceB... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Marked as reviewed by erfang (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/24104#pullrequestreview-3704448350 From xgong at openjdk.org Mon Jan 26 02:37:00 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jan 2026 02:37:00 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v12] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 06:34:00 GMT, Jatin Bhateja wrote: >> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. >> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. >> >> Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). >> >> Vector API jtreg tests pass at AVX level 2, remaining validation in progress. >> >> Performance numbers: >> >> >> System : 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms >> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms >> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms >> VectorSliceB... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions LGTM! Thanks for your updating! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/24104#pullrequestreview-3704485459 From xgong at openjdk.org Mon Jan 26 03:13:36 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jan 2026 03:13:36 GMT Subject: [jdk26] RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector Message-ID: Hi all, This pull request contains a backport of commit [38b66b12](https://github.com/openjdk/jdk/commit/38b66b12581a3745a37589e32aa0fc880d27b4d4) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Xiaohong Gong on 26 Jan 2026 and was reviewed by Quan Anh Mai and Vladimir Ivanov. Thanks! ------------- Commit messages: - Backport 38b66b12581a3745a37589e32aa0fc880d27b4d4 Changes: https://git.openjdk.org/jdk/pull/29404/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29404&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374043 Stats: 14 lines in 2 files changed: 13 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29404/head:pull/29404 PR: https://git.openjdk.org/jdk/pull/29404 From xgong at openjdk.org Mon Jan 26 03:41:01 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jan 2026 03:41:01 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v14] In-Reply-To: <0pq5n_IV0b3ROupfjhldjhO4ER3qJ4dW88xczvqjfyY=.df1b2b89-bf28-429b-9c3f-75dec0d9b58a@github.com> References: <0pq5n_IV0b3ROupfjhldjhO4ER3qJ4dW88xczvqjfyY=.df1b2b89-bf28-429b-9c3f-75dec0d9b58a@github.com> Message-ID: On Fri, 23 Jan 2026 14:49:34 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 47 additional commits since the last revision: > > - Merge branch 'master' into JDK-8373026-vector-algorithms > - some bug-fixes and lowerCaseB test/benchmark > - updates for review > - use firstTrue for XiaohongGong > - Data refactor part 4 > - Data refactor part 3 > - Data refactor part 2 > - Data refactor part 1 > - fix flag handling for Vladimir > - more hashCodeB > - ... and 37 more: https://git.openjdk.org/jdk/compare/8da7301a...3abe02a0 LGTM! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/28639#pullrequestreview-3704553603 From xgong at openjdk.org Mon Jan 26 03:41:03 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jan 2026 03:41:03 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v12] In-Reply-To: References: Message-ID: <8EM3xfQMr8eji0HVhvB02g5Tzogr9sM3VYcpMOg2l6M=.fd326dbf-f7a3-47af-bbf9-a78610102d97@github.com> On Thu, 22 Jan 2026 02:24:03 GMT, Xiaohong Gong wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> updates for review > > I ran the new tests on my ARM NEON machine with `-XX:MaxVectorSize=8`, and following tests crashed with the same error: > > compiler/vectorization/TestVectorAlgorithms.java#noOptimizeFill > compiler/vectorization/TestVectorAlgorithms.java#noSuperWord > compiler/vectorization/TestVectorAlgorithms.java#vanilla > > > Here is the log: > > Standard Output > --------------- > CompileCommand: inline *VectorAlgorithmsImpl*.* bool inline = true > TestVM main() called - about to run tests in class compiler.vectorization.TestVectorAlgorithms > For random generator using seed: 5121565769469166450 > To re-run test with same seed value please add "-Djdk.test.lib.random.seed=5121565769469166450" to command line. > 300 Phi === 103 1050 302 [[ 399 299 ]] #rawptr:BotPTR !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int64Vector::lanewise @ bci:3 (line 278) Int64Vector::lanewise @ bci:3 (line 43) IntVector::lanewise @ bci:43 (line 944) IntVector::add @ bci:5 (line 1406) VectorAlgorithmsImpl::findMinIndexI_VectorAPI @ bci:96 (line 563) > 300 Phi === 103 1050 302 [[ 399 299 ]] #rawptr:BotPTR !jvms: IntVector::lanewiseTemplate @ bci:154 (line 798) Int64Vector::lanewise @ bci:3 (line 278) Int64Vector::lanewise @ bci:3 (line 43) IntVector::lanewise @ bci:43 (line 944) IntVector::add @ bci:5 (line 1406) VectorAlgorithmsImpl::findMinIndexI_VectorAPI @ bci:96 (line 563) > 98 safePoint === 101 0 401 0 0 99 905 402 403 404 282 0 0 0 0 908 909 912 [[ 100 575 675 ]] !jvms: VectorAlgorithmsImpl::findMinIndexI_VectorAPI @ bci:113 (line 558) > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (jdk-src/src/hotspot/share/opto/buildOopMap.cpp:371), pid=145228, tid=145250 > # assert(false) failed: there should be an oop in OopMap instead of a live raw oop at safepoint > # > # JRE version: OpenJDK Runtime Environment (27.0) (fastdebug build 27-internal-git-362f4c7acc8) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 27-internal-git-362f4c7acc8, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # V [libjvm.so+0x72ae50] OopFlow::build_oop_map(Node*, int, PhaseRegAlloc*, int*)+0xf80 > # > > > And the VM options: > > -ea -esa -Xmx768m -XX:UseSVE=0 -XX:MaxVectorSize=8 --add-modules=jdk.incubator.vector -XX:CompileCommand=inline,*VectorAlgorithmsImpl*::* -XX:-BackgroundCompilation -XX:CompileCommand=quiet > > Could you please take a look? Thanks! > @XiaohongGong The failure is of course unrelated, since we have no VM changes here. A bit scary that a random "demo benchmark" triggers a bug :/ > > I could reproduce it as well, extracted a stand-alone test, and filed: https://bugs.openjdk.org/browse/JDK-8376189 > > Thanks for reporting it @XiaohongGong ! Good to know, thanks for testing it. That?s weird to me as well; I?ll take a look when I have time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3797774486 From thartmann at openjdk.org Mon Jan 26 07:03:52 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 Jan 2026 07:03:52 GMT Subject: [jdk26] RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: <0qAhTutafKtQguSO47IfErfpz40DMktvQ09Ggl8y3w0=.e041f0b9-81e0-4e5e-b426-9515c25578e1@github.com> On Mon, 26 Jan 2026 03:04:31 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [38b66b12](https://github.com/openjdk/jdk/commit/38b66b12581a3745a37589e32aa0fc880d27b4d4) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 26 Jan 2026 and was reviewed by Quan Anh Mai and Vladimir Ivanov. > > Thanks! This is a P3 bug and since we are already at RDP 2 for JDK 26 only P1-P2 bugs (with approval) are allowed at this point: https://openjdk.org/jeps/3 ------------- PR Comment: https://git.openjdk.org/jdk/pull/29404#issuecomment-3798149572 From xgong at openjdk.org Mon Jan 26 07:49:27 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jan 2026 07:49:27 GMT Subject: [jdk26] RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: <0qAhTutafKtQguSO47IfErfpz40DMktvQ09Ggl8y3w0=.e041f0b9-81e0-4e5e-b426-9515c25578e1@github.com> References: <0qAhTutafKtQguSO47IfErfpz40DMktvQ09Ggl8y3w0=.e041f0b9-81e0-4e5e-b426-9515c25578e1@github.com> Message-ID: On Mon, 26 Jan 2026 07:00:54 GMT, Tobias Hartmann wrote: > This is a P3 bug and since we are already at RDP 2 for JDK 26 only P1-P2 bugs (with approval) are allowed at this point: https://openjdk.org/jeps/3 Thanks for the remainder! > This should/could go into JDK 26u instead or we need to raise priority and request approval for JDK 26. Is this a regression? Yes, this is a regression. But it fails very rare on an existing test. So do you think it's fine to backport to JDK 26u instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29404#issuecomment-3798271591 From chagedorn at openjdk.org Mon Jan 26 07:49:29 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jan 2026 07:49:29 GMT Subject: RFR: 8366861: Phase AFTER_LOOP_OPTS printed even though the method has no loops In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 12:41:25 GMT, Saranya Natarajan wrote: > **Issue** > When a test program with no loop is run with flag `-XX:PrintPhaseLevel=3,` the output prints `PHASE_AFTER_LOOP_OPTS `but does not print `PHASE_BEFORE_LOOP_OPTS` > > **Solution** > This simple fix introduces a variable `_print_phase_loop_opts` that caches the value of `has_loops()` before loop opts pass to make sure that `PHASE_AFTER_LOOP_OPTS` gets printed every time `PHASE_BEFORE_LOOP_OPTS` is printed. > > Before fix output > image > > After fix output > image > > > **Testing** > Github Actions, Tier 1-3 Looks good, thanks for cleaning that up! src/hotspot/share/opto/compile.hpp line 358: > 356: bool _print_inlining; // True if we should print inlining for this compilation > 357: bool _print_intrinsics; // True if we should print intrinsics for this compilation > 358: bool _print_phase_loop_opts; // True if we should before and after print phase loop opts Suggestion: bool _print_phase_loop_opts; // True if we should print before and after loop opts phase ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29387#pullrequestreview-3704918904 PR Review Comment: https://git.openjdk.org/jdk/pull/29387#discussion_r2726614451 From shade at openjdk.org Mon Jan 26 07:53:51 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 26 Jan 2026 07:53:51 GMT Subject: RFR: 8371768: AArch64: test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java fails on SVE after JDK-8340093 [v2] In-Reply-To: References: Message-ID: <3Yhj0JCIxMlAVDIDtxRI2gIqKNYcCWZypv9LQduCCkU=.181fceb9-288e-480f-9976-f179102c8ae4@github.com> On Fri, 28 Nov 2025 12:27:07 GMT, Aleksey Shipilev wrote: >> Looks like the test should be more resilient with UseSVE > 0, which _can_ vectorise. It does not look all that reliable to me to failOn when vectorization actually happens. So I dropped some non-arch-specific rules, and amended AArch64-specific rules for UseSVE. >> >> Testing: >> - [x] Linux AArch64 server fastdebug, affected test on machine with UseSVE=1 by default >> - [x] Linux AArch64 server fastdebug, affected test on machine with UseSVE=0 overridden > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8371768-testbug-reduction > - A bit of mop up > - UseSVE works Right, forgot about this one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28423#issuecomment-3798280918 From thartmann at openjdk.org Mon Jan 26 07:54:47 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 Jan 2026 07:54:47 GMT Subject: [jdk26] RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 03:04:31 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [38b66b12](https://github.com/openjdk/jdk/commit/38b66b12581a3745a37589e32aa0fc880d27b4d4) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 26 Jan 2026 and was reviewed by Quan Anh Mai and Vladimir Ivanov. > > Thanks! Yes, I'd say let's backport to JDK 26u instead. Please add a caused-by link to the JBS issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29404#issuecomment-3798283047 From xgong at openjdk.org Mon Jan 26 08:01:01 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jan 2026 08:01:01 GMT Subject: [jdk26] RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 07:51:20 GMT, Tobias Hartmann wrote: > Yes, I'd say let's backport to JDK 26u instead. Please add a caused-by link to the JBS issue. This is a regression triggered by an assertion added by https://bugs.openjdk.org/browse/JDK-8367292, but the root cause behind is actually not introduced by it. As it has been added as `relates-to` link in the JBS, do I still need an additional "caused-by" link? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29404#issuecomment-3798303074 From thartmann at openjdk.org Mon Jan 26 08:14:45 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 Jan 2026 08:14:45 GMT Subject: [jdk26] RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 03:04:31 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [38b66b12](https://github.com/openjdk/jdk/commit/38b66b12581a3745a37589e32aa0fc880d27b4d4) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 26 Jan 2026 and was reviewed by Quan Anh Mai and Vladimir Ivanov. > > Thanks! Okay, thanks for the details. A relates-to link is fine then. I suggest to close this PR and re-target JDK 26u. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29404#issuecomment-3798340160 From xgong at openjdk.org Mon Jan 26 08:14:45 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jan 2026 08:14:45 GMT Subject: [jdk26] RFR: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 08:11:15 GMT, Tobias Hartmann wrote: > Okay, thanks for the details. A relates-to link is fine then. I suggest to close this PR and re-target JDK 26u. Thanks! Sure, I will close it. Thanks for your input! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29404#issuecomment-3798343299 From xgong at openjdk.org Mon Jan 26 08:14:46 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jan 2026 08:14:46 GMT Subject: [jdk26] Withdrawn: 8374043: C2: assert(_base >= VectorMask && _base <= VectorZ) failed: Not a Vector In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 03:04:31 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [38b66b12](https://github.com/openjdk/jdk/commit/38b66b12581a3745a37589e32aa0fc880d27b4d4) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 26 Jan 2026 and was reviewed by Quan Anh Mai and Vladimir Ivanov. > > Thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/29404 From dfenacci at openjdk.org Mon Jan 26 08:17:51 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 26 Jan 2026 08:17:51 GMT Subject: RFR: 8366861: Phase AFTER_LOOP_OPTS printed even though the method has no loops In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 12:41:25 GMT, Saranya Natarajan wrote: > **Issue** > When a test program with no loop is run with flag `-XX:PrintPhaseLevel=3,` the output prints `PHASE_AFTER_LOOP_OPTS `but does not print `PHASE_BEFORE_LOOP_OPTS` > > **Solution** > This simple fix introduces a variable `_print_phase_loop_opts` that caches the value of `has_loops()` before loop opts pass to make sure that `PHASE_AFTER_LOOP_OPTS` gets printed every time `PHASE_BEFORE_LOOP_OPTS` is printed. > > Before fix output > image > > After fix output > image > > > **Testing** > Github Actions, Tier 1-3 Looks good to me. Thanks for the cleanup @sarannat. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/29387#pullrequestreview-3704993547 From erfang at openjdk.org Mon Jan 26 09:26:35 2026 From: erfang at openjdk.org (Eric Fang) Date: Mon, 26 Jan 2026 09:26:35 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v4] In-Reply-To: References: Message-ID: > This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. > > Changes: > -------- > > 1. C2 mid-end: > - Added UMinReductionVNode and UMaxReductionVNode > > 2. AArch64 Backend: > - Added uminp/umaxp/sve_uminv/sve_umaxv instructions > - Updated match rules for all vector sizes and element types > - Both NEON and SVE implementation are supported > > 3. Test: > - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java > - Added assembly tests in aarch64-asmtest.py for new instructions > - Added a JTReg test file VectorUMinMaxReductionTest.java > > Different configurations were tested on aarch64 and x86 machines, and all tests passed. > > Test results of JMH benchmarks from the panama-vector project: > -------- > > On a Nvidia Grace machine with 128-bit SVE: > > Benchmark Unit Before Error After Error Uplift > Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 > Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 > Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 > Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 > Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 > Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 > Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 > Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 > Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 > Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 > Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 > Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 > Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 > Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 > Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 > Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 > Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 > Short128Vector.UMAXMaskedLanes ops/ms 308.90 351.78 15155.26 31.03 49.06 > Sh... Eric Fang has updated the pull request incrementally with one additional commit since the last revision: Move helper functions into c2_MacroAssembler_aarch64.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28693/files - new: https://git.openjdk.org/jdk/pull/28693/files/fc3dee3d..10d74f13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=02-03 Stats: 104 lines in 2 files changed: 26 ins; 64 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/28693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28693/head:pull/28693 PR: https://git.openjdk.org/jdk/pull/28693 From erfang at openjdk.org Mon Jan 26 09:29:46 2026 From: erfang at openjdk.org (Eric Fang) Date: Mon, 26 Jan 2026 09:29:46 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 17:40:31 GMT, Andrew Haley wrote: >> Eric Fang has updated the pull request incrementally with one additional commit since the last revision: >> >> Extract some helper functions for better readability > > Please add the `TypeNNVector` JMH test files to this PR. Hi @theRealAph I have addressed all of your comments, thanks for your suggestions. > Please add the TypeNNVector JMH test files to this PR. These micro-benchmarks are already in the `panama-vector` project, do we need to sync that file separately into OpenJDK? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3798625373 From aph at openjdk.org Mon Jan 26 09:51:59 2026 From: aph at openjdk.org (Andrew Haley) Date: Mon, 26 Jan 2026 09:51:59 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 17:40:31 GMT, Andrew Haley wrote: >> Eric Fang has updated the pull request incrementally with one additional commit since the last revision: >> >> Extract some helper functions for better readability > > Please add the `TypeNNVector` JMH test files to this PR. > Hi @theRealAph I have addressed all of your comments, thanks for your suggestions. > > > Please add the TypeNNVector JMH test files to this PR. > > These micro-benchmarks are already in the `panama-vector` project, do we need to sync that file separately into OpenJDK? Ideally, yes. It's a bit much to expect a reviewer to check out another repo. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3798709384 From chagedorn at openjdk.org Mon Jan 26 10:31:31 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jan 2026 10:31:31 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v6] In-Reply-To: References: Message-ID: <5quTb8IlMkg0XMl6Y5Y-IWLHgGy3LKrU5bi655OI8GU=.1b3b5762-e2e4-4cc8-aa9e-0ff4962bdd2c@github.com> On Fri, 23 Jan 2026 08:51:52 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. >> >> For example, given `r = CmpU(x, y)`. >> >> At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. >> >> At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29308#pullrequestreview-3705449497 From qamai at openjdk.org Mon Jan 26 11:21:11 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 26 Jan 2026 11:21:11 GMT Subject: RFR: 8375653: C2: CmpUNode::sub is not monotonic [v6] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 08:51:52 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. >> >> For example, given `r = CmpU(x, y)`. >> >> At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. >> >> At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style Thanks a lot for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29308#issuecomment-3799058900 From qamai at openjdk.org Mon Jan 26 11:21:13 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 26 Jan 2026 11:21:13 GMT Subject: Integrated: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 02:42:41 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue that `CmpUNode::sub` is not monotonic. The root cause is that it returns different values for several cases, but the cases are not mutually exclusive and the return values are not a subset of each other. This leads to the possibilities that a node satisfying both cases will return the first value, but if upon being widen it ceases to satisfy the first case but still satisfies the second case, the method will return the second value, which is not a superset of the previous result. > > For example, given `r = CmpU(x, y)`. > > At the first iteration, `type(x) = {0}` and `type(y) = {1, -1}`, then `CmpUNode::sub` returns `TypeInt::CC_LE` since it sees that `x` is the constant `0`. > > At the second iteration, `type(x) = {0, 2}` and `type(y) = {-1, 1}`, then `CmpUNode::sub` returns `TypeInt::CC_NE` since it sees that `x` and `y` do not overlap. This is not a superset of `TypeInt::CC_LE`, which leads to an assertion. > > Please take a look and leave your reviews, thanks a lot. This pull request has now been integrated. Changeset: 30675faa Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/30675faa67d1bbb4acc729a841493bb8311416af Stats: 389 lines in 3 files changed: 293 ins; 77 del; 19 mod 8375653: C2: CmpUNode::sub is not monotonic Reviewed-by: chagedorn, mchevalier ------------- PR: https://git.openjdk.org/jdk/pull/29308 From jbhateja at openjdk.org Mon Jan 26 12:17:02 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 26 Jan 2026 12:17:02 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v17] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Clanups - Refactoring vectorIntrinsics - Refactoring and cleanups - Refactoring and cleanups - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Adding testpoint for JDK-8373574 - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - ... and 24 more: https://git.openjdk.org/jdk/compare/0f1b96a5...ce5768fa ------------- Changes: https://git.openjdk.org/jdk/pull/28002/files Webrev: Webrev is not available because diff is too large Stats: 519675 lines in 224 files changed: 284942 ins; 233000 del; 1733 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From qamai at openjdk.org Mon Jan 26 12:19:15 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 26 Jan 2026 12:19:15 GMT Subject: [jdk26] RFR: 8375653: C2: CmpUNode::sub is not monotonic Message-ID: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> Hi all, This pull request contains a backport of commit [30675faa](https://github.com/openjdk/jdk/commit/30675faa67d1bbb4acc729a841493bb8311416af) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Quan Anh Mai on 26 Jan 2026 and was reviewed by Christian Hagedorn and Marc Chevalier. Thanks! ------------- Commit messages: - Backport 30675faa67d1bbb4acc729a841493bb8311416af Changes: https://git.openjdk.org/jdk/pull/29412/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29412&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375653 Stats: 389 lines in 3 files changed: 293 ins; 77 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/29412.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29412/head:pull/29412 PR: https://git.openjdk.org/jdk/pull/29412 From chagedorn at openjdk.org Mon Jan 26 12:48:58 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jan 2026 12:48:58 GMT Subject: [jdk26] RFR: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> References: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> Message-ID: On Mon, 26 Jan 2026 12:09:38 GMT, Quan Anh Mai wrote: > Hi all, > > This pull request contains a backport of commit [30675faa](https://github.com/openjdk/jdk/commit/30675faa67d1bbb4acc729a841493bb8311416af) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Quan Anh Mai on 26 Jan 2026 and was reviewed by Christian Hagedorn and Marc Chevalier. > > Thanks! Looks good! You need to request approval before integration (see [JEP 3](https://openjdk.org/jeps/3#rdp-2)). ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29412#pullrequestreview-3705892324 From mchevalier at openjdk.org Mon Jan 26 14:33:23 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 26 Jan 2026 14:33:23 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order [v2] In-Reply-To: References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: On Tue, 20 Jan 2026 15:54:53 GMT, Marc Chevalier wrote: >> As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. >> >> I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. >> There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Randomize insertion I've did some randomization on insertion. It still seems fine in testing. Using this flag in tests causes a lot of failures and crash, but similar as before the shuffling. Some issues have been filed. I suggest we keep the shuffling for the stress flag, and to create an followup issue to remove `AlwaysIncrementalInline` as it is subsumed (or should be) by the compile command `delayinline`. Is that fine for everyone? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29110#issuecomment-3799898012 From mchevalier at openjdk.org Mon Jan 26 14:49:21 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 26 Jan 2026 14:49:21 GMT Subject: RFR: 8373898: RepeatCompilation does not repeat compilation after bailout Message-ID: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> Repeat compilation happens here: https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2355 and in `C2Compiler::compile_method` which does https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/opto/c2compiler.cpp#L136-L147 In case of failure `_failure_reason` in `ci_env`/`env` is populated, and so the big loop in `compile_method` makes no iteration, and in particular, no compilation happens. Bailouts in `Compile::Compile` and `Compile::Optimize` checks this `_failure_reason`, but also the `Compile` object's one. This is fine since the `Compile` object doesn't survive across compilation, and thus the field in it is fresh at each iteration in `compile_method`. A direct solution is to reset `ci_env._failure_reason` at each iteration of compilation repeat. We could also create a fresh `ci_env` like hinted in the JBS issue, but `ci_env` undergoes a bit of setup. It might be ok not to do this setup for the purpose of repeated compilation, but it's not clear to me. Since repeated compilation do not install code (and thus, cannot mark the task as successful), to get consistent values between `ci_env.failing()` and `task->is_success()`, I copy the original failing reason and restore it at the end. Using IGV output, we can see that we get as many re-compilation as requested, even when some are (artificially) bailing out. Thanks, Marc ------------- Commit messages: - Restore first message - Reset failing reason Changes: https://git.openjdk.org/jdk/pull/29419/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29419&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373898 Stats: 12 lines in 1 file changed: 6 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/29419.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29419/head:pull/29419 PR: https://git.openjdk.org/jdk/pull/29419 From chagedorn at openjdk.org Mon Jan 26 15:27:44 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jan 2026 15:27:44 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v29] In-Reply-To: References: Message-ID: <9--VECh_vf6TAJd1myv5voviDWgv_9_7nP-4oloR4uQ=.9d71c07f-398b-4519-89da-e33cd810eecb@github.com> On Wed, 14 Jan 2026 16:38:25 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix safepoint detection I had a look at the failure and it seems that we miss a Loop Limit Check Predicate when we stress with `StressLongCountedLoop`. With the attached `Test.java` at the bottom, the following happens: #### Mainline before your patch We try to convert to a counted loop. We pass most of the checks, then emit a Loop Limit Check Predicate for "i < z" but then we bail out when `StressLongCountedLoop` is set: https://github.com/openjdk/jdk/blob/99b4e05d502b68844699faa025e0d5bd51135d8f/src/hotspot/share/opto/loopnode.cpp#L2424-L2430 When we try again next time, we find that we do not need the Loop Limit Check Predicate because `init_plus_stride_could_overflow` is false: https://github.com/openjdk/jdk/blob/99b4e05d502b68844699faa025e0d5bd51135d8f/src/hotspot/share/opto/loopnode.cpp#L2325-L2327 `init_t->hi_as_long()` will be `MAX_INT` (coming from the `ConvI2L` which we created with `StressLongCountedLoop`) and `max_signed_integer(iv_bt)` is `MAX_LONG`. So, we conclude that no Loop Limit Check Predicate is needed - we already added it in the previous attempt before converting the int counted loop to a long counted loop. #### With your patch We changed now the logic to only emit the Loop Limit Check Predicate if we actually convert the loop to a counted loop. In the first iteration, we fail due to the delay with `StressLongCountedLoop`. We do not emit the Loop Limit Check Predicate. In the second try, we find, as in mainline, that no Loop Limit Check Predicate is required and we end up emitting no Loop Limit Check Predicates at all while we still did it in mainline. We should fix that. I think you could try to extract the Loop Limit Check Predicate creation from `CountedLoopConverter::convert()` to a separate method and call it in case we bail out here: if (converter.is_counted_loop()) { #ifdef ASSERT // Stress by converting int counted loops to long counted loops if (converter.should_stress_long_counted_loop() && converter.stress_long_counted_loop()) { return false; } #endif

    Test.java // Run with: // $ java -XX:StressLongCountedLoop=2000000 -XX:CompileOnly=Test::test* -Xcomp Test.java public class Test { static int x, y, z; public static void main(String[] args) { test(); } static void test() { int i = x; // Any int do { x += y; i++; // Could overflow and thus we need a Loop Limit Check Predicate "i < z" } while (i < z); } }
    ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3800170491 From snatarajan at openjdk.org Mon Jan 26 15:32:29 2026 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Mon, 26 Jan 2026 15:32:29 GMT Subject: RFR: 8366861: Phase AFTER_LOOP_OPTS printed even though the method has no loops [v2] In-Reply-To: References: Message-ID: > **Issue** > When a test program with no loop is run with flag `-XX:PrintPhaseLevel=3,` the output prints `PHASE_AFTER_LOOP_OPTS `but does not print `PHASE_BEFORE_LOOP_OPTS` > > **Solution** > This simple fix introduces a variable `_print_phase_loop_opts` that caches the value of `has_loops()` before loop opts pass to make sure that `PHASE_AFTER_LOOP_OPTS` gets printed every time `PHASE_BEFORE_LOOP_OPTS` is printed. > > Before fix output > image > > After fix output > image > > > **Testing** > Github Actions, Tier 1-3 Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: fixing comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29387/files - new: https://git.openjdk.org/jdk/pull/29387/files/aec3078a..ef89a737 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29387&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29387&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29387.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29387/head:pull/29387 PR: https://git.openjdk.org/jdk/pull/29387 From epeter at openjdk.org Mon Jan 26 15:37:50 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 Jan 2026 15:37:50 GMT Subject: RFR: 8359335: Template-Framework Library: Primitive Types subtyping In-Reply-To: <-RSZ6hVzLKo-Lb5Ik8GyfGwJ009zFM29mpH8ghb3KdU=.966790e2-1017-4186-b3d1-609aa3a37cf6@github.com> References: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> <-RSZ6hVzLKo-Lb5Ik8GyfGwJ009zFM29mpH8ghb3KdU=.966790e2-1017-4186-b3d1-609aa3a37cf6@github.com> Message-ID: On Fri, 23 Jan 2026 16:04:24 GMT, Manuel H?ssig wrote: > > I wasn't aware. Do you have an example? > > `testlibrary_tests/template_framework/tests/TestExpression.java` will fail with `Template rendering mismatch`. I tried it before I added the exception to the float division operators. Ah ok. But that is because we'd be changing the semantics of the API, so then we'd have to adjust testing too. Not surprising. As you said above: easy to fix. > > just the fear that we will continue to hit subtle bugs, and it's just not worth it? > > Mainly this I don't care enough about a single operator. Personally, I'd like to have one version that does not have the annotation. But it can be the one with explicit casts to float/double for the arguments. Just so we can generate a float mod without catching exceptions. > > we could also add explicit float/double casts to the modulo operator arguments. That would at least force away any exception, and ensure we are choosing the float/double modulo, rather than int modulo. > > I like this. Then the question is, whether we want both or only the one with or the one without the annotation. Might as well add both, and some comments. I would agree to that :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/29349#issuecomment-3800229996 From epeter at openjdk.org Mon Jan 26 16:04:33 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 Jan 2026 16:04:33 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v17] In-Reply-To: <-cin72VWnqAukd5JbCMV9BfsGR1tWcWh_aGpdrlhHlM=.2d22e5da-35d2-4ffc-8ce5-86c90d662fd7@github.com> References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> <1ElN5XvEXAYGINpCIB2smhDrzekGyiXmG6o8-jnxDxk=.83a69a64-2894-40af-a2ee-9c35448c88b2@github.com> <-cin72VWnqAukd5JbCMV9BfsGR1tWcWh_aGpdrlhHlM=.2d22e5da-35d2-4ffc-8ce5-86c90d662fd7@github.com> Message-ID: On Fri, 23 Jan 2026 04:57:04 GMT, Jatin Bhateja wrote: >> Hi @eme64 , Your comments have been addressed > >> @jatin-bhateja This patch is really really large. There are lots of renamings that could be done in a separate patch first (as a subtask). It would make reviewing easier, allowing focus on the substantial work. See discussion here: [#28002 (comment)](https://github.com/openjdk/jdk/pull/28002#discussion_r2705376899) > > Hi @eme64 , > > I have done some cleanups, following is the summary of changes included with the patch:- > > ``` > 1 Changes to introduce a new (custom) basictype T_FLOAT16 > - Global Definition. > - Skip over handling where ever applicable. > 2 Changes to pass laneType (BasicType) to intrinsific entry point instead of element classes. > - Inline expander interface changes mainly. > 3 Changes in abstract and concrete vector class generation templates. > 4 Changing the nomenclature of Vector classes to avoid Float1664... sort of names... > 5 Changes in the LaneType to add a new carrier type field. > 6 Changes in inline expanders to selectivelty enable intrinsification for opration for which we have > auto-vectorization and backend support in place.. > 7 Changes in test generation templates. > b. Assert wrappers to conver float16 (short) value to float before invoking testng Asserts. > c. Scalar operation wrappers to selectivelty invoke Float16 math routine which are not > part of Java SE math libraries. > > 8 New IR verification test. > 9 New Micro-benchmark. > 10 AARCH64 test failure - patch + test fixed by Bhavana Kilambi. > > > Out of above change 7b consumes 40000+ LOC. > > Q. Why do we need wrapper assertions ? > A. To handle all possible NaN representations of SNaN and QNaN, since float16 uses short carrier type hence we need to promote them float values before invoking TestNG assertions. This conversion is accomplished by assertion wrappers > > All the tasks are related and most of source/test are generated using scripts we should not go by the size of patch and review the templates files. @jatin-bhateja I was wondering: what prompted the decision to add a new `BasicType` for `Float16`? Would each additional numeric type get a new `BasicType`? How many do we anticipate? Currently, we are using `T_SHORT` for `Float16`, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3800362594 From rcastanedalo at openjdk.org Mon Jan 26 16:06:50 2026 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Jan 2026 16:06:50 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v5] In-Reply-To: <_DqULj-g1bNhAlaskqNJyTnW7r1a7-ykDp1z-gGvutE=.880fbf28-f484-4ec3-b585-a9301eb2de87@github.com> References: <_DqULj-g1bNhAlaskqNJyTnW7r1a7-ykDp1z-gGvutE=.880fbf28-f484-4ec3-b585-a9301eb2de87@github.com> Message-ID: On Sat, 24 Jan 2026 19:28:52 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors the logic in `MemNode::find_previous_store` and makes a small improvement to `MemNode::detect_ptr_independence`. An IR test accompanies the improvement. >> >> Please take a look and share your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Add test store the loaded vector Thanks for extracting this refactoring into an independent changeset! This is is going to simplify significantly the review process of the subsequent load folding changes. I have a few comments, questions, and suggestions. src/hotspot/share/opto/memnode.cpp line 564: > 562: return true; > 563: } > 564: Thanks for accompanying this changeset with some test cases! Could you add a few negative ones where the memory accesses cannot be folded (e.g. one where `c1` and `c2` in `TestFindStore.java` are of the exact same class, one when one is a subclass of the other, one that exercises the raw-to-oop casting you mention above, etc.)? src/hotspot/share/opto/memnode.cpp line 709: > 707: } else if (adr_type->base() == TypePtr::AnyPtr) { > 708: // Give up on a very wide access > 709: return nullptr; What kind of memory access is ruled out here? Could you add a test case for it? In mainline, this condition will imply `adr_maybe_raw` and impose an additional constraint on raw accesses (base equality), but not lead necessarily to `find_previous_store` giving up, right? src/hotspot/share/opto/memnode.cpp line 740: > 738: } > 739: > 740: // If the bases are the same and the offsets are the same, it seems that this is the exact Suggestion: // (b) If the bases are the same and the offsets are the same, it seems that this is the exact In general, I find the original comments referring to steps (a), (b), (c), etc. useful and would prefer if they were left in besides return and continue statements below. src/hotspot/share/opto/memnode.cpp line 741: > 739: > 740: // If the bases are the same and the offsets are the same, it seems that this is the exact > 741: // store we are looking for, the caller will check if the type of the store matches Could you detail in the comment where does the caller check type matching? src/hotspot/share/opto/memnode.cpp line 785: > 783: if (detect_ptr_independence(base, alloc, st_base, AllocateNode::Ideal_allocation(st_base), phase)) { > 784: // detect_ptr_independence == true means that it can prove that base and st_base cannot > 785: // have the same runtime value I see how this comment can be useful in the original local EA changeset, but in the context of this separate changeset it seems redundant since it is basically restating what the comment two lines above says. src/hotspot/share/opto/memnode.cpp line 1910: > 1908: ctrl = ctrl->in(0); > 1909: set_req(MemNode::Control,ctrl); > 1910: return this; Is there a reason to return early in this changeset, or is it something that only makes sense in the context of the subsequent local EA changes? Same for the early return below and the IGVN recording at the end of the function. test/hotspot/jtreg/compiler/c2/gvn/TestFindStore.java line 1: > 1: /* Thanks for adding these test cases! Out of curiosity, I ran some testing disabling `MemNode::find_previous_store` entirely and found that we have very little "optimization check coverage" (tests with IR checks verifying that folding happens) for this logic -- only a couple of seemingly unrelated tests fail. It would be great if you could extend this test file with more positive and negative basic tests so that we have stronger confidence in 1) the correctness of this refactoring and the subsequent local EA changes and 2) that they do not accidentally inhibit some current optimization. Interesting cases are combinations of overlapping and non-overlapping, regular and mismatched memory accesses, array copies, etc. What do you think? ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29390#pullrequestreview-3706631775 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2728104271 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2728118435 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2728140161 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2728121897 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2728128693 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2728159254 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2728201738 From rcastanedalo at openjdk.org Mon Jan 26 16:06:53 2026 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Jan 2026 16:06:53 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v4] In-Reply-To: References: <_DCQEBinOHkFUYvFf7boqdWG9VD4aaRaU0SwO2hct-w=.0c474ed2-23ab-4263-a89b-6ac4a94d7f14@github.com> Message-ID: On Sat, 24 Jan 2026 18:43:28 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Test description > > src/hotspot/share/opto/memnode.cpp line 1236: > >> 1234: // LoadVector/StoreVector needs additional check to ensure the types match. >> 1235: if (st->is_StoreVector()) { >> 1236: // Some kind of masked access or gather/scatter > > This condition is insufficient to determine if `this` inspects the same memory as `st`. Luckily, `LoadVectorMasked`, `LoadVectorGather`, and `LoadVectorGatherMasked` all have `store_Opcode()` being `-1`, preventing any folding with them. On the other hand, `LoadVector` has `store_Opcode()` being `Op_StoreVector`, so the only case here turns out the be correct. However, it is better to be precise here. Could you summarize this motivation in a code comment? Is the failure that motivated this additional checks triggered by the additional capabilities of `MemNode::detect_ptr_independence`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2728151027 From thartmann at openjdk.org Mon Jan 26 16:14:58 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 Jan 2026 16:14:58 GMT Subject: RFR: 8375272: [IR Framework] Miscellaneous clean-ups [v2] In-Reply-To: <6rGK6c2xEi0Y8Rce1f8kwPqbOxKBf2W7-ToVPO5lXy8=.d33314ea-4c5c-49b7-b8a4-990410db1ffe@github.com> References: <6rGK6c2xEi0Y8Rce1f8kwPqbOxKBf2W7-ToVPO5lXy8=.d33314ea-4c5c-49b7-b8a4-990410db1ffe@github.com> Message-ID: On Fri, 23 Jan 2026 14:52:23 GMT, Christian Hagedorn wrote: >> This patch applies various unrelated clean-ups split out from a first prototype for [JDK-8375270](https://bugs.openjdk.org/browse/JDK-8375270). This should ease reviews. I left some comments in the PR to further explain some details. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > review Marc & Damon That looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29362#pullrequestreview-3706798099 From dlunden at openjdk.org Mon Jan 26 16:16:21 2026 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 26 Jan 2026 16:16:21 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v5] In-Reply-To: <_DqULj-g1bNhAlaskqNJyTnW7r1a7-ykDp1z-gGvutE=.880fbf28-f484-4ec3-b585-a9301eb2de87@github.com> References: <_DqULj-g1bNhAlaskqNJyTnW7r1a7-ykDp1z-gGvutE=.880fbf28-f484-4ec3-b585-a9301eb2de87@github.com> Message-ID: On Sat, 24 Jan 2026 19:28:52 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors the logic in `MemNode::find_previous_store` and makes a small improvement to `MemNode::detect_ptr_independence`. An IR test accompanies the improvement. >> >> Please take a look and share your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Add test store the loaded vector Great with a separate refactoring PR, thanks @merykitty! In addition to adding more tests as @robcasloz suggests, a good stress test is to create an instrumented version of the changeset that runs both the old and new versions, and verifies at runtime that there are no regressions (e.g., an optimization that is inhibited by the new changeset by mistake). Then, you can run this instrumented version on a large set of standard tests. It is a bit of work, but I've found it useful on many occasions in the past. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29390#issuecomment-3800427095 From chagedorn at openjdk.org Mon Jan 26 16:19:56 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jan 2026 16:19:56 GMT Subject: RFR: 8373898: RepeatCompilation does not repeat compilation after bailout In-Reply-To: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> References: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> Message-ID: On Mon, 26 Jan 2026 14:39:11 GMT, Marc Chevalier wrote: > Repeat compilation happens here: > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2355 > > and in `C2Compiler::compile_method` which does > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/opto/c2compiler.cpp#L136-L147 > > In case of failure `_failure_reason` in `ci_env`/`env` is populated, and so the big loop in `compile_method` makes no iteration, and in particular, no compilation happens. Bailouts in `Compile::Compile` and `Compile::Optimize` checks this `_failure_reason`, but also the `Compile` object's one. This is fine since the `Compile` object doesn't survive across compilation, and thus the field in it is fresh at each iteration in `compile_method`. > > A direct solution is to reset `ci_env._failure_reason` at each iteration of compilation repeat. We could also create a fresh `ci_env` like hinted in the JBS issue, but `ci_env` undergoes a bit of setup. It might be ok not to do this setup for the purpose of repeated compilation, but it's not clear to me. > > Since repeated compilation do not install code (and thus, cannot mark the task as successful), to get consistent values between `ci_env.failing()` and `task->is_success()`, I copy the original failing reason and restore it at the end. > > Using IGV output, we can see that we get as many re-compilation as requested, even when some are (artificially) bailing out. > > Thanks, > Marc That looks reasonable to me. Not sure if it's worth but did you try to come up with a regression test for that? It would probably just mean matching the failure reason for `RepeatCompilation + 1` many times when run with `PrintCompilation`. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29419#pullrequestreview-3706821656 From mchevalier at openjdk.org Mon Jan 26 16:22:08 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 26 Jan 2026 16:22:08 GMT Subject: RFR: 8375272: [IR Framework] Miscellaneous clean-ups [v2] In-Reply-To: <6rGK6c2xEi0Y8Rce1f8kwPqbOxKBf2W7-ToVPO5lXy8=.d33314ea-4c5c-49b7-b8a4-990410db1ffe@github.com> References: <6rGK6c2xEi0Y8Rce1f8kwPqbOxKBf2W7-ToVPO5lXy8=.d33314ea-4c5c-49b7-b8a4-990410db1ffe@github.com> Message-ID: On Fri, 23 Jan 2026 14:52:23 GMT, Christian Hagedorn wrote: >> This patch applies various unrelated clean-ups split out from a first prototype for [JDK-8375270](https://bugs.openjdk.org/browse/JDK-8375270). This should ease reviews. I left some comments in the PR to further explain some details. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > review Marc & Damon Marked as reviewed by mchevalier (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29362#pullrequestreview-3706832880 From chagedorn at openjdk.org Mon Jan 26 16:22:10 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jan 2026 16:22:10 GMT Subject: RFR: 8375272: [IR Framework] Miscellaneous clean-ups [v2] In-Reply-To: <6rGK6c2xEi0Y8Rce1f8kwPqbOxKBf2W7-ToVPO5lXy8=.d33314ea-4c5c-49b7-b8a4-990410db1ffe@github.com> References: <6rGK6c2xEi0Y8Rce1f8kwPqbOxKBf2W7-ToVPO5lXy8=.d33314ea-4c5c-49b7-b8a4-990410db1ffe@github.com> Message-ID: On Fri, 23 Jan 2026 14:52:23 GMT, Christian Hagedorn wrote: >> This patch applies various unrelated clean-ups split out from a first prototype for [JDK-8375270](https://bugs.openjdk.org/browse/JDK-8375270). This should ease reviews. I left some comments in the PR to further explain some details. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > review Marc & Damon Thanks for your review Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29362#issuecomment-3800445732 From chagedorn at openjdk.org Mon Jan 26 16:26:49 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jan 2026 16:26:49 GMT Subject: Integrated: 8375272: [IR Framework] Miscellaneous clean-ups In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 14:51:08 GMT, Christian Hagedorn wrote: > This patch applies various unrelated clean-ups split out from a first prototype for [JDK-8375270](https://bugs.openjdk.org/browse/JDK-8375270). This should ease reviews. I left some comments in the PR to further explain some details. > > Thanks, > Christian This pull request has now been integrated. Changeset: bbae38e5 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/bbae38e510efd8877daca5118f45893bb87f6eaa Stats: 165 lines in 15 files changed: 81 ins; 12 del; 72 mod 8375272: [IR Framework] Miscellaneous clean-ups Reviewed-by: mchevalier, dfenacci, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/29362 From epeter at openjdk.org Mon Jan 26 16:36:39 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 Jan 2026 16:36:39 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v17] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 12:17:02 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Clanups > - Refactoring vectorIntrinsics > - Refactoring and cleanups > - Refactoring and cleanups > - Review comments resolutions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Adding testpoint for JDK-8373574 > - Review comments resolutions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - ... and 24 more: https://git.openjdk.org/jdk/compare/0f1b96a5...ce5768fa I asked some people internally, and they seem to be _very_ opposed to a new BasicType. Because it goes across the JVM, as I can also see in your patch. Apparently, they wanted to avoid the use of new BasicTypes, mostly managed except for the new `T_FLAT_ELEMENT`. Using `T_SHORT` for `Float16` would be strongly preferred. I think it may be good to ask @fparain @rose00 @iwanowww @vnkozlov if they have opinions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3800525049 From thartmann at openjdk.org Mon Jan 26 16:37:45 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 Jan 2026 16:37:45 GMT Subject: [jdk26] RFR: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> References: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> Message-ID: On Mon, 26 Jan 2026 12:09:38 GMT, Quan Anh Mai wrote: > Hi all, > > This pull request contains a backport of commit [30675faa](https://github.com/openjdk/jdk/commit/30675faa67d1bbb4acc729a841493bb8311416af) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Quan Anh Mai on 26 Jan 2026 and was reviewed by Christian Hagedorn and Marc Chevalier. > > Thanks! Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29412#pullrequestreview-3706902392 From psandoz at openjdk.org Mon Jan 26 16:51:30 2026 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 26 Jan 2026 16:51:30 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v17] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 12:17:02 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Clanups > - Refactoring vectorIntrinsics > - Refactoring and cleanups > - Refactoring and cleanups > - Review comments resolutions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Adding testpoint for JDK-8373574 > - Review comments resolutions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - ... and 24 more: https://git.openjdk.org/jdk/compare/0f1b96a5...ce5768fa The underlying motivation was to avoid passing two parameters to the vector intrinsics that can get out of sync. Currently, we cannot use `Float16.class` like we can `Integer.class` that describes the vector element type to the intrinsic. Could we use an internal class that acts as a proxy until we can replace it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3800594113 From epeter at openjdk.org Mon Jan 26 16:54:45 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 Jan 2026 16:54:45 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 29 Dec 2025 17:39:42 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Address review comments for the JTREG test and microbenchmark > - Merge branch 'master' > - Address review comments > - Fix build failures on Mac > - Address review comments > - Merge 'master' > - 8366444: Add support for add/mul reduction operations for Float16 > > This patch adds mid-end support for vectorized add/mul reduction > operations for half floats. It also includes backend aarch64 support for > these operations. Only vectorization support through autovectorization > is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate > the implementation to be strictly ordered. The following is how each of > these reductions is implemented for different aarch64 targets - > > For AddReduction : > On Neon only targets (UseSVE = 0): Generates scalarized additions > using the scalar "fadd" instruction for both 8B and 16B vector lengths. > This is because Neon does not provide a direct instruction for computing > strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the "fadda" instruction which > computes add reduction for floating point in strict order. > > For MulReduction : > Both Neon and SVE do not provide a direct instruction for computing > strictly ordered floating point multiply reduction. For vector lengths > of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is > generated and multiply reduction for vector lengths > 16B is not > supported. > > Below is the performance of the two newly added microbenchmarks in > Float16OperationsBenchmark.java tested on three different aarch64 > machines and with varying MaxVectorSize - > > Note: On all machines, the score (ops/ms) is compared with the master > branch without this patch which generates a sequence of loads ("ldrsh") > to load the FP16 value into an FPR and a scalar "fadd/fmul" to > add/multiply the loaded value to the running sum/product. The ratios > given below are the ratios between the throughput with this patch and > the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the > master branch. > > N1 (UseSVE = 0... I had another quick look. And I was wondering: In my experience, float/double reductions that just add/mul up values (aka simple reductions) generally have no speedups when vectorized. The reason is that no matter if they are scalar or vector, the bottleneck is the latency along the reduction chain. So why do you measure speedups here for `Float16`? Do you have a good explanation? Because memory bandwidth should be even less the problem here, so the effect of latency along the chain has an even bigger weight. What do you think? src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1929: > 1927: ext(vtmp, T8B, vsrc, vsrc, 6); > 1928: faddh(dst, dst, vtmp); > 1929: if (isQ) { I don't think the `if` should be indented here, right? src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1940: > 1938: } > 1939: BLOCK_COMMENT("} neon_reduce_add_fp16"); > 1940: } Given that the reduction order is sequential: why do you see any speedup in your benchmarks, comparing scalar to vector performance? How do you explain it? I'm just curious ;) ------------- PR Review: https://git.openjdk.org/jdk/pull/27526#pullrequestreview-3706944699 PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2728374969 PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2728381603 From epeter at openjdk.org Mon Jan 26 16:54:46 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 Jan 2026 16:54:46 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 26 Jan 2026 16:45:43 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Address review comments for the JTREG test and microbenchmark >> - Merge branch 'master' >> - Address review comments >> - Fix build failures on Mac >> - Address review comments >> - Merge 'master' >> - 8366444: Add support for add/mul reduction operations for Float16 >> >> This patch adds mid-end support for vectorized add/mul reduction >> operations for half floats. It also includes backend aarch64 support for >> these operations. Only vectorization support through autovectorization >> is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate >> the implementation to be strictly ordered. The following is how each of >> these reductions is implemented for different aarch64 targets - >> >> For AddReduction : >> On Neon only targets (UseSVE = 0): Generates scalarized additions >> using the scalar "fadd" instruction for both 8B and 16B vector lengths. >> This is because Neon does not provide a direct instruction for computing >> strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the "fadda" instruction which >> computes add reduction for floating point in strict order. >> >> For MulReduction : >> Both Neon and SVE do not provide a direct instruction for computing >> strictly ordered floating point multiply reduction. For vector lengths >> of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is >> generated and multiply reduction for vector lengths > 16B is not >> supported. >> >> Below is the performance of the two newly added microbenchmarks in >> Float16OperationsBenchmark.java tested on three different aarch64 >> machines and with varying MaxVectorSize - >> >> Note: On all machines, the score (ops/ms) is compared with the master >> branch without this patch which generates a sequence of loads ("ldrsh") >> to load the FP16 value into an FPR and a scalar "fadd/fmul" to >> add/multiply the loaded value to the running sum/product. The ratios >> given below are the ratios between the throughput with this patch and >> the throughput without this patch. >> Ratio > 1 indicate... > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1940: > >> 1938: } >> 1939: BLOCK_COMMENT("} neon_reduce_add_fp16"); >> 1940: } > > Given that the reduction order is sequential: why do you see any speedup in your benchmarks, comparing scalar to vector performance? How do you explain it? I'm just curious ;) Also: why not allow a vector with only 2 elements? Is there some restriction here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2728407382 From liach at openjdk.org Mon Jan 26 17:16:13 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 17:16:13 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 03:27:06 GMT, Quan Anh Mai wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Move test, fix merge garbage >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const >> - Typo >> - assert >> - refactorings >> - Typo >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const >> - Cleanup >> - identity hash support in C2 >> - ... and 2 more: https://git.openjdk.org/jdk/compare/96fd00b6...67a3954f > > src/hotspot/share/ci/ciArray.cpp line 93: > >> 91: // Returns T_ILLEGAL if there is no element at the given index. >> 92: ciConstant ciArray::element_value(int index) { >> 93: assert(index >= 0, "out-of-bounds index: %d", index); > > IIUC, this is because you use `-1` as the offset for hashcode, so you need to make sure we are accessing a real element here, or the cache access will return something dubious. I think it is then more uniform to save the value at the cache using the offset instead of the element index. I think I will not touch ciArray but rather comment on the ConstantValue _off variable about it is just a key. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2728496636 From mchevalier at openjdk.org Mon Jan 26 17:17:23 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 26 Jan 2026 17:17:23 GMT Subject: RFR: 8373898: RepeatCompilation does not repeat compilation after bailout In-Reply-To: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> References: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> Message-ID: On Mon, 26 Jan 2026 14:39:11 GMT, Marc Chevalier wrote: > Repeat compilation happens here: > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2355 > > and in `C2Compiler::compile_method` which does > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/opto/c2compiler.cpp#L136-L147 > > In case of failure `_failure_reason` in `ci_env`/`env` is populated, and so the big loop in `compile_method` makes no iteration, and in particular, no compilation happens. Bailouts in `Compile::Compile` and `Compile::Optimize` checks this `_failure_reason`, but also the `Compile` object's one. This is fine since the `Compile` object doesn't survive across compilation, and thus the field in it is fresh at each iteration in `compile_method`. > > A direct solution is to reset `ci_env._failure_reason` at each iteration of compilation repeat. We could also create a fresh `ci_env` like hinted in the JBS issue, but `ci_env` undergoes a bit of setup. It might be ok not to do this setup for the purpose of repeated compilation, but it's not clear to me. > > Since repeated compilation do not install code (and thus, cannot mark the task as successful), to get consistent values between `ci_env.failing()` and `task->is_success()`, I copy the original failing reason and restore it at the end. > > Using IGV output, we can see that we get as many re-compilation as requested, even when some are (artificially) bailing out. > > Thanks, > Marc It's actually not that easy to cause reliably bailout, and observe it. PrintCompilation doesn't tell about those, for instance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29419#issuecomment-3800718564 From duke at openjdk.org Mon Jan 26 17:24:56 2026 From: duke at openjdk.org (Ryan Hallock) Date: Mon, 26 Jan 2026 17:24:56 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: <_aldMUgRlZczkrjTXtd7k1EVzaM3OzLoLJEPWAd-k6Q=.80ebeb4b-8181-4646-ac1a-73aeba1dbbdc@github.com> On Mon, 15 Dec 2025 19:49:52 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Move test, fix merge garbage > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Typo > - assert > - refactorings > - Typo > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Cleanup > - identity hash support in C2 > - ... and 2 more: https://git.openjdk.org/jdk/compare/f449918c...67a3954f Would this PR allow the removal of the stable hash in Enum#hashCode, maybe that should be a followup? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3800754288 From krk at openjdk.org Mon Jan 26 17:31:38 2026 From: krk at openjdk.org (Kerem Kat) Date: Mon, 26 Jan 2026 17:31:38 GMT Subject: RFR: 8356184: C2 MemorySegment: long RangeCheck with ConvI2L(iv + invar) prevents RCE [v2] In-Reply-To: References: Message-ID: <83UkxSGXfp5AcTeOZ6snL_Woj0aEL3Saiok_UzYD7hc=.f167532d-5d98-4d88-b06b-56a9c7934114@github.com> > `MemorySegment` bounds checks use long arithmetic, but when accessing with an int loop variable plus an int invariant offset, the pattern `ConvI2L(iv + invar)` was not recognized by Range Check Elimination. This prevented RCE and consequently blocked vectorization for common `MemorySegment` access patterns. > > The fix teaches `is_scaled_iv_plus_offset` to recognize linear int expressions inside `ConvI2L`. A new `short_offset` flag signals that the offset is part of int arithmetic (not added separately in long), requiring the range to be clamped at `max_jint + 1` to correctly handle potential int overflow. This also removes pre-existing dead code where an `exp_bt != bt` check was intended to bail out on such patterns but never actually executed. > > With this change, `MemorySegment` loops using int invariant offsets now benefit from RCE and vectorization, matching the behavior already supported for long invariant offsets. > > > void process(MemorySegment segment, int offset, int size) { > for (int i = 0; i < size; i++) { > long addr = i + offset; // ConvI2L(AddI(iv, offset)) was not recognized > segment.set(JAVA_BYTE, addr, (byte) 0); > } > } Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: Remove IR rules from TestMemorySegment methods where vectorization depends on backing store type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29392/files - new: https://git.openjdk.org/jdk/pull/29392/files/2cb35a3b..23eff0db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29392&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29392&range=00-01 Stats: 14 lines in 1 file changed: 0 ins; 6 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/29392.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29392/head:pull/29392 PR: https://git.openjdk.org/jdk/pull/29392 From chagedorn at openjdk.org Mon Jan 26 17:32:56 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jan 2026 17:32:56 GMT Subject: RFR: 8376174: [IR Framework] Refactor Test VM socket communication In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 17:11:03 GMT, Christian Hagedorn wrote: > This is the next patch in the series to replace the hotspot-pid-file-based IR dumps with socket communication (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271)). > > This patch cleans up the current socket communication implementation without changing the semantics. We still process the messages in the same way but I refactored the logic into separate classes instead of having everything in the `TestFrameworkSocket` class. > > This patch addresses the following (also see GitHub code comments): > - Changes to `TestFrameworkSocket`: > - Split logic to send messages from Test VM into separate `TestVmSocket` class. > - Move message tags from `TestFrameworkSocket` to new `MessageTag` class. > - Introduce new `JavaMessages` class to wrap the sent messages. This will later be further expanded. Note that I choose the name "Java" to later distinguish between messages sent by C2. > - Introduce new `TestVmMessageReader` to parse the received messages. This will later be further expanded. > - Introduce new `TestVMData` class to hold all interesting information received from the Test VM. This class will also be further expanded later. > > Thanks, > Christian test/hotspot/jtreg/compiler/lib/ir_framework/driver/TestVMProcess.java line 183: > 181: * represent the Applicable IR Rules used for IR matching later. > 182: */ > 183: private void processSocketOutput(TestFrameworkSocket socket) { Moved to `TestVmData::processOutput()`. test/hotspot/jtreg/compiler/lib/ir_framework/shared/TestFrameworkSocket.java line 47: > 45: public static final String DEFAULT_REGEX_TAG = "[DEFAULT_REGEX]"; > 46: public static final String PRINT_TIMES_TAG = "[PRINT_TIMES]"; > 47: public static final String NOT_COMPILABLE_TAG = "[NOT_COMPILABLE]"; Dropped `DEFAULT_REGEX_TAG` and `NOT_COMPILABLE` - I'm not sure they really add more benefit since the messages themselves already explain what they are about. Additionally, there were already some messages sent without tags before. So, I think it's fine to reduce on the number of tags. test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java line 162: > 160: public static void main(String[] args) { > 161: try { > 162: TestVmSocket.connect(); I now just unconditionally connect which I think is negligible. When IR matching, we will always use the socket. This makes the handling easier. test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java line 863: > 861: } > 862: if (testFilterPresent) { > 863: TestVmSocket.send(MessageTag.TEST_LIST + "Run " + test.toString()); This does not look very nice to use by appending the message tag to the actual message before calling `send()`. But I will later refactor this again, so this is only a temporary state. test/hotspot/jtreg/compiler/lib/ir_framework/test/network/TestVmSocket.java line 63: > 61: } > 62: > 63: public static void connect() { Now that we unconditionally connect, we can have the old exception code from `TestFrameworkSocket::write()` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29426#discussion_r2728495108 PR Review Comment: https://git.openjdk.org/jdk/pull/29426#discussion_r2728500824 PR Review Comment: https://git.openjdk.org/jdk/pull/29426#discussion_r2728513598 PR Review Comment: https://git.openjdk.org/jdk/pull/29426#discussion_r2728530905 PR Review Comment: https://git.openjdk.org/jdk/pull/29426#discussion_r2728521230 From chagedorn at openjdk.org Mon Jan 26 17:32:52 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jan 2026 17:32:52 GMT Subject: RFR: 8376174: [IR Framework] Refactor Test VM socket communication Message-ID: This is the next patch in the series to replace the hotspot-pid-file-based IR dumps with socket communication (see [JDK-8375271](https://bugs.openjdk.org/browse/JDK-8375271)). This patch cleans up the current socket communication implementation without changing the semantics. We still process the messages in the same way but I refactored the logic into separate classes instead of having everything in the `TestFrameworkSocket` class. This patch addresses the following (also see GitHub code comments): - Changes to `TestFrameworkSocket`: - Split logic to send messages from Test VM into separate `TestVmSocket` class. - Move message tags from `TestFrameworkSocket` to new `MessageTag` class. - Introduce new `JavaMessages` class to wrap the sent messages. This will later be further expanded. Note that I choose the name "Java" to later distinguish between messages sent by C2. - Introduce new `TestVmMessageReader` to parse the received messages. This will later be further expanded. - Introduce new `TestVMData` class to hold all interesting information received from the Test VM. This class will also be further expanded later. Thanks, Christian ------------- Commit messages: - update comment - 8376174: [IR Framework] Refactor Test VM socket communication Changes: https://git.openjdk.org/jdk/pull/29426/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29426&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376174 Stats: 585 lines in 14 files changed: 380 ins; 142 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/29426.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29426/head:pull/29426 PR: https://git.openjdk.org/jdk/pull/29426 From liach at openjdk.org Mon Jan 26 17:40:19 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 17:40:19 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 19:49:52 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Move test, fix merge garbage > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Typo > - assert > - refactorings > - Typo > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Cleanup > - identity hash support in C2 > - ... and 2 more: https://git.openjdk.org/jdk/compare/8652dc25...67a3954f No, this only covers if a constant enum is passed. If an arbitrary enum constant is passed we still go through sone arithmetics and is slower. But this difference is probably small enough to warrant the removal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3800819255 From rrich at openjdk.org Mon Jan 26 17:41:26 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 26 Jan 2026 17:41:26 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v2] In-Reply-To: References: Message-ID: <_znDzqadbEQ-gSZ_0rTo1fW654LmvPqXYFbkV4uahwk=.465ce48d-1fcc-42c7-bd07-cd5e2ea86422@github.com> On Wed, 21 Jan 2026 13:11:52 GMT, David Briemann wrote: >> Adds the following mach nodes: >> match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > address review comments Hi David, nice work! ...with just a few rough edges ;) I haven't yet finished the review but I still wanted to send you the comments I collected so far. Cheers, Richard. src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp line 669: > 667: // Works for single and double precision floats. > 668: // dst = (op1 cmp(cc) op2) ? src1 : src2; > 669: void C2_MacroAssembler::cmovF(int cc, VectorSRegister dst, VectorSRegister op1, VectorSRegister op2, I've had a pretty hard time understanding the usage of `cc`. I found that its value comes from `operand cmpOp` in the ppc.ad file. The `cmpOp` values are meant to be used for encoding `BO` and `BI` fields of instructions that have them (with `bc` aka Branch Conditional as prominent example). The encoding is rather difficult to understand. Luckily the instructions used here don't have `BO` or `BI` fields. I'd suggest to use `BoolTest::mask` directly and map these to the appropriate instructions (swapping operands if necessary). I think you get the `BoolTest::mask` replacing `$cop$$cmpcode` with `$cop$$constant` (see also `to_assembler_cond` on aarch64). I'd expect this to make the implementation a lot easier to understand. (I'm too embarrassed to tell how long it took me to understand this version ;)) src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp line 674: > 672: VectorSRegister second = src2; > 673: int exchange = (~cc) & 8; > 674: if (exchange) { hotspot-style.md suggestes "Avoid implicit conversions to bool". ------------- PR Review: https://git.openjdk.org/jdk/pull/29281#pullrequestreview-3698655184 PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2728567004 PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2722123267 From sviswanathan at openjdk.org Mon Jan 26 17:46:42 2026 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 26 Jan 2026 17:46:42 GMT Subject: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v12] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 02:34:30 GMT, Xiaohong Gong wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > LGTM! Thanks for your updating! > Hi @XiaohongGong , your comments have been addressed. Hi @sviswa7, can you kindly review x86 part. Thanks @jatin-bhateja. I will take a look next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24104#issuecomment-3800844163 From qamai at openjdk.org Mon Jan 26 18:27:02 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 26 Jan 2026 18:27:02 GMT Subject: [jdk26] RFR: 8375653: C2: CmpUNode::sub is not monotonic [v2] In-Reply-To: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> References: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> Message-ID: <0aTmdsHwgUi6fQOr6J-D-9a73aOAodOVEjqy7QBsQJY=.a6fa9a07-5cc6-4738-a5cb-f425b1098b34@github.com> > Hi all, > > This pull request contains a backport of commit [30675faa](https://github.com/openjdk/jdk/commit/30675faa67d1bbb4acc729a841493bb8311416af) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Quan Anh Mai on 26 Jan 2026 and was reviewed by Christian Hagedorn and Marc Chevalier. > > Thanks! Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29412/files - new: https://git.openjdk.org/jdk/pull/29412/files/806a2feb..40fc3ff2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29412&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29412&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/29412.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29412/head:pull/29412 PR: https://git.openjdk.org/jdk/pull/29412 From liach at openjdk.org Mon Jan 26 18:32:47 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 18:32:47 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 00:03:10 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Move the test to a core library purposed directory Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3801082528 From liach at openjdk.org Mon Jan 26 18:33:18 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 18:33:18 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v5] In-Reply-To: References: Message-ID: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Copyright year, code style improvements - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const - Move test, fix merge garbage - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const - Typo - assert - refactorings - Typo - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const - ... and 4 more: https://git.openjdk.org/jdk/compare/a4c5fb85...e33f01e0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28589/files - new: https://git.openjdk.org/jdk/pull/28589/files/67a3954f..e33f01e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=03-04 Stats: 95520 lines in 3917 files changed: 47645 ins; 16634 del; 31241 mod Patch: https://git.openjdk.org/jdk/pull/28589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28589/head:pull/28589 PR: https://git.openjdk.org/jdk/pull/28589 From liach at openjdk.org Mon Jan 26 18:33:21 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 18:33:21 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 17:13:08 GMT, Chen Liang wrote: >> src/hotspot/share/ci/ciArray.cpp line 93: >> >>> 91: // Returns T_ILLEGAL if there is no element at the given index. >>> 92: ciConstant ciArray::element_value(int index) { >>> 93: assert(index >= 0, "out-of-bounds index: %d", index); >> >> IIUC, this is because you use `-1` as the offset for hashcode, so you need to make sure we are accessing a real element here, or the cache access will return something dubious. I think it is then more uniform to save the value at the cache using the offset instead of the element index. > > I think I will not touch ciArray but rather comment on the ConstantValue _off variable about it is just a key. I decided to rename ConstantValue's off into key, should reduce confusions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2728758977 From liach at openjdk.org Mon Jan 26 18:33:22 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 18:33:22 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 03:19:14 GMT, Quan Anh Mai wrote: >> src/hotspot/share/ci/ciObject.hpp line 76: >> >>> 74: }; >>> 75: >>> 76: const int IDENTITY_HASH_OFFSET = -1; >> >> `const` is fine, but `constexpr` is often preferred. Also, is `static` needed here? Another nitpick is that constants are usually not in uppercase in C++, as macros are often in uppercase. > > It is also useful to note what this value is. It is not clear at first glance why offset is -1 here. I see gc is now using `static constexpr int lowerCamelCase = ...` will use it here too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2728758052 From liach at openjdk.org Mon Jan 26 18:35:22 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 18:35:22 GMT Subject: Integrated: 8372696: Allow boot classes to explicitly opt-in for final field trusting In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 16:16:05 GMT, Chen Liang wrote: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. This pull request has now been integrated. Changeset: 3220c4cb Author: Chen Liang URL: https://git.openjdk.org/jdk/commit/3220c4cb431a2c4eb8bb2d60f0d5046e40af69bd Stats: 168 lines in 13 files changed: 154 ins; 13 del; 1 mod 8372696: Allow boot classes to explicitly opt-in for final field trusting Reviewed-by: jvernee, jrose, alanb ------------- PR: https://git.openjdk.org/jdk/pull/28540 From chagedorn at openjdk.org Mon Jan 26 19:33:56 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jan 2026 19:33:56 GMT Subject: RFR: 8366861: Phase AFTER_LOOP_OPTS printed even though the method has no loops [v2] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 15:32:29 GMT, Saranya Natarajan wrote: >> **Issue** >> When a test program with no loop is run with flag `-XX:PrintPhaseLevel=3,` the output prints `PHASE_AFTER_LOOP_OPTS `but does not print `PHASE_BEFORE_LOOP_OPTS` >> >> **Solution** >> This simple fix introduces a variable `_print_phase_loop_opts` that caches the value of `has_loops()` before loop opts pass to make sure that `PHASE_AFTER_LOOP_OPTS` gets printed every time `PHASE_BEFORE_LOOP_OPTS` is printed. >> >> Before fix output >> image >> >> After fix output >> image >> >> >> **Testing** >> Github Actions, Tier 1-3 > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > fixing comment Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29387#pullrequestreview-3707594288 From liach at openjdk.org Mon Jan 26 19:37:06 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 19:37:06 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v6] In-Reply-To: References: Message-ID: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Use UpperCamelCase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28589/files - new: https://git.openjdk.org/jdk/pull/28589/files/e33f01e0..77ec309b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=04-05 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28589/head:pull/28589 PR: https://git.openjdk.org/jdk/pull/28589 From liach at openjdk.org Mon Jan 26 23:44:44 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 23:44:44 GMT Subject: RFR: 8376422: Run compiler/corelibs/OptionalFold.java with tiered compilation Message-ID: This new test is failing with `JTREG=VM_OPTIONS=-XX:-TieredCompilation`. Force enable tiered compilation for this test for now, as this is failing in the CI. ------------- Commit messages: - 8376422 Changes: https://git.openjdk.org/jdk/pull/29435/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29435&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376422 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29435.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29435/head:pull/29435 PR: https://git.openjdk.org/jdk/pull/29435 From dholmes at openjdk.org Tue Jan 27 00:14:09 2026 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Jan 2026 00:14:09 GMT Subject: RFR: 8376422: Run compiler/corelibs/OptionalFold.java with tiered compilation In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 23:35:27 GMT, Chen Liang wrote: > This new test is failing with `JTREG=VM_OPTIONS=-XX:-TieredCompilation`. Force enable tiered compilation for this test for now, as this is failing in the CI. Seems reasonable to fix CI failures. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29435#pullrequestreview-3708584355 From liach at openjdk.org Tue Jan 27 00:18:00 2026 From: liach at openjdk.org (Chen Liang) Date: Tue, 27 Jan 2026 00:18:00 GMT Subject: RFR: 8376422: Run compiler/corelibs/OptionalFold.java with tiered compilation In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 23:35:27 GMT, Chen Liang wrote: > This new test is failing with `JTREG=VM_OPTIONS=-XX:-TieredCompilation`. Force enable tiered compilation for this test for now, as this is failing in the CI. Thanks for this swift review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29435#issuecomment-3802408322 From liach at openjdk.org Tue Jan 27 00:18:00 2026 From: liach at openjdk.org (Chen Liang) Date: Tue, 27 Jan 2026 00:18:00 GMT Subject: Integrated: 8376422: Run compiler/corelibs/OptionalFold.java with tiered compilation In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 23:35:27 GMT, Chen Liang wrote: > This new test is failing with `JTREG=VM_OPTIONS=-XX:-TieredCompilation`. Force enable tiered compilation for this test for now, as this is failing in the CI. This pull request has now been integrated. Changeset: fdcc122a Author: Chen Liang URL: https://git.openjdk.org/jdk/commit/fdcc122a9db2f6fdeb014e9e731cd3992bb3d0f3 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8376422: Run compiler/corelibs/OptionalFold.java with tiered compilation Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/29435 From erfang at openjdk.org Tue Jan 27 02:12:39 2026 From: erfang at openjdk.org (Eric Fang) Date: Tue, 27 Jan 2026 02:12:39 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 09:49:20 GMT, Andrew Haley wrote: > > Hi @theRealAph I have addressed all of your comments, thanks for your suggestions. > > > Please add the TypeNNVector JMH test files to this PR. > > > > > > These micro-benchmarks are already in the `panama-vector` project, do we need to sync that file separately into OpenJDK? > > Ideally, yes. It's a bit much to expect a reviewer to check out another repo. Yeah, I agree. VectorAPI is still in the incubator phase, so some code is first merged into the `panama-vector` project and then periodically synced to `OpenJDK`. Therefore, sometimes when the relevant JMH benchmark already exists in `panama-vector`, we don't add duplicates in the PR target to `OpenJDK`, but this makes code review somewhat inconvenient. But I wonder could we use a separate PR for this sync? Otherwise, we might import dozens or even hundreds of files into this PR, which I think would be difficult to review. Perhaps I should ask @PaulSandoz for his opinion on this issue. I?d really appreciate hearing your thoughts on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3802689436 From erfang at openjdk.org Tue Jan 27 02:35:50 2026 From: erfang at openjdk.org (Eric Fang) Date: Tue, 27 Jan 2026 02:35:50 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 08:14:10 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Update copyright year to 2026 > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Convert the check condition for vector length into an assertion > > Also refined the tests. > - Refine code comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java > - Refine the test code and comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Don't read and write the same memory in the JMH benchmarks > - ... and 2 more: https://git.openjdk.org/jdk/compare/6eaabed5...9c38a6d9 Hi, could someone help review this PR? Thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28313#issuecomment-3802742570 From jkarthikeyan at openjdk.org Tue Jan 27 03:12:29 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 27 Jan 2026 03:12:29 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v16] In-Reply-To: References: Message-ID: > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Apply changes from review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23413/files - new: https://git.openjdk.org/jdk/pull/23413/files/13378368..9e74df5f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=14-15 Stats: 65 lines in 14 files changed: 6 ins; 36 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Tue Jan 27 03:12:35 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 27 Jan 2026 03:12:35 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 08:40:53 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix whitespace >> - Update tests after merge, apply changes from review >> - Merge from master >> - Update tests, cleanup logic >> - Merge branch 'master' into vectorize-subword >> - Check for AVX2 for byte/long conversions >> - Whitespace and benchmark tweak >> - Address more comments, make test and benchmark more exhaustive >> - Merge from master >> - Fix copyright after merge >> - ... and 9 more: https://git.openjdk.org/jdk/compare/de6f35ef...13378368 > > src/hotspot/share/opto/superwordVTransformBuilder.cpp line 264: > >> 262: if (use_bt != def_bt && !p0->is_Convert() && VectorCastNode::is_supported_subword_cast(def_bt, use_bt, pack->size())) { >> 263: VTransformNode* in = get_vtnode(pack_in->at(0)); >> 264: VTransformNode* cast = new (_vtransform.arena()) VTransformCastVectorNode(_vtransform, pack->size(), def_bt, use_bt); > > I just noticed: above, we already handle a cast case, but use `VTransformElementWiseVectorNode`: > https://github.com/openjdk/jdk/pull/23413/files#diff-cd8469676c3f287680696b4dbd87fd02b765f2c9a249bd485c55613b15843435L213-L217 > > I'm not happy with using `VTransformElementWiseVectorNode` for some casts and `VTransformCastVectorNode` for others. So I see 2 options: > - Use `VTransformCastVectorNode` for both, refactor the code I linded. > - Somehow try to remove `VTransformCastVectorNode`, and use `VTransformElementWiseVectorNode` here. Do you think that would be possible? This is a great suggestion! I didn't realize we already had `VTransformElementWiseVectorNode` which works in this case when provided the vector opcode. That means `VTransformCastVectorNode` isn't needed, which reduces the size of the patch significantly. > test/hotspot/jtreg/compiler/c2/TestMinMaxSubword.java line 65: > >> 63: >> 64: @Test >> 65: @IR(applyIfCPUFeature = { "avx", "true" }, counts = { IRNode.VECTOR_CAST_I2S, IRNode.VECTOR_SIZE_ANY, ">0" }) > > I think you could get more precise vector size here as well, using `IRNode.VECTOR_SIZE + "min(max_int, max_short)"` as you did in the other test :) Thanks for the heads up, I missed these when updating the tests! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2730046520 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2730047128 From jkarthikeyan at openjdk.org Tue Jan 27 03:12:38 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 27 Jan 2026 03:12:38 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 09:20:10 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 77: >> >>> 75: @Test >>> 76: @IR(counts = { IRNode.LOAD_VECTOR_S, IRNode.VECTOR_SIZE + "min(max_int, max_short)", "> 0" }, >>> 77: applyIfCPUFeatureOr = { "avx2", "true", "asimd", "true" }) >> >> And how about here? Could we optimize and remove the casts? > > Filed: > [JDK-8376179](https://bugs.openjdk.org/browse/JDK-8376179): C2 SuperWord: improve subword vectorization, avoid cast to-and-from int We wouldn't be able to remove the casts in this case because `CountLeadingZeros` [doesn't support truncation](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L2506), so the cast to and from int is required for correctness. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2730054527 From jkarthikeyan at openjdk.org Tue Jan 27 03:13:10 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 27 Jan 2026 03:13:10 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 09:19:52 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix whitespace >> - Update tests after merge, apply changes from review >> - Merge from master >> - Update tests, cleanup logic >> - Merge branch 'master' into vectorize-subword >> - Check for AVX2 for byte/long conversions >> - Whitespace and benchmark tweak >> - Address more comments, make test and benchmark more exhaustive >> - Merge from master >> - Fix copyright after merge >> - ... and 9 more: https://git.openjdk.org/jdk/compare/de6f35ef...13378368 > > Ok, I filed this as another follow-up: > [JDK-8376179](https://bugs.openjdk.org/browse/JDK-8376179): C2 SuperWord: improve subword vectorization, avoid cast to-and-from int Hi @eme64, thanks a lot for the review! I've pushed an update that should address the review comments and update the bug annotations and copyright years. About the cast to and from int, the only places where that should be required is when the node doesn't support truncation. Right now it looks like reductions also cast to int even when they're not required, such as with `AndReduction`. I can do some further investigation in a followup patch to see where the int vectors are generated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-3802833523 From duke at openjdk.org Tue Jan 27 05:31:59 2026 From: duke at openjdk.org (Harshit470250) Date: Tue, 27 Jan 2026 05:31:59 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v12] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 12 more: https://git.openjdk.org/jdk/compare/52c8a4cb...4e742431 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/9676e39d..4e742431 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=10-11 Stats: 25509 lines in 695 files changed: 12136 ins; 4385 del; 8988 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From galder at openjdk.org Tue Jan 27 05:33:49 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 27 Jan 2026 05:33:49 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 Message-ID: Adjust test expectations to deal with environments where `Min/Max[F|D]` are not produced. To fix this, I've followed same patterns used in `compiler.c2.irTests.TestMinMaxIdentities`. I have tested this on x86_64 with `-XX:UseAVX=0`. ------------- Commit messages: - Fix format - Fix IR expectations for floating points Changes: https://git.openjdk.org/jdk/pull/29438/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29438&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375640 Stats: 15 lines in 1 file changed: 13 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29438/head:pull/29438 PR: https://git.openjdk.org/jdk/pull/29438 From thartmann at openjdk.org Tue Jan 27 06:03:04 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 27 Jan 2026 06:03:04 GMT Subject: [jdk26] RFR: 8375653: C2: CmpUNode::sub is not monotonic [v2] In-Reply-To: <0aTmdsHwgUi6fQOr6J-D-9a73aOAodOVEjqy7QBsQJY=.a6fa9a07-5cc6-4738-a5cb-f425b1098b34@github.com> References: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> <0aTmdsHwgUi6fQOr6J-D-9a73aOAodOVEjqy7QBsQJY=.a6fa9a07-5cc6-4738-a5cb-f425b1098b34@github.com> Message-ID: On Mon, 26 Jan 2026 18:27:02 GMT, Quan Anh Mai wrote: >> Hi all, >> >> This pull request contains a backport of commit [30675faa](https://github.com/openjdk/jdk/commit/30675faa67d1bbb4acc729a841493bb8311416af) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Quan Anh Mai on 26 Jan 2026 and was reviewed by Christian Hagedorn and Marc Chevalier. >> >> Thanks! > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Fix test Still looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29412#pullrequestreview-3709263169 From qamai at openjdk.org Tue Jan 27 06:03:05 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 27 Jan 2026 06:03:05 GMT Subject: [jdk26] RFR: 8375653: C2: CmpUNode::sub is not monotonic [v2] In-Reply-To: <0aTmdsHwgUi6fQOr6J-D-9a73aOAodOVEjqy7QBsQJY=.a6fa9a07-5cc6-4738-a5cb-f425b1098b34@github.com> References: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> <0aTmdsHwgUi6fQOr6J-D-9a73aOAodOVEjqy7QBsQJY=.a6fa9a07-5cc6-4738-a5cb-f425b1098b34@github.com> Message-ID: On Mon, 26 Jan 2026 18:27:02 GMT, Quan Anh Mai wrote: >> Hi all, >> >> This pull request contains a backport of commit [30675faa](https://github.com/openjdk/jdk/commit/30675faa67d1bbb4acc729a841493bb8311416af) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Quan Anh Mai on 26 Jan 2026 and was reviewed by Christian Hagedorn and Marc Chevalier. >> >> Thanks! > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Fix test I had to change the unit test a little bit because the inference that `x | y` is `< 0` if `x < 0` or `y < 0` is only introduced in JDK27. Please reapprove this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29412#issuecomment-3803269411 From chagedorn at openjdk.org Tue Jan 27 06:51:02 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 27 Jan 2026 06:51:02 GMT Subject: RFR: 8373898: RepeatCompilation does not repeat compilation after bailout In-Reply-To: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> References: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> Message-ID: On Mon, 26 Jan 2026 14:39:11 GMT, Marc Chevalier wrote: > Repeat compilation happens here: > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2355 > > and in `C2Compiler::compile_method` which does > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/opto/c2compiler.cpp#L136-L147 > > In case of failure `_failure_reason` in `ci_env`/`env` is populated, and so the big loop in `compile_method` makes no iteration, and in particular, no compilation happens. Bailouts in `Compile::Compile` and `Compile::Optimize` checks this `_failure_reason`, but also the `Compile` object's one. This is fine since the `Compile` object doesn't survive across compilation, and thus the field in it is fresh at each iteration in `compile_method`. > > A direct solution is to reset `ci_env._failure_reason` at each iteration of compilation repeat. We could also create a fresh `ci_env` like hinted in the JBS issue, but `ci_env` undergoes a bit of setup. It might be ok not to do this setup for the purpose of repeated compilation, but it's not clear to me. > > Since repeated compilation do not install code (and thus, cannot mark the task as successful), to get consistent values between `ci_env.failing()` and `task->is_success()`, I copy the original failing reason and restore it at the end. > > Using IGV output, we can see that we get as many re-compilation as requested, even when some are (artificially) bailing out. > > Thanks, > Marc You're right, `PrintCompilation` does not repeatedly print the compilation (should it?). I think it's okay to leave out a regression test but thanks for having another look! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29419#issuecomment-3803414601 From epeter at openjdk.org Tue Jan 27 06:55:16 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 06:55:16 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v6] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 19:37:06 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Use UpperCamelCase test/hotspot/jtreg/compiler/intrinsics/object/IdentityHashCodeFold.java line 37: > 35: * @summary Verify constant folding is possible for identity hash code > 36: * @library /test/lib / > 37: * @requires vm.compiler2.enabled Drive-by comment: Do you really need this restriction? IR rules are only executed if C2 is available anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2730518926 From epeter at openjdk.org Tue Jan 27 06:58:07 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 06:58:07 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 03:06:32 GMT, Jasmine Karthikeyan wrote: >> Filed: >> [JDK-8376179](https://bugs.openjdk.org/browse/JDK-8376179): C2 SuperWord: improve subword vectorization, avoid cast to-and-from int > > We wouldn't be able to remove the casts in this case because `CountLeadingZeros` [doesn't support truncation](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L2506), so the cast to and from int is required for correctness. Ah ok, got it. Makes sense :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2730528557 From epeter at openjdk.org Tue Jan 27 07:06:14 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 07:06:14 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v16] In-Reply-To: References: Message-ID: <2GYPPi0aIsR_1Gil-qSv5JOHUAvAef8Nv9OpfSehvb4=.bca3fc78-7d93-4889-b2b2-6532677cd72a@github.com> On Tue, 27 Jan 2026 03:12:29 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Apply changes from review Excellent, it now looks good to me :) I'll run some internal testing. Please wait with for the results before integrating. But you need to get second review anyway. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23413#pullrequestreview-3709446938 From dfenacci at openjdk.org Tue Jan 27 07:14:01 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 27 Jan 2026 07:14:01 GMT Subject: RFR: 8366861: Phase AFTER_LOOP_OPTS printed even though the method has no loops [v2] In-Reply-To: References: Message-ID: <9UIVK41afVK_PsozeC9TBKEv11YtQybtHpZPMPGHmjQ=.144338e2-5d44-4413-9219-a364a74f4f0c@github.com> On Mon, 26 Jan 2026 15:32:29 GMT, Saranya Natarajan wrote: >> **Issue** >> When a test program with no loop is run with flag `-XX:PrintPhaseLevel=3,` the output prints `PHASE_AFTER_LOOP_OPTS `but does not print `PHASE_BEFORE_LOOP_OPTS` >> >> **Solution** >> This simple fix introduces a variable `_print_phase_loop_opts` that caches the value of `has_loops()` before loop opts pass to make sure that `PHASE_AFTER_LOOP_OPTS` gets printed every time `PHASE_BEFORE_LOOP_OPTS` is printed. >> >> Before fix output >> image >> >> After fix output >> image >> >> >> **Testing** >> Github Actions, Tier 1-3 > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > fixing comment Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29387#pullrequestreview-3709478856 From liach at openjdk.org Tue Jan 27 07:15:12 2026 From: liach at openjdk.org (Chen Liang) Date: Tue, 27 Jan 2026 07:15:12 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v6] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 06:51:54 GMT, Emanuel Peter wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Use UpperCamelCase > > test/hotspot/jtreg/compiler/intrinsics/object/IdentityHashCodeFold.java line 37: > >> 35: * @summary Verify constant folding is possible for identity hash code >> 36: * @library /test/lib / >> 37: * @requires vm.compiler2.enabled > > Drive-by comment: > Do you really need this restriction? IR rules are only executed if C2 is available anyway. Well, the plain test isn't that meaningful without C2. https://github.com/openjdk/jdk/pull/28589#discussion_r2615925589 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2730577795 From chagedorn at openjdk.org Tue Jan 27 07:43:25 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 27 Jan 2026 07:43:25 GMT Subject: [jdk26] RFR: 8375653: C2: CmpUNode::sub is not monotonic [v2] In-Reply-To: <0aTmdsHwgUi6fQOr6J-D-9a73aOAodOVEjqy7QBsQJY=.a6fa9a07-5cc6-4738-a5cb-f425b1098b34@github.com> References: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> <0aTmdsHwgUi6fQOr6J-D-9a73aOAodOVEjqy7QBsQJY=.a6fa9a07-5cc6-4738-a5cb-f425b1098b34@github.com> Message-ID: On Mon, 26 Jan 2026 18:27:02 GMT, Quan Anh Mai wrote: >> Hi all, >> >> This pull request contains a backport of commit [30675faa](https://github.com/openjdk/jdk/commit/30675faa67d1bbb4acc729a841493bb8311416af) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Quan Anh Mai on 26 Jan 2026 and was reviewed by Christian Hagedorn and Marc Chevalier. >> >> Thanks! > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Fix test Still good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29412#pullrequestreview-3709580310 From epeter at openjdk.org Tue Jan 27 07:50:09 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 07:50:09 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: References: <6ip4JrJ4WiYEe6d2FA_WQ5dDjxAk2RPaPbwth4jNeJM=.43d7879d-89a4-434c-80ea-371c92581686@github.com> <0b81mH1_Y6r905N2HmehXBbSFdzLpJIfuXHNfijpHBs=.c870b13e-a52f-4c00-b771-91cf9205cb4a@github.com> Message-ID: On Mon, 15 Dec 2025 19:44:47 GMT, Chen Liang wrote: >> I don't argue that there's always a chance to catch a bug, but unit tests on C2 IR are mostly trivial, so the actual chance to spot a unique problem is quite low. And the price is execution time. > > I kept the C2 limit (note this is a build restriction instead of a flag restriction), but updated to use test.main.class. Right. Vladimir already argued against it. I forgot. I still believe it is best practice not to limit tests unless they are very expensive. But I'll accept the majority vote. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2730682963 From epeter at openjdk.org Tue Jan 27 07:50:12 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 07:50:12 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v6] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 07:12:28 GMT, Chen Liang wrote: >> test/hotspot/jtreg/compiler/intrinsics/object/IdentityHashCodeFold.java line 37: >> >>> 35: * @summary Verify constant folding is possible for identity hash code >>> 36: * @library /test/lib / >>> 37: * @requires vm.compiler2.enabled >> >> Drive-by comment: >> Do you really need this restriction? IR rules are only executed if C2 is available anyway. > > Well, the plain test isn't that meaningful without C2. https://github.com/openjdk/jdk/pull/28589#discussion_r2615925589 Right. Vladimir already argued against it. I forgot. I still believe it is best practice not to limit tests unless they are very expensive. But I'll accept the majority vote. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2730681732 From jbhateja at openjdk.org Tue Jan 27 08:11:29 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 Jan 2026 08:11:29 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v17] In-Reply-To: <-cin72VWnqAukd5JbCMV9BfsGR1tWcWh_aGpdrlhHlM=.2d22e5da-35d2-4ffc-8ce5-86c90d662fd7@github.com> References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> <1ElN5XvEXAYGINpCIB2smhDrzekGyiXmG6o8-jnxDxk=.83a69a64-2894-40af-a2ee-9c35448c88b2@github.com> <-cin72VWnqAukd5JbCMV9BfsGR1tWcWh_aGpdrlhHlM=.2d22e5da-35d2-4ffc-8ce5-86c90d662fd7@github.com> Message-ID: <5a11-TYC046h3ro4YgCI4oSBFfyjUWlWA2I3pOMTF-k=.bfff183a-d825-45ba-bc75-d515337119be@github.com> On Fri, 23 Jan 2026 04:57:04 GMT, Jatin Bhateja wrote: >> Hi @eme64 , Your comments have been addressed > >> @jatin-bhateja This patch is really really large. There are lots of renamings that could be done in a separate patch first (as a subtask). It would make reviewing easier, allowing focus on the substantial work. See discussion here: [#28002 (comment)](https://github.com/openjdk/jdk/pull/28002#discussion_r2705376899) > > Hi @eme64 , > > I have done some cleanups, following is the summary of changes included with the patch:- > > ``` > 1 Changes to introduce a new (custom) basictype T_FLOAT16 > - Global Definition. > - Skip over handling where ever applicable. > 2 Changes to pass laneType (BasicType) to intrinsific entry point instead of element classes. > - Inline expander interface changes mainly. > 3 Changes in abstract and concrete vector class generation templates. > 4 Changing the nomenclature of Vector classes to avoid Float1664... sort of names... > 5 Changes in the LaneType to add a new carrier type field. > 6 Changes in inline expanders to selectivelty enable intrinsification for opration for which we have > auto-vectorization and backend support in place.. > 7 Changes in test generation templates. > b. Assert wrappers to conver float16 (short) value to float before invoking testng Asserts. > c. Scalar operation wrappers to selectivelty invoke Float16 math routine which are not > part of Java SE math libraries. > > 8 New IR verification test. > 9 New Micro-benchmark. > 10 AARCH64 test failure - patch + test fixed by Bhavana Kilambi. > > > Out of above change 7b consumes 40000+ LOC. > > Q. Why do we need wrapper assertions ? > A. To handle all possible NaN representations of SNaN and QNaN, since float16 uses short carrier type hence we need to promote them float values before invoking TestNG assertions. This conversion is accomplished by assertion wrappers > > All the tasks are related and most of source/test are generated using scripts we should not go by the size of patch and review the templates files. > @jatin-bhateja I was wondering: what prompted the decision to add a new `BasicType` for `Float16`? Would each additional numeric type get a new `BasicType`? How many do we anticipate? > > Currently, we are using `T_SHORT` for `Float16`, right? Hi @eme64 , Currently in JDK mainline we pass element class as the lane type, problem with passing Float16.class is that its part of incubating module an we cannot declare a symbol for it in vmSymbols, thus if we pass Float16.class as element type we will need to do a fragile name based checks over element type to infer Float16 operation in inline expanders. To circumvent this problem started passing additional integer argument vector operation type (VECTOR_TYPE_FP16 / VECTOR_TYPE_PRIM) to intrinsic entry point. Paul suggested in his [prior comment](https://github.com/openjdk/jdk/pull/28002#issuecomment-3529452461) to add a new basicType for Float16 and instead of passing element class and vector operation type pass just the basicType since its already used in the LaneType. [Enum definitions of all the primitive basic types used in LaneType ](https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java*L153__;Iw!!ACWV5N9M2RV99hQ!J4ZZ1lwCxaG8mXxtjHB9uET0tlcqBdgJwsC3pCLt4WeUQYULtKPtxo_2NIJw67AYBe6k9ffftGh_EttPe1bY_kYW$) are as per JVM specification and in synchronism with [BasicType definition in VM side](https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp*L671__;Iw!!ACWV5N9M2RV99hQ!J4ZZ1lwCxaG8mXxtjHB9uET0tlcqBdgJwsC3pCLt4WeUQYULtKPtxo_2NIJw67AYBe6k9ffftGh_EttPe6e5uGFc$). VM also defines some custom basic types like T_METADATA, T_NARROWKLASS. If we just create new basic type on Java side, then there is a chance that its value may conflict with existing custom basic types in VM side. One solution is to maintain the synchronization b/w basic type assignment for primitive type only and not modify any VM side code since current scope of T_FLOAT16 is only limited to intrinsic entry point. Adding a new custom BasicType on VM side is not just a change in one place and is cumbersome and not desirable given that its used all across VM code. Thus there are following options :- 1/ Create new basicType T_FLOAT16 in Java side, add it to LaneType and pass only basic types as element type to intrinsic entry point and maintain an efficient interface 2/ Pass Float16.class as element type to Float16Vector operations and do a fragile and inefficient name base lookup in inline expander to infer Float16 vector IR. 3/ Extend both BasicType definition on Java side and VM side and keep them in synchronism but this is not desirable given that VM makes extensive use of BasicType. 4/ Pass short.class as element type and pass another argument vector operation kind to intrinsic entry point to differentiate b/w ShortVector and Float16Vector operations. 5/ Paul's suggestion to create proxy class in java.base module for Float16 type. I am inclined to go with solution 1, let me know if you have other solutions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3803701515 From epeter at openjdk.org Tue Jan 27 08:13:40 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 08:13:40 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 08:14:10 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Update copyright year to 2026 > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Convert the check condition for vector length into an assertion > > Also refined the tests. > - Refine code comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java > - Refine the test code and comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Don't read and write the same memory in the JMH benchmarks > - ... and 2 more: https://git.openjdk.org/jdk/compare/6eaabed5...9c38a6d9 src/hotspot/share/opto/vectornode.cpp line 1063: > 1061: return n; > 1062: } > 1063: This has a clear parallel in `Node::uncast`. But there, we may recursively uncast. Your pattern: `(VectorStoreMask (VectorMaskCast* (VectorLoadMask x))) => (x)` We could also have a chain of casts here. Can you explain why you chose only to do a single step here? test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 121: > 119: VectorMask mInt128 = mFloat128.cast(IntVector.SPECIES_128); > 120: return mInt128.not().trueCount(); > 121: } Why can't the casts be eliminated here? Can you please add a comment to the test? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2730746738 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2730754926 From epeter at openjdk.org Tue Jan 27 08:13:42 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 08:13:42 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 08:09:47 GMT, Emanuel Peter wrote: >> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Update copyright year to 2026 >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Convert the check condition for vector length into an assertion >> >> Also refined the tests. >> - Refine code comments >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java >> - Refine the test code and comments >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Don't read and write the same memory in the JMH benchmarks >> - ... and 2 more: https://git.openjdk.org/jdk/compare/6eaabed5...9c38a6d9 > > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 121: > >> 119: VectorMask mInt128 = mFloat128.cast(IntVector.SPECIES_128); >> 120: return mInt128.not().trueCount(); >> 121: } > > Why can't the casts be eliminated here? Can you please add a comment to the test? There used to be a comment, would that one still be accurate? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2730756241 From erfang at openjdk.org Tue Jan 27 08:50:26 2026 From: erfang at openjdk.org (Eric Fang) Date: Tue, 27 Jan 2026 08:50:26 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 08:14:10 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Update copyright year to 2026 > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Convert the check condition for vector length into an assertion > > Also refined the tests. > - Refine code comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java > - Refine the test code and comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Don't read and write the same memory in the JMH benchmarks > - ... and 2 more: https://git.openjdk.org/jdk/compare/6eaabed5...9c38a6d9 Thanks for your review ! @eme64 ------------- PR Review: https://git.openjdk.org/jdk/pull/28313#pullrequestreview-3709848369 From erfang at openjdk.org Tue Jan 27 08:50:28 2026 From: erfang at openjdk.org (Eric Fang) Date: Tue, 27 Jan 2026 08:50:28 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 08:07:10 GMT, Emanuel Peter wrote: >> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Update copyright year to 2026 >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Convert the check condition for vector length into an assertion >> >> Also refined the tests. >> - Refine code comments >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java >> - Refine the test code and comments >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Don't read and write the same memory in the JMH benchmarks >> - ... and 2 more: https://git.openjdk.org/jdk/compare/6eaabed5...9c38a6d9 > > src/hotspot/share/opto/vectornode.cpp line 1063: > >> 1061: return n; >> 1062: } >> 1063: > > This has a clear parallel in `Node::uncast`. But there, we may recursively uncast. > > Your pattern: > `(VectorStoreMask (VectorMaskCast* (VectorLoadMask x))) => (x)` > > We could also have a chain of casts here. > Can you explain why you chose only to do a single step here? I'm not sure I fully understood your point. This function can recursively uncast a chain of consecutive `VectorMaskCastNode`, so the pattern you mentioned above can be optimized to `(x)` even when there are multiple `VectorMaskCastNode` in between. I?m not sure I get what you mean. Could you elaborate on it a bit? Thanks~ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2730885339 From erfang at openjdk.org Tue Jan 27 08:50:30 2026 From: erfang at openjdk.org (Eric Fang) Date: Tue, 27 Jan 2026 08:50:30 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 08:10:14 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 121: >> >>> 119: VectorMask mInt128 = mFloat128.cast(IntVector.SPECIES_128); >>> 120: return mInt128.not().trueCount(); >>> 121: } >> >> Why can't the casts be eliminated here? Can you please add a comment to the test? > > There used to be a comment, would that one still be accurate? Yeah, the comment is still there, see line 92 of this file. I refactored this file a bit, now it looks like this: // comment for testXXXCastToSameType testOneCastToSameType() testTwoCastToSameType() // comment for testXXXCastToDifferentType testOneCastToDifferentType() testTwoCastToDifferentType() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2730907001 From epeter at openjdk.org Tue Jan 27 08:58:25 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 08:58:25 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 08:42:14 GMT, Eric Fang wrote: >> src/hotspot/share/opto/vectornode.cpp line 1063: >> >>> 1061: return n; >>> 1062: } >>> 1063: >> >> This has a clear parallel in `Node::uncast`. But there, we may recursively uncast. >> >> Your pattern: >> `(VectorStoreMask (VectorMaskCast* (VectorLoadMask x))) => (x)` >> >> We could also have a chain of casts here. >> Can you explain why you chose only to do a single step here? > > I'm not sure I fully understood your point. This function can recursively uncast a chain of consecutive `VectorMaskCastNode`, so the pattern you mentioned above can be optimized to `(x)` even when there are multiple `VectorMaskCastNode` in between. > > I?m not sure I get what you mean. Could you elaborate on it a bit? Thanks~ Woops, my bad. I read `if` instead of `while`. Sorry, you are right! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2730938160 From epeter at openjdk.org Tue Jan 27 08:58:26 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 08:58:26 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 08:46:53 GMT, Eric Fang wrote: >> There used to be a comment, would that one still be accurate? > > Yeah, the comment is still there, see line 92 of this file. I refactored this file a bit, now it looks like this: > > // comment for testXXXCastToSameType > testOneCastToSameType() > testTwoCastToSameType() > > // comment for testXXXCastToDifferentType > testOneCastToDifferentType() > testTwoCastToDifferentType() Honestly, I prefer having comments next to the IR rule. If the IR rule fails, you instantly understand the assumptions with the comment. The IR rule could fail because: but or additional enhancements. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2730945272 From erfang at openjdk.org Tue Jan 27 09:01:30 2026 From: erfang at openjdk.org (Eric Fang) Date: Tue, 27 Jan 2026 09:01:30 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 08:55:48 GMT, Emanuel Peter wrote: >> Yeah, the comment is still there, see line 92 of this file. I refactored this file a bit, now it looks like this: >> >> // comment for testXXXCastToSameType >> testOneCastToSameType() >> testTwoCastToSameType() >> >> // comment for testXXXCastToDifferentType >> testOneCastToDifferentType() >> testTwoCastToDifferentType() > > Honestly, I prefer having comments next to the IR rule. > If the IR rule fails, you instantly understand the assumptions with the comment. > The IR rule could fail because: but or additional enhancements. Make sense, I'll add a comment for each test, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2730957043 From erfang at openjdk.org Tue Jan 27 09:05:43 2026 From: erfang at openjdk.org (Eric Fang) Date: Tue, 27 Jan 2026 09:05:43 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v8] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 08:54:04 GMT, Emanuel Peter wrote: >> I'm not sure I fully understood your point. This function can recursively uncast a chain of consecutive `VectorMaskCastNode`, so the pattern you mentioned above can be optimized to `(x)` even when there are multiple `VectorMaskCastNode` in between. >> >> I?m not sure I get what you mean. Could you elaborate on it a bit? Thanks~ > > Woops, my bad. I read `if` instead of `while`. Sorry, you are right! No problem, now I understand that this function behaves as we expect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2730972580 From aph at openjdk.org Tue Jan 27 10:16:04 2026 From: aph at openjdk.org (Andrew Haley) Date: Tue, 27 Jan 2026 10:16:04 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v3] In-Reply-To: References: Message-ID: <0KaSJxnKpyK1_1JvW3V4uAJ0rLJCs7KZBN1EN-ZtyEc=.c4374870-b25b-4d72-a887-5cfe3d770949@github.com> On Tue, 27 Jan 2026 02:10:03 GMT, Eric Fang wrote: > But I wonder could we use a separate PR for this sync? Otherwise, we might import dozens or even hundreds of files into this PR, which I think would be difficult to review. Why? A commit into mainline should be accompanied by its tests. That's a base, (and IMO obvious) requirement. Perhaps I should ask @PaulSandoz for his opinion on this issue. I?d really appreciate hearing your thoughts on it. I can think of no reason not to grab all of them, but that must happen first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3804295417 From jbhateja at openjdk.org Tue Jan 27 12:30:31 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 Jan 2026 12:30:31 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v17] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 16:48:11 GMT, Paul Sandoz wrote: > The underlying motivation was to avoid passing two parameters to the vector intrinsics that can get out of sync. Currently, we cannot use `Float16.class` like we can `Integer.class` that describes the vector element type to the intrinsic. Could we use an internal class that acts as a proxy until we can replace it? Hi @PaulSandoz , We will still need to create T_FLOAT16 basic type and associate it with Float16 LaneType, why not directly pass these basic types to intrinsic entry point ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3804950143 From mbaesken at openjdk.org Tue Jan 27 14:09:07 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 27 Jan 2026 14:09:07 GMT Subject: RFR: 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code Message-ID: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code but seems they still end up in the product build JVM, at least when using standard build settings. (This can be observed when enabling link time gc and verbose info printing - in this case the methods are eliminated). linuxx86_64 product build without those methods ls -alL images/jdk/lib/server/libjvm.so size 2.859.5144 unchanged product build : ls -alL images/jdk/lib/server/libjvm.so size 2.859.9464 (so we see a little size difference) ------------- Commit messages: - JDK-8376402 Changes: https://git.openjdk.org/jdk/pull/29449/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29449&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376402 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29449/head:pull/29449 PR: https://git.openjdk.org/jdk/pull/29449 From azafari at openjdk.org Tue Jan 27 14:14:47 2026 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 27 Jan 2026 14:14:47 GMT Subject: RFR: 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 13:58:35 GMT, Matthias Baesken wrote: > Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code but seems they still end up in the product build JVM, at least when using standard build settings. > (This can be observed when enabling link time gc and verbose info printing - in this case the methods are eliminated). > > linuxx86_64 > product build without those methods > > ls -alL images/jdk/lib/server/libjvm.so > size 2.859.5144 > > > unchanged product build : > > ls -alL images/jdk/lib/server/libjvm.so > size 2.859.9464 > > > (so we see a little size difference) Thank you for fixing this. It would be good to show the output of linking with verbose to see the difference. Nit: Copyright year to be updated. ------------- PR Review: https://git.openjdk.org/jdk/pull/29449#pullrequestreview-3711439425 From rcastanedalo at openjdk.org Tue Jan 27 14:25:56 2026 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 Jan 2026 14:25:56 GMT Subject: RFR: 8375038: C2: Enforce that Ideal() returns the root of the subgraph if any change was made by checking the node hash In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 14:47:58 GMT, Beno?t Maillard wrote: > This PR introduces an assert in `PhaseIterGVN` to check that `Ideal` actually returns something if the node was modified. > > ## Context > > In the description of `Node::Ideal` in `node.cpp`, we have: > >> If ANY change is made, it must return the root of the reshaped graph - even if the root is the same Node > > It is crucial that such changes do not go unnoticed and that they can propagate to other nodes. Current documentation also states: > >> Running with `-XX:VerifyIterativeGVN=1` checks >> these invariants, although its too slow to have on by default. If you are >> hacking an Ideal call, be sure to test with `-XX:VerifyIterativeGVN=1` > > However, `-XX:VerifyIterativeGVN=1` ends up veryfing that the `_in` and `_out` arrays are consistent, but does not verify the return value. > > This PR aims to enforce the return value invariant. It should also make regression testing of bugs caused by wrongly returning nullptr in `Ideal` easier, such as [JDK-8373251](https://bugs.openjdk.org/browse/JDK-8373251). > > ## Proposed Change > > In summary, this PR brings the following set of changes > - Add a new flag bit to`-XX:VerifyIterativeGVN` for verifying return of `Ideal` calls > - Add an assert on the hash of nodes before and after `Ideal` in `PhaseIterGVN::transform_old` > - Fix `Ideal` optimizations that would cause harness errors with testing on tier1 > - Update the comments in the code to clarify the invariant and how to enforce it > > After consideration, I took the decision to only check the hash if the node is not dead. It seems there are many cases where the control node is dead, and we propagate the information to all users with `kill_dead_code`, and end up return `nullptr`. This is basically a mechanism to "speed up" the propagation (it would also happen normally via the usual IGVN worklist). This somehow contradicts the "must return the root of the reshaped graph" invariant, but it seems to be a common practice. > > In addition to that, I have decided to implement this as part of a new flag bit to `-XX:VerifyIterativeGVN` instead of an existing one, because there is a risk that it causes new failures in existing usages of the flag. > > This PR is meant to introduce the new check and fix the most "obvious" failures that the new flag would introduce in common scenarios, such as when running with `-version` on tier1. Since there are known issues caused by bad return values of `Ideal` (such as [JDK-8373251](https://bugs.openjdk.org/browse/JDK-8373251)), I will fix other failures in follow-up PRs.... src/hotspot/share/opto/node.cpp line 1157: > 1155: // can help with validating these invariants, although they are too slow to have on by default: > 1156: // - '-XX:VerifyIterativeGVN=1' checks the def-use info > 1157: // - '-XX:VerifyIterativeGVN=100000' cheks the return value Suggestion: // - '-XX:VerifyIterativeGVN=100000' checks the return value ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29421#discussion_r2732251370 From mbaesken at openjdk.org Tue Jan 27 14:29:52 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 27 Jan 2026 14:29:52 GMT Subject: RFR: 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code [v2] In-Reply-To: References: Message-ID: > Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code but seems they still end up in the product build JVM, at least when using standard build settings. > (This can be observed when enabling link time gc and verbose info printing - in this case the methods are eliminated). > > linuxx86_64 > product build without those methods > > ls -alL images/jdk/lib/server/libjvm.so > size 2.859.5144 > > > unchanged product build : > > ls -alL images/jdk/lib/server/libjvm.so > size 2.859.9464 > > > (so we see a little size difference) Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Adjust COPYRIGHT year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29449/files - new: https://git.openjdk.org/jdk/pull/29449/files/661f85be..5f268ea7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29449&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29449&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29449/head:pull/29449 PR: https://git.openjdk.org/jdk/pull/29449 From mbaesken at openjdk.org Tue Jan 27 14:39:03 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 27 Jan 2026 14:39:03 GMT Subject: RFR: 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 14:29:52 GMT, Matthias Baesken wrote: >> Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code but seems they still end up in the product build JVM, at least when using standard build settings. >> (This can be observed when enabling link time gc and verbose info printing - in this case the methods are eliminated). >> >> linuxx86_64 >> product build without those methods >> >> ls -alL images/jdk/lib/server/libjvm.so >> size 2.859.5144 >> >> >> unchanged product build : >> >> ls -alL images/jdk/lib/server/libjvm.so >> size 2.859.9464 >> >> >> (so we see a little size difference) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust COPYRIGHT year With `-Wl,--gc-sections -Wl,--print-gc-sections` I see (because of print-gc-sections) this output in the linker output of libjvm /usr/lib64/gcc/x86_64-suse-linux/14/../../../../x86_64-suse-linux/bin/ld: removing unused section '.text._ZN12Dependencies16print_statisticsEv' in file '(/build/hotspot/variant-server/libjvm/objs/dependencies.o' /usr/lib64/gcc/x86_64-suse-linux/14/../../../../x86_64-suse-linux/bin/ld: removing unused section '.text._ZN28AbstractClassHierarchyWalker16print_statisticsEv' in file '/build/hotspot/variant-server/libjvm/objs/dependencies.o' I could also add the whole output of the libjvm linkage but it gets really huge and a number of removals need more analysis . ------------- PR Comment: https://git.openjdk.org/jdk/pull/29449#issuecomment-3805573289 From chagedorn at openjdk.org Tue Jan 27 14:53:03 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 27 Jan 2026 14:53:03 GMT Subject: RFR: 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 14:29:52 GMT, Matthias Baesken wrote: >> Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code but seems they still end up in the product build JVM, at least when using standard build settings. >> (This can be observed when enabling link time gc and verbose info printing - in this case the methods are eliminated). >> >> linuxx86_64 >> product build without those methods >> >> ls -alL images/jdk/lib/server/libjvm.so >> size 2.859.5144 >> >> >> unchanged product build : >> >> ls -alL images/jdk/lib/server/libjvm.so >> size 2.859.9464 >> >> >> (so we see a little size difference) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust COPYRIGHT year Thanks for cleaning that up! src/hotspot/share/code/dependencies.cpp line 2280: > 2278: } > 2279: > 2280: #ifndef PRODUCT You should probably also make the declarations of the two methods inside the classes `NOT_PRODUCT`. ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29449#pullrequestreview-3711638156 PR Review Comment: https://git.openjdk.org/jdk/pull/29449#discussion_r2732371231 From epeter at openjdk.org Tue Jan 27 15:05:48 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 15:05:48 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v6] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Fri, 16 Jan 2026 17:37:09 GMT, Manuel H?ssig wrote: >> Good point, I didn't think of that. Passing a string into the method would be one solution. Another one would be to keep the `bool` return type for `verify_Value_for` and assert at the call site (just as it was before). I think this feels a bit more natural that passing an assert message as parameter. What do you think? > > Perhaps you could change the message to `... PhaseCCP ...` if --- in Java speak --- `this instanceof PhaseCCP` in addition to the comment you added. Ah, I was supposed to answer here? I think something like passing a `PhaseIterGVN` or `PhaseCCP` string would be a good solution :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2732454805 From mbaesken at openjdk.org Tue Jan 27 15:37:19 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 27 Jan 2026 15:37:19 GMT Subject: RFR: 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code [v3] In-Reply-To: References: Message-ID: > Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code but seems they still end up in the product build JVM, at least when using standard build settings. > (This can be observed when enabling link time gc and verbose info printing - in this case the methods are eliminated). > > linuxx86_64 > product build without those methods > > ls -alL images/jdk/lib/server/libjvm.so > size 2.859.5144 > > > unchanged product build : > > ls -alL images/jdk/lib/server/libjvm.so > size 2.859.9464 > > > (so we see a little size difference) Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Adjust decls too ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29449/files - new: https://git.openjdk.org/jdk/pull/29449/files/5f268ea7..23bec285 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29449&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29449&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29449/head:pull/29449 PR: https://git.openjdk.org/jdk/pull/29449 From roland at openjdk.org Tue Jan 27 15:52:43 2026 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 Jan 2026 15:52:43 GMT Subject: RFR: 8350330: C2: PhaseIdealLoop::add_parse_predicate() should mirror GraphKit::add_parse_predicate() [v2] In-Reply-To: References: Message-ID: > `PhaseIdealLoop::add_parse_predicate()` was intented to mirror > `GraphKit::add_parse_predicate()` but it doesn't. That last one checks > `too_many_traps` per bci but the `PhaseIdealLoop` version doesn't. As > demonstrated by the test case, a method can get compiled with a > predicate, take a trap, and get recompiled with the same predicate > many times (up to ~100). Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8350330 - test & fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29367/files - new: https://git.openjdk.org/jdk/pull/29367/files/b1ca0bc4..44832b53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29367&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29367&range=00-01 Stats: 16077 lines in 421 files changed: 6992 ins; 2071 del; 7014 mod Patch: https://git.openjdk.org/jdk/pull/29367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29367/head:pull/29367 PR: https://git.openjdk.org/jdk/pull/29367 From roland at openjdk.org Tue Jan 27 15:52:45 2026 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 Jan 2026 15:52:45 GMT Subject: RFR: 8350330: C2: PhaseIdealLoop::add_parse_predicate() should mirror GraphKit::add_parse_predicate() [v2] In-Reply-To: <8jET6W_ZVuz7gdnA7fscABp054UMADSpU51eRxIZ_YE=.ace7ab11-eb33-4a04-8da2-d03d4b3e2adb@github.com> References: <8jET6W_ZVuz7gdnA7fscABp054UMADSpU51eRxIZ_YE=.ace7ab11-eb33-4a04-8da2-d03d4b3e2adb@github.com> Message-ID: On Fri, 23 Jan 2026 07:37:19 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8350330 >> - test & fix > > test/hotspot/jtreg/compiler/longcountedloops/TestLoopNestTooManyTraps.java line 34: > >> 32: * -XX:-BackgroundCompilation -XX:-ShortRunningLongLoop -XX:-UseOnStackReplacement >> 33: * -XX:CompileOnly=*TestLoopNestTooManyTraps::test1 -XX:LoopMaxUnroll=0 >> 34: * compiler.longcountedloops.TestLoopNestTooManyTraps > > Nice test! Would it make sense for this special test to also have a non-flag run? With no command line flag, the test fails because deoptimization doesn't happen when it expects. So the test would have to be tweaked so it can run in 2 modes (one with the command line flags and one without where some checks have to be relaxed). It doesn't feel like it's worth the effort. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29367#discussion_r2732651457 From chagedorn at openjdk.org Tue Jan 27 16:02:51 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 27 Jan 2026 16:02:51 GMT Subject: RFR: 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code [v3] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 15:37:19 GMT, Matthias Baesken wrote: >> Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code but seems they still end up in the product build JVM, at least when using standard build settings. >> (This can be observed when enabling link time gc and verbose info printing - in this case the methods are eliminated). >> >> linuxx86_64 >> product build without those methods >> >> ls -alL images/jdk/lib/server/libjvm.so >> size 2.859.5144 >> >> >> unchanged product build : >> >> ls -alL images/jdk/lib/server/libjvm.so >> size 2.859.9464 >> >> >> (so we see a little size difference) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust decls too Thanks for the update, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29449#pullrequestreview-3712057005 From chagedorn at openjdk.org Tue Jan 27 16:04:20 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 27 Jan 2026 16:04:20 GMT Subject: RFR: 8350330: C2: PhaseIdealLoop::add_parse_predicate() should mirror GraphKit::add_parse_predicate() [v2] In-Reply-To: References: <8jET6W_ZVuz7gdnA7fscABp054UMADSpU51eRxIZ_YE=.ace7ab11-eb33-4a04-8da2-d03d4b3e2adb@github.com> Message-ID: On Tue, 27 Jan 2026 15:44:43 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/longcountedloops/TestLoopNestTooManyTraps.java line 34: >> >>> 32: * -XX:-BackgroundCompilation -XX:-ShortRunningLongLoop -XX:-UseOnStackReplacement >>> 33: * -XX:CompileOnly=*TestLoopNestTooManyTraps::test1 -XX:LoopMaxUnroll=0 >>> 34: * compiler.longcountedloops.TestLoopNestTooManyTraps >> >> Nice test! Would it make sense for this special test to also have a non-flag run? > > With no command line flag, the test fails because deoptimization doesn't happen when it expects. So the test would have to be tweaked so it can run in 2 modes (one with the command line flags and one without where some checks have to be relaxed). It doesn't feel like it's worth the effort. Makes sense, then let's leave it as it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29367#discussion_r2732730632 From epeter at openjdk.org Tue Jan 27 16:06:05 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 16:06:05 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: References: Message-ID: <0Y-lUGt3Co6upV2G_agCAH35r-J1oM-8xbVOxWeVoxc=.7af84968-4994-4fbd-8fe7-59965de04d11@github.com> On Tue, 27 Jan 2026 03:11:05 GMT, Jasmine Karthikeyan wrote: >> Ok, I filed this as another follow-up: >> [JDK-8376179](https://bugs.openjdk.org/browse/JDK-8376179): C2 SuperWord: improve subword vectorization, avoid cast to-and-from int > > Hi @eme64, thanks a lot for the review! I've pushed an update that should address the review comments and update the bug annotations and copyright years. About the cast to and from int, the only places where that should be required is when the node doesn't support truncation. Right now it looks like reductions also cast to int even when they're not required, such as with `AndReduction`. I can do some further investigation in a followup patch to see where the int vectors are generated. @jaskarth Something is going a bit strange with my testing script. Could you merge with latest master, maybe that helps? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-3806048519 From epeter at openjdk.org Tue Jan 27 16:08:00 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jan 2026 16:08:00 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v15] In-Reply-To: References: Message-ID: > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server From psandoz at openjdk.org Tue Jan 27 18:28:32 2026 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 27 Jan 2026 18:28:32 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v17] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 12:27:46 GMT, Jatin Bhateja wrote: > We will still need to create T_FLOAT16 basic type and associate it with Float16 LaneType, why not directly pass these basic types to intrinsic entry point ? The strong feedback from HotSpot folks, which i agree with, is adding a new enum value to `BasicType` is not the way to go - it is too disruptive and does not scale. Sorry if i misled you earlier on, it was my intention in feedback to propose something that was limited in scope to vector support. The thought about a proxy class was motivated by a question i had - what would we do if `Float16.class` was already present in `java.base`? and answers to that might motivate what we do now in preparation for when that happens. Regardless i think we need to separate out the Vector API's direct dependence on BasicType and its values. Instead we should define our own constants for the vector element types, and provide mapping of those to BasicType values which might result in "erasure" to the carrier type. We should adjust/adapt LaneType accordingly. Does that make sense to you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3806791602 From psandoz at openjdk.org Tue Jan 27 18:41:13 2026 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 27 Jan 2026 18:41:13 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v4] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 09:26:35 GMT, Eric Fang wrote: >> This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. >> >> Changes: >> -------- >> >> 1. C2 mid-end: >> - Added UMinReductionVNode and UMaxReductionVNode >> >> 2. AArch64 Backend: >> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions >> - Updated match rules for all vector sizes and element types >> - Both NEON and SVE implementation are supported >> >> 3. Test: >> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java >> - Added assembly tests in aarch64-asmtest.py for new instructions >> - Added a JTReg test file VectorUMinMaxReductionTest.java >> >> Different configurations were tested on aarch64 and x86 machines, and all tests passed. >> >> Test results of JMH benchmarks from the panama-vector project: >> -------- >> >> On a Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After Error Uplift >> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 >> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 >> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 >> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 >> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 >> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 >> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 >> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 >> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 >> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 >> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 >> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 >> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 >> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 >> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 >> Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 >> Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 >> ... > > Eric Fang has updated the pull request incrementally with one additional commit since the last revision: > > Move helper functions into c2_MacroAssembler_aarch64.hpp The general way code flows right now, but not often, is from jdk/master to panama-vector/vectorIntrinsics, since most of the development work is in the mainline (exceptions to that are the float16 and Valhalla alignment work which are large efforts). I am very reluctant to include all the auto-generated micro benchmarks in mainline. There is a huge number of them and i am not certain they provide as much value as they did now we have the IR test framework. In may cases, given the simplicity of what they measure, they were designed to ensure C2 generates the right instructions. The IR test framework is better at determining that by testing the right IR nodes are generated - and they get run as part of the existing HotSpot test suite. The IR test framework is of course no substitute, in general, for performance tests. A better focus for Vector API performance tests is i think Emanuel's work [here](https://github.com/openjdk/jdk/pull/28639/) and use-cases/algorithms that can be implemented concisely. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3806851359 From vlivanov at openjdk.org Tue Jan 27 19:08:48 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 27 Jan 2026 19:08:48 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v16] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 16:09:36 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply changes from review > > src/hotspot/share/opto/vectornode.cpp line 1540: > >> 1538: >> 1539: // Opcode is only required to disambiguate half float, so we pass -1 as it can't be encountered here. >> 1540: return (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt); > > Why do we specifically require it to be a subword. If you mean this is only called with one of the two being a subword then can we use an assert instead? Moreover, there are test cases for int<->long conversions (testIntToLong/testLongToInt). How do they work? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2733436829 From vlivanov at openjdk.org Tue Jan 27 22:18:33 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 27 Jan 2026 22:18:33 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions [v2] In-Reply-To: References: Message-ID: > Strength-reducing an interface call to a virtual call for interfaces with > unique implementors can use receiver type information to narrow the context. > > C2 tracks interface types and receiver type information can be used to reveal > an interface with a unique implementor which can't be derived from the call > site itself. > > Since C2 effectively accumulates a union interface type from multiple subtype checks, iterating over individual components of a type may reveal a candidate for a strength-reduction. The only prerequisite is that a candidate has to be a subtype of the declared interface. > > Testing: hs-tier1 - hs-tier5 Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - addtional case - Merge branch 'master' into cha.intf.recv - Use receiver type to improve CHA decisions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28811/files - new: https://git.openjdk.org/jdk/pull/28811/files/2551b03c..70ceb4bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28811&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28811&range=00-01 Stats: 95946 lines in 3901 files changed: 48141 ins; 16836 del; 30969 mod Patch: https://git.openjdk.org/jdk/pull/28811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28811/head:pull/28811 PR: https://git.openjdk.org/jdk/pull/28811 From vlivanov at openjdk.org Tue Jan 27 22:18:39 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 27 Jan 2026 22:18:39 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions [v2] In-Reply-To: <4Yf-QKYfX3oICfh5VbDj7h5C3GKkExeu8llKaLWx4L8=.ccd07642-5d3b-4c02-96a4-a0495f041482@github.com> References: <4Yf-QKYfX3oICfh5VbDj7h5C3GKkExeu8llKaLWx4L8=.ccd07642-5d3b-4c02-96a4-a0495f041482@github.com> Message-ID: On Mon, 15 Dec 2025 16:25:00 GMT, Roland Westrelin wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - addtional case >> - Merge branch 'master' into cha.intf.recv >> - Use receiver type to improve CHA decisions > > src/hotspot/share/opto/callGenerator.cpp line 529: > >> 527: allow_inline, >> 528: _prof_factor, >> 529: nullptr /*receiver_type*/, > > Is there no benefit to passing `receiver_type` here? Good point. After thinking more about it, I see some corner cases when it may be useful. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2733978428 From vlivanov at openjdk.org Tue Jan 27 22:18:42 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 27 Jan 2026 22:18:42 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions [v2] In-Reply-To: References: Message-ID: On Mon, 22 Dec 2025 16:32:15 GMT, Damon Fenacci wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - addtional case >> - Merge branch 'master' into cha.intf.recv >> - Use receiver type to improve CHA decisions > > src/hotspot/share/opto/doCall.cpp line 340: > >> 338: // number of implementors for decl_interface is 0 or 1. If >> 339: // it's 0 then no class implements decl_interface and there's >> 340: // no point in inlining. > > Does the above comment still hold? Or did you remove it because it is not relevant anymore? IMO that part of the comment becomes misleading, since the code computes "context interface" now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2733981471 From duke at openjdk.org Tue Jan 27 23:34:26 2026 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 27 Jan 2026 23:34:26 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v6] In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 10:03:39 GMT, George Wort wrote: > Won't the GC only remove an nmethod if it's inaccessible rather than it just being cold though? Or does the GC sweep methods that haven't been called for a while? The GC should remove nmethods even if they are still accessible. There is a check to determine if an nmethod should be removed based on the `is_cold` heuristic ([link](https://github.com/openjdk/jdk/blob/fa1b1d677ac492dfdd3110b9303a4c2b009046c8/src/hotspot/share/code/nmethod.cpp#L2736-L2741)) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27858#issuecomment-3808110703 From duke at openjdk.org Wed Jan 28 00:39:13 2026 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 28 Jan 2026 00:39:13 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v6] In-Reply-To: References: Message-ID: <9m7_4Kf6Vea6NuHshrpu882clUZ62dg4eUEaEz6ZfNM=.32b82a85-21ac-4a9b-ae89-ed66a4bcd4e1@github.com> On Mon, 19 Jan 2026 23:06:56 GMT, Evgeny Astigeevich wrote: > We can consider relocating nmethods back to the normal heap, the non-profiled code heap. IMO we should do this instead of GC throwing them away. If after being moved to the normal heap they become cold, GC will remove them from CodeCache. If they become hot again, they will be relocated to HotCodeHeap. I think we should only consider relocating nmethods to the non-profiled code heap if the HotCodeHeap is full. If the GC determines an nmethod from the HotCodeHeap is cold, it would also have determined it were cold in the non-profiled heap. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27858#issuecomment-3808304020 From erfang at openjdk.org Wed Jan 28 01:49:52 2026 From: erfang at openjdk.org (Eric Fang) Date: Wed, 28 Jan 2026 01:49:52 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v9] In-Reply-To: References: Message-ID: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request incrementally with one additional commit since the last revision: Add clearer comments to VectorMaskCastIdentityTest.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28313/files - new: https://git.openjdk.org/jdk/pull/28313/files/9c38a6d9..f53d330f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=07-08 Stats: 14 lines in 1 file changed: 8 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From erfang at openjdk.org Wed Jan 28 01:52:38 2026 From: erfang at openjdk.org (Eric Fang) Date: Wed, 28 Jan 2026 01:52:38 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: <_61TOKMrW72QC8X-YzGXJ2Ws5kUz5fhQxB9-Q9XHuKk=.3a271efc-1115-426a-aa94-678b5ba053f5@github.com> On Fri, 26 Dec 2025 11:10:07 GMT, Emanuel Peter wrote: >> Eric Fang has updated the pull request incrementally with one additional commit since the last revision: >> >> Refine code comments > > I'll review this again in early January, once I'm back from Christnas/New Year break ;) @eme64 I have addressed your comments, would you mind taking another look of this PR? Thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28313#issuecomment-3808501474 From dlong at openjdk.org Wed Jan 28 02:25:23 2026 From: dlong at openjdk.org (Dean Long) Date: Wed, 28 Jan 2026 02:25:23 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v6] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 19:37:06 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Use UpperCamelCase src/hotspot/share/ci/ciObject.hpp line 118: > 116: > 117: // Access to the constant value cache > 118: // Key must be nonnegative. Negative keys are reserved. Suggestion: // Keys representing an array index or field offset are nonnegative. Negative keys are reserved for special values such as identity hash code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2734576071 From dlong at openjdk.org Wed Jan 28 02:56:57 2026 From: dlong at openjdk.org (Dean Long) Date: Wed, 28 Jan 2026 02:56:57 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v6] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 19:37:06 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Use UpperCamelCase src/hotspot/share/oops/oop.inline.hpp line 443: > 441: intptr_t oopDesc::fast_identity_hash_or_no_hash() { > 442: // Note: The mark must be read into local variable to avoid concurrent updates. > 443: markWord mrk = mark_acquire(); Suggestion: markWord mrk = mark(); For the fast case, it seems safe to use mark() like oopDesc::identity_hash() does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2734638145 From erfang at openjdk.org Wed Jan 28 03:12:03 2026 From: erfang at openjdk.org (Eric Fang) Date: Wed, 28 Jan 2026 03:12:03 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v4] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 18:38:40 GMT, Paul Sandoz wrote: >> Eric Fang has updated the pull request incrementally with one additional commit since the last revision: >> >> Move helper functions into c2_MacroAssembler_aarch64.hpp > > The general way code flows right now, but not often, is from jdk/master to panama-vector/vectorIntrinsics, since most of the development work is in the mainline (exceptions to that are the float16 and Valhalla alignment work which are large efforts). > > I am very reluctant to include all the auto-generated micro benchmarks in mainline. There is a huge number of them and i am not certain they provide as much value as they did now we have the IR test framework. In may cases, given the simplicity of what they measure, they were designed to ensure C2 generates the right instructions. The IR test framework is better at determining that by testing the right IR nodes are generated - and they get run as part of the existing HotSpot test suite. > > The IR test framework is of course no substitute, in general, for performance tests. A better focus for Vector API performance tests is i think Emanuel's work [here](https://github.com/openjdk/jdk/pull/28639/) and use-cases/algorithms that can be implemented concisely. @PaulSandoz thanks for your insight, this really makes sense to me. Hi @theRealAph, I've added a number of IR tests into this PR, and there are also numerous related tests in `test/jdk/jdk/incubator/vector/`, like [UMINByte128VectorTests()](https://github.com/openjdk/jdk/blob/fa1b1d677ac492dfdd3110b9303a4c2b009046c8/test/jdk/jdk/incubator/vector/Byte128VectorTests.java#L3283), which are sufficient to ensure the qulity of this PR. I share your feeling that it's inconvenience to review the JMH test that isn't in the mainline. I should have included the JMH link in the commit message, which is here: [Byte128Vector.UMINLanes](https://github.com/openjdk/panama-vector/blob/2181a35d64762bb3ac3d7fb66212c2559b6b72b5/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Byte128Vector.java#L1542). Is it fine to you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3808702684 From dlong at openjdk.org Wed Jan 28 03:14:45 2026 From: dlong at openjdk.org (Dean Long) Date: Wed, 28 Jan 2026 03:14:45 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v6] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 19:37:06 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Use UpperCamelCase src/hotspot/share/opto/library_call.cpp line 4791: > 4789: const TypeInstPtr* t = _gvn.type(obj)->isa_instptr(); > 4790: if (t != nullptr && t->const_oop() != nullptr) { > 4791: assert(!is_virtual, "no devirtualization for constant receiver?"); Don't we also need to check for `is_static`, to distinguish between `Object.hashCode` and `System.identityHashCode`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2734670836 From xgong at openjdk.org Wed Jan 28 03:35:12 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 28 Jan 2026 03:35:12 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v15] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 16:08:00 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix IR rule that failed on neon test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 252: > 250: public Object fillI_VectorAPI(int[] r) { > 251: return VectorAlgorithmsImpl.fillI_VectorAPI(r); > 252: } So the IR check only passes because we rely on `fillI_VectorAPI` being inlined, correct? I'm wondering whether it's possible that the method is not inlined as expected and could the IR check fail if so? Does it deserve a `ForceInline` annotation on methods in the `Impl` file? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2734706459 From liach at openjdk.org Wed Jan 28 03:46:01 2026 From: liach at openjdk.org (Chen Liang) Date: Wed, 28 Jan 2026 03:46:01 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v6] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 03:11:52 GMT, Dean Long wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Use UpperCamelCase > > src/hotspot/share/opto/library_call.cpp line 4791: > >> 4789: const TypeInstPtr* t = _gvn.type(obj)->isa_instptr(); >> 4790: if (t != nullptr && t->const_oop() != nullptr) { >> 4791: assert(!is_virtual, "no devirtualization for constant receiver?"); > > Don't we also need to check for `is_static`, to distinguish between `Object.hashCode` and `System.identityHashCode`? I think once we are not virtual, the native Object::hashCode behaves like System::identityHashCode. The only difference is null check, but I think there's a null check in the beginning so we should be safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2734726026 From jkarthikeyan at openjdk.org Wed Jan 28 04:33:03 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 28 Jan 2026 04:33:03 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v16] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 16:09:36 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply changes from review > > src/hotspot/share/opto/vectornode.cpp line 1540: > >> 1538: >> 1539: // Opcode is only required to disambiguate half float, so we pass -1 as it can't be encountered here. >> 1540: return (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt); > > Why do we specifically require it to be a subword. If you mean this is only called with one of the two being a subword then can we use an assert instead? @merykitty Since this function is called from places that may not always have a subword type, changing it to an assert causes spurious assertion failures. @iwanowww Scalar int<->long conversions are modeled with `ConvI2L` and `ConvL2I`, the existing superword mechanism is able to vectorize the conversion without needing to dynamically add new nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2734803318 From jkarthikeyan at openjdk.org Wed Jan 28 04:33:00 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 28 Jan 2026 04:33:00 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: <0Y-lUGt3Co6upV2G_agCAH35r-J1oM-8xbVOxWeVoxc=.7af84968-4994-4fbd-8fe7-59965de04d11@github.com> References: <0Y-lUGt3Co6upV2G_agCAH35r-J1oM-8xbVOxWeVoxc=.7af84968-4994-4fbd-8fe7-59965de04d11@github.com> Message-ID: <5Aj7cNpkNPSVkQ_ElMG_SOTlssaomF0YcJJJ15_UpQc=.d8ac8eb7-cae1-49b2-907b-8e6c1e9274dd@github.com> On Tue, 27 Jan 2026 16:03:07 GMT, Emanuel Peter wrote: >> Hi @eme64, thanks a lot for the review! I've pushed an update that should address the review comments and update the bug annotations and copyright years. About the cast to and from int, the only places where that should be required is when the node doesn't support truncation. Right now it looks like reductions also cast to int even when they're not required, such as with `AndReduction`. I can do some further investigation in a followup patch to see where the int vectors are generated. > > @jaskarth Something is going a bit strange with my testing script. Could you merge with latest master, maybe that helps? Thanks for the ping @eme64, I've merged from master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-3808878311 From jkarthikeyan at openjdk.org Wed Jan 28 04:32:59 2026 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 28 Jan 2026 04:32:59 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v17] In-Reply-To: References: Message-ID: > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' into vectorize-subword - Apply changes from review - Fix whitespace - Update tests after merge, apply changes from review - Merge from master - Update tests, cleanup logic - Merge branch 'master' into vectorize-subword - Check for AVX2 for byte/long conversions - Whitespace and benchmark tweak - Address more comments, make test and benchmark more exhaustive - ... and 11 more: https://git.openjdk.org/jdk/compare/fa1b1d67...641a3abc ------------- Changes: https://git.openjdk.org/jdk/pull/23413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=16 Stats: 796 lines in 15 files changed: 690 ins; 14 del; 92 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From qamai at openjdk.org Wed Jan 28 04:54:14 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 28 Jan 2026 04:54:14 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v16] In-Reply-To: References: Message-ID: <5nv3GFy_QMTZNF-gbO1RgyN9RH7gExYTiAb47KCJlW8=.52905d8b-31f8-4c75-879c-8dda077bd231@github.com> On Wed, 28 Jan 2026 04:28:52 GMT, Jasmine Karthikeyan wrote: >> src/hotspot/share/opto/vectornode.cpp line 1540: >> >>> 1538: >>> 1539: // Opcode is only required to disambiguate half float, so we pass -1 as it can't be encountered here. >>> 1540: return (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt); >> >> Why do we specifically require it to be a subword. If you mean this is only called with one of the two being a subword then can we use an assert instead? > > @merykitty Since this function is called from places that may not always have a subword type, changing it to an assert causes spurious assertion failures. > > @iwanowww Scalar int<->long conversions are modeled with `ConvI2L` and `ConvL2I`, the existing superword mechanism is able to vectorize the conversion without needing to dynamically add new nodes. Can you give me an example where this is the case? And in that case, can we remove the `is_subword_type` condition and check the availability of a cast regardless? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2734846588 From epeter at openjdk.org Wed Jan 28 07:18:15 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jan 2026 07:18:15 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: <5Aj7cNpkNPSVkQ_ElMG_SOTlssaomF0YcJJJ15_UpQc=.d8ac8eb7-cae1-49b2-907b-8e6c1e9274dd@github.com> References: <0Y-lUGt3Co6upV2G_agCAH35r-J1oM-8xbVOxWeVoxc=.7af84968-4994-4fbd-8fe7-59965de04d11@github.com> <5Aj7cNpkNPSVkQ_ElMG_SOTlssaomF0YcJJJ15_UpQc=.d8ac8eb7-cae1-49b2-907b-8e6c1e9274dd@github.com> Message-ID: On Wed, 28 Jan 2026 04:27:31 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Something is going a bit strange with my testing script. Could you merge with latest master, maybe that helps? > > Thanks for the ping @eme64, I've merged from master. @jaskarth Thanks, now the script succeeded and tests are launched :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-3809422234 From chagedorn at openjdk.org Wed Jan 28 07:20:10 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 28 Jan 2026 07:20:10 GMT Subject: RFR: 8374622: StressIncrementalInlining should also randomize the processing order [v2] In-Reply-To: References: <4t8vV1C6ARKkVmpI_7yMbXxv_ITJnNOk7b3GFJ5q9NM=.2d323fa6-c849-4143-895c-74a86afba457@github.com> Message-ID: On Tue, 20 Jan 2026 15:54:53 GMT, Marc Chevalier wrote: >> As it says: randomize the order the late inlines are processed, and slightly factor it with macro nodes. >> >> I didn't add shuffle to the `GrowableArray` class since it seems a bit method specialized method (and it seems it'd be a controversial change to a widely use class), it would make `GrowableArray` depends on `Compile` for random number generation (or require a callback, for instance, giving a non-trivial signature and usage), I couldn't find other shuffling of such an object. >> There is also shuffling of `UniqueNodeList` (for `StressIGVN`), but it seems hard to unify: access and length are not written quite the same, it would probably be not simpler than duplicating the implem (which is simple, so it's fine in my opinion). >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Randomize insertion We've also discussed it offline, I agree with your suggestion. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29110#pullrequestreview-3714946353 From epeter at openjdk.org Wed Jan 28 07:23:16 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jan 2026 07:23:16 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v15] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 03:32:30 GMT, Xiaohong Gong wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix IR rule that failed on neon > > test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 252: > >> 250: public Object fillI_VectorAPI(int[] r) { >> 251: return VectorAlgorithmsImpl.fillI_VectorAPI(r); >> 252: } > > So the IR check only passes because we rely on `fillI_VectorAPI` being inlined, correct? I'm wondering whether it's possible that the method is not inlined as expected and could the IR check fail if so? Does it deserve a `ForceInline` annotation on methods in the `Impl` file? That's why both the test and the benchmark have the flag `-XX:CompileCommand=inline,*VectorAlgorithmsImpl*::*` so we can avoid the annotations ;) If I had to do annotation, then it would be `@ForceInline` for the test, and `@CompilerControl(CompilerControl.Mode.INLINE)` for the benchmark. So that would mean the `Impl` files would not be identical. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2735192882 From epeter at openjdk.org Wed Jan 28 07:27:19 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jan 2026 07:27:19 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v11] In-Reply-To: References: <-avpi_0V0b9jLf29Sah1VB8ZiOWu9MSFl8X3vO22NFQ=.0c006266-c5eb-4077-b238-311e6190486f@github.com> Message-ID: On Tue, 20 Jan 2026 19:48:42 GMT, Vladimir Ivanov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use firstTrue for XiaohongGong > > Looks good. @iwanowww Can I get your re-approval for integration, please? :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3809481679 From xgong at openjdk.org Wed Jan 28 08:01:34 2026 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 28 Jan 2026 08:01:34 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v15] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 07:20:39 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 252: >> >>> 250: public Object fillI_VectorAPI(int[] r) { >>> 251: return VectorAlgorithmsImpl.fillI_VectorAPI(r); >>> 252: } >> >> So the IR check only passes because we rely on `fillI_VectorAPI` being inlined, correct? I'm wondering whether it's possible that the method is not inlined as expected and could the IR check fail if so? Does it deserve a `ForceInline` annotation on methods in the `Impl` file? > > That's why both the test and the benchmark have the flag `-XX:CompileCommand=inline,*VectorAlgorithmsImpl*::*` so we can avoid the annotations ;) > > If I had to do annotation, then it would be `@ForceInline` for the test, and `@CompilerControl(CompilerControl.Mode.INLINE)` for the benchmark. So that would mean the `Impl` files would not be identical. Got it. I didn't notice the vm options inside the test. Thanks so much for your explanation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2735323357 From roland at openjdk.org Wed Jan 28 08:06:08 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 28 Jan 2026 08:06:08 GMT Subject: RFR: 8350330: C2: PhaseIdealLoop::add_parse_predicate() should mirror GraphKit::add_parse_predicate() [v2] In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 11:41:55 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8350330 >> - test & fix > > LGTM, too. @merykitty @chhagedorn thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29367#issuecomment-3809627664 From roland at openjdk.org Wed Jan 28 08:06:09 2026 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 28 Jan 2026 08:06:09 GMT Subject: Integrated: 8350330: C2: PhaseIdealLoop::add_parse_predicate() should mirror GraphKit::add_parse_predicate() In-Reply-To: References: Message-ID: <5rrdFE7mk19Za3OdV2jNEcvJZDWd_04ZYXqkKDxhr5s=.b7911080-958e-4259-af77-66db6e4595b6@github.com> On Thu, 22 Jan 2026 16:22:34 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::add_parse_predicate()` was intented to mirror > `GraphKit::add_parse_predicate()` but it doesn't. That last one checks > `too_many_traps` per bci but the `PhaseIdealLoop` version doesn't. As > demonstrated by the test case, a method can get compiled with a > predicate, take a trap, and get recompiled with the same predicate > many times (up to ~100). This pull request has now been integrated. Changeset: b2cd3b0d Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/b2cd3b0d48bdabacfd421dee9b9f87a003e0e09d Stats: 123 lines in 3 files changed: 100 ins; 20 del; 3 mod 8350330: C2: PhaseIdealLoop::add_parse_predicate() should mirror GraphKit::add_parse_predicate() Reviewed-by: chagedorn, qamai ------------- PR: https://git.openjdk.org/jdk/pull/29367 From qamai at openjdk.org Wed Jan 28 08:19:11 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 28 Jan 2026 08:19:11 GMT Subject: [jdk26] RFR: 8375653: C2: CmpUNode::sub is not monotonic [v2] In-Reply-To: <0aTmdsHwgUi6fQOr6J-D-9a73aOAodOVEjqy7QBsQJY=.a6fa9a07-5cc6-4738-a5cb-f425b1098b34@github.com> References: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> <0aTmdsHwgUi6fQOr6J-D-9a73aOAodOVEjqy7QBsQJY=.a6fa9a07-5cc6-4738-a5cb-f425b1098b34@github.com> Message-ID: On Mon, 26 Jan 2026 18:27:02 GMT, Quan Anh Mai wrote: >> Hi all, >> >> This pull request contains a backport of commit [30675faa](https://github.com/openjdk/jdk/commit/30675faa67d1bbb4acc729a841493bb8311416af) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Quan Anh Mai on 26 Jan 2026 and was reviewed by Christian Hagedorn and Marc Chevalier. >> >> Thanks! > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Fix test Thanks a lot for your approvals! The tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29412#issuecomment-3809684741 From qamai at openjdk.org Wed Jan 28 08:19:13 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 28 Jan 2026 08:19:13 GMT Subject: [jdk26] Integrated: 8375653: C2: CmpUNode::sub is not monotonic In-Reply-To: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> References: <7LhAjmvBXDhH1TvnAU7qzB8HwdWpXS9GYfUTKjp4Lgs=.7683ff64-ce00-4e45-b234-552d17058658@github.com> Message-ID: On Mon, 26 Jan 2026 12:09:38 GMT, Quan Anh Mai wrote: > Hi all, > > This pull request contains a backport of commit [30675faa](https://github.com/openjdk/jdk/commit/30675faa67d1bbb4acc729a841493bb8311416af) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Quan Anh Mai on 26 Jan 2026 and was reviewed by Christian Hagedorn and Marc Chevalier. > > Thanks! This pull request has now been integrated. Changeset: 52340411 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/523404112e9efaee357470a33d6fcb08a22029da Stats: 387 lines in 3 files changed: 291 ins; 77 del; 19 mod 8375653: C2: CmpUNode::sub is not monotonic Reviewed-by: chagedorn, thartmann Backport-of: 30675faa67d1bbb4acc729a841493bb8311416af ------------- PR: https://git.openjdk.org/jdk/pull/29412 From qamai at openjdk.org Wed Jan 28 08:19:20 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 28 Jan 2026 08:19:20 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v6] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AddNode/SubNode::Value` by taking advantage of the additional information in `TypeInt`. The implementation has some pretty non-trivial logic. Fortunately, the test infrastructure is already there. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Fix merge conflict, address review - Merge branch 'master' into addsub - Improve comments - copyright year - Merge branch 'master' into addsub - Merge branch 'master' into addsub - include order - Improve Add/SubNode::Value with unsigned bounds and known bits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28897/files - new: https://git.openjdk.org/jdk/pull/28897/files/ae17b24e..1b3ea38d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=04-05 Stats: 81638 lines in 2598 files changed: 39493 ins; 14781 del; 27364 mod Patch: https://git.openjdk.org/jdk/pull/28897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28897/head:pull/28897 PR: https://git.openjdk.org/jdk/pull/28897 From qamai at openjdk.org Wed Jan 28 08:19:21 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 28 Jan 2026 08:19:21 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v4] In-Reply-To: <4m46JmkQxxNKZoYMVsKjSJZ2Gl8HWMJgg1G7f8Co5Bk=.48ecd3e9-ceae-488e-92db-cf19de82815d@github.com> References: <4m46JmkQxxNKZoYMVsKjSJZ2Gl8HWMJgg1G7f8Co5Bk=.48ecd3e9-ceae-488e-92db-cf19de82815d@github.com> Message-ID: <9OJNKEnZgk4HJK9lFuBz_s4P4x_qKNjHcBUw1OqL7-s=.4e7bec69-4c21-4f08-9d79-d59493459acc@github.com> On Thu, 15 Jan 2026 08:35:12 GMT, Beno?t Maillard wrote: >> Great work! I went through all the calculations, and tried to reproduce them independently. It all looks sound to me. I only have a few comments, mostly about notation. > >> @benoitmaillard Thanks a lot for your reviews! I have addressed your comments. I think this PR should wait for #28952, so it would be great if you or anyone could take a look there. > > I will try to take a look today if I have time, worst case tomorrow. Thank you for adressing my comments! @benoitmaillard I have merged the branch with master and included your suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28897#issuecomment-3809700643 From bmaillard at openjdk.org Wed Jan 28 08:24:09 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 08:24:09 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v8] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
    > Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with three additional commits since the last revision: - Use a string for selecting the assert message - Revert "DEBUG: try to pass assert message with char*" This reverts commit 79ef113242a3ada4345fd60932d5815536d04c45. - DEBUG: try to pass assert message with char* ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/d22ce771..a71b8233 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=06-07 Stats: 9 lines in 2 files changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From bmaillard at openjdk.org Wed Jan 28 08:27:01 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 08:27:01 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v9] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
    > Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Remove unused declaration ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/a71b8233..f535b56e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From bmaillard at openjdk.org Wed Jan 28 08:31:54 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 08:31:54 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v9] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: <5iARDbaCfyFwrDVwC6p-OZFfaJOiMbK6b6mABzuZHlk=.d203b093-6523-40d1-8d57-e3f0dc4433a5@github.com> On Fri, 14 Nov 2025 07:10:28 GMT, Christian Hagedorn wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused declaration > > Good idea, looks good to me! I've carried out the changes that were requested, I think we are ready to move forward with this PR. I would need your (re) approval @chhagedorn @eme64 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28295#issuecomment-3809761275 From epeter at openjdk.org Wed Jan 28 08:41:23 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jan 2026 08:41:23 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v9] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: <9eLBVN0xV07IbyO99CQ79dyVYUme_JpewQqgTo3eFtg=.f9e352cd-724c-4150-a2bb-a0aa4236c4e7@github.com> On Wed, 28 Jan 2026 08:27:01 GMT, Beno?t Maillard wrote: >> This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. >> >> In summary, this PR brings the following changes: >> - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. >> - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. >> - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. >> >> ### Example outputs >> #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) >> Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >>
    >> Before >> >> >> Missed Ideal optimization (can_reshape=false): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 >> 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) >> 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) >> The result after Ideal: >> dist dump >> --------------------------------------------- >> 1 337 ConL === 0 [[ 338 ]] #long:-9 >> 1 336 URShiftL === _ 298 22 [[ 338 ]] >> 0 338 AndL === _ 336 337 [[ ]] >> >> >> Missed Ideal optimization (can_reshape=true): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused declaration src/hotspot/share/opto/phaseX.cpp line 1206: > 1204: } else if (strcmp(phase, "PhaseCCP") == 0) { > 1205: assert(false, "PhaseCCP not at fixpoint: analysis result may be unsound for %s", n->Name()); > 1206: } The `strcmp` is a little nasty, but I don't have a better solution right now. But I think we should convert the `else if` condition into an `assert` in the `else` branch. Imagine someone calls the method with a string we don't match here: would we just silently pass? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2735464563 From bmaillard at openjdk.org Wed Jan 28 08:59:04 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 08:59:04 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v10] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
    > Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Add default case for assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/f535b56e..387b0c19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=08-09 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From bmaillard at openjdk.org Wed Jan 28 08:59:06 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 08:59:06 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v9] In-Reply-To: <9eLBVN0xV07IbyO99CQ79dyVYUme_JpewQqgTo3eFtg=.f9e352cd-724c-4150-a2bb-a0aa4236c4e7@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> <9eLBVN0xV07IbyO99CQ79dyVYUme_JpewQqgTo3eFtg=.f9e352cd-724c-4150-a2bb-a0aa4236c4e7@github.com> Message-ID: <-rssqxnycfY9oZ6SgyG8JyzBctLFgg2QQs7Iydxx9Qo=.7e513c45-f132-445f-9d11-7d1018520dfb@github.com> On Wed, 28 Jan 2026 08:37:40 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused declaration > > src/hotspot/share/opto/phaseX.cpp line 1206: > >> 1204: } else if (strcmp(phase, "PhaseCCP") == 0) { >> 1205: assert(false, "PhaseCCP not at fixpoint: analysis result may be unsound for %s", n->Name()); >> 1206: } > > The `strcmp` is a little nasty, but I don't have a better solution right now. > But I think we should convert the `else if` condition into an `assert` in the `else` branch. > Imagine someone calls the method with a string we don't match here: would we just silently pass? Right, of course we need to assert in that case as well. Made the change (not sure that's exactly what you meant by "convert the `else if` condition into an `assert` though). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2735528005 From snatarajan at openjdk.org Wed Jan 28 09:38:51 2026 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 28 Jan 2026 09:38:51 GMT Subject: RFR: 8366861: Phase AFTER_LOOP_OPTS printed even though the method has no loops [v2] In-Reply-To: References: Message-ID: <5G4AHHEZwxVgbE5Q-E32_GbJRe15EsQtpNYsmpyRj7c=.e5a20875-41dd-4c52-8abb-59f084a91642@github.com> On Mon, 26 Jan 2026 19:16:32 GMT, Christian Hagedorn wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> fixing comment > > Marked as reviewed by chagedorn (Reviewer). Thank you for the reviews @chhagedorn and @dafedafe ------------- PR Comment: https://git.openjdk.org/jdk/pull/29387#issuecomment-3810104184 From jbhateja at openjdk.org Wed Jan 28 09:39:42 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 28 Jan 2026 09:39:42 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v18] In-Reply-To: References: Message-ID: <23D_3Ap5t-E3oXX8yiulKT0bhbumyT4N8ucWDjNMZPE=.054aef2e-8280-448e-88b8-cbfcf91fbfaf@github.com> > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding new lane type constants for intrinsic entries, removing basictype extension changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28002/files - new: https://git.openjdk.org/jdk/pull/28002/files/ce5768fa..68145fd9 Webrevs: - full: Webrev is not available because diff is too large - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=16-17 Stats: 1162 lines in 58 files changed: 25 ins; 26 del; 1111 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From jbhateja at openjdk.org Wed Jan 28 09:39:44 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 28 Jan 2026 09:39:44 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v17] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 18:25:25 GMT, Paul Sandoz wrote: > > We will still need to create T_FLOAT16 basic type and associate it with Float16 LaneType, why not directly pass these basic types to intrinsic entry point ? > > The strong feedback from HotSpot folks, which i agree with, is adding a new enum value to `BasicType` is not the way to go - it is too disruptive and does not scale. Sorry if i misled you earlier on, it was my intention in feedback to propose something that was limited in scope to vector support. > > The thought about a proxy class was motivated by a question i had - what would we do if `Float16.class` was already present in `java.base`? and answers to that might motivate what we do now in preparation for when that happens. Regardless i think we need to separate out the Vector API's direct dependence on BasicType and its values. Instead we should define our own constants for the vector element types, and provide mapping of those to BasicType values which might result in "erasure" to the carrier type. We should adjust/adapt LaneType accordingly. Does that make sense to you? Hi @PaulSandoz , Yes this looks good to me, I have modified the patch accordingly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3810098116 From snatarajan at openjdk.org Wed Jan 28 09:41:51 2026 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 28 Jan 2026 09:41:51 GMT Subject: Integrated: 8366861: Phase AFTER_LOOP_OPTS printed even though the method has no loops In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 12:41:25 GMT, Saranya Natarajan wrote: > **Issue** > When a test program with no loop is run with flag `-XX:PrintPhaseLevel=3,` the output prints `PHASE_AFTER_LOOP_OPTS `but does not print `PHASE_BEFORE_LOOP_OPTS` > > **Solution** > This simple fix introduces a variable `_print_phase_loop_opts` that caches the value of `has_loops()` before loop opts pass to make sure that `PHASE_AFTER_LOOP_OPTS` gets printed every time `PHASE_BEFORE_LOOP_OPTS` is printed. > > Before fix output > image > > After fix output > image > > > **Testing** > Github Actions, Tier 1-3 This pull request has now been integrated. Changeset: 6afc0d8f Author: Saranya Natarajan URL: https://git.openjdk.org/jdk/commit/6afc0d8f39390d474ce8ba16533c30b4c7770388 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod 8366861: Phase AFTER_LOOP_OPTS printed even though the method has no loops Reviewed-by: chagedorn, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/29387 From rrich at openjdk.org Wed Jan 28 09:48:23 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 28 Jan 2026 09:48:23 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v2] In-Reply-To: References: Message-ID: <2EsMN-Ho8F7NzytDTPvYh4cbewkLcb3YQq5oTbuFcfc=.32a85ea0-e332-44cc-9f10-04d88138eb82@github.com> On Wed, 21 Jan 2026 13:11:52 GMT, David Briemann wrote: >> Adds the following mach nodes: >> match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > address review comments src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp line 684: > 682: xscmpeqdp(tmp, op1, op2); > 683: xxsel(dst, first, second, tmp); > 684: break; This looks incorrect to me. Also when I compare it with the disassembly of the C version[1]: double cmovf_eq(double op1, double op2, double src1, double src2) { return op1 == op2 ? src1 : src2; } cmovf_eq(double, double, double, double): xscmpeqdp 1,2,1 xxsel 1,4,3,1 blr `cc` would be 0xA for `==` (looking at `operand cmpOp`[2]), right? 0xA is 0b1010. `exchange` would be 0 but I think `src1` and `src2` need to be exchanged. Assume `op1` and `op2` are indeed equal in `op1 == op2 ? src1 : src2`. `tmp.dword[0]` will be set to 0xFFFF_FFFF_FFFF_FFFF by the `xscmpeqdp`. `xxsel` evaluates `(src1 & ~tmp) | (src2 & tmp)` so for the correct result `src1` and `src2` need to be swapped as also seen in the disassembly above. [1] [disassembly of the C version](https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1DIApACYAQuYukl9ZATwDKjdAGFUtAK4sGEgOykrgAyeAyYAHI%2BAEaYxCAArKQADqgKhE4MHt6%2BASlpGQKh4VEssfFJdpgOmUIETMQE2T5%2BXIFVNQJ1DQTFkTFxibb1jc25bcM9faXliQCUtqhexMjsHOYAzGHI3lgA1CYbbgoE%2BIIAdAiH2CYaAIKb27uYB0cn%2BKiX17cP9%2BhL0Xoe2QLFQADcqAB9TAARwg/y8gJeqGSXFIewRSL2KLM6MxQIUKzRGIBBJWZjmB38Vnuezpe2ImAIywY2NRrwAIocOWyzK8AGJ7QnILh7EBC8mHGkPfxc%2B4/fEvEHgqHheGk5GovEa3naxFkkV6rHCilU6X0hlMlls0VgMDc3kCiUisXOsxSn4mWWev465UQyHATDq/Wa4mK3Uk0PO8M6k2Ur3m%2BmM5nEVko0XfDY8nFO4Wi8Umj3y73y33R/1Q2gEENYjNGoE4hsvfPNt0J6k/C0p60Z15uR2HQX511FjbSr0cjgLWicBK8PwcLSkVCcNzWaxCpYrF6bHikAiaacLADWIA2CXOAA4AJwANhvGivd7MN/8t4SgVnHEkC6PK84XgFBADQDyPBY4FgJA0BYZI6DichKBguD6HiZJkmQO9JHoYACGILwGBPPg6AIOJgIgaJ/2iMIGgAT04fdqOYYhaIAeWibRqkPbheBgthBFYhhaHopdeCwaIvGANwxFoYCeNILAWEMYBxFEhS8EZGowUwOTl0wVRqi8UiGN4MJSO/ZdaDwaJiDojwsH/PC8BYEzSG04hojSTAOUwJSjCsoxwL4AxgAUAA1PBMAA d1Y5JGFc/hBBEMR2CkGRBEUFR1DU3Q0QMQLTEsax9Gs4DIAWFFHAEOSAFpWI2PYapYZBki8blUiiuIuGXcE4mIPAsDKiAFg6Kq/AgVwxlaIIGHQaYBniNFUnSMapr0ZbCgYeaykGNFRtqEYmk8Fo9H2rpDu22Y9sOtbrqmMJ%2Bh2xaRu3VYJBnOc/zU1cOD2Dq4iwoFgGQZA9jwgiTz2CBcEIEgDjMDYuDmXhuK0OZT3PDRzn8B9/ASK8NA2O8r38fw8s4X9SEXHrANsECwNEiCYEQEAlgIVqCEQiBkPg4gIlYNZVGfGqsL2YHQYgcHCORoJ8CIfr0D0RLhFEcQ0uVzK1H/XLSCi2zkhMj6OHnKn/x%2B1ijI57EqD%2B1BOuIQGXnFsH8MIqGPFg3n4cRmXUePUgzw2DZziD0Ow/D/QKa%2BmmOCA%2Bm/fRyOODMaPeB%2BlHwIWdz0mcSQgA) [2] [`operand cmpOp`](https://github.com/openjdk/jdk/blob/4ae4ffd5a3114aa2a3832818ee30dc38d9aa2b72/src/hotspot/cpu/ppc/ppc.ad#L4870) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2735766494 From aph at openjdk.org Wed Jan 28 09:57:04 2026 From: aph at openjdk.org (Andrew Haley) Date: Wed, 28 Jan 2026 09:57:04 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v4] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 18:38:40 GMT, Paul Sandoz wrote: > I am very reluctant to include all the auto-generated micro benchmarks in mainline. There is a huge number of them I understand. However, when a performance claim is made for a PR then the proof surely must be included in that PR. This is essential for others to be able easily to verify a claim in their environment. I do not think this should be up for debate because it's a matter of scientific verifiability. There is no reason not to include the _specific benchmark_ that is the basis for a performance claim. Apart from anything else, it will make life much easier for future maintainers reading the history. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3810207659 From aph at openjdk.org Wed Jan 28 10:05:52 2026 From: aph at openjdk.org (Andrew Haley) Date: Wed, 28 Jan 2026 10:05:52 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v4] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 18:38:40 GMT, Paul Sandoz wrote: > The IR test framework is better at determining that by testing the right IR nodes are generated - and they get run as part of the existing HotSpot test suite. But as a reviewer I'm not looking at the IR at all, but at the performance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3810262815 From mhaessig at openjdk.org Wed Jan 28 10:11:11 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 28 Jan 2026 10:11:11 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v9] In-Reply-To: <-rssqxnycfY9oZ6SgyG8JyzBctLFgg2QQs7Iydxx9Qo=.7e513c45-f132-445f-9d11-7d1018520dfb@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> <9eLBVN0xV07IbyO99CQ79dyVYUme_JpewQqgTo3eFtg=.f9e352cd-724c-4150-a2bb-a0aa4236c4e7@github.com> <-rssqxnycfY9oZ6SgyG8JyzBctLFgg2QQs7Iydxx9Qo=.7e513c45-f132-445f-9d11-7d1018520dfb@github.com> Message-ID: <6sXuSoCq_GGQd8cWTWidhgxynFNYPGASb2Q3RMTXz-4=.0103b2f5-4486-4c84-abff-d1c65c61ee94@github.com> On Wed, 28 Jan 2026 08:53:21 GMT, Beno?t Maillard wrote: >> src/hotspot/share/opto/phaseX.cpp line 1206: >> >>> 1204: } else if (strcmp(phase, "PhaseCCP") == 0) { >>> 1205: assert(false, "PhaseCCP not at fixpoint: analysis result may be unsound for %s", n->Name()); >>> 1206: } >> >> The `strcmp` is a little nasty, but I don't have a better solution right now. >> But I think we should convert the `else if` condition into an `assert` in the `else` branch. >> Imagine someone calls the method with a string we don't match here: would we just silently pass? > > Right, of course we need to assert in that case as well. Made the change (not sure that's exactly what you meant by "convert the `else if` condition into an `assert` though). Defining a small, say `GVNVerificationPhase`, enum would already be cleaner, safer and less overhead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2735875499 From mhaessig at openjdk.org Wed Jan 28 10:11:12 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 28 Jan 2026 10:11:12 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v9] In-Reply-To: <6sXuSoCq_GGQd8cWTWidhgxynFNYPGASb2Q3RMTXz-4=.0103b2f5-4486-4c84-abff-d1c65c61ee94@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> <9eLBVN0xV07IbyO99CQ79dyVYUme_JpewQqgTo3eFtg=.f9e352cd-724c-4150-a2bb-a0aa4236c4e7@github.com> <-rssqxnycfY9oZ6SgyG8JyzBctLFgg2QQs7Iydxx9Qo=.7e513c45-f132-445f-9d11-7d1018520dfb@github.com> <6sXuSoCq_GGQd8cWTWidhgxynFNYPGASb2Q3RMTXz-4=.0103b2f5-4486-4c84-abff-d1c65c61ee94@github.com> Message-ID: On Wed, 28 Jan 2026 10:06:51 GMT, Manuel H?ssig wrote: >> Right, of course we need to assert in that case as well. Made the change (not sure that's exactly what you meant by "convert the `else if` condition into an `assert` though). > > Defining a small, say `GVNVerificationPhase`, enum would already be cleaner, safer and less overhead. You could even make that a debug only field of `PhaseIterGVN` and override it for `PhaseCCP`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2735882522 From epeter at openjdk.org Wed Jan 28 10:19:48 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jan 2026 10:19:48 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v9] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> <9eLBVN0xV07IbyO99CQ79dyVYUme_JpewQqgTo3eFtg=.f9e352cd-724c-4150-a2bb-a0aa4236c4e7@github.com> <-rssqxnycfY9oZ6SgyG8JyzBctLFgg2QQs7Iydxx9Qo=.7e513c45-f132-445f-9d11-7d1018520dfb@github.com> <6sXuSoCq_GGQd8cWTWidhgxynFNYPGASb2Q3RMTXz-4=.0103b2f5-4486-4c84-abff-d1c65c61ee94@github.com> Message-ID: On Wed, 28 Jan 2026 10:08:25 GMT, Manuel H?ssig wrote: >> Defining a small, say `GVNVerificationPhase`, enum would already be cleaner, safer and less overhead. > > You could even make that a debug only field of `PhaseIterGVN` and override it for `PhaseCCP`. Enum is a good idea, probably the best. My suggestion was taking the `else if` condition `strcmp(phase, "PhaseCCP") == 0`, and convert it into an `assert(strcmp(phase, "PhaseCCP") == 0, "else case")`. } else { assert(strcmp(phase, "PhaseCCP") == 0, "else case"); assert(false, "PhaseCCP not at fixpoint: analysis result may be unsound for %s", n->Name()); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2735920762 From bmaillard at openjdk.org Wed Jan 28 10:25:25 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 10:25:25 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:42:42 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > package declaration I was able to come up with this test, which is a bit more that 2 times faster than the original one on my machine. Its `memlimit` is set to `600M`, which is enough to make the old version fail. With the new one, the test passes even with a `memlimit` of `200M`, so this should be a good enough margin. While looking into this I have also found out that some programs have an unexpectedly high usage of `output` (as was the case in the test case that I initially suggested). I am trying to get a good reproducer and will most likely file a follow-up. /** * @test * @key stress randomness * @bug 8370519 * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations * @run main/othervm -XX:CompileCommand=compileonly,${test.main.class}::* -XX:-TieredCompilation -Xbatch * -XX:+UnlockDiagnosticVMOptions -XX:+IgnoreUnrecognizedVMOptions * -XX:+StressLoopPeeling -XX:+VerifyLoopOptimizations * -XX:CompileCommand=memlimit,${test.main.class}::*,600M~crash * -XX:StressSeed=3106998670 ${test.main.class} * @run main ${test.main.class} */ package compiler.c2; public class TestVerifyLoopOptimizationsHighMemUsage { public static final int N = 400; public static long instanceCount = -13L; public static volatile short sFld = -16143; public static int iFld = -159; public static float fArrFld[] = new float[N]; public static long lMeth(int i1) { int i2 = 11, i3 = 37085, i4 = 177, i5 = 190, i6 = -234, i7 = 13060, iArr[] = new int[N]; float f = 1.179F; double d = 2.9685; long lArr[] = new long[N]; for (i2 = 15; i2 < 330; ++i2) for (i4 = 1; i4 < 5; ++i4) { fArrFld[i4 + 1] = (++i1); for (i6 = 2; i6 > 1; i6 -= 3) switch ((i2 * 5) + 54) { case 156: if (i4 != 0) ; case 168: case 342: case 283: case 281: case 328: case 322: case 228: case 114: case 207: case 209: case 354: case 108: i1 <<= i1; case 398: case 144: case 218: case 116: case 296: case 198: case 173: case 105: case 120: case 248: case 140: case 352: try { } catch (ArithmeticException a_e) { } case 404: i5 += (i6 ^ instanceCount); case 370: case 211: case 231: try { } catch (ArithmeticException a_e) { } case 251: case 179: f += (((i6 * sFld) + i4) - iFld); } } long meth_res = i1 + i2 + i3 + i4 + i5 + i6 + i7 + Float.floatToIntBits(f) + Double.doubleToLongBits(d) + +checkSum(iArr) + checkSum(lArr); return meth_res; } public static long checkSum(int[] a) { long sum = 0; for (int j = 0; j < a.length; j++) sum += (a[j] / (j + 1) + a[j] % (j + 1)); return sum; } public static long checkSum(long[] a) { long sum = 0; for (int j = 0; j < a.length; j++) sum += (a[j] / (j + 1) + a[j] % (j + 1)); return sum; } public static void main(String[] strArr) { for (int i = 0; i < 10; i++) lMeth(-159); } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3810407567 From mhaessig at openjdk.org Wed Jan 28 10:31:25 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 28 Jan 2026 10:31:25 GMT Subject: RFR: 8359335: Template-Framework Library: Primitive Types subtyping [v2] In-Reply-To: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> References: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> Message-ID: > This PR implements the subtype relation for primitive types in the Template Framework Library according to JLS ?4.10.1. Further, this PR extends the primitive type example to test the addition. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Add floating point division operations using explicits casts. - Introduce control over how expressions are nested - Merge branch 'master' into JDK-8359335-template-subtyping - Implement subtyping for primitive types in templates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29349/files - new: https://git.openjdk.org/jdk/pull/29349/files/a0ebfef2..9259b86f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29349&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29349&range=00-01 Stats: 25594 lines in 704 files changed: 12171 ins; 4381 del; 9042 mod Patch: https://git.openjdk.org/jdk/pull/29349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29349/head:pull/29349 PR: https://git.openjdk.org/jdk/pull/29349 From epeter at openjdk.org Wed Jan 28 10:51:44 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jan 2026 10:51:44 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v16] In-Reply-To: References: Message-ID: > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix IR rule that failed on neon Looks good to me. Best Regards, Jatin test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java line 680: > 678: > 679: // X4: ints simulate 4-byte oops. > 680: // oops: if non-zero (= non-null), every entry simpulates a 4-byte oop, pointing into mem. Suggestion: // oops: if non-zero (= non-null), every entry simulates a 4-byte oop, pointing into mem. test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithmsImpl.java line 680: > 678: > 679: // X4: ints simulate 4-byte oops. > 680: // oops: if non-zero (= non-null), every entry simpulates a 4-byte oop, pointing into mem. Suggestion: // oops: if non-zero (= non-null), every entry simulates a 4-byte oop, pointing into mem. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28639#pullrequestreview-3715922754 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2735981281 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2735979740 From epeter at openjdk.org Wed Jan 28 10:51:48 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jan 2026 10:51:48 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v15] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 10:46:30 GMT, Jatin Bhateja wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix IR rule that failed on neon > > Looks good to me. > > Best Regards, > Jatin @jatin-bhateja Thanks for the approval and the typo fixes :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3810560000 From jbhateja at openjdk.org Wed Jan 28 10:56:59 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 28 Jan 2026 10:56:59 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v16] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 10:51:44 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Jatins typo fix part 2 > > Co-authored-by: Jatin Bhateja > - Jatins typo fix part 1 > > Co-authored-by: Jatin Bhateja Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28639#pullrequestreview-3716034107 From ghan at openjdk.org Wed Jan 28 11:01:36 2026 From: ghan at openjdk.org (Guanqiang Han) Date: Wed, 28 Jan 2026 11:01:36 GMT Subject: RFR: 8375598: VM crashes with "assert((labs(val) & 0xFFFFFFFF00000000) == 0 || dest == (address)-1) failed: must be 32bit offset or -1" when using too high value for NonNMethodCodeHeapSize In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 14:50:28 GMT, Guanqiang Han wrote: > Please review this change. Thanks! > > **Description:** > > On x86/x64, near calls/jumps use 32-bit signed PC-relative displacements. With SegmentedCodeCache enabled, a very large NonNMethodCodeHeapSize can inflate the derived ReservedCodeCacheSize, causing the code cache span to > exceed the reach of 32-bit relative branches. This may later lead to relocation failures (e.g. "must be 32bit offset") when installing nmethods. > https://github.com/openjdk/jdk/blob/037040129e82958bd023e0b24d962627e8653710/src/hotspot/cpu/x86/nativeInst_x86.hpp#L433-L440 > > **Fix:** > > Add an x86-specific validation in CodeCache::initialize_heaps() after final segment alignment. If the computed code cache size exceeds max_jint bytes, abort VM initialization with a clear error message that includes the segment sizes, instead of failing later during compilation/relocation. > > **Test:** > > GHA Hi @chhagedorn and @TheRealMDoerr, Sorry for the ping ? could you please take a look at this PR when you have a moment? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29324#issuecomment-3810609762 From shade at openjdk.org Wed Jan 28 11:08:05 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 Jan 2026 11:08:05 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v12] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Roll in good fix - Revert bad fix - Merge branch 'master' into JDK-8360557-ctw-inlining - Coopt InlineColdMethods to drive inlining decisions when Xcomp is enabled - Merge branch 'master' into JDK-8360557-ctw-inlining - JDK-8375046 fix - JDK-8375694 POC fix - Merge branch 'master' into JDK-8360557-ctw-inlining - Debug - Merge branch 'master' into JDK-8360557-ctw-inlining - ... and 1 more: https://git.openjdk.org/jdk/compare/c151a68c...f1a06e35 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/6513fc52..f1a06e35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=10-11 Stats: 21678 lines in 638 files changed: 9471 ins; 3867 del; 8340 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From mdoerr at openjdk.org Wed Jan 28 11:30:16 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 28 Jan 2026 11:30:16 GMT Subject: RFR: 8375598: VM crashes with "assert((labs(val) & 0xFFFFFFFF00000000) == 0 || dest == (address)-1) failed: must be 32bit offset or -1" when using too high value for NonNMethodCodeHeapSize In-Reply-To: References: Message-ID: <5NU0BCQERIyL10bTRsfqmwRE7SuLNFa_mqHpdjgaIoA=.cacd5f7e-4274-429a-80c2-cfedfc6964de@github.com> On Tue, 20 Jan 2026 14:50:28 GMT, Guanqiang Han wrote: > Please review this change. Thanks! > > **Description:** > > On x86/x64, near calls/jumps use 32-bit signed PC-relative displacements. With SegmentedCodeCache enabled, a very large NonNMethodCodeHeapSize can inflate the derived ReservedCodeCacheSize, causing the code cache span to > exceed the reach of 32-bit relative branches. This may later lead to relocation failures (e.g. "must be 32bit offset") when installing nmethods. > https://github.com/openjdk/jdk/blob/037040129e82958bd023e0b24d962627e8653710/src/hotspot/cpu/x86/nativeInst_x86.hpp#L433-L440 > > **Fix:** > > Add an x86-specific validation in CodeCache::initialize_heaps() after final segment alignment. If the computed code cache size exceeds max_jint bytes, abort VM initialization with a clear error message that includes the segment sizes, instead of failing later during compilation/relocation. > > **Test:** > > GHA Please note that other problems related to modified `ReservedCodeCacheSize` have been reported, too: https://github.com/openjdk/jdk/pull/28658#issuecomment-3782072965 I think we should find a solution which solves all the problems. We should be able to avoid enlarging `ReservedCodeCacheSize`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29324#issuecomment-3810739340 From mhaessig at openjdk.org Wed Jan 28 11:40:44 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 28 Jan 2026 11:40:44 GMT Subject: RFR: 8359335: Template-Framework Library: Primitive Types subtyping [v2] In-Reply-To: References: <69kmR5e7B-PnE8Lxa_XWShTmwDJmh2fhqD5nNsjpJDI=.1e47ddad-4054-45a3-bb0a-494b6baedc15@github.com> Message-ID: On Wed, 28 Jan 2026 10:31:25 GMT, Manuel H?ssig wrote: >> This PR implements the subtype relation for primitive types in the Template Framework Library according to JLS ?4.10.1. Further, this PR extends the primitive type example to test the addition. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Add floating point division operations using explicits casts. > - Introduce control over how expressions are nested > - Merge branch 'master' into JDK-8359335-template-subtyping > - Implement subtyping for primitive types in templates I updated the PR with: - configurable nesting mode for `Expression.nestRandomly()` - set the nesting mode for the `ExpressionFuzzer` to `EXACT` - added both variants of floating point divisions ------------- PR Comment: https://git.openjdk.org/jdk/pull/29349#issuecomment-3810797032 From mdoerr at openjdk.org Wed Jan 28 11:53:48 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 28 Jan 2026 11:53:48 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 05:24:12 GMT, Galder Zamarre?o wrote: > Adjust test expectations to deal with environments where `Min/Max[F|D]` are not produced. To fix this, I've followed same patterns used in `compiler.c2.irTests.TestMinMaxIdentities`. > > I have tested this on x86_64 with `-XX:UseAVX=0`. LGTM. Thanks for fixing it! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29438#pullrequestreview-3716286383 From epeter at openjdk.org Wed Jan 28 13:15:45 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jan 2026 13:15:45 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 13 Jan 2026 11:27:53 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Fix build failure after rebasing and address review comments > - Merge branch 'master' into optimize-atomic-post > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 I did some quick benchmarking / investigation this morning, using a byte-copy benchmark / test. First, I mapped out the unrolling and super-unrolling factors we get, depending on the loop size. Profiling obviously plays a role here. image And then some performance numbers: image This confirms a few things for me: - Profiling matters: if you do warmup with a small loop, you will get a smaller unrolling factor and smaller vector length. - The drain-loop is only inserted if warmup happens with many iterations. In my case, it took at least `size = 626`. That is because only at that point do we get full vector length on my machine with 512 bit vectors. - Up to `size=3`, all my versions compile the same code, because we don't vectorize. - From `size=4..7` we start vectorizing. There seems to be some small overhead from alignment. Not sure if that is because we don't spend the same amount of iterations in the main loop or because of the additional instructions for the alignment calculation itself. Scalar performance is the best, but not by much. - For `size=8..32` we see that the unaligned vectorized version is the best over all. We see the characteristic "saw-tooth", dropping at `k*8+1` (we spend at least 1 iteration in the pre-loop). I suspect that alignment just has too much overhead in this range (alignment computation & often spending more iterations in pre/post loops). In the higher range, the scalar performance is the slowest, and that trend would continue on. Some open questions: - How should we chose the unrolling factor? - Should it really be based on profiling? - We currently have no way to recover from a small unrolling if suddenly we process large arrays. Should we have some loop predicate that checks for small iteration counts, and would lead to recompilation if it was ever triggered? - Should we have a drain loop for smaller unrolling factors? - Should we disable automatic alignment for small loops? More relevant to this PR directly: - We will only be able to measure the impact of this PR if we do warmup with a large iteration count, and then measure performance with a small to medium iteration count. - If you warmup with a small iteration count, you don't get any drain loop. - If you warmup with a large iteration count and measure with a large iteration count, then we always enter the main loop first, and so this change makes no difference (no need to access drain loop without entering main loop). - What this means: test-coverage for the "warmup with large iteration count but then run small iteration count" is probably very low. - I think we should invest some effort in a loop stress mode that allows smaller unrolling factors, then vectorization and drain loop insertion. It would ensure better test coverage. Ok, I needed to do this research to get a better understanding. You probably already knew most/all of this ;) @fg1417 What do you think about a stress mode that allows smaller unrolling factors to vectorize, and then smaller unrolling factors already lead to drain loop insertion. That could really improve out test coverage for all the graph surgery you are doing in this patch. It would probably be smarter to have the stress mode first, but I'd also understand if you wanted to get this work here finished and we do it in a later RFE. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3811234712 From epeter at openjdk.org Wed Jan 28 13:26:48 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jan 2026 13:26:48 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v15] In-Reply-To: <5Aj7cNpkNPSVkQ_ElMG_SOTlssaomF0YcJJJ15_UpQc=.d8ac8eb7-cae1-49b2-907b-8e6c1e9274dd@github.com> References: <0Y-lUGt3Co6upV2G_agCAH35r-J1oM-8xbVOxWeVoxc=.7af84968-4994-4fbd-8fe7-59965de04d11@github.com> <5Aj7cNpkNPSVkQ_ElMG_SOTlssaomF0YcJJJ15_UpQc=.d8ac8eb7-cae1-49b2-907b-8e6c1e9274dd@github.com> Message-ID: On Wed, 28 Jan 2026 04:27:31 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Something is going a bit strange with my testing script. Could you merge with latest master, maybe that helps? > > Thanks for the ping @eme64, I've merged from master. @jaskarth I'm seing a first failure on `-XX:UseAVX=0`. - `compiler/loopopts/superword/TestReductions.java#vanilla` - `compiler/loopopts/superword/TestReductions.java#force-vectorization` For both, about `30` tests have IR failures. One example: Failed IR Rules (30) of Methods (30) ------------------------------------ 1) Method "private static byte compiler.loopopts.superword.TestReductions.byteAndBig()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true"}, counts={"_#V#LOAD_VECTOR_B#_", "_ at min(max_int, max_byte)", "> 0", "_#AND_REDUCTION_V#_", "> 0", "_#V#AND_VI#_", "> 0"}, applyIfPlatform={}, failOn={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"AutoVectorizationOverrideProfitability", "> 0"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(AndReductionV.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 3: "(\\d+(\\s){2}(AndV.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! This should be easy to reproduce on any `x64` machine with the `-XX:UseAVX=0` flag. Maybe some instruction is not implemented for vectorization, and you need to require at least `avx` instead of `sse4.1`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-3811292013 From krk at openjdk.org Wed Jan 28 13:56:26 2026 From: krk at openjdk.org (Kerem Kat) Date: Wed, 28 Jan 2026 13:56:26 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v3] In-Reply-To: <0_wYDA2lNvTyIDv7ist5heu-hs4J8pmEKT1mqRyiBBk=.438156e1-24fd-4352-8a61-9cf85efacb25@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> <0_wYDA2lNvTyIDv7ist5heu-hs4J8pmEKT1mqRyiBBk=.438156e1-24fd-4352-8a61-9cf85efacb25@github.com> Message-ID: On Tue, 20 Jan 2026 19:47:00 GMT, Vladimir Ivanov wrote: >> Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into fix-c2-checkCastPP >> - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed >> - Simplify expand_vbox_node_helper by merging VectorBox Phi handling >> - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded > > Test results (hs-tier1 - hs-tier4) are clean. Hi @iwanowww, is there anything remaining that is blocking approval? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29200#issuecomment-3811440318 From epeter at openjdk.org Wed Jan 28 15:14:21 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jan 2026 15:14:21 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Thu, 22 Jan 2026 16:27:51 GMT, Fei Gao wrote: >> I just ran the `bench001B_aligned_computeBound` benchmark on my `AVX512` machine, and realized that (as I think you tried to say) the PR here has no effect on it: >> >> image >> >> That's a bit of a bummer :/ >> >> I'd have to do some more digging to confirm what you said: that this is because of profiling, i.e. that we don't actually unroll the loop enough and don't insert the drain loop, right? > >> I'd have to do some more digging to confirm what you said: that this is because of profiling, i.e. that we don't actually unroll the loop enough and don't insert the drain loop, right? > > Thanks for your testing. Yes, that's what I meant. > >> It's a bummer because I had initially hoped that this PR would address (at least a part of) the performance regression that vectorization can cause, see #27315 > You can see that for very small iteration counts, it is faster to disable the auto vectorizer. > There were some regressions filed, like this one: https://bugs.openjdk.org/browse/JDK-8368245 > > Did you obtain the scalar vs. vector performance results by overriding > `-XX:AutoVectorizationOverrideProfitability=0/2`, or by comparing runs without and with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751)? > > For these benchmarks with small iteration counts, what are the main differences between the generated scalar and vectorized code? For example, when `NUM_ACCESS_ELEMENTS` is `15`, what code does C2 generate for [`copy_byte_loop()`](https://github.com/eme64/jdk/blob/716aab07845d8e52455ee0f7daea54cacf3662e9/test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java#L265)? > > I?m asking because I?m a bit unclear about the vectorization behavior here. As mentioned earlier, AFAIK, fixed small-trip-count loops are typically not auto-vectorized due to profiling. Is vectorization happening in this case because the benchmark uses nested loops? In particular, does the inner loop become vectorized after sufficient unrolling driven by the outer loop? @fg1417 I'm trying to see the bigger picture now, and locate this PR in it. Let's think about what would be the optimal loop configuration that could handle any iteration count with good performance. **Let's assume we have masked operations available.** As far as I know, they are not fast enough for use in the main loop. But they would be profitable for pre/post loops. Maybe we could get rid of the drain loop, but not sure about that. pre-loop (masked N-vector, simulates 1-N iterations) main-loop (N-vectorized and super unrolled) drain-loop (N-vectorized) post-loop (masked N-vector, simulates 1-N iterations) It may even be possible that pre/post loops don't need to be loops if they can simulate 1-N iterations. We'd have to do quite a bit of work to get to this "masked pre/post" loop trick. I think we'd probably have to do the traditional approach of running the auto vectorizer on a single iteration loop, and widening scalars to vectors, i.e. "unrolling" during vectorization. That way, we could then also figure out a way to generate the pre/post loops that use masked operations, enabling only 1-N lanes. We can't really generate the pre/loops from out "unroll first, then SuperWord" approach, because the scalar unrolling already scrables the eggs, and later we don't know which lane came from which iteration, so enabling 1-N lanes becomes difficult to impossible. This approach with masked pre/post loops would mean we at most spend one iteration in the pre-loop and one in the post-loop, and maybe 0-8 iterations in the drain loop. The rest in the main-loop. This means that for any iteration count, we'd have very efficient code. That's my prediction anyway, experiments could show that I'm missing something here. At this point, the drain loop only would be beneficial if it is cheaper to spend iterations in the drain loop rather than the masked post loop. I don't know if/when that is the case. ------------------ **If we don't have masked operations.** Now we need to do smart things to not spend too much time in the scalar pre/post loops. Some ideas: - Only do auto-alignment (using pre loop) if we have large iteration count, where the cost of a few extra pre-loop iterations is lower compared to the cost of unaligned accesses of the many main-loop iterations. - We must be able to go directly from pre-loop to drain-loop, for small/medium iteration count loops (what this PR does here). - We may need multiple drain loops of different vector sizes. I'm not sure we'd need all sizes (2, 4, 8, 16, 32). Maybe we'd be ok with half of them (4, 16)? That way, we'd spend at most 4 iterations in any drain loop or post loop. Not sure where exactly the tradeoff line lies (code size vs iteration counts). pre-loop (only align for large iteration count) main-loop (N-vectors with super unrolling) drain-loops (4/16/N-vectors) post-loop If we want to have multiple vectorized drain-loops with different vector sizes, it would also be helpful to take the widening approach rather than the current "first unroll then SuperWord". ----------------------- **So how does this PR fit those future plans?** At what point would the auto vectorizer run? - A first approach would be to run after pre/main/post. That could work for the no-masked pattern. Then we can directly generate the main-loop as well as the (multiple) drain-loops. This would probably require a refactor of the graph surgery, right? I'm not sure, maybe there are still a lot of parallel parts to this PR. - A second approach would be to run auto vectorization on the single iteration loop (before pre/main/post). That would allow us to directly generate all loops, including masked pre/post loops. This would be an immense refactor in loop-opts. But all of these plans need good ways to do the graph surgery, and this PR is setting up some ways of doing that. So that is very valuable going forward. It is very important that we well document things, so that future refactors would be easier ;) --------------------------- TLDR: - @fg1417 I think this PR is very valuable and a step in the right direction. We have to make sure to document things well, so that future work around this code is possible :) - Let me know what you think about the ideas above. No guarantees that they would happen very soon. I'll have some internal conversations about it as well. But I may need the widening approach to make if-conversion more feasible (my next project). I'll try to keep reviewing in the next days/weeks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3811843337 From azafari at openjdk.org Wed Jan 28 15:16:44 2026 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 28 Jan 2026 15:16:44 GMT Subject: RFR: 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code [v3] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 15:37:19 GMT, Matthias Baesken wrote: >> Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code but seems they still end up in the product build JVM, at least when using standard build settings. >> (This can be observed when enabling link time gc and verbose info printing - in this case the methods are eliminated). >> >> linuxx86_64 >> product build without those methods >> >> ls -alL images/jdk/lib/server/libjvm.so >> size 2.859.5144 >> >> >> unchanged product build : >> >> ls -alL images/jdk/lib/server/libjvm.so >> size 2.859.9464 >> >> >> (so we see a little size difference) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust decls too Thanks for fixes. Look goods. ------------- Marked as reviewed by azafari (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29449#pullrequestreview-3717258213 From mbaesken at openjdk.org Wed Jan 28 16:34:34 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 28 Jan 2026 16:34:34 GMT Subject: RFR: 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code [v3] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 15:37:19 GMT, Matthias Baesken wrote: >> Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code but seems they still end up in the product build JVM, at least when using standard build settings. >> (This can be observed when enabling link time gc and verbose info printing - in this case the methods are eliminated). >> >> linuxx86_64 >> product build without those methods >> >> ls -alL images/jdk/lib/server/libjvm.so >> size 2.859.5144 >> >> >> unchanged product build : >> >> ls -alL images/jdk/lib/server/libjvm.so >> size 2.859.9464 >> >> >> (so we see a little size difference) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust decls too Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29449#issuecomment-3812325053 From mbaesken at openjdk.org Wed Jan 28 16:34:36 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 28 Jan 2026 16:34:36 GMT Subject: Integrated: 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 13:58:35 GMT, Matthias Baesken wrote: > Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code but seems they still end up in the product build JVM, at least when using standard build settings. > (This can be observed when enabling link time gc and verbose info printing - in this case the methods are eliminated). > > linuxx86_64 > product build without those methods > > ls -alL images/jdk/lib/server/libjvm.so > size 2.859.5144 > > > unchanged product build : > > ls -alL images/jdk/lib/server/libjvm.so > size 2.859.9464 > > > (so we see a little size difference) This pull request has now been integrated. Changeset: 0e2e66be Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/0e2e66be2423335002a53d887df35d2348a3ec9f Stats: 6 lines in 2 files changed: 2 ins; 0 del; 4 mod 8376402: Dependencies::print_statistics() and AbstractClassHierarchyWalker::print_statistics() are not called from PRODUCT code Reviewed-by: azafari, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/29449 From mdoerr at openjdk.org Wed Jan 28 16:41:32 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 28 Jan 2026 16:41:32 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v2] In-Reply-To: <2EsMN-Ho8F7NzytDTPvYh4cbewkLcb3YQq5oTbuFcfc=.32a85ea0-e332-44cc-9f10-04d88138eb82@github.com> References: <2EsMN-Ho8F7NzytDTPvYh4cbewkLcb3YQq5oTbuFcfc=.32a85ea0-e332-44cc-9f10-04d88138eb82@github.com> Message-ID: <-TysMtkU_jZckaZB-X_nEFxM2t8C31OCRyz9WvbtiZ4=.80aaea68-bfb5-435e-8542-b50e013c48cb@github.com> On Wed, 28 Jan 2026 09:45:14 GMT, Richard Reingruber wrote: >> David Briemann has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comments > > src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp line 684: > >> 682: xscmpeqdp(tmp, op1, op2); >> 683: xxsel(dst, first, second, tmp); >> 684: break; > > This looks incorrect to me. > > Also when I compare it with the disassembly of the C version[1]: > > > double cmovf_eq(double op1, double op2, double src1, double src2) { > return op1 == op2 ? src1 : src2; > } > > cmovf_eq(double, double, double, double): > xscmpeqdp 1,2,1 > xxsel 1,4,3,1 > blr > > > `cc` would be 0xA for `==` (looking at `operand cmpOp`[2]), right? > 0xA is 0b1010. `exchange` would be 0 but I think `src1` and `src2` need to be exchanged. > > Assume `op1` and `op2` are indeed equal in `op1 == op2 ? src1 : src2`. `tmp.dword[0]` will be set to 0xFFFF_FFFF_FFFF_FFFF by the `xscmpeqdp`. > > `xxsel` evaluates `(src1 & ~tmp) | (src2 & tmp)` so for the correct result `src1` and `src2` need to be swapped as also seen in the disassembly above. > > [1] [disassembly of the C version](https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1DIApACYAQuYukl9ZATwDKjdAGFUtAK4sGEgOykrgAyeAyYAHI%2BAEaYxCAArKQADqgKhE4MHt6%2BASlpGQKh4VEssfFJdpgOmUIETMQE2T5%2BXIFVNQJ1DQTFkTFxibb1jc25bcM9faXliQCUtqhexMjsHOYAzGHI3lgA1CYbbgoE%2BIIAdAiH2CYaAIKb27uYB0cn%2BKiX17cP9%2BhL0Xoe2QLFQADcqAB9TAARwg/y8gJeqGSXFIewRSL2KLM6MxQIUKzRGIBBJWZjmB38Vnuezpe2ImAIywY2NRrwAIocOWyzK8AGJ7QnILh7EBC8mHGkPfxc%2B4/fEvEHgqHheGk5GovEa3naxFkkV6rHCilU6X0hlMlls0VgMDc3kCiUisXOsxSn4mWWev465UQyHATDq/Wa4mK3Uk0PO8M6k2Ur3m%2BmM5nEVko0XfDY8nFO4Wi8Umj3y73y33R/1Q2gEENYjNGoE4hsvfPNt0J6k/C0p60Z15uR2HQX511FjbSr0cjgLWicBK8PwcLSkVCcNzWaxCpYrF6bHikAiaacLADWIA2CXOAA4AJwANhvGivd7MN/8t4SgVnHEkC6PK84XgFBADQDyPBY4FgJA0BYZI6DichKBguD6HiZJkmQO9JHoYACGILwGBPPg6AIOJgIgaJ/2iMIGgAT04fdqOYYhaIAeWibRqkPbheBgthBFYhhaHopdeCwaIvGANwxFoYCeNILAWEMYBxFEhS8EZGowUwOTl0wVRqi8UiGN4MJSO/ZdaDwaJiDojwsH/PC8BYEzSG04hojSTAOUwJSjCsoxwL4AxgAUAA1PBM AAd1Y5JGFc/hBBEMR2CkGRBEUFR1DU3Q0QMQLTEsax9Gs4DIAWFFHAEOSAFpWI2PYapYZBki8blUiiuIuGXcE4mIPAsDKiAFg6Kq/AgVwxlaIIGHQaYBniNFUnSMapr0ZbCgYeaykGNFRtqEYmk8Fo9H2rpDu22Y9sOtbrqmMJ%2Bh2xaRu3VYJBnOc/zU1cOD2Dq4iwoFgGQZA9jwgiTz2CBcEIEgDjMDYuDmXhuK0OZT3PDRzn8B9/ASK8NA2O8r38fw8s4X9SEXHrANsECwNEiCYEQEAlgIVqCEQiBkPg4gIlYNZVGfGqsL2YHQYgcHCORoJ8CIfr0D0RLhFEcQ0uVzK1H/XLSCi2zkhMj6OHnKn/x%2B1ijI57EqD%2B1BOuIQGXnFsH8MIqGPFg3n4cRmXUePUgzw2DZziD0Ow/D/QKa%2BmmOCA%2Bm/fRyOODMaPeB%2BlHwIWdz0mcSQgA) > > [2] [`operand cmpOp`](https://github.com/openjdk/jdk/blob/4ae4ffd5a3114aa2a3832818ee30dc38d9aa2b72/src/hotspot/... Please note that C2 generates code which is equivalent to your C version: xscmpeqdp vs0,vs1,vs2 xxsel vs1,vs4,vs3,vs0 It's correct because the condition is already inverted by C2 (cc == 2). But, the code is very confusing. I agree with that it should be improved. Btw. what the Spec suggests also seems to use the wrong order: xscmpeqdp can be used to implement the C/C++/Java conditional operation, RESULT = (x=y) ? a:b. xscmpeqdp fEQ,fX,fY xxsel fRESULT,fA,fB,fEQ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2737516069 From psandoz at openjdk.org Wed Jan 28 16:45:25 2026 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 28 Jan 2026 16:45:25 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v17] In-Reply-To: References: Message-ID: <6-fyVKW-u3b6Bmm1FAhe7gMzlxOqr9wVwzk-FWXvR8s=.db3b9aa0-87f2-4861-b282-d198fc5ae543@github.com> On Tue, 27 Jan 2026 18:25:25 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Clanups >> - Refactoring vectorIntrinsics >> - Refactoring and cleanups >> - Refactoring and cleanups >> - Review comments resolutions >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - Adding testpoint for JDK-8373574 >> - Review comments resolutions >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 >> - ... and 24 more: https://git.openjdk.org/jdk/compare/0f1b96a5...ce5768fa > >> We will still need to create T_FLOAT16 basic type and associate it with Float16 LaneType, why not directly pass these basic types to intrinsic entry point ? > > The strong feedback from HotSpot folks, which i agree with, is adding a new enum value to `BasicType` is not the way to go - it is too disruptive and does not scale. Sorry if i misled you earlier on, it was my intention in feedback to propose something that was limited in scope to vector support. > > The thought about a proxy class was motivated by a question i had - what would we do if `Float16.class` was already present in `java.base`? and answers to that might motivate what we do now in preparation for when that happens. Regardless i think we need to separate out the Vector API's direct dependence on BasicType and its values. Instead we should define our own constants for the vector element types, and provide mapping of those to BasicType values which might result in "erasure" to the carrier type. We should adjust/adapt LaneType accordingly. Does that make sense to you? > Hi @PaulSandoz , Yes this looks good to me, I have modified the patch accordingly. Thanks, i think this is much better, more localized. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3812429459 From bmaillard at openjdk.org Wed Jan 28 16:50:40 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 16:50:40 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v11] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
    > Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: - Use phase enum to pick assert message - Refactor is_iterGVN ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/387b0c19..94421ab6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=09-10 Stats: 19 lines in 2 files changed: 7 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From bmaillard at openjdk.org Wed Jan 28 17:05:55 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 17:05:55 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v12] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
    > Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Change condition structure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/94421ab6..99855fe6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=10-11 Stats: 4 lines in 1 file changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From bmaillard at openjdk.org Wed Jan 28 17:11:23 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 17:11:23 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v9] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> <9eLBVN0xV07IbyO99CQ79dyVYUme_JpewQqgTo3eFtg=.f9e352cd-724c-4150-a2bb-a0aa4236c4e7@github.com> <-rssqxnycfY9oZ6SgyG8JyzBctLFgg2QQs7Iydxx9Qo=.7e513c45-f132-445f-9d11-7d1018520dfb@github.com> <6sXuSoCq_GGQd8cWTWidhgxynFNYPGASb2Q3RMTXz-4=.0103b2f5-4486-4c84-abff-d1c65c61ee94@github.com> Message-ID: On Wed, 28 Jan 2026 10:08:25 GMT, Manuel H?ssig wrote: >> Defining a small, say `GVNVerificationPhase`, enum would already be cleaner, safer and less overhead. > > You could even make that a debug only field of `PhaseIterGVN` and override it for `PhaseCCP`. Following a discussion with @mhaessig, I decided to replace the `_iterGVN` boolean field by an enum field, and refactor `is_IterGVN` using this new field. ```c++ enum class PhaseValuesType { gvn, iter_gvn, ccp }; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2737650119 From bmaillard at openjdk.org Wed Jan 28 17:14:22 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 17:14:22 GMT Subject: RFR: 8375038: C2: Enforce that Ideal() returns the root of the subgraph if any change was made by checking the node hash [v2] In-Reply-To: References: Message-ID: > This PR introduces an assert in `PhaseIterGVN` to check that `Ideal` actually returns something if the node was modified. > > ## Context > > In the description of `Node::Ideal` in `node.cpp`, we have: > >> If ANY change is made, it must return the root of the reshaped graph - even if the root is the same Node > > It is crucial that such changes do not go unnoticed and that they can propagate to other nodes. Current documentation also states: > >> Running with `-XX:VerifyIterativeGVN=1` checks >> these invariants, although its too slow to have on by default. If you are >> hacking an Ideal call, be sure to test with `-XX:VerifyIterativeGVN=1` > > However, `-XX:VerifyIterativeGVN=1` ends up veryfing that the `_in` and `_out` arrays are consistent, but does not verify the return value. > > This PR aims to enforce the return value invariant. It should also make regression testing of bugs caused by wrongly returning nullptr in `Ideal` easier, such as [JDK-8373251](https://bugs.openjdk.org/browse/JDK-8373251). > > ## Proposed Change > > In summary, this PR brings the following set of changes > - Add a new flag bit to`-XX:VerifyIterativeGVN` for verifying return of `Ideal` calls > - Add an assert on the hash of nodes before and after `Ideal` in `PhaseIterGVN::transform_old` > - Fix `Ideal` optimizations that would cause harness errors with testing on tier1 > - Update the comments in the code to clarify the invariant and how to enforce it > > After consideration, I took the decision to only check the hash if the node is not dead. It seems there are many cases where the control node is dead, and we propagate the information to all users with `kill_dead_code`, and end up return `nullptr`. This is basically a mechanism to "speed up" the propagation (it would also happen normally via the usual IGVN worklist). This somehow contradicts the "must return the root of the reshaped graph" invariant, but it seems to be a common practice. > > In addition to that, I have decided to implement this as part of a new flag bit to `-XX:VerifyIterativeGVN` instead of an existing one, because there is a risk that it causes new failures in existing usages of the flag. > > This PR is meant to introduce the new check and fix the most "obvious" failures that the new flag would introduce in common scenarios, such as when running with `-version` on tier1. Since there are known issues caused by bad return values of `Ideal` (such as [JDK-8373251](https://bugs.openjdk.org/browse/JDK-8373251)), I will fix other failures in follow-up PRs.... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/node.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29421/files - new: https://git.openjdk.org/jdk/pull/29421/files/3528807f..76dbb85f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29421&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29421&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29421.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29421/head:pull/29421 PR: https://git.openjdk.org/jdk/pull/29421 From bmaillard at openjdk.org Wed Jan 28 17:14:24 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 28 Jan 2026 17:14:24 GMT Subject: RFR: 8375038: C2: Enforce that Ideal() returns the root of the subgraph if any change was made by checking the node hash [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 14:22:21 GMT, Roberto Casta?eda Lozano wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/node.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano > > src/hotspot/share/opto/node.cpp line 1157: > >> 1155: // can help with validating these invariants, although they are too slow to have on by default: >> 1156: // - '-XX:VerifyIterativeGVN=1' checks the def-use info >> 1157: // - '-XX:VerifyIterativeGVN=100000' cheks the return value > > Suggestion: > > // - '-XX:VerifyIterativeGVN=100000' checks the return value Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29421#discussion_r2737660301 From psandoz at openjdk.org Wed Jan 28 17:23:18 2026 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 28 Jan 2026 17:23:18 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v4] In-Reply-To: References: Message-ID: <2oY0tq0dzrbhoMHy9v68f39P5VAgld8bAsE_rrd6m5U=.229ab632-1e4a-4d43-94e5-491b4c1448e3@github.com> On Wed, 28 Jan 2026 10:02:53 GMT, Andrew Haley wrote: > > The IR test framework is better at determining that by testing the right IR nodes are generated - and they get run as part of the existing HotSpot test suite. > > But as a reviewer I'm not looking at the IR at all, but at the performance. That's a good point. Where i have concerns is introducing a very large set of vector micro benchmarks in bulk or over time in to the mainline under the `test/micro` directory. Further, i am not very happy with the way we generate the vector API benchmarks by leaning on the unit test harness (of which i am also not so happy about). A better approach might be to generate a benchmark on demand for an operation so it can be verified if needed. I think to do this properly we need to invest some resources, which are limited, at least from my side, and so would require some adjustment in priorities. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3812696618 From shade at openjdk.org Wed Jan 28 18:38:15 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 Jan 2026 18:38:15 GMT Subject: RFR: 8376604: C2: EA should assert is_oop_field for AddP with oop outs Message-ID: Split out of [JDK-8376514](https://bugs.openjdk.org/browse/JDK-8376514) in Valhalla: I think we need to verify more thoroughly that if we reply is_oop_field = false for AddP, then there are no nodes that we feed into oops. We handle it pretty well in various branches in the method already, and we "just" need to check it at the end. Valhalla catches fire on that post-condition check, tracked in [JDK-8376514](https://bugs.openjdk.org/browse/JDK-8376514). Cleans up the code a bit as well. Additional testing: - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` - [x] Linux x86_64 server fastdebug, `hotspot_compiler` ------------- Commit messages: - Adjust assert message - More verification Changes: https://git.openjdk.org/jdk/pull/29468/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29468&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376604 Stats: 20 lines in 2 files changed: 10 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/29468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29468/head:pull/29468 PR: https://git.openjdk.org/jdk/pull/29468 From rrich at openjdk.org Wed Jan 28 20:42:53 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 28 Jan 2026 20:42:53 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v2] In-Reply-To: <-TysMtkU_jZckaZB-X_nEFxM2t8C31OCRyz9WvbtiZ4=.80aaea68-bfb5-435e-8542-b50e013c48cb@github.com> References: <2EsMN-Ho8F7NzytDTPvYh4cbewkLcb3YQq5oTbuFcfc=.32a85ea0-e332-44cc-9f10-04d88138eb82@github.com> <-TysMtkU_jZckaZB-X_nEFxM2t8C31OCRyz9WvbtiZ4=.80aaea68-bfb5-435e-8542-b50e013c48cb@github.com> Message-ID: On Wed, 28 Jan 2026 16:38:32 GMT, Martin Doerr wrote: > Please note that C2 generates code which is equivalent to your C version: Could you please provide the Java source code for that? > Btw. what the Spec suggests also seems to use the wrong order: Where can that be found? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2738521065 From vlivanov at openjdk.org Wed Jan 28 22:33:51 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 28 Jan 2026 22:33:51 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v29] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: - updates - VerifyIterativeGVN ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/be42a719..410cc2af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=27-28 Stats: 46 lines in 4 files changed: 24 ins; 16 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Wed Jan 28 22:33:58 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 28 Jan 2026 22:33:58 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v27] In-Reply-To: <9l8uhFDyx0QQeTcUw0TclysqlxwvdndXDbRL8rjX8GQ=.0429ab53-df55-4580-914e-a8a4242f00ac@github.com> References: <9l8uhFDyx0QQeTcUw0TclysqlxwvdndXDbRL8rjX8GQ=.0429ab53-df55-4580-914e-a8a4242f00ac@github.com> Message-ID: <0CQxB2IVdHDrvS1gacEuZR7PzYztiSvPeMJLY_qBq-E=.ec8af214-4fc1-4c8a-a7f8-ed8ed649c010@github.com> On Mon, 8 Dec 2025 09:45:43 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> IR test cases > > src/hotspot/share/opto/phaseX.cpp line 2051: > >> 2049: // java -XX:VerifyIterativeGVN=1000 -Xcomp -XX:+StressReachabilityFences >> 2050: return false; >> 2051: } > > Is this still true? Good catch. I double-checked that IGVN verification doesn't fail anymore. Removed. > src/hotspot/share/opto/reachability.cpp line 73: > >> 71: * >> 72: * After loop opts are over, it becomes possible to reliably enumerate all interfering safe points and >> 73: * to ensure that the referent is present in their oop maps. > > For the sake of someone else who has to fix a bug here, or considers changing the design, can you please: > - Give a more concrete example, e.g: load of a native memory address is hoisted, ... explain that this means we move it over the SafePoint in the backedge, and why this is problematic. > - State your assumption / invariants that should hold after loop opts, that guarantee that it is safe to now attach to SP instead of RF. This one makes me a bit nervous, because it is another implicit assumption in C2, but I suppose we just have to live with that. But at least we can document it well ;) Updated the comment. > src/hotspot/share/opto/reachability.cpp line 166: > >> 164: lpt->_reachability_fences = new Node_List(); >> 165: } >> 166: lpt->_reachability_fences->push(new_rf); > > This code is duplicated elsewhere. Consider refactoring it with a `lpt->reachability_fences_push` method that automatically allocates the new `Node_List`. Done. > src/hotspot/share/opto/reachability.cpp line 216: > >> 214: // ResourceMark rm; // NB! not safe because insert_rf may trigger _idom reallocation >> 215: Unique_Node_List redundant_rfs; >> 216: GrowableArray> worklist; > > Tech debt alarm ;) > > We should probably more `_idom` to a different arena then, right? Agree. Will update it once #28581 lands in the repo. > src/hotspot/share/opto/reachability.cpp line 473: > >> 471: if (extra_edge != nullptr) { >> 472: sfpt->add_req(extra_edge); // Add valid_length_test_input edge back >> 473: } > > Could it be that you have two meanings for "extra edge" here? OR does the top comment: > >> Turn extra safepoint edges into reachability fences > > match with this? > >> sfpt->add_req(extra_edge); // Add valid_length_test_input edge back > > Again: mixing up these edges really feels like tech debt. We should fix that soon. I expanded the comment to clarify what happens there. Hope it makes it clearer. "extra edge" term is broader than "reachability edge" and ambiguity comes from the fact that non-debug edges are untyped, but there are multiple unrelated cases when edges are appended to safepoints. > test/hotspot/jtreg/compiler/c2/TestReachabilityFenceFlags.java line 48: > >> 46: * -XX:+StressReachabilityFences -XX:+OptimizeReachabilityFences -XX:+PreserveReachabilityFencesOnConstants >> 47: * compiler.c2.TestReachabilityFenceFlags >> 48: */ > > Conside dropping `-Xcomp -XX:-TieredCompilation`, because we will run with that at some point in our CI anyway. Would give more options for different kinds of compilation, right? It is intended as a smoke test for different RF-related control flags. It runs a very limited amount of testing (-Xcomp and C2-only with an empty main method) irrespective of execution mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2738887078 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2738887240 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2738887640 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2738887791 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2738888183 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2738888293 From vlivanov at openjdk.org Wed Jan 28 22:38:37 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 28 Jan 2026 22:38:37 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v27] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 09:23:55 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> IR test cases > > test/hotspot/jtreg/compiler/c2/TestReachabilityFence.java line 200: > >> 198: @IR(counts = {IRNode.REACHABILITY_FENCE, "2"}, phase = CompilePhase.AFTER_LOOP_OPTS) >> 199: @IR(counts = {IRNode.REACHABILITY_FENCE, "0"}, phase = CompilePhase.EXPAND_REACHABILITY_FENCES) >> 200: @IR(counts = {IRNode.REACHABILITY_FENCE, "1"}, phase = CompilePhase.FINAL_CODE) > > Can you add a small comment here, why we go from 2 -> 0 -> 1 ? Is it because we eliminate one of the two RF? Which one is supposed to be eliminated? There's one referent and one interfering safepoint. So, both RFs are transformed into a single reachability edge and it is expanded into a single RF at the end. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2738903535 From vlivanov at openjdk.org Wed Jan 28 22:46:05 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 28 Jan 2026 22:46:05 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v30] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/410cc2af..cf715d51 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=28-29 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Wed Jan 28 22:46:09 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 28 Jan 2026 22:46:09 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v29] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 22:33:51 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: > > - updates > - VerifyIterativeGVN Thanks for the thorough review, Emanuel! Much appreciated! Any other feedback/reviews, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3814300124 From vlivanov at openjdk.org Wed Jan 28 22:46:13 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 28 Jan 2026 22:46:13 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v4] In-Reply-To: <1E6C99XanWuE9QF_-SfyFzxgD4asf7WUO7RyR1pYn7M=.47f1b163-1b67-4f54-acd4-d061779ad95d@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> <1E6C99XanWuE9QF_-SfyFzxgD4asf7WUO7RyR1pYn7M=.47f1b163-1b67-4f54-acd4-d061779ad95d@github.com> Message-ID: On Thu, 22 Jan 2026 14:12:12 GMT, Kerem Kat wrote: >> The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into fix-c2-checkCastPP > - Merge branch 'master' into fix-c2-checkCastPP > - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed > - Simplify expand_vbox_node_helper by merging VectorBox Phi handling > - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded Looks good. (Sorry for the delay, I thought I already approved it.) ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29200#pullrequestreview-3719518486 From vlivanov at openjdk.org Wed Jan 28 22:47:16 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 28 Jan 2026 22:47:16 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v16] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 10:51:44 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Jatins typo fix part 2 > > Co-authored-by: Jatin Bhateja > - Jatins typo fix part 1 > > Co-authored-by: Jatin Bhateja Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28639#pullrequestreview-3719520780 From psandoz at openjdk.org Wed Jan 28 23:12:12 2026 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 28 Jan 2026 23:12:12 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v16] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 10:51:44 GMT, Emanuel Peter wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Jatins typo fix part 2 > > Co-authored-by: Jatin Bhateja > - Jatins typo fix part 1 > > Co-authored-by: Jatin Bhateja Very good, i just focused on the benchmark code. In addition to highlighting gaps in platforms it may also highlight gaps in machine code gen when performance is good to improve it further. test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithmsImpl.java line 534: > 532: for (; i < SPECIES_I512.loopBound(a.length); i += SPECIES_I512.length()) { > 533: IntVector v = IntVector.fromArray(SPECIES_I512, a, i); > 534: v = v.add(v.rearrange(shf1), mask1); These are lane shifting operations, so another variant can use `compress` with the masks as input. Another could use `slice`, ideally the rearrange and slice variants would generate comparable code, or the compress and slice would. ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28639#pullrequestreview-3719592647 PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2738982413 From dlong at openjdk.org Wed Jan 28 23:40:56 2026 From: dlong at openjdk.org (Dean Long) Date: Wed, 28 Jan 2026 23:40:56 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v9] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 11:46:31 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - ... and 4 more: https://git.openjdk.org/jdk/compare/129feafa...d29208cf Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28432#pullrequestreview-3719688418 From duke at openjdk.org Thu Jan 29 00:09:26 2026 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 29 Jan 2026 00:09:26 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v7] In-Reply-To: References: Message-ID: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Added dedicated tests for the `HotCodeGrouper` > ... Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Fix test failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27858/files - new: https://git.openjdk.org/jdk/pull/27858/files/9999bf7b..4ce14eef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/27858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27858/head:pull/27858 PR: https://git.openjdk.org/jdk/pull/27858 From duke at openjdk.org Thu Jan 29 00:24:33 2026 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 29 Jan 2026 00:24:33 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v8] In-Reply-To: References: Message-ID: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Added dedicated tests for the `HotCodeGrouper` > ... Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8326205 - Fix test failure - Fix builds - Fix merge - Merge remote-tracking branch 'origin/master' into JDK-8326205 - Add check for full HotCodeHeap - Add HotCodeGrouperMoveFunction test - Add StessHotCodeGrouper test - Update blob checks - Merge fix - ... and 20 more: https://git.openjdk.org/jdk/compare/d5e16db1...b5a5c71b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27858/files - new: https://git.openjdk.org/jdk/pull/27858/files/4ce14eef..b5a5c71b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=06-07 Stats: 47318 lines in 1062 files changed: 23617 ins; 9713 del; 13988 mod Patch: https://git.openjdk.org/jdk/pull/27858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27858/head:pull/27858 PR: https://git.openjdk.org/jdk/pull/27858 From erfang at openjdk.org Thu Jan 29 01:41:06 2026 From: erfang at openjdk.org (Eric Fang) Date: Thu, 29 Jan 2026 01:41:06 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v4] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 10:02:53 GMT, Andrew Haley wrote: >> The general way code flows right now, but not often, is from jdk/master to panama-vector/vectorIntrinsics, since most of the development work is in the mainline (exceptions to that are the float16 and Valhalla alignment work which are large efforts). >> >> I am very reluctant to include all the auto-generated micro benchmarks in mainline. There is a huge number of them and i am not certain they provide as much value as they did now we have the IR test framework. In may cases, given the simplicity of what they measure, they were designed to ensure C2 generates the right instructions. The IR test framework is better at determining that by testing the right IR nodes are generated - and they get run as part of the existing HotSpot test suite. >> >> The IR test framework is of course no substitute, in general, for performance tests. A better focus for Vector API performance tests is i think Emanuel's work [here](https://github.com/openjdk/jdk/pull/28639/) and use-cases/algorithms that can be implemented concisely. > >> The IR test framework is better at determining that by testing the right IR nodes are generated - and they get run as part of the existing HotSpot test suite. > > But as a reviewer I'm not looking at the IR at all, but at the performance. Hi @theRealAph @PaulSandoz , thanks for your insight! How to synchronize the JMH micro benchmarks between Panama and the mainline may be a more general issue that requires further investigation, design, and resources. As for how to move this PR forward, my idea is to write a new micro benchmark in this PR to demonstrate the optimization effect of this patch. Would that be acceptable to you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3814866131 From dlong at openjdk.org Thu Jan 29 06:50:32 2026 From: dlong at openjdk.org (Dean Long) Date: Thu, 29 Jan 2026 06:50:32 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v6] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 03:43:15 GMT, Chen Liang wrote: >> src/hotspot/share/opto/library_call.cpp line 4791: >> >>> 4789: const TypeInstPtr* t = _gvn.type(obj)->isa_instptr(); >>> 4790: if (t != nullptr && t->const_oop() != nullptr) { >>> 4791: assert(!is_virtual, "no devirtualization for constant receiver?"); >> >> Don't we also need to check for `is_static`, to distinguish between `Object.hashCode` and `System.identityHashCode`? > > I think once we are not virtual, the native Object::hashCode behaves like System::identityHashCode. The only difference is null check, but I think there's a null check in the beginning so we should be safe. OK so if we get here we are guaranteed to be calling Object::hashCode and not a devirtualized MySubClass::hashCode? I guess the intrinsic lookup would fail if the callee was a subclass. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2740204610 From jbhateja at openjdk.org Thu Jan 29 07:25:03 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 29 Jan 2026 07:25:03 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries Message-ID: As per [discussions ](https://github.com/openjdk/jdk/pull/28002#issuecomment-3789507594) on JDK-8370691 pull request, splitting out portion of PR#28002 into a separate patch in preparation of Float16 vector API support. Patch add new lane type constants and pass them to vector intrinsic entry points. All existing Vector API jtreg test are passing with the patch. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - [VectorAPI] Define new lane type constants and pass them to intrinsic entries Changes: https://git.openjdk.org/jdk/pull/29481/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29481&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376187 Stats: 1744 lines in 52 files changed: 192 ins; 79 del; 1473 mod Patch: https://git.openjdk.org/jdk/pull/29481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29481/head:pull/29481 PR: https://git.openjdk.org/jdk/pull/29481 From epeter at openjdk.org Thu Jan 29 07:55:23 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jan 2026 07:55:23 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v9] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> <9eLBVN0xV07IbyO99CQ79dyVYUme_JpewQqgTo3eFtg=.f9e352cd-724c-4150-a2bb-a0aa4236c4e7@github.com> <-rssqxnycfY9oZ6SgyG8JyzBctLFgg2QQs7Iydxx9Qo=.7e513c45-f132-445f-9d11-7d1018520dfb@github.com> <6sXuSoCq_GGQd8cWTWidhgxynFNYPGASb2Q3RMTXz-4=.0103b2f5-4486-4c84-abff-d1c65c61ee94@github.com> Message-ID: On Wed, 28 Jan 2026 17:08:37 GMT, Beno?t Maillard wrote: >> You could even make that a debug only field of `PhaseIterGVN` and override it for `PhaseCCP`. > > Following a discussion with @mhaessig, I decided to replace the `_iterGVN` boolean field by an enum field, and refactor `is_IterGVN` using this new field. > > ```c++ > enum class PhaseValuesType { > gvn, > iter_gvn, > ccp > }; Yet another option: You just pass the `Phase`, and check if it is PhaseIterGVN etc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2740385322 From qamai at openjdk.org Thu Jan 29 08:00:31 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 08:00:31 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 13 Jan 2026 11:27:53 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Fix build failure after rebasing and address review comments > - Merge branch 'master' into optimize-atomic-post > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 I'm a novice in loop optimizations, and this is just an unfounded comments: I feel that this kind of graph surgery is hard to verify and it tends to be fragile at the presence of numerous optimizations happen concurrently. Another inconsistency I feel is that while you do normal unrolling, the post loop is already in place, when you do super unrolling, you have to pull out a vectorized drain loop from thin air. As a result, I think it would be more reliable to generate the pre-main-post1-post2 loop structure from the beginning, and eliminate each of them if they are unnecessary. This also helps the cases where we want the drain loop and the main loop to operate on vectors of different sizes, or to have a drain loop even if the main loop does not super unroll. For example, if the main loop operates on vectors of 64 bytes, then you will want to have a drain loop that operates on vectors of 8 bytes before going into scalar, even if the main loop does not super unroll. Please let me know if I misunderstand anything, thanks a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3816067434 From qamai at openjdk.org Thu Jan 29 08:13:59 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 08:13:59 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 07:16:35 GMT, Jatin Bhateja wrote: > As per [discussions ](https://github.com/openjdk/jdk/pull/28002#issuecomment-3789507594) on JDK-8370691 pull request, splitting out portion of PR#28002 into a separate patch in preparation of Float16 vector API support. > > Patch add new lane type constants and pass them to vector intrinsic entry points. > > All existing Vector API jtreg test are passing with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/share/prims/vectorSupport.hpp line 140: > 138: }; > 139: > 140: enum { Please use a scoped enum instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2740433980 From epeter at openjdk.org Thu Jan 29 08:29:33 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jan 2026 08:29:33 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v16] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 23:07:49 GMT, Paul Sandoz wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Jatins typo fix part 2 >> >> Co-authored-by: Jatin Bhateja >> - Jatins typo fix part 1 >> >> Co-authored-by: Jatin Bhateja > > test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithmsImpl.java line 534: > >> 532: for (; i < SPECIES_I512.loopBound(a.length); i += SPECIES_I512.length()) { >> 533: IntVector v = IntVector.fromArray(SPECIES_I512, a, i); >> 534: v = v.add(v.rearrange(shf1), mask1); > > These are lane shifting operations, so another variant can use `compress` with the masks as input. Another could use `slice`, ideally the rearrange and slice variants would generate comparable code, or the compress and slice would. Ah good ideas! I'll note that down for a follow-up RFE! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2740490494 From epeter at openjdk.org Thu Jan 29 08:45:02 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jan 2026 08:45:02 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v8] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 14:03:35 GMT, Otmar Ertl wrote: >> I'm adding Otmar Ertl as contributor because of the inspiration I took from his work on `hashCode`: >> https://www.dynatrace.com/news/blog/java-arrays-hashcode-byte-efficiency-techniques/ > >> @eme64 Contributor `Ormar Ertl ` successfully added. > > @eme64 There is a typo in my name, should be `Otmar Ertl` @oertl @XiaohongGong @PaulSandoz @iwanowww @jatin-bhateja Thank you all for your contributions/suggestions/reviews, much appreciated! I filed an umbrella task for follow-up RFEs: [JDK-8376655](https://bugs.openjdk.org/browse/JDK-8376655): [VectorAlgorithms] Umbrella: add more SuperWord and VectorAPI benchmarks and tests ------------- PR Comment: https://git.openjdk.org/jdk/pull/28639#issuecomment-3816260798 From epeter at openjdk.org Thu Jan 29 08:45:07 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jan 2026 08:45:07 GMT Subject: Integrated: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 15:11:33 GMT, Emanuel Peter wrote: > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > Note: the `aarch64` machines all only have `NEON` support. Performance may be much better on `SVE`, I have not benchmarked that yet. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server References: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> Message-ID: On Mon, 26 Jan 2026 14:39:11 GMT, Marc Chevalier wrote: > Repeat compilation happens here: > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2355 > > and in `C2Compiler::compile_method` which does > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/opto/c2compiler.cpp#L136-L147 > > In case of failure `_failure_reason` in `ci_env`/`env` is populated, and so the big loop in `compile_method` makes no iteration, and in particular, no compilation happens. Bailouts in `Compile::Compile` and `Compile::Optimize` checks this `_failure_reason`, but also the `Compile` object's one. This is fine since the `Compile` object doesn't survive across compilation, and thus the field in it is fresh at each iteration in `compile_method`. > > A direct solution is to reset `ci_env._failure_reason` at each iteration of compilation repeat. We could also create a fresh `ci_env` like hinted in the JBS issue, but `ci_env` undergoes a bit of setup. It might be ok not to do this setup for the purpose of repeated compilation, but it's not clear to me. > > Since repeated compilation do not install code (and thus, cannot mark the task as successful), to get consistent values between `ci_env.failing()` and `task->is_success()`, I copy the original failing reason and restore it at the end. > > Using IGV output, we can see that we get as many re-compilation as requested, even when some are (artificially) bailing out. > > Thanks, > Marc Marked as reviewed by bmaillard (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29419#pullrequestreview-3721711590 From mdoerr at openjdk.org Thu Jan 29 10:01:45 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 29 Jan 2026 10:01:45 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 05:24:12 GMT, Galder Zamarre?o wrote: > Adjust test expectations to deal with environments where `Min/Max[F|D]` are not produced. To fix this, I've followed same patterns used in `compiler.c2.irTests.TestMinMaxIdentities`. > > I have tested this on x86_64 with `-XX:UseAVX=0`. Is anybody willing to provide a 2nd review? Maybe @offamitkumar? Otherwise, we could treat it as trivial. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29438#issuecomment-3816632059 From mchevalier at openjdk.org Thu Jan 29 10:07:14 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 29 Jan 2026 10:07:14 GMT Subject: RFR: 8373898: RepeatCompilation does not repeat compilation after bailout [v2] In-Reply-To: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> References: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> Message-ID: > Repeat compilation happens here: > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2355 > > and in `C2Compiler::compile_method` which does > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/opto/c2compiler.cpp#L136-L147 > > In case of failure `_failure_reason` in `ci_env`/`env` is populated, and so the big loop in `compile_method` makes no iteration, and in particular, no compilation happens. Bailouts in `Compile::Compile` and `Compile::Optimize` checks this `_failure_reason`, but also the `Compile` object's one. This is fine since the `Compile` object doesn't survive across compilation, and thus the field in it is fresh at each iteration in `compile_method`. > > A direct solution is to reset `ci_env._failure_reason` at each iteration of compilation repeat. We could also create a fresh `ci_env` like hinted in the JBS issue, but `ci_env` undergoes a bit of setup. It might be ok not to do this setup for the purpose of repeated compilation, but it's not clear to me. > > Since repeated compilation do not install code (and thus, cannot mark the task as successful), to get consistent values between `ci_env.failing()` and `task->is_success()`, I copy the original failing reason and restore it at the end. > > Using IGV output, we can see that we get as many re-compilation as requested, even when some are (artificially) bailing out. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29419/files - new: https://git.openjdk.org/jdk/pull/29419/files/c8d4648a..b604d39b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29419&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29419&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29419.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29419/head:pull/29419 PR: https://git.openjdk.org/jdk/pull/29419 From bmaillard at openjdk.org Thu Jan 29 10:07:15 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 29 Jan 2026 10:07:15 GMT Subject: RFR: 8373898: RepeatCompilation does not repeat compilation after bailout [v2] In-Reply-To: References: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> Message-ID: On Thu, 29 Jan 2026 10:03:43 GMT, Marc Chevalier wrote: >> Repeat compilation happens here: >> >> https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2355 >> >> and in `C2Compiler::compile_method` which does >> >> https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/opto/c2compiler.cpp#L136-L147 >> >> In case of failure `_failure_reason` in `ci_env`/`env` is populated, and so the big loop in `compile_method` makes no iteration, and in particular, no compilation happens. Bailouts in `Compile::Compile` and `Compile::Optimize` checks this `_failure_reason`, but also the `Compile` object's one. This is fine since the `Compile` object doesn't survive across compilation, and thus the field in it is fresh at each iteration in `compile_method`. >> >> A direct solution is to reset `ci_env._failure_reason` at each iteration of compilation repeat. We could also create a fresh `ci_env` like hinted in the JBS issue, but `ci_env` undergoes a bit of setup. It might be ok not to do this setup for the purpose of repeated compilation, but it's not clear to me. >> >> Since repeated compilation do not install code (and thus, cannot mark the task as successful), to get consistent values between `ci_env.failing()` and `task->is_success()`, I copy the original failing reason and restore it at the end. >> >> Using IGV output, we can see that we get as many re-compilation as requested, even when some are (artificially) bailing out. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Copyright Marked as reviewed by bmaillard (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29419#pullrequestreview-3721751877 From galder at openjdk.org Thu Jan 29 10:18:30 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 29 Jan 2026 10:18:30 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 05:24:12 GMT, Galder Zamarre?o wrote: > Adjust test expectations to deal with environments where `Min/Max[F|D]` are not produced. To fix this, I've followed same patterns used in `compiler.c2.irTests.TestMinMaxIdentities`. > > I have tested this on x86_64 with `-XX:UseAVX=0`. Yeah sure happy with a second review. Skara marked it with 1 review, that's why I thought this was ready ------------- PR Comment: https://git.openjdk.org/jdk/pull/29438#issuecomment-3816708547 From mchevalier at openjdk.org Thu Jan 29 10:23:43 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 29 Jan 2026 10:23:43 GMT Subject: RFR: 8376325: [IR Framework] Detect and report overloads [v2] In-Reply-To: References: Message-ID: > The IR framework should not only forbid overloads between test methods, but overloads of a test method, even if other overloads are not test methods themselves. Indeed, the compiler directive file designate methods only by the class name and method name, without the parameters. Something like: > > { > match : "ir_framework.tests.BadOverloadedMethod::sameName", > log : true, > PrintIdeal : true, > } > > This means that the same printing directive would apply to overloads, and make the output confusing in case these non-test methods are compiled. While test methods are necessarily compiled by the framework, the said framework doesn't prevent other methods to be compiled (a normal output of the test VM shows a lot of compilations). > > One could emit compiler directives that take arguments into account, but this is not clear it is useful. Also, there is a simpler solution: disallow overloading of test methods at all. This way, if we regret and need overloads later, we can still allow them without breaking existing tests. With this change, one can get the new error message: > > - Cannot overload @Test methods, but method public void ir_framework.tests.BadOverloadedMethod.sameName(double) has 2 overloads: > - public void ir_framework.tests.BadOverloadedMethod.sameName(boolean) > - public void ir_framework.tests.BadOverloadedMethod.sameName() > > which should explain well enough what is happening. A little esthetic problem is that if all three methods (in the previous example) are test-method, one get an error for each of them. I considered it acceptable. > > This change needed adjusting some tests. I've also made them a bit more robust/easy to maintain by using a map instead so I didn't have to sift a hundred array indices. > > Let's also emphasize that this change doesn't mean that overloads are entirely forbidden: they are fine as long as they don't involve a test method. > > Tested on tier1,tier2,tier3,hs-precheckin-comp,hs-comp-stress. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29483/files - new: https://git.openjdk.org/jdk/pull/29483/files/6a19fc93..7d61034c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29483&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29483&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29483.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29483/head:pull/29483 PR: https://git.openjdk.org/jdk/pull/29483 From mhaessig at openjdk.org Thu Jan 29 10:30:10 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 29 Jan 2026 10:30:10 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 05:24:12 GMT, Galder Zamarre?o wrote: > Adjust test expectations to deal with environments where `Min/Max[F|D]` are not produced. To fix this, I've followed same patterns used in `compiler.c2.irTests.TestMinMaxIdentities`. > > I have tested this on x86_64 with `-XX:UseAVX=0`. The changes look good, I'll just quickly run some sanity testing on our side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29438#issuecomment-3816781216 From aph at openjdk.org Thu Jan 29 10:32:29 2026 From: aph at openjdk.org (Andrew Haley) Date: Thu, 29 Jan 2026 10:32:29 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v4] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 10:02:53 GMT, Andrew Haley wrote: >> The general way code flows right now, but not often, is from jdk/master to panama-vector/vectorIntrinsics, since most of the development work is in the mainline (exceptions to that are the float16 and Valhalla alignment work which are large efforts). >> >> I am very reluctant to include all the auto-generated micro benchmarks in mainline. There is a huge number of them and i am not certain they provide as much value as they did now we have the IR test framework. In may cases, given the simplicity of what they measure, they were designed to ensure C2 generates the right instructions. The IR test framework is better at determining that by testing the right IR nodes are generated - and they get run as part of the existing HotSpot test suite. >> >> The IR test framework is of course no substitute, in general, for performance tests. A better focus for Vector API performance tests is i think Emanuel's work [here](https://github.com/openjdk/jdk/pull/28639/) and use-cases/algorithms that can be implemented concisely. > >> The IR test framework is better at determining that by testing the right IR nodes are generated - and they get run as part of the existing HotSpot test suite. > > But as a reviewer I'm not looking at the IR at all, but at the performance. > Hi @theRealAph @PaulSandoz , thanks for your insight! How to synchronize the JMH micro benchmarks between Panama and the mainline may be a more general issue that requires further investigation, design, and resources. As for how to move this PR forward, my idea is to write a new micro benchmark in this PR to demonstrate the optimization effect of this patch. Would that be acceptable to you? Sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3816786645 From mhaessig at openjdk.org Thu Jan 29 10:33:34 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 29 Jan 2026 10:33:34 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 In-Reply-To: References: Message-ID: <2iurEAXQt6htWV8rnlpzU7abIBklEO9e_s3VkeAvUXI=.fed2f415-4672-42b3-94bb-8d1d02f57b7b@github.com> On Tue, 27 Jan 2026 05:24:12 GMT, Galder Zamarre?o wrote: > Adjust test expectations to deal with environments where `Min/Max[F|D]` are not produced. To fix this, I've followed same patterns used in `compiler.c2.irTests.TestMinMaxIdentities`. > > I have tested this on x86_64 with `-XX:UseAVX=0`. test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java line 2: > 1: /* > 2: * Copyright (c) 2025 IBM Corporation. All rights reserved. Not sure if you need to update the copyright year over at IBM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29438#discussion_r2740970899 From qamai at openjdk.org Thu Jan 29 11:02:51 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 11:02:51 GMT Subject: RFR: 8376604: C2: EA should assert is_oop_field for AddP with oop outs In-Reply-To: References: Message-ID: <6PipsDnFG7dhTYX6PZp3UbQycxNV1lLKhXQbgUMYqqY=.cb844298-730c-4711-8064-52709083e1ba@github.com> On Wed, 28 Jan 2026 18:30:15 GMT, Aleksey Shipilev wrote: > Split out of [JDK-8376514](https://bugs.openjdk.org/browse/JDK-8376514) in Valhalla: I think we need to verify more thoroughly that if we reply is_oop_field = false for AddP, then there are no nodes that we feed into oops. We handle it pretty well in various branches in the method already, and we "just" need to check it at the end. Valhalla catches fire on that post-condition check, tracked in [JDK-8376514](https://bugs.openjdk.org/browse/JDK-8376514). > > Cleans up the code a bit as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` LGTM ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/29468#pullrequestreview-3722019313 From qamai at openjdk.org Thu Jan 29 11:22:29 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 11:22:29 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 09:43:30 GMT, Jatin Bhateja wrote: >> As per [discussions ](https://github.com/openjdk/jdk/pull/28002#issuecomment-3789507594) on JDK-8370691 pull request, splitting out portion of PR#28002 into a separate patch in preparation of Float16 vector API support. >> >> Patch add new lane type constants and pass them to vector intrinsic entry points. >> >> All existing Vector API jtreg test are passing with the patch. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution src/hotspot/share/prims/vectorSupport.cpp line 202: > 200: } > 201: > 202: int VectorSupport::vop2ideal(jint id, BasicType bt) { Previously, this method accepts a `BasicType`, now it accepts an untyped `int`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2741153639 From qamai at openjdk.org Thu Jan 29 11:22:31 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 11:22:31 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 09:07:54 GMT, Quan Anh Mai wrote: >> Its contained in VectorSupport class which makes it implicitly scoped for external uses without being a named (scoped) enum > > I mean an `enum class`. With this we just pass `int` around which is not recommended. I don't see this gets resolved. My suggestion is to use a scoped enum so we have a strongly typed value instead of using an unscoped enum and passing `int` all over the places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2741147870 From rrich at openjdk.org Thu Jan 29 11:23:41 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 29 Jan 2026 11:23:41 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v2] In-Reply-To: References: <2EsMN-Ho8F7NzytDTPvYh4cbewkLcb3YQq5oTbuFcfc=.32a85ea0-e332-44cc-9f10-04d88138eb82@github.com> <-TysMtkU_jZckaZB-X_nEFxM2t8C31OCRyz9WvbtiZ4=.80aaea68-bfb5-435e-8542-b50e013c48cb@github.com> Message-ID: On Thu, 29 Jan 2026 09:46:49 GMT, Martin Doerr wrote: >>> Please note that C2 generates code which is equivalent to your C version: >> >> Could you please provide the Java source code for that? >> >>> Btw. what the Spec suggests also seems to use the wrong order: >> >> Where can that be found? > >> > Please note that C2 generates code which is equivalent to your C version: >> >> Could you please provide the Java source code for that? > > > class test { > static double cmovf_eq(double op1, double op2, double src1, double src2) { > return op1 == op2 ? src1 : src2; > } > > static void main(String[] args) { > double result = 0.0; > for (int i = 0; i < 100_000; ++i) { > result += cmovf_eq((double) (i / 2), (double) ((i + 1) / 2), 1.0, 2.0); > } > System.out.println("result = " + result); > } > } > > >> > Btw. what the Spec suggests also seems to use the wrong order: >> >> Where can that be found? > > PowerISA? Version3.1C Thanks, I was looking at 3.1B which doesn't have the programming note. Thanks also for the example. The trick to get the cmove is that the condition mustn't be true or false all the time. We should also have OptoAssembly that shows the condition and the registers. For testing I've added format %{ "cmovD_cmpD $dst, $op1 $cop $op2, $src1, $src2\n\t" %} My testmethod static double dontinline_cmovf(double op1, double op2, double src1, double src2) { return op1 == op2 ? src1 : src2; } produces this OptoAssembly ============================= C2-compiled nmethod ============================== #r090 F0:F0 : parm 0: double <- note: register names are incorrect #r088 F0:F0 : parm 2: double #r086 F0:F0 : parm 4: double #r084 F0:F0 : parm 6: double #r283 R1+76: old out preserve #r282 R1+72: old out preserve // ... 028 cmovD_cmpD F1, F1 ne F2, F3, F4 testing... res = dontinline_cmovf(1d, 2d, 3d, 4d); System.out.println(op1 + " == " + op2 + " ? " + src1 + " : " + src2); System.out.println("-> " + res); 1.0 == 2.0 ? 3.0 : 4.0 -> 4.0 Very weired: the condition `==` is flipped to `ne` but `src1` and `src2` are not swapped since 4.0 is in `F4` which is `src2` according to the OptoAssembly above. I would have expected 028 cmovD_cmpD F1, F1 ne F2, F4, F3 Is there an explanation? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2741160233 From chagedorn at openjdk.org Thu Jan 29 11:29:02 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 29 Jan 2026 11:29:02 GMT Subject: RFR: 8373898: RepeatCompilation does not repeat compilation after bailout [v2] In-Reply-To: References: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> Message-ID: On Thu, 29 Jan 2026 10:07:14 GMT, Marc Chevalier wrote: >> Repeat compilation happens here: >> >> https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2355 >> >> and in `C2Compiler::compile_method` which does >> >> https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/opto/c2compiler.cpp#L136-L147 >> >> In case of failure `_failure_reason` in `ci_env`/`env` is populated, and so the big loop in `compile_method` makes no iteration, and in particular, no compilation happens. Bailouts in `Compile::Compile` and `Compile::Optimize` checks this `_failure_reason`, but also the `Compile` object's one. This is fine since the `Compile` object doesn't survive across compilation, and thus the field in it is fresh at each iteration in `compile_method`. >> >> A direct solution is to reset `ci_env._failure_reason` at each iteration of compilation repeat. We could also create a fresh `ci_env` like hinted in the JBS issue, but `ci_env` undergoes a bit of setup. It might be ok not to do this setup for the purpose of repeated compilation, but it's not clear to me. >> >> Since repeated compilation do not install code (and thus, cannot mark the task as successful), to get consistent values between `ci_env.failing()` and `task->is_success()`, I copy the original failing reason and restore it at the end. >> >> Using IGV output, we can see that we get as many re-compilation as requested, even when some are (artificially) bailing out. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Copyright Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29419#pullrequestreview-3722128349 From mchevalier at openjdk.org Thu Jan 29 11:34:22 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 29 Jan 2026 11:34:22 GMT Subject: RFR: 8373898: RepeatCompilation does not repeat compilation after bailout [v2] In-Reply-To: References: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> Message-ID: <8PDOZpcMiivpJ94SXhPnl0qf5eZsPHua9wzmUa6jeAM=.190d5356-fe5c-4159-a6fd-6b3d3c7876f7@github.com> On Thu, 29 Jan 2026 10:07:14 GMT, Marc Chevalier wrote: >> Repeat compilation happens here: >> >> https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2355 >> >> and in `C2Compiler::compile_method` which does >> >> https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/opto/c2compiler.cpp#L136-L147 >> >> In case of failure `_failure_reason` in `ci_env`/`env` is populated, and so the big loop in `compile_method` makes no iteration, and in particular, no compilation happens. Bailouts in `Compile::Compile` and `Compile::Optimize` checks this `_failure_reason`, but also the `Compile` object's one. This is fine since the `Compile` object doesn't survive across compilation, and thus the field in it is fresh at each iteration in `compile_method`. >> >> A direct solution is to reset `ci_env._failure_reason` at each iteration of compilation repeat. We could also create a fresh `ci_env` like hinted in the JBS issue, but `ci_env` undergoes a bit of setup. It might be ok not to do this setup for the purpose of repeated compilation, but it's not clear to me. >> >> Since repeated compilation do not install code (and thus, cannot mark the task as successful), to get consistent values between `ci_env.failing()` and `task->is_success()`, I copy the original failing reason and restore it at the end. >> >> Using IGV output, we can see that we get as many re-compilation as requested, even when some are (artificially) bailing out. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Copyright Thanks for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29419#issuecomment-3817072590 From mchevalier at openjdk.org Thu Jan 29 11:34:23 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 29 Jan 2026 11:34:23 GMT Subject: Integrated: 8373898: RepeatCompilation does not repeat compilation after bailout In-Reply-To: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> References: <8dPAwPZVqlsfdXKFWOB6MUgJN5wJQqWSYTH_WUnwOpw=.f1a1fe00-5f97-4f6a-a90b-7ddee2cb8b91@github.com> Message-ID: On Mon, 26 Jan 2026 14:39:11 GMT, Marc Chevalier wrote: > Repeat compilation happens here: > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2355 > > and in `C2Compiler::compile_method` which does > > https://github.com/openjdk/jdk/blob/b59f49a1c3e370f794291a1f948e67d2651ece11/src/hotspot/share/opto/c2compiler.cpp#L136-L147 > > In case of failure `_failure_reason` in `ci_env`/`env` is populated, and so the big loop in `compile_method` makes no iteration, and in particular, no compilation happens. Bailouts in `Compile::Compile` and `Compile::Optimize` checks this `_failure_reason`, but also the `Compile` object's one. This is fine since the `Compile` object doesn't survive across compilation, and thus the field in it is fresh at each iteration in `compile_method`. > > A direct solution is to reset `ci_env._failure_reason` at each iteration of compilation repeat. We could also create a fresh `ci_env` like hinted in the JBS issue, but `ci_env` undergoes a bit of setup. It might be ok not to do this setup for the purpose of repeated compilation, but it's not clear to me. > > Since repeated compilation do not install code (and thus, cannot mark the task as successful), to get consistent values between `ci_env.failing()` and `task->is_success()`, I copy the original failing reason and restore it at the end. > > Using IGV output, we can see that we get as many re-compilation as requested, even when some are (artificially) bailing out. > > Thanks, > Marc This pull request has now been integrated. Changeset: f96974db Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/f96974dbbd824db8d7b2bbf28f5d3b49bb005fb3 Stats: 13 lines in 1 file changed: 6 ins; 0 del; 7 mod 8373898: RepeatCompilation does not repeat compilation after bailout Reviewed-by: chagedorn, bmaillard ------------- PR: https://git.openjdk.org/jdk/pull/29419 From jbhateja at openjdk.org Thu Jan 29 11:39:35 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 29 Jan 2026 11:39:35 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 11:19:18 GMT, Quan Anh Mai wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > src/hotspot/share/prims/vectorSupport.cpp line 202: > >> 200: } >> 201: >> 202: int VectorSupport::vop2ideal(jint id, BasicType bt) { > > Previously, this method accepts a `BasicType`, now it accepts an untyped `int`. Correct, we are passing an integer laneType from java side to intrinsic entry points. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2741207425 From qamai at openjdk.org Thu Jan 29 11:43:35 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 11:43:35 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 22:18:33 GMT, Vladimir Ivanov wrote: >> Strength-reducing an interface call to a virtual call for interfaces with >> unique implementors can use receiver type information to narrow the context. >> >> C2 tracks interface types and receiver type information can be used to reveal >> an interface with a unique implementor which can't be derived from the call >> site itself. >> >> Since C2 effectively accumulates a union interface type from multiple subtype checks, iterating over individual components of a type may reveal a candidate for a strength-reduction. The only prerequisite is that a candidate has to be a subtype of the declared interface. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - addtional case > - Merge branch 'master' into cha.intf.recv > - Use receiver type to improve CHA decisions src/hotspot/share/opto/type.cpp line 4593: > 4591: // For an interface instance reports one of most specific superinterfaces with a unique implementor. > 4592: ciInstanceKlass* TypeInstPtr::has_unique_implementor(ciInstanceKlass* context_intf) const { > 4593: if (is_interface() && context_intf->is_interface()) { I feel like you are making it unnecessarily complicated. `_interface` is the set of interfaces this `TypeInstPtr` must satisfy. As a result, a check like this would be sufficient (not real code): for (ciInstanceKlass* intf : _interfaces) { ciInstanceKlass* candidate = intf->unique_implementor(); if (candidate != nullptr) { return candidate; } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2741217681 From qamai at openjdk.org Thu Jan 29 11:43:36 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 11:43:36 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions [v2] In-Reply-To: References: Message-ID: <6Ya3MBac7hXnBWBhEa1cqbdtW33T7C9phyqq-WsvhFo=.f5afd90e-e104-420a-9536-a8f54a38dd66@github.com> On Thu, 29 Jan 2026 11:37:55 GMT, Quan Anh Mai wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - addtional case >> - Merge branch 'master' into cha.intf.recv >> - Use receiver type to improve CHA decisions > > src/hotspot/share/opto/type.cpp line 4593: > >> 4591: // For an interface instance reports one of most specific superinterfaces with a unique implementor. >> 4592: ciInstanceKlass* TypeInstPtr::has_unique_implementor(ciInstanceKlass* context_intf) const { >> 4593: if (is_interface() && context_intf->is_interface()) { > > I feel like you are making it unnecessarily complicated. `_interface` is the set of interfaces this `TypeInstPtr` must satisfy. As a result, a check like this would be sufficient (not real code): > > for (ciInstanceKlass* intf : _interfaces) { > ciInstanceKlass* candidate = intf->unique_implementor(); > if (candidate != nullptr) { > return candidate; > } > } Even better, since these `_interfaces` are trusted, we don't need to emit a runtime check for the type. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2741224374 From qamai at openjdk.org Thu Jan 29 11:45:33 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 11:45:33 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 11:34:51 GMT, Jatin Bhateja wrote: >> src/hotspot/share/prims/vectorSupport.cpp line 202: >> >>> 200: } >>> 201: >>> 202: int VectorSupport::vop2ideal(jint id, BasicType bt) { >> >> Previously, this method accepts a `BasicType`, now it accepts an untyped `int`. > > Correct, we are passing an integer laneType from java side to intrinsic entry points. Please use a separate named type instead of `int`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2741235103 From jbhateja at openjdk.org Thu Jan 29 11:50:35 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 29 Jan 2026 11:50:35 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 11:43:06 GMT, Quan Anh Mai wrote: >> Correct, we are passing an integer laneType from java side to intrinsic entry points. > > Please use a separate named type instead of `int`. It is indeed an integral value which is passed from Java side which is casted to BasicType. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2741250148 From chagedorn at openjdk.org Thu Jan 29 11:54:33 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 29 Jan 2026 11:54:33 GMT Subject: RFR: 8376325: [IR Framework] Detect and report overloads [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 10:23:43 GMT, Marc Chevalier wrote: >> The IR framework should not only forbid overloads between test methods, but overloads of a test method, even if other overloads are not test methods themselves. Indeed, the compiler directive file designate methods only by the class name and method name, without the parameters. Something like: >> >> { >> match : "ir_framework.tests.BadOverloadedMethod::sameName", >> log : true, >> PrintIdeal : true, >> } >> >> This means that the same printing directive would apply to overloads, and make the output confusing in case these non-test methods are compiled. While test methods are necessarily compiled by the framework, the said framework doesn't prevent other methods to be compiled (a normal output of the test VM shows a lot of compilations). >> >> One could emit compiler directives that take arguments into account, but this is not clear it is useful. Also, there is a simpler solution: disallow overloading of test methods at all. This way, if we regret and need overloads later, we can still allow them without breaking existing tests. With this change, one can get the new error message: >> >> - Cannot overload @Test methods, but method public void ir_framework.tests.BadOverloadedMethod.sameName(double) has 2 overloads: >> - public void ir_framework.tests.BadOverloadedMethod.sameName(boolean) >> - public void ir_framework.tests.BadOverloadedMethod.sameName() >> >> which should explain well enough what is happening. A little esthetic problem is that if all three methods (in the previous example) are test-method, one get an error for each of them. I considered it acceptable. >> >> This change needed adjusting some tests. I've also made them a bit more robust/easy to maintain by using a map instead so I didn't have to sift a hundred array indices. >> >> Let's also emphasize that this change doesn't mean that overloads are entirely forbidden: they are fine as long as they don't involve a test method. >> >> Tested on tier1,tier2,tier3,hs-precheckin-comp,hs-comp-stress. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Copyright Nice catch and thanks for cleaning the tests up! Looks good to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29483#pullrequestreview-3722235070 From qamai at openjdk.org Thu Jan 29 11:56:51 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 11:56:51 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 11:47:33 GMT, Jatin Bhateja wrote: >> Please use a separate named type instead of `int`. > > It is indeed an integral value which is passed from Java side which is casted to BasicType. So please cast it in the intrinsics functions in `vectorIntrinsics.cpp` and pass the `LaneType` into this function. static bool is_primitive_lane_type(int laneType) { return laneType >= VectorSupport::LT_FLOAT && laneType <= VectorSupport::LT_LONG; } This function could return a `LaneType` for you. Also, when do we pass in something that is not a valid value? Should this be an `assert`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2741273046 From mdoerr at openjdk.org Thu Jan 29 11:57:49 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 29 Jan 2026 11:57:49 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v2] In-Reply-To: References: <2EsMN-Ho8F7NzytDTPvYh4cbewkLcb3YQq5oTbuFcfc=.32a85ea0-e332-44cc-9f10-04d88138eb82@github.com> <-TysMtkU_jZckaZB-X_nEFxM2t8C31OCRyz9WvbtiZ4=.80aaea68-bfb5-435e-8542-b50e013c48cb@github.com> Message-ID: On Thu, 29 Jan 2026 11:21:04 GMT, Richard Reingruber wrote: >>> > Please note that C2 generates code which is equivalent to your C version: >>> >>> Could you please provide the Java source code for that? >> >> >> class test { >> static double cmovf_eq(double op1, double op2, double src1, double src2) { >> return op1 == op2 ? src1 : src2; >> } >> >> static void main(String[] args) { >> double result = 0.0; >> for (int i = 0; i < 100_000; ++i) { >> result += cmovf_eq((double) (i / 2), (double) ((i + 1) / 2), 1.0, 2.0); >> } >> System.out.println("result = " + result); >> } >> } >> >> >>> > Btw. what the Spec suggests also seems to use the wrong order: >>> >>> Where can that be found? >> >> PowerISA? Version3.1C > > Thanks, I was looking at 3.1B which doesn't have the programming note. > > Thanks also for the example. The trick to get the cmove is that the condition mustn't be true or false all the time. > > We should also have OptoAssembly that shows the condition and the registers. > For testing I've added > > format %{ "cmovD_cmpD $dst, $op1 $cop $op2, $src1, $src2\n\t" %} > > My testmethod > > static double dontinline_cmovf(double op1, double op2, double src1, double src2) { > return op1 == op2 ? src1 : src2; > } > > produces this OptoAssembly > > ============================= C2-compiled nmethod ============================== > #r090 F0:F0 : parm 0: double <- note: register names are incorrect > #r088 F0:F0 : parm 2: double > #r086 F0:F0 : parm 4: double > #r084 F0:F0 : parm 6: double > #r283 R1+76: old out preserve > #r282 R1+72: old out preserve > // ... > 028 cmovD_cmpD F1, F1 ne F2, F3, F4 > > testing... > > res = dontinline_cmovf(1d, 2d, 3d, 4d); > System.out.println(op1 + " == " + op2 + " ? " + src1 + " : " + src2); > System.out.println("-> " + res); > > 1.0 == 2.0 ? 3.0 : 4.0 > -> 4.0 > > Very weired: the condition `==` is flipped to `ne` but `src1` and `src2` are not swapped since 4.0 is in `F4` which is `src2` according to the OptoAssembly above. > > I would have expected > > > 028 cmovD_cmpD F1, F1 ne F2, F4, F3 > > Is there an explanation? The condition is adjusted for `CMove` with match rules like `match(Set dst (CMoveD (Binary cop cr) (Binary dst src)));`. If the condition is `false`, the value `dst` is kept. If the condition is `true`, the value `src` is used. So, the comment for PPC64 should actually say "dst = (op1 cmp(cc) op2) ? src2 : src1;". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2741276653 From epeter at openjdk.org Thu Jan 29 12:12:24 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jan 2026 12:12:24 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v12] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Wed, 28 Jan 2026 17:05:55 GMT, Beno?t Maillard wrote: >> This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. >> >> In summary, this PR brings the following changes: >> - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. >> - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. >> - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. >> >> ### Example outputs >> #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) >> Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >>
    >> Before >> >> >> Missed Ideal optimization (can_reshape=false): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 >> 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) >> 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) >> The result after Ideal: >> dist dump >> --------------------------------------------- >> 1 337 ConL === 0 [[ 338 ]] #long:-9 >> 1 336 URShiftL === _ 298 22 [[ 338 ]] >> 0 338 AndL === _ 336 337 [[ ]] >> >> >> Missed Ideal optimization (can_reshape=true): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Change condition structure Updates look good :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28295#pullrequestreview-3722305047 From qamai at openjdk.org Thu Jan 29 12:25:25 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 12:25:25 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions [v2] In-Reply-To: <6Ya3MBac7hXnBWBhEa1cqbdtW33T7C9phyqq-WsvhFo=.f5afd90e-e104-420a-9536-a8f54a38dd66@github.com> References: <6Ya3MBac7hXnBWBhEa1cqbdtW33T7C9phyqq-WsvhFo=.f5afd90e-e104-420a-9536-a8f54a38dd66@github.com> Message-ID: <5HKr1M1uRVSk4ZINg8AgVegHEe5uFq7GHL6CtGCpFGs=.0169d67f-2dfd-4eee-b897-3f9115d30b54@github.com> On Thu, 29 Jan 2026 11:39:56 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/type.cpp line 4593: >> >>> 4591: // For an interface instance reports one of most specific superinterfaces with a unique implementor. >>> 4592: ciInstanceKlass* TypeInstPtr::has_unique_implementor(ciInstanceKlass* context_intf) const { >>> 4593: if (is_interface() && context_intf->is_interface()) { >> >> I feel like you are making it unnecessarily complicated. `_interface` is the set of interfaces this `TypeInstPtr` must satisfy. As a result, a check like this would be sufficient (not real code): >> >> for (ciInstanceKlass* intf : _interfaces) { >> ciInstanceKlass* candidate = intf->unique_implementor(); >> if (candidate != nullptr) { >> return candidate; >> } >> } > > Even better, since these `_interfaces` are trusted, we don't need to emit a runtime check for the type. Come to think of it, we can use `interface->unique_implementor()` similar to how we use `ik->unique_concrete_subklass()` in `TypeOopPtr::make_from_klass_common`, that is to tighten the `TypeOopPtr` at the time of creation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2741378544 From rrich at openjdk.org Thu Jan 29 12:26:30 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 29 Jan 2026 12:26:30 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v2] In-Reply-To: References: <2EsMN-Ho8F7NzytDTPvYh4cbewkLcb3YQq5oTbuFcfc=.32a85ea0-e332-44cc-9f10-04d88138eb82@github.com> <-TysMtkU_jZckaZB-X_nEFxM2t8C31OCRyz9WvbtiZ4=.80aaea68-bfb5-435e-8542-b50e013c48cb@github.com> Message-ID: On Thu, 29 Jan 2026 11:55:03 GMT, Martin Doerr wrote: >> Thanks, I was looking at 3.1B which doesn't have the programming note. >> >> Thanks also for the example. The trick to get the cmove is that the condition mustn't be true or false all the time. >> >> We should also have OptoAssembly that shows the condition and the registers. >> For testing I've added >> >> format %{ "cmovD_cmpD $dst, $op1 $cop $op2, $src1, $src2\n\t" %} >> >> My testmethod >> >> static double dontinline_cmovf(double op1, double op2, double src1, double src2) { >> return op1 == op2 ? src1 : src2; >> } >> >> produces this OptoAssembly >> >> ============================= C2-compiled nmethod ============================== >> #r090 F0:F0 : parm 0: double <- note: register names are incorrect >> #r088 F0:F0 : parm 2: double >> #r086 F0:F0 : parm 4: double >> #r084 F0:F0 : parm 6: double >> #r283 R1+76: old out preserve >> #r282 R1+72: old out preserve >> // ... >> 028 cmovD_cmpD F1, F1 ne F2, F3, F4 >> >> testing... >> >> res = dontinline_cmovf(1d, 2d, 3d, 4d); >> System.out.println(op1 + " == " + op2 + " ? " + src1 + " : " + src2); >> System.out.println("-> " + res); >> >> 1.0 == 2.0 ? 3.0 : 4.0 >> -> 4.0 >> >> Very weired: the condition `==` is flipped to `ne` but `src1` and `src2` are not swapped since 4.0 is in `F4` which is `src2` according to the OptoAssembly above. >> >> I would have expected >> >> >> 028 cmovD_cmpD F1, F1 ne F2, F4, F3 >> >> Is there an explanation? > > The condition is adjusted for `CMove` with match rules like `match(Set dst (CMoveD (Binary cop cr) (Binary dst src)));`. If the condition is `false`, the value `dst` is kept. If the condition is `true`, the value `src` is used. So, the comment for PPC64 should actually say "dst = (op1 cmp(cc) op2) ? src2 : src1;". Are you saying the condition is only adjusted if `dst` and `src1` are identical? Probably not... I guess this originates from L44, L45. below The left tree is associated with false and the right with true. ```c++ 38 CMoveNode( Node *bol, Node *left, Node *right, const Type *t ) : TypeNode(t,4) 39 { 40 init_class_id(Class_CMove); 41 // all inputs are nullified in Node::Node(int) 42 // init_req(Control,nullptr); 43 init_req(Condition,bol); 44 init_req(IfFalse,left); 45 init_req(IfTrue,right); 46 } So we cannot think of a CMoveNode as a ternary operator. What about adapting the match rules swapping src1 and src2 and implemeting `op1 cond op2 ? src1 : src2`? This would be less confusing then implementing `op1 cond op2 ? src2 : src1` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2741383030 From krk at openjdk.org Thu Jan 29 12:32:01 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 29 Jan 2026 12:32:01 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v4] In-Reply-To: <1E6C99XanWuE9QF_-SfyFzxgD4asf7WUO7RyR1pYn7M=.47f1b163-1b67-4f54-acd4-d061779ad95d@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> <1E6C99XanWuE9QF_-SfyFzxgD4asf7WUO7RyR1pYn7M=.47f1b163-1b67-4f54-acd4-d061779ad95d@github.com> Message-ID: <_6t2YyyitDGwAWuIvqM8aSc3s1QNWt7WTKEVH6g7T3w=.94cafdc2-179d-4262-86be-2531b1f4eb38@github.com> On Thu, 22 Jan 2026 14:12:12 GMT, Kerem Kat wrote: >> The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into fix-c2-checkCastPP > - Merge branch 'master' into fix-c2-checkCastPP > - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed > - Simplify expand_vbox_node_helper by merging VectorBox Phi handling > - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29200#issuecomment-3817330260 From duke at openjdk.org Thu Jan 29 12:32:02 2026 From: duke at openjdk.org (duke) Date: Thu, 29 Jan 2026 12:32:02 GMT Subject: RFR: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded [v4] In-Reply-To: <1E6C99XanWuE9QF_-SfyFzxgD4asf7WUO7RyR1pYn7M=.47f1b163-1b67-4f54-acd4-d061779ad95d@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> <1E6C99XanWuE9QF_-SfyFzxgD4asf7WUO7RyR1pYn7M=.47f1b163-1b67-4f54-acd4-d061779ad95d@github.com> Message-ID: On Thu, 22 Jan 2026 14:12:12 GMT, Kerem Kat wrote: >> The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into fix-c2-checkCastPP > - Merge branch 'master' into fix-c2-checkCastPP > - 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed > - Simplify expand_vbox_node_helper by merging VectorBox Phi handling > - 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded @krk Your change (at version 9670be04724f1671ab4fcb8029fdb82b0451727a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29200#issuecomment-3817345610 From krk at openjdk.org Thu Jan 29 12:32:15 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 29 Jan 2026 12:32:15 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v9] In-Reply-To: References: Message-ID: <6pKcFcVq-ZuKZsmGEYwT6p24SeR_TikYgaTGefsdGks=.00e819c2-a4f3-438e-81ae-2dec402bef62@github.com> On Thu, 22 Jan 2026 11:46:31 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - ... and 4 more: https://git.openjdk.org/jdk/compare/41268dbb...d29208cf Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3817332298 From duke at openjdk.org Thu Jan 29 12:32:16 2026 From: duke at openjdk.org (duke) Date: Thu, 29 Jan 2026 12:32:16 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v9] In-Reply-To: References: Message-ID: On Thu, 22 Jan 2026 11:46:31 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - ... and 4 more: https://git.openjdk.org/jdk/compare/41268dbb...d29208cf @krk Your change (at version d29208cfea651cb01d792c5bbc9e5b35a010f209) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3817347146 From krk at openjdk.org Thu Jan 29 12:46:44 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 29 Jan 2026 12:46:44 GMT Subject: Integrated: 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded In-Reply-To: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> References: <5KlCjwyY6FFDvf0mn-FdcNkUUjerGWSyPSiWHhPN_1E=.1246331c-0995-4c48-a06d-acda89f90eab@github.com> Message-ID: On Tue, 13 Jan 2026 17:35:35 GMT, Kerem Kat wrote: > The check `vect->is_Vector() || vect->is_LoadVector()` doesn't handle `Proj` nodes that resolve to vector types, causing an assertion failure when such nodes flow through a `Phi` into `VectorBox`. This pull request has now been integrated. Changeset: e85d5d7a Author: Kerem Kat Committer: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/e85d5d7a16024f6a3eda14f1e08f72e07ae38dd0 Stats: 134 lines in 3 files changed: 104 ins; 22 del; 8 mod 8375010: C2 VectorAPI: assert(vbox->is_CheckCastPP()) failed: should be expanded 8374903: C2 VectorAPI: assert(vbox->as_Phi()->region() == vect->as_Phi()->region()) failed Reviewed-by: qamai, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/29200 From duke at openjdk.org Thu Jan 29 12:55:33 2026 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 29 Jan 2026 12:55:33 GMT Subject: Integrated: 8374755: ML-KEM's 12-bit decompression can be simplified on aarch64 In-Reply-To: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Fri, 9 Jan 2026 14:41:07 GMT, Ferenc Rakoczi wrote: > The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. This pull request has now been integrated. Changeset: 99119597 Author: Ferenc Rakoczi Committer: Weijun Wang URL: https://git.openjdk.org/jdk/commit/99119597aa95c1139ae2259bed5ec885a7c01269 Stats: 91 lines in 2 files changed: 4 ins; 73 del; 14 mod 8374755: ML-KEM's 12-bit decompression can be simplified on aarch64 Reviewed-by: adinn ------------- PR: https://git.openjdk.org/jdk/pull/29141 From jbhateja at openjdk.org Thu Jan 29 12:58:57 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 29 Jan 2026 12:58:57 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v3] In-Reply-To: References: Message-ID: > As per [discussions ](https://github.com/openjdk/jdk/pull/28002#issuecomment-3789507594) on JDK-8370691 pull request, splitting out portion of PR#28002 into a separate patch in preparation of Float16 vector API support. > > Patch add new lane type constants and pass them to vector intrinsic entry points. > > All existing Vector API jtreg test are passing with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29481/files - new: https://git.openjdk.org/jdk/pull/29481/files/11021544..d81035fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29481&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29481&range=01-02 Stats: 103 lines in 3 files changed: 23 ins; 0 del; 80 mod Patch: https://git.openjdk.org/jdk/pull/29481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29481/head:pull/29481 PR: https://git.openjdk.org/jdk/pull/29481 From krk at openjdk.org Thu Jan 29 12:59:49 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 29 Jan 2026 12:59:49 GMT Subject: RFR: 8356184: C2 MemorySegment: long RangeCheck with ConvI2L(iv + invar) prevents RCE [v3] In-Reply-To: References: Message-ID: > `MemorySegment` bounds checks use long arithmetic, but when accessing with an int loop variable plus an int invariant offset, the pattern `ConvI2L(iv + invar)` was not recognized by Range Check Elimination. This prevented RCE and consequently blocked vectorization for common `MemorySegment` access patterns. > > The fix teaches `is_scaled_iv_plus_offset` to recognize linear int expressions inside `ConvI2L`. A new `short_offset` flag signals that the offset is part of int arithmetic (not added separately in long), requiring the range to be clamped at `max_jint + 1` to correctly handle potential int overflow. This also removes pre-existing dead code where an `exp_bt != bt` check was intended to bail out on such patterns but never actually executed. > > With this change, `MemorySegment` loops using int invariant offsets now benefit from RCE and vectorization, matching the behavior already supported for long invariant offsets. > > > void process(MemorySegment segment, int offset, int size) { > for (int i = 0; i < size; i++) { > long addr = i + offset; // ConvI2L(AddI(iv, offset)) was not recognized > segment.set(JAVA_BYTE, addr, (byte) 0); > } > } Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into fix-c2-convi2l-8356184 - Remove IR rules from TestMemorySegment methods where vectorization depends on backing store type - 8356184: C2 MemorySegment: long RangeCheck with ConvI2L(iv + invar) prevents RCE ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29392/files - new: https://git.openjdk.org/jdk/pull/29392/files/23eff0db..92fc2787 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29392&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29392&range=01-02 Stats: 8916 lines in 291 files changed: 5442 ins; 1782 del; 1692 mod Patch: https://git.openjdk.org/jdk/pull/29392.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29392/head:pull/29392 PR: https://git.openjdk.org/jdk/pull/29392 From mhaessig at openjdk.org Thu Jan 29 13:11:33 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 29 Jan 2026 13:11:33 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v12] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: <6mZpoC4Y1Nwxm0wieJB1OQFTjkN0wq8yc4FIwkRLLJY=.9c14bf40-b0d5-4b94-9d51-9940ad7b02bc@github.com> On Wed, 28 Jan 2026 17:05:55 GMT, Beno?t Maillard wrote: >> This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. >> >> In summary, this PR brings the following changes: >> - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. >> - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. >> - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. >> >> ### Example outputs >> #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) >> Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >>
    >> Before >> >> >> Missed Ideal optimization (can_reshape=false): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 >> 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) >> 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) >> The result after Ideal: >> dist dump >> --------------------------------------------- >> 1 337 ConL === 0 [[ 338 ]] #long:-9 >> 1 336 URShiftL === _ 298 22 [[ 338 ]] >> 0 338 AndL === _ 336 337 [[ ]] >> >> >> Missed Ideal optimization (can_reshape=true): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Change condition structure I still have a few nits. Apart from those this looks good. src/hotspot/share/opto/phaseX.cpp line 1184: > 1182: // false conservatively, and later it can determine that it is indeed true. Loops with > 1183: // Region heads can lead to giving up, whereas LoopNodes can be skipped easier, and > 1184: // so the traversal becomes more powerful. This is difficult to remidy, we would have Suggestion: // so the traversal becomes more powerful. This is difficult to remedy, we would have Small typo I just noticed src/hotspot/share/opto/phaseX.cpp line 1207: > 1205: assert(_phase == PhaseValuesType::ccp, "Unexpected phase identifier"); > 1206: assert(false, "PhaseCCP not at fixpoint: analysis result may be unsound for %s", n->Name()); > 1207: } Suggestion: switch (_phase) { case PhaseValuesType::iter_gvn: assert(false, "Missed Value optimization opportunity in PhaseIterGVN for %s",n->Name()); break; case PhaseValuesType::ccp: assert(false, "PhaseCCP not at fixpoint: analysisresult may be unsound for %s", n->Name()); break; default: assert(false, "Unexpected phase"); break; } With an enum, a switch case seems safer. src/hotspot/share/opto/phaseX.hpp line 265: > 263: } > 264: NOT_PRODUCT(~PhaseValues();) > 265: PhaseIterGVN* is_IterGVN() { return (_phase >= PhaseValuesType::iter_gvn) ? (PhaseIterGVN*)this : nullptr; } Suggestion: PhaseIterGVN* is_IterGVN() { return (_phase == PhaseValuesType::iter_gvn || _phase == PhaseValuesType::ccp) ? static_cast(this) : nullptr; } I find relying on the enum order quite error prone. Also, we should probably prefer a static cast. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28295#pullrequestreview-3722259656 PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2741297466 PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2741306347 PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2741288073 From krk at openjdk.org Thu Jan 29 13:15:05 2026 From: krk at openjdk.org (Kerem Kat) Date: Thu, 29 Jan 2026 13:15:05 GMT Subject: Integrated: 8370502: C2: segfault while adding node to IGVN worklist In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 14:18:39 GMT, Kerem Kat wrote: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. This pull request has now been integrated. Changeset: 7c6c34e1 Author: Kerem Kat Committer: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/7c6c34e150cf01cec5d166f6cbb8a649c75b0627 Stats: 68 lines in 2 files changed: 56 ins; 5 del; 7 mod 8370502: C2: segfault while adding node to IGVN worklist Reviewed-by: mhaessig, dlong ------------- PR: https://git.openjdk.org/jdk/pull/28432 From bmaillard at openjdk.org Thu Jan 29 13:21:25 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 29 Jan 2026 13:21:25 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v13] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
    > Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/phaseX.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/phaseX.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/phaseX.hpp Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/99855fe6..5cafd41a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=11-12 Stats: 12 lines in 2 files changed: 5 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From bmaillard at openjdk.org Thu Jan 29 13:39:08 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 29 Jan 2026 13:39:08 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v14] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: <6Zc2lXJt5a8X-3WIKrOvd2VFnDHngbJ44wcMnPrQKFE=.c1231dc2-5785-4a43-b539-515ede347791@github.com> > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
    > Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Move static_cast to .cpp to avoid incomplete type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/5cafd41a..c25efddc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=12-13 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From bmaillard at openjdk.org Thu Jan 29 13:39:11 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 29 Jan 2026 13:39:11 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v12] In-Reply-To: <6mZpoC4Y1Nwxm0wieJB1OQFTjkN0wq8yc4FIwkRLLJY=.9c14bf40-b0d5-4b94-9d51-9940ad7b02bc@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> <6mZpoC4Y1Nwxm0wieJB1OQFTjkN0wq8yc4FIwkRLLJY=.9c14bf40-b0d5-4b94-9d51-9940ad7b02bc@github.com> Message-ID: On Thu, 29 Jan 2026 11:58:17 GMT, Manuel H?ssig wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> Change condition structure > > src/hotspot/share/opto/phaseX.hpp line 265: > >> 263: } >> 264: NOT_PRODUCT(~PhaseValues();) >> 265: PhaseIterGVN* is_IterGVN() { return (_phase >= PhaseValuesType::iter_gvn) ? (PhaseIterGVN*)this : nullptr; } > > Suggestion: > > PhaseIterGVN* is_IterGVN() { return (_phase == PhaseValuesType::iter_gvn || _phase == PhaseValuesType::ccp) ? static_cast(this) : nullptr; } > > I find relying on the enum order quite error prone. Also, we should probably prefer a static cast. We actually can't do the static cast here, as the target type is incomplete (we are in the headers file). I moved it to `.cpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2741671007 From mhaessig at openjdk.org Thu Jan 29 13:50:08 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 29 Jan 2026 13:50:08 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v14] In-Reply-To: <6Zc2lXJt5a8X-3WIKrOvd2VFnDHngbJ44wcMnPrQKFE=.c1231dc2-5785-4a43-b539-515ede347791@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> <6Zc2lXJt5a8X-3WIKrOvd2VFnDHngbJ44wcMnPrQKFE=.c1231dc2-5785-4a43-b539-515ede347791@github.com> Message-ID: On Thu, 29 Jan 2026 13:39:08 GMT, Beno?t Maillard wrote: >> This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. >> >> In summary, this PR brings the following changes: >> - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. >> - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. >> - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. >> >> ### Example outputs >> #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) >> Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >>
    >> Before >> >> >> Missed Ideal optimization (can_reshape=false): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 >> 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) >> 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) >> The result after Ideal: >> dist dump >> --------------------------------------------- >> 1 337 ConL === 0 [[ 338 ]] #long:-9 >> 1 336 URShiftL === _ 298 22 [[ 338 ]] >> 0 338 AndL === _ 336 337 [[ ]] >> >> >> Missed Ideal optimization (can_reshape=true): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Move static_cast to .cpp to avoid incomplete type Thank you for addressing my comment. Looks good now. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28295#pullrequestreview-3722837319 From dbriemann at openjdk.org Thu Jan 29 14:01:56 2026 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 29 Jan 2026 14:01:56 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v3] In-Reply-To: References: Message-ID: > Adds the following mach nodes: > match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); David Briemann has updated the pull request incrementally with one additional commit since the last revision: add comments, clean up handling of operands ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29281/files - new: https://git.openjdk.org/jdk/pull/29281/files/6553a246..2aeea6ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=01-02 Stats: 22 lines in 3 files changed: 5 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/29281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29281/head:pull/29281 PR: https://git.openjdk.org/jdk/pull/29281 From adinn at openjdk.org Thu Jan 29 14:33:52 2026 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 29 Jan 2026 14:33:52 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression can be simplified on aarch64 [v3] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Mon, 19 Jan 2026 14:01:56 GMT, Ferenc Rakoczi wrote: >> The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > improve comment @wangweij Thanks for sponsoring. While that probably counts implicitly as a review I think you are supposed to actually click the 'reviewed' button to acknowledge that you have reviewed the code. n.b. Although the PR says only one review is required a Hotspot change actually needs two reviews with at least one (big R) 'Reviewer'. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29141#issuecomment-3818054364 From dbriemann at openjdk.org Thu Jan 29 14:52:08 2026 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 29 Jan 2026 14:52:08 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v4] In-Reply-To: References: Message-ID: > Adds the following mach nodes: > match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); David Briemann has updated the pull request incrementally with one additional commit since the last revision: add format strings, further clarify inversion of results ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29281/files - new: https://git.openjdk.org/jdk/pull/29281/files/2aeea6ba..476f2677 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=02-03 Stats: 20 lines in 1 file changed: 4 ins; 4 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/29281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29281/head:pull/29281 PR: https://git.openjdk.org/jdk/pull/29281 From weijun at openjdk.org Thu Jan 29 15:02:51 2026 From: weijun at openjdk.org (Weijun Wang) Date: Thu, 29 Jan 2026 15:02:51 GMT Subject: RFR: 8374755: ML-KEM's 12-bit decompression can be simplified on aarch64 [v3] In-Reply-To: References: <2XkSKe1vGfj4EzcrRnkK99q8QjauNLaBgNPvMRJrhbQ=.ccee73f0-8cbf-4119-bdee-32c6784b25d1@github.com> Message-ID: On Mon, 19 Jan 2026 14:01:56 GMT, Ferenc Rakoczi wrote: >> The preconditions for the aarch64 and the AVX-512 intrinsic implementations of the implKyber12To16() method of com.sun.crypto.provider.ML_KEM are different and the AVX-512 one has stricter preconditions on the input, which was not recorded in the assert() before calling the function (although they were satisfied by all calling code). Now the assert() is corrected, and with these preconditions, the aarch64 implementation is simplified. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > improve comment Didn't notice the reviewer count. Maybe the Skara bot hasn't enforced it. Otherwise, it should not add the `ready` and `sponsor` labels. I typed `/sponsor` mainly because I trust Ferenc' code and reviews from you and Shawn. I don't think a sponsor has to review the code change themselves. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29141#issuecomment-3818252537 From dbriemann at openjdk.org Thu Jan 29 16:03:46 2026 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 29 Jan 2026 16:03:46 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: References: Message-ID: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> > Adds the following mach nodes: > match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); David Briemann has updated the pull request incrementally with one additional commit since the last revision: negating comparisons does not always work, invert results instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29281/files - new: https://git.openjdk.org/jdk/pull/29281/files/476f2677..3d1aa822 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=03-04 Stats: 11 lines in 1 file changed: 5 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29281/head:pull/29281 PR: https://git.openjdk.org/jdk/pull/29281 From epeter at openjdk.org Thu Jan 29 16:22:14 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jan 2026 16:22:14 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Thu, 29 Jan 2026 07:57:53 GMT, Quan Anh Mai wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Fix build failure after rebasing and address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Fixed new test failures after rebasing and refined parts of the code to address review comments >> - Merge branch 'master' into optimize-atomic-post >> - Merge branch 'master' into optimize-atomic-post >> - Clean up comments for consistency and add spacing for readability >> - Fix some corner case failures and refined part of code >> - Merge branch 'master' into optimize-atomic-post >> - Refine ascii art, rename some variables and resolve conflicts >> - Merge branch 'master' into optimize-atomic-post >> - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 > > I'm a novice in loop optimizations, and this is just an unfounded comments: > > I feel that this kind of graph surgery is hard to verify and it tends to be fragile at the presence of numerous optimizations happen concurrently. Another inconsistency I feel is that while you do normal unrolling, the post loop is already in place, when you do super unrolling, you have to pull out a vectorized drain loop from thin air. > > As a result, I think it would be more reliable to generate the pre-main-post1-post2 loop structure from the beginning, and eliminate each of them if they are unnecessary. This also helps the cases where we want the drain loop and the main loop to operate on vectors of different sizes, or to have a drain loop even if the main loop does not super unroll. For example, if the main loop operates on vectors of 64 bytes, then you will want to have a drain loop that operates on vectors of 8 bytes before going into scalar, even if the main loop does not super unroll. > > Please let me know if I misunderstand anything, thanks a lot. @merykitty @fg1417 I did some more reflecting and also had an offline conversation with @chhagedorn . > generate the pre-main-post1-post2 loop structure from the beginning We _could_ do that. But at the cost of adding the `drain` loops (possibly multiple) to all loops, even those that we won't succeed to vectorize. That could drive up compile-time and memory noticably. And I think most loops are never vectorized,. Besides, this does not prevent us from doing graph surgery. We will still have to build the graph structure with the "main-bypass to drain". So I fear we will need the same amount of complexity either way. Current approach: - Clone pre-loop up - Clone post-loop down - Clone drain-loop in-between This requires "3 algorithms". Suppose we instead did: - Clone pre-loop up - Clone drain-loop down - Clone post-loop down from drain-loop Could we do this with only "2 algorithms", without the "in-between", and instead twice "down"? Maybe? But then we'd still always pay the price of the drain loop, even if it then gets folded away. Not great. --------------------------------------- I also don't think that this patch blocks future progress. One possible future: - pre/main/post - directly run auto vectorizer on single iteration main loop - decide on unrolling factor for main loop and possibly multiple drain loops - clone the main loop for all drain loops - apply the `VTransform` to the main loop and the drain loops, using different "vectorized unrolling factors". The only thing that would still be difficult to do here: to apply the `VTransform` to pre/post loop, so that we could do masked vector ops to simulate multiple iterations. Applying the `VTransform` to the just cloned drain loops works because we know they have the same structure still, but that may not apply to pre/post loops. Maybe if we don't do any IGVN between pre/main/post and auto vectorization, we could still know that pre/post loops have the same shape as the main loop? The alternative that @merykitty mentioned: > generate the pre-main-post1-post2 loop structure from the beginning Here, we would know that all loops have the same shape, so applying `VTransform` to all loops should work, but probably only if we don't run IGVN between the loop cloning and auto vectorization, right? Running the auto vectorizer on all loops individually might also be an option, but cost much more compile time. ---------------------------------- @chhagedorn and I agreed that it is a bit sad that we can only see the performance impact of this patch with this special "warmup with large iteration count, measure with small iteration count". The real-world impact is going to be very limited at this point. So we have to be quite confident that this patch is correct. Some small follow-up bugs are of course ok. Longterm, the contribution of this patch could show valuable. Especially if we can use it to produce multiple drain loops. Do you think that would be possible @fg1417 ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3818729840 From epeter at openjdk.org Thu Jan 29 16:29:47 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jan 2026 16:29:47 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Tue, 13 Jan 2026 11:27:53 GMT, Fei Gao wrote: >> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the >> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go. >> >> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`. >> >> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop. >> >> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop. >> >> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow. >> >> The whole process is done by the function `insert_post_loop()`. >> >> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`: >> >> 1. The fall-in control flow to the vectorized drain loop comes fr... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Fix build failure after rebasing and address review comments > - Merge branch 'master' into optimize-atomic-post > - Fixed new test failures after rebasing and refined parts of the code to address review comments > - Merge branch 'master' into optimize-atomic-post > - Merge branch 'master' into optimize-atomic-post > - Clean up comments for consistency and add spacing for readability > - Fix some corner case failures and refined part of code > - Merge branch 'master' into optimize-atomic-post > - Refine ascii art, rename some variables and resolve conflicts > - Merge branch 'master' into optimize-atomic-post > - ... and 3 more: https://git.openjdk.org/jdk/compare/a8552243...ab1de504 FYI: I filed this RFE which will be a relatively low-hanging fruit: [JDK-8376728](https://bugs.openjdk.org/browse/JDK-8376728): C2 SuperWord: disable automatic alignment for small iteration counts ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3818778637 From qamai at openjdk.org Thu Jan 29 16:29:49 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 29 Jan 2026 16:29:49 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: <-GqrRj0gaXDW-pdZzrvhmLiVR2WDINDoSfM2cnzvFvg=.6b7916dc-eab7-41de-9b7e-37a15a769c78@github.com> On Thu, 29 Jan 2026 16:19:38 GMT, Emanuel Peter wrote: > But then we'd still always pay the price of the drain loop, even if it then gets folded away. Not great. May I ask why? We only need to clone the post loop after we decide to vectorize. At which point it becomes the drain loop, otherwise there is no need for the clone. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3818785449 From mdoerr at openjdk.org Thu Jan 29 17:20:59 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 29 Jan 2026 17:20:59 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> References: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> Message-ID: On Thu, 29 Jan 2026 16:03:46 GMT, David Briemann wrote: >> Adds the following mach nodes: >> match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > negating comparisons does not always work, invert results instead Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29281#pullrequestreview-3724055323 From mdoerr at openjdk.org Thu Jan 29 17:21:01 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 29 Jan 2026 17:21:01 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v2] In-Reply-To: References: <2EsMN-Ho8F7NzytDTPvYh4cbewkLcb3YQq5oTbuFcfc=.32a85ea0-e332-44cc-9f10-04d88138eb82@github.com> <-TysMtkU_jZckaZB-X_nEFxM2t8C31OCRyz9WvbtiZ4=.80aaea68-bfb5-435e-8542-b50e013c48cb@github.com> Message-ID: On Thu, 29 Jan 2026 12:23:47 GMT, Richard Reingruber wrote: >> The condition is adjusted for `CMove` with match rules like `match(Set dst (CMoveD (Binary cop cr) (Binary dst src)));`. If the condition is `false`, the value `dst` is kept. If the condition is `true`, the value `src` is used. So, the comment for PPC64 should actually say "dst = (op1 cmp(cc) op2) ? src2 : src1;". > > Are you saying the condition is only adjusted if `dst` and `src1` are identical? Probably not... > > I guess this originates from L44, L45. below The left tree is associated with false and the right with true. > > ```c++ > 38 CMoveNode( Node *bol, Node *left, Node *right, const Type *t ) : TypeNode(t,4) > 39 { > 40 init_class_id(Class_CMove); > 41 // all inputs are nullified in Node::Node(int) > 42 // init_req(Control,nullptr); > 43 init_req(Condition,bol); > 44 init_req(IfFalse,left); > 45 init_req(IfTrue,right); > 46 } > > > So we cannot think of a CMoveNode as a ternary operator. > > What about adapting the match rules swapping src1 and src2 and implemeting `op1 cond op2 ? src1 : src2`? > This would be less confusing then implementing `op1 cond op2 ? src2 : src1` Thanks for pointing this out! The new code is much better readable and avoids much confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29281#discussion_r2742707964 From rrich at openjdk.org Thu Jan 29 17:51:10 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 29 Jan 2026 17:51:10 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> References: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> Message-ID: On Thu, 29 Jan 2026 16:03:46 GMT, David Briemann wrote: >> Adds the following mach nodes: >> match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > negating comparisons does not always work, invert results instead Very nice and clean now. Thanks for putting the work into it. Cheers, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29281#pullrequestreview-3724178852 From shade at openjdk.org Thu Jan 29 18:05:29 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 29 Jan 2026 18:05:29 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v13] In-Reply-To: References: Message-ID: <6IVTRd5ca8lQpk1YmC2PUEdbmSR-KnFmz-XLkF5dtNY=.89f819fb-e93e-4860-af32-8e5e4cec988a@github.com> > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Potential fix for dead loop - Roll back some dead weight ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/f1a06e35..2ded3cee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=11-12 Stats: 16 lines in 4 files changed: 11 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From aturbanov at openjdk.org Thu Jan 29 18:10:24 2026 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Thu, 29 Jan 2026 18:10:24 GMT Subject: RFR: 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap [v3] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 17:28:14 GMT, Boris Ulasevich wrote: >> We observed a deoptimization storm caused by GraphKit::uncommon_trap generator logic. GraphKit::uncommon_trap considers the too_many_recompiles metric. If the threshold is overflowed, it replaces Deoptimization::Action_reinterpret with Deoptimization::Action_none (see code snippet below). >> >> This replacement changes the uncommon_trap logic: once execution hits a trap, the VM performs deoptimization but does not recompile the method anymore. In an "unlucky" case, when the code part calling this uncommon_trap becomes frequent, a deoptimization storm occurs (thousands of deoptimizations per second) causing a significant performance drop. >> >> The original problematic method, which triggered repeated recompilations, is a high-performance compressed binary serialization algorithm with heavy use of conditional branches driven by bitmasks. See a standalone [synthetic benchmark](https://bugs.openjdk.org/secure/attachment/118045/UnstableIf.java) to reproduce the issue. >> >> The issue arises when the method overcomes a global recompilation threshold before stabilizing specific trap counters. >> >> Current thresholds: >> - Recompilation Limit (too_many_recompiles): >> Condition: decompile_count() >= (PerMethodRecompilationCutoff / 2) + 1 >> Default: 201 (derived from default PerMethodRecompilationCutoff = 400). >> - Specific Trap Limits (too_many_traps): >> Checks if the trap count for a specific reason exceeds: >> PerMethodTrapLimit (Default: 100) - for Reason_unstable_if, Reason_unstable_fused_if, etc. >> PerMethodSpecTrapLimit (Default: 5000) - for Reason_speculate_class_check, Reason_speculate_null_check, etc. >> >> With the gived defaults, if the only reason for the method recompilation is unstable_if, the system stabilizes after 100 traps (PerMethodTrapLimit). However, if the method experiences traps and recompilations for different reasons, the total number of recompilations can exceed 200 before hitting the limit for unstable_if traps. This triggers Action_none and causes the deopt storm. >> >> The proposal is a minimal change in GraphKit::uncommon_trap: apply the same `too_many_recompiles` threshold inside `Parse::path_is_suitable_for_uncommon_trap` - this ensures that on the final recompilation C2 gets a hint not to speculate on untaken branches anymore. >> >> As an alternative solution, we can revisit GraphKit::uncommon_trap. This "Temporary fix" has persisted in the codebase for 17 years, so it is probably time to change it as well. Any comments are we... > > Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > using too_many_traps_or_recompiles. adding DeoptStorm jtreg test test/hotspot/jtreg/compiler/uncommontrap/DeoptStorm.java line 57: > 55: "-Xlog:deoptimization=debug", > 56: className, "dummy"}; > 57: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder(procArgs); Suggestion: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder(procArgs); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28966#discussion_r2742895741 From psandoz at openjdk.org Thu Jan 29 18:15:43 2026 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 29 Jan 2026 18:15:43 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v3] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 12:58:57 GMT, Jatin Bhateja wrote: >> As per [discussions ](https://github.com/openjdk/jdk/pull/28002#issuecomment-3789507594) on JDK-8370691 pull request, splitting out portion of PR#28002 into a separate patch in preparation of Float16 vector API support. >> >> Patch add new lane type constants and pass them to vector intrinsic entry points. >> >> All existing Vector API jtreg test are passing with the patch. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions src/hotspot/share/prims/vectorSupport.hpp line 146: > 144: LT_SHORT = 9, > 145: LT_INT = 10, > 146: LT_LONG = 11 Are the values designed to be in sync with the `BasicType` values where the lane type and the basic type are the same? If so we should call this out via explicit assignment. Otherwise, i think we should adjust the values (which may require some adjustment elsewhere e.g., VectorOperators.java). src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 152: > 150: public static final int MODE_BITS_COERCED_LONG_TO_MASK = 1; > 151: > 152: // BasicType codes, for primitives only: This comment needs to be updated. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Byte128Vector.java line 544: > 542: public byte laneHelper(int i) { > 543: return (byte) VectorSupport.extract( > 544: VCLASS, LT_BYTE, VLENGTH, Can we declare a static final field `LANE_TYPE_ORDINAL` (or `LANE_TYPE_ID`, see comment on `LaneType` as the naming is important) and use that consistently like we already do for `ETYPE`? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LaneType.java line 88: > 86: final String printName; > 87: final char typeChar; // one of "BSILFD" > 88: final int laneType; // lg(size/8) | (kind=='F'?4:kind=='I'?8) We need to change the name of this field to more clearly distinguish between it and the class name. If we can change the values of `LT_*` and align them with the enum ordinal values then we can call it `laneTypeOrdinal` and consistently use that, then we don't likely need the `LT_*` constants. If the values need to align with `BasicType` values then it might be better called `laneTypeIdentifier` or `laneTypeId`. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LaneType.java line 197: > 195: /*package-private*/ > 196: @ForceInline > 197: static LaneType ofBasicType(int bt) { The method name and argument need updating. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2742857369 PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2742850134 PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2742840498 PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2742894994 PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2742873650 From vlivanov at openjdk.org Thu Jan 29 21:27:20 2026 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 29 Jan 2026 21:27:20 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions [v2] In-Reply-To: <5HKr1M1uRVSk4ZINg8AgVegHEe5uFq7GHL6CtGCpFGs=.0169d67f-2dfd-4eee-b897-3f9115d30b54@github.com> References: <6Ya3MBac7hXnBWBhEa1cqbdtW33T7C9phyqq-WsvhFo=.f5afd90e-e104-420a-9536-a8f54a38dd66@github.com> <5HKr1M1uRVSk4ZINg8AgVegHEe5uFq7GHL6CtGCpFGs=.0169d67f-2dfd-4eee-b897-3f9115d30b54@github.com> Message-ID: On Thu, 29 Jan 2026 12:22:33 GMT, Quan Anh Mai wrote: >> Even better, since these `_interfaces` are trusted, we don't need to emit a runtime check for the type. > > Come to think of it, we can use `interface->unique_implementor()` similar to how we use `ik->unique_concrete_subklass()` in `TypeOopPtr::make_from_klass_common`, that is to tighten the `TypeOopPtr` at the time of creation. What makes it more complicated is the requirement to return a subtype of declared interface. `_interfaces` contain a closure of a set of interfaces. A dependency on an unrelated interface (in the context of the call site) is not enough to ensure correctness of interface->virtual strength reduction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2743614851 From duke at openjdk.org Thu Jan 29 22:04:48 2026 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 29 Jan 2026 22:04:48 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v9] In-Reply-To: References: Message-ID: <4B04_CHU7c80isClKUBlLeoD_V9eVAEUENUC0FZp_Xo=.f3b2e176-3106-47dc-b7a7-1bef56859b0f@github.com> > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > See https://github.com/openjdk/jdk/pull/28934#issuecomment-3820151693 for performance results. > > This was also tested internally with a real workload and showed up to a 20% latency reduction with a large CodeCache (512 MB). > > ### Testing > * CodeCache tests have been updated to cover... Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Add more configuration flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27858/files - new: https://git.openjdk.org/jdk/pull/27858/files/b5a5c71b..6933c0a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=07-08 Stats: 25 lines in 3 files changed: 16 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/27858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27858/head:pull/27858 PR: https://git.openjdk.org/jdk/pull/27858 From duke at openjdk.org Thu Jan 29 22:04:56 2026 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 29 Jan 2026 22:04:56 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v8] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 00:24:33 GMT, Chad Rakoczy wrote: >> ### Summary >> This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. >> >> ### Description >> The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. >> >> Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. >> >> The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. >> >> The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). >> >> Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. >> >> ### Performance >> See https://github.com/openjdk/jdk/pull/28934#issuecomment-3820151693 for performance results. >> >> This was also tested internally with a real workload and showed up to a 20% latency reduction with a large CodeCache (512 MB). >> >> ### Testing >> * ... > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8326205 > - Fix test failure > - Fix builds > - Fix merge > - Merge remote-tracking branch 'origin/master' into JDK-8326205 > - Add check for full HotCodeHeap > - Add HotCodeGrouperMoveFunction test > - Add StessHotCodeGrouper test > - Update blob checks > - Merge fix > - ... and 20 more: https://git.openjdk.org/jdk/compare/1e1e132d...b5a5c71b I created an agent that fragments the code cache to better demonstrate the benefits of this feature: https://github.com/openjdk/jdk/pull/28934 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27858#issuecomment-3820647704 From kvn at openjdk.org Thu Jan 29 23:23:11 2026 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 29 Jan 2026 23:23:11 GMT Subject: RFR: 8376604: C2: EA should assert is_oop_field for AddP with oop outs In-Reply-To: References: Message-ID: <-NaEIiwynTsmvpaBQKFW-37Z7RSk3nNEFtG7pc-h8L0=.63d02e6c-3eec-4162-8e64-ade7df2ae723@github.com> On Wed, 28 Jan 2026 18:30:15 GMT, Aleksey Shipilev wrote: > Split out of [JDK-8376514](https://bugs.openjdk.org/browse/JDK-8376514) in Valhalla: I think we need to verify more thoroughly that if we reply is_oop_field = false for AddP, then there are no nodes that we feed into oops. We handle it pretty well in various branches in the method already, and we "just" need to check it at the end. Valhalla catches fire on that post-condition check, tracked in [JDK-8376514](https://bugs.openjdk.org/browse/JDK-8376514). > > Cleans up the code a bit as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`, 20x, no failures > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` LGTM ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29468#pullrequestreview-3725490397 From dlong at openjdk.org Fri Jan 30 01:56:18 2026 From: dlong at openjdk.org (Dean Long) Date: Fri, 30 Jan 2026 01:56:18 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v5] In-Reply-To: References: Message-ID: <5gMW8_l-dEDamvRlro5___9bwOZ-XcMGS1q7hL-E3pI=.8a5dee77-63b0-4744-903f-92255da8d4f6@github.com> On Wed, 7 Jan 2026 09:42:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is extracted from #28570 , there are 2 issues here: >> >> - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. >> - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. >> >> Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into intrinsicsadrtype > - copyright year > - Merge branch 'master' into intrinsicsadrtype > - consolidate the memory effect into a function > - Use MemBar instead of widening the intrinsic memory > - Fix Shenandoah > - Fix memory around intrinsics nodes So after looking at this PR I have learned that C2 can control reordering of memory operations in at least 3 ways: anti-dependencies, memory slices, or membars. Are there are rules-of-thumb on which is best to use? Using a membar seems the most conservative but probably allows fewer optimizations. By the way, I see that LibraryCallKit::inline_encodeISOArray and corresponding Java method do pretty much the same things a compress. So I tried adding a test for it in TestAntiDependency.java. But to my surprise, it passes, even without the fixes in this PR. I would expect it to fail, because the existing code uses TypeAryPtr::BYTES, so how does it prevent the movement of a char[] store in the test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28789#issuecomment-3821336640 From qamai at openjdk.org Fri Jan 30 02:08:12 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 30 Jan 2026 02:08:12 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions [v2] In-Reply-To: References: <6Ya3MBac7hXnBWBhEa1cqbdtW33T7C9phyqq-WsvhFo=.f5afd90e-e104-420a-9536-a8f54a38dd66@github.com> <5HKr1M1uRVSk4ZINg8AgVegHEe5uFq7GHL6CtGCpFGs=.0169d67f-2dfd-4eee-b897-3f9115d30b54@github.com> Message-ID: On Thu, 29 Jan 2026 21:24:05 GMT, Vladimir Ivanov wrote: >> Come to think of it, we can use `interface->unique_implementor()` similar to how we use `ik->unique_concrete_subklass()` in `TypeOopPtr::make_from_klass_common`, that is to tighten the `TypeOopPtr` at the time of creation. > > What makes it more complicated is the requirement to return a subtype of declared interface. `_interfaces` contain a closure of a set of interfaces. A dependency on an unrelated interface (in the context of the call site) is not enough to ensure correctness of interface->virtual strength reduction. Can you use `ciInstanceKlass::unique_implementor` during the creation of the `TypeInstPtr`, will this work? diff --git a/src/hotspot/share/opto/type.cpp b/src/hotspot/share/opto/type.cpp index eb825b81a93..7e395fca743 100644 --- a/src/hotspot/share/opto/type.cpp +++ b/src/hotspot/share/opto/type.cpp @@ -3702,6 +3702,13 @@ const TypeOopPtr* TypeOopPtr::make_from_klass_common(ciKlass* klass, bool klass_ deps->assert_abstract_with_unique_concrete_subtype(ik, sub); klass = ik = sub; klass_is_exact = sub->is_final(); + } else if (ik->is_interface() && interface_handling == trust_interfaces) { + sub = ik->unique_implementor(); + if (sub != nullptr) { + deps->assert_unique_implementor(ik, sub); + klass = ik = sub; + klass_is_exact = sub->is_final(); + } } } if (!klass_is_exact && try_for_exact && deps != nullptr && ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2744306281 From qamai at openjdk.org Fri Jan 30 02:11:40 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 30 Jan 2026 02:11:40 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions [v2] In-Reply-To: References: <6Ya3MBac7hXnBWBhEa1cqbdtW33T7C9phyqq-WsvhFo=.f5afd90e-e104-420a-9536-a8f54a38dd66@github.com> <5HKr1M1uRVSk4ZINg8AgVegHEe5uFq7GHL6CtGCpFGs=.0169d67f-2dfd-4eee-b897-3f9115d30b54@github.com> Message-ID: On Fri, 30 Jan 2026 02:05:57 GMT, Quan Anh Mai wrote: >> What makes it more complicated is the requirement to return a subtype of declared interface. `_interfaces` contain a closure of a set of interfaces. A dependency on an unrelated interface (in the context of the call site) is not enough to ensure correctness of interface->virtual strength reduction. > > Can you use `ciInstanceKlass::unique_implementor` during the creation of the `TypeInstPtr`, will this work? > > diff --git a/src/hotspot/share/opto/type.cpp b/src/hotspot/share/opto/type.cpp > index eb825b81a93..7e395fca743 100644 > --- a/src/hotspot/share/opto/type.cpp > +++ b/src/hotspot/share/opto/type.cpp > @@ -3702,6 +3702,13 @@ const TypeOopPtr* TypeOopPtr::make_from_klass_common(ciKlass* klass, bool klass_ > deps->assert_abstract_with_unique_concrete_subtype(ik, sub); > klass = ik = sub; > klass_is_exact = sub->is_final(); > + } else if (ik->is_interface() && interface_handling == trust_interfaces) { > + sub = ik->unique_implementor(); > + if (sub != nullptr) { > + deps->assert_unique_implementor(ik, sub); > + klass = ik = sub; > + klass_is_exact = sub->is_final(); > + } > } > } > if (!klass_is_exact && try_for_exact && deps != nullptr && I would imagine it has this effect: interface I1 {} interface I2 { int get(); } class K1 implements I1, I2 {} class K2 implements I2 {} Object v; I1 i = (I1) v; --> Then it can be inferred that i is an instance of type K1 int x = i.I2::get(); --> There is no need to do devirtualization here ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2744312726 From dlong at openjdk.org Fri Jan 30 02:15:07 2026 From: dlong at openjdk.org (Dean Long) Date: Fri, 30 Jan 2026 02:15:07 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v5] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 09:42:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is extracted from #28570 , there are 2 issues here: >> >> - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. >> - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. >> >> Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into intrinsicsadrtype > - copyright year > - Merge branch 'master' into intrinsicsadrtype > - consolidate the memory effect into a function > - Use MemBar instead of widening the intrinsic memory > - Fix Shenandoah > - Fix memory around intrinsics nodes Dumb question: why are these intrinsic nodes not implemented as MemNodes? > For nodes such as StrInflatedCopyNode, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case Why is that? > so we should fix it by making the nodes kill all the memory they consume. Why can't we use MergeMem and memory slices/aliases like regular load and store? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28789#issuecomment-3821385710 From qamai at openjdk.org Fri Jan 30 02:42:51 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 30 Jan 2026 02:42:51 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v6] In-Reply-To: References: Message-ID: <60IhfWbS5WwHkue0ET5NIOEhoJ349u3jK54PkMSSme4=.0f1485b0-2682-40d2-8e8d-ede4e50a0221@github.com> > Hi, > > This is extracted from #28570 , there are 2 issues here: > > - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. > - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. > > Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Add test case for StringCoding::implEncodeAsciiArray - Merge branch 'master' into intrinsicsadrtype - Merge branch 'master' into intrinsicsadrtype - copyright year - Merge branch 'master' into intrinsicsadrtype - consolidate the memory effect into a function - Use MemBar instead of widening the intrinsic memory - Fix Shenandoah - Fix memory around intrinsics nodes ------------- Changes: https://git.openjdk.org/jdk/pull/28789/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=05 Stats: 341 lines in 6 files changed: 227 ins; 13 del; 101 mod Patch: https://git.openjdk.org/jdk/pull/28789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28789/head:pull/28789 PR: https://git.openjdk.org/jdk/pull/28789 From dlong at openjdk.org Fri Jan 30 02:42:53 2026 From: dlong at openjdk.org (Dean Long) Date: Fri, 30 Jan 2026 02:42:53 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v5] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 09:42:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is extracted from #28570 , there are 2 issues here: >> >> - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. >> - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. >> >> Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into intrinsicsadrtype > - copyright year > - Merge branch 'master' into intrinsicsadrtype > - consolidate the memory effect into a function > - Use MemBar instead of widening the intrinsic memory > - Fix Shenandoah > - Fix memory around intrinsics nodes This may be unrelated, but I checked to see if we treat Op_EncodeISOArray the same as Op_StrCompressedCopy everywhere. In two places in `ConnectionGraph::split_unique_types`, we treat them differently. For both we look at in(MemNode::Memory), but for Op_EncodeISOArray we also look at use->in(3). I don't understand this code well enough to decide if this a missing optimization or a correctness issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28789#issuecomment-3821462990 From qamai at openjdk.org Fri Jan 30 02:55:43 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 30 Jan 2026 02:55:43 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v5] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 02:38:57 GMT, Dean Long wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into intrinsicsadrtype >> - copyright year >> - Merge branch 'master' into intrinsicsadrtype >> - consolidate the memory effect into a function >> - Use MemBar instead of widening the intrinsic memory >> - Fix Shenandoah >> - Fix memory around intrinsics nodes > > This may be unrelated, but I checked to see if we treat Op_EncodeISOArray the same as Op_StrCompressedCopy everywhere. In two places in `ConnectionGraph::split_unique_types`, we treat them differently. For both we look at in(MemNode::Memory), but for Op_EncodeISOArray we also look at use->in(3). I don't understand this code well enough to decide if this a missing optimization or a correctness issue. @dean-long Thanks for taking a look. > So I tried adding a test for it in TestAntiDependency.java. But to my surprise, it passes, even without the fixes in this PR I have added a test for this method. If it does not fail then adding `-XX:+StressGCM -XX:+StressLCM` may help. > Dumb question: why are these intrinsic nodes not implemented as MemNodes? I think it is because only `LoadNode` and `StoreNode` are `MemNode`, even `LoadStoreNode` does not extend `MemNode`. > > For nodes such as StrInflatedCopyNode, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case > > Why is that? During `PhaseIdealLoop::get_late_ctrl`, we only check the anti-dependency when a node returns `true` for `is_Load()`: if (n->is_Load() && LCA != early) { LCA = get_late_ctrl_with_anti_dep(n->as_Load(), early, LCA); } During `PhaseCFG::schedule_late`, we only check the anti-dependency when a node has the flag `Flag_needs_anti_dependence_check` set. bool Node::needs_anti_dependence_check() const { if (req() < 2 || (_flags & Flag_needs_anti_dependence_check) == 0) { return false; } return in(1)->bottom_type()->has_memory(); } We may fix these places, but since it is a really rare occurrence that a node consumes some memory and produces some but the latter is different from the former, so it is more reasonable to fix the graph at these nodes. >> so we should fix it by making the nodes kill all the memory they consume. > > Why can't we use MergeMem and memory slices/aliases like regular load and store? Thanks to Roland's suggestion, now it only kills the 2 slices it concerns with and not the whole memory state. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28789#issuecomment-3821512600 From qamai at openjdk.org Fri Jan 30 03:00:55 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 30 Jan 2026 03:00:55 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v5] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 02:38:57 GMT, Dean Long wrote: > This may be unrelated, but I checked to see if we treat `Op_EncodeISOArray` the same as `Op_StrCompressedCopy` everywhere. In two places in `ConnectionGraph::split_unique_types`, we treat them differently. For both we look at `in(MemNode::Memory)`, but for `Op_EncodeISOArray` we also look at `use->in(3)`. I don't understand this code well enough to decide if this a missing optimization or a correctness issue. I believe it is because before this change, `EncodeISOArray` does not consume the memory of the destination like `StrCompressed`, so it may miss being pushed on the worklist. As a result, checking for `in(3)` ensures the node is visited. After this change, `EncodeISOArray` correctly consumes the memory of its destination, so that become unnecessary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28789#issuecomment-3821535212 From mhaessig at openjdk.org Fri Jan 30 07:20:50 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 30 Jan 2026 07:20:50 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 05:24:12 GMT, Galder Zamarre?o wrote: > Adjust test expectations to deal with environments where `Min/Max[F|D]` are not produced. To fix this, I've followed same patterns used in `compiler.c2.irTests.TestMinMaxIdentities`. > > I have tested this on x86_64 with `-XX:UseAVX=0`. Testing passed. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/29438#pullrequestreview-3726755307 From jbhateja at openjdk.org Fri Jan 30 07:35:43 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 Jan 2026 07:35:43 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v4] In-Reply-To: References: Message-ID: > As per [discussions ](https://github.com/openjdk/jdk/pull/28002#issuecomment-3789507594) on JDK-8370691 pull request, splitting out portion of PR#28002 into a separate patch in preparation of Float16 vector API support. > > Patch add new lane type constants and pass them to vector intrinsic entry points. > > All existing Vector API jtreg test are passing with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29481/files - new: https://git.openjdk.org/jdk/pull/29481/files/d81035fd..ff73dc3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29481&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29481&range=02-03 Stats: 858 lines in 48 files changed: 290 ins; 13 del; 555 mod Patch: https://git.openjdk.org/jdk/pull/29481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29481/head:pull/29481 PR: https://git.openjdk.org/jdk/pull/29481 From jbhateja at openjdk.org Fri Jan 30 07:35:46 2026 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 Jan 2026 07:35:46 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v3] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 18:08:03 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LaneType.java line 88: > >> 86: final String printName; >> 87: final char typeChar; // one of "BSILFD" >> 88: final int laneType; // lg(size/8) | (kind=='F'?4:kind=='I'?8) > > We need to change the name of this field to more clearly distinguish between it and the class name. > > If we can change the values of `LT_*` and align them with the enum ordinal values then we can call it `laneTypeOrdinal` and consistently use that, then we don't likely need the `LT_*` constants. If the values need to align with `BasicType` values then it might be better called `laneTypeIdentifier` or `laneTypeId`. Thanks @PaulSandoz , I have incorporated your comments, it will still be useful to keep new LT_* constants as its better to pass named constants to intrinsic entries rather than numeric values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2745013458 From dbriemann at openjdk.org Fri Jan 30 08:26:23 2026 From: dbriemann at openjdk.org (David Briemann) Date: Fri, 30 Jan 2026 08:26:23 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: References: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> Message-ID: On Thu, 29 Jan 2026 17:48:49 GMT, Richard Reingruber wrote: > Very nice and clean now. Thanks for putting the work into it. Cheers, Richard. Thank you @reinrich for your very thorough review. I didn't find a way to use $$cop$constant on PPC but the code now should be much better to understand. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29281#issuecomment-3822471638 From shade at openjdk.org Fri Jan 30 08:34:42 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 30 Jan 2026 08:34:42 GMT Subject: RFR: 8376604: C2: EA should assert is_oop_field for AddP with oop outs In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 18:30:15 GMT, Aleksey Shipilev wrote: > Split out of [JDK-8376514](https://bugs.openjdk.org/browse/JDK-8376514) in Valhalla: I think we need to verify more thoroughly that if we reply is_oop_field = false for AddP, then there are no nodes that we feed into oops. We handle it pretty well in various branches in the method already, and we "just" need to check it at the end. Valhalla catches fire on that post-condition check, tracked in [JDK-8376514](https://bugs.openjdk.org/browse/JDK-8376514). > > Cleans up the code a bit as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`, 20x, no failures > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` Thank you! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29468#issuecomment-3822501155 From shade at openjdk.org Fri Jan 30 08:34:44 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 30 Jan 2026 08:34:44 GMT Subject: Integrated: 8376604: C2: EA should assert is_oop_field for AddP with oop outs In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 18:30:15 GMT, Aleksey Shipilev wrote: > Split out of [JDK-8376514](https://bugs.openjdk.org/browse/JDK-8376514) in Valhalla: I think we need to verify more thoroughly that if we reply is_oop_field = false for AddP, then there are no nodes that we feed into oops. We handle it pretty well in various branches in the method already, and we "just" need to check it at the end. Valhalla catches fire on that post-condition check, tracked in [JDK-8376514](https://bugs.openjdk.org/browse/JDK-8376514). > > Cleans up the code a bit as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`, 20x, no failures > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` This pull request has now been integrated. Changeset: e6437264 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e6437264d5e6d4aad23430b7dbdf574a12b8f57b Stats: 20 lines in 2 files changed: 10 ins; 6 del; 4 mod 8376604: C2: EA should assert is_oop_field for AddP with oop outs Reviewed-by: qamai, kvn ------------- PR: https://git.openjdk.org/jdk/pull/29468 From jsikstro at openjdk.org Fri Jan 30 08:38:23 2026 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 30 Jan 2026 08:38:23 GMT Subject: RFR: 8376777: Consistent use of nonstatic instead of non_static in ci files Message-ID: Hello, While working towards having more consistent names for nonstatic fields in Valhalla (see [JDK-8376652](https://bugs.openjdk.org/browse/JDK-8376652)), I noticed that we have a few places in the ci files that uses non_static over nonstatic. I suggest we should be consistent in this case and use nonstatic, which is favored in HotSpot. Testing: * GHA * Oracle's tier1 ------------- Commit messages: - 8376777: Consistent use of nonstatic instead of non_static in ci files Changes: https://git.openjdk.org/jdk/pull/29500/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29500&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376777 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/29500.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29500/head:pull/29500 PR: https://git.openjdk.org/jdk/pull/29500 From epeter at openjdk.org Fri Jan 30 08:48:31 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jan 2026 08:48:31 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: <-GqrRj0gaXDW-pdZzrvhmLiVR2WDINDoSfM2cnzvFvg=.6b7916dc-eab7-41de-9b7e-37a15a769c78@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <-GqrRj0gaXDW-pdZzrvhmLiVR2WDINDoSfM2cnzvFvg=.6b7916dc-eab7-41de-9b7e-37a15a769c78@github.com> Message-ID: <_x35U-e0JnJ4enqN2xdnFXunAup2AIDw8_yrht9unt0=.5c772df5-0bd9-4df4-9df9-7d1e1e0df89b@github.com> On Thu, 29 Jan 2026 16:27:21 GMT, Quan Anh Mai wrote: > > But then we'd still always pay the price of the drain loop, even if it then gets folded away. Not great. > > May I ask why? We only need to clone the post loop after we decide to vectorize. At which point it becomes the drain loop, otherwise there is no need for the clone. Ah, I see. So you would do pre/main/post, immediately run the vectorizer on main, if we are about to succeed, clone the post-loop down (possibly multiple times), so we then have some drain loops and a post loop. Then apply the VTransform to the main loop and the drain loops, but with different "vectorization unrolling factors". @merykitty Is that what you are suggesting? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3822563551 From qamai at openjdk.org Fri Jan 30 09:00:43 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 30 Jan 2026 09:00:43 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4] In-Reply-To: <_x35U-e0JnJ4enqN2xdnFXunAup2AIDw8_yrht9unt0=.5c772df5-0bd9-4df4-9df9-7d1e1e0df89b@github.com> References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> <-GqrRj0gaXDW-pdZzrvhmLiVR2WDINDoSfM2cnzvFvg=.6b7916dc-eab7-41de-9b7e-37a15a769c78@github.com> <_x35U-e0JnJ4enqN2xdnFXunAup2AIDw8_yrht9unt0=.5c772df5-0bd9-4df4-9df9-7d1e1e0df89b@github.com> Message-ID: On Fri, 30 Jan 2026 08:45:45 GMT, Emanuel Peter wrote: >>> But then we'd still always pay the price of the drain loop, even if it then gets folded away. Not great. >> >> May I ask why? We only need to clone the post loop after we decide to vectorize. At which point it becomes the drain loop, otherwise there is no need for the clone. > >> > But then we'd still always pay the price of the drain loop, even if it then gets folded away. Not great. >> >> May I ask why? We only need to clone the post loop after we decide to vectorize. At which point it becomes the drain loop, otherwise there is no need for the clone. > > Ah, I see. So you would do pre/main/post, immediately run the vectorizer on main, if we are about to succeed, clone the post-loop down (possibly multiple times), so we then have some drain loops and a post loop. Then apply the VTransform to the main loop and the drain loops, but with different "vectorization unrolling factors". @merykitty Is that what you are suggesting? @eme64 Can we clone the post loop after vectorization as well? My thought is that we vectorize and super unroll the main loop. At that point, the maximum iteration of the post loop may become sufficiently large that we clone the post loop, and perform vectorization on the first one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3822607084 From mhaessig at openjdk.org Fri Jan 30 09:04:56 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 30 Jan 2026 09:04:56 GMT Subject: Integrated: 8376781: Problemlist compiler/longcountedloops/TestLoopNestTooManyTraps.java Message-ID: <6nkA9QI_4JFIuMSy3jqW3szESCUDAK9jfAImBJjHJxc=.3cad8bcf-cb0d-4372-8daa-c3b6ca88cdff@github.com> This PR problemlists compiler/longcountedloops/TestLoopNestTooManyTraps.java to reduce noise in the CI until a fix lands. ------------- Commit messages: - Problemlist compiler/longcountedloops/TestLoopNestTooManyTraps.java Changes: https://git.openjdk.org/jdk/pull/29501/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29501&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376781 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29501/head:pull/29501 PR: https://git.openjdk.org/jdk/pull/29501 From thartmann at openjdk.org Fri Jan 30 09:04:57 2026 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 30 Jan 2026 09:04:57 GMT Subject: Integrated: 8376781: Problemlist compiler/longcountedloops/TestLoopNestTooManyTraps.java In-Reply-To: <6nkA9QI_4JFIuMSy3jqW3szESCUDAK9jfAImBJjHJxc=.3cad8bcf-cb0d-4372-8daa-c3b6ca88cdff@github.com> References: <6nkA9QI_4JFIuMSy3jqW3szESCUDAK9jfAImBJjHJxc=.3cad8bcf-cb0d-4372-8daa-c3b6ca88cdff@github.com> Message-ID: On Fri, 30 Jan 2026 08:52:53 GMT, Manuel H?ssig wrote: > This PR problemlists compiler/longcountedloops/TestLoopNestTooManyTraps.java to reduce noise in the CI until a fix lands. Good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29501#pullrequestreview-3727092064 From chagedorn at openjdk.org Fri Jan 30 09:04:58 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 30 Jan 2026 09:04:58 GMT Subject: Integrated: 8376781: Problemlist compiler/longcountedloops/TestLoopNestTooManyTraps.java In-Reply-To: <6nkA9QI_4JFIuMSy3jqW3szESCUDAK9jfAImBJjHJxc=.3cad8bcf-cb0d-4372-8daa-c3b6ca88cdff@github.com> References: <6nkA9QI_4JFIuMSy3jqW3szESCUDAK9jfAImBJjHJxc=.3cad8bcf-cb0d-4372-8daa-c3b6ca88cdff@github.com> Message-ID: On Fri, 30 Jan 2026 08:52:53 GMT, Manuel H?ssig wrote: > This PR problemlists compiler/longcountedloops/TestLoopNestTooManyTraps.java to reduce noise in the CI until a fix lands. Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29501#pullrequestreview-3727094631 From mhaessig at openjdk.org Fri Jan 30 09:05:00 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 30 Jan 2026 09:05:00 GMT Subject: Integrated: 8376781: Problemlist compiler/longcountedloops/TestLoopNestTooManyTraps.java In-Reply-To: <6nkA9QI_4JFIuMSy3jqW3szESCUDAK9jfAImBJjHJxc=.3cad8bcf-cb0d-4372-8daa-c3b6ca88cdff@github.com> References: <6nkA9QI_4JFIuMSy3jqW3szESCUDAK9jfAImBJjHJxc=.3cad8bcf-cb0d-4372-8daa-c3b6ca88cdff@github.com> Message-ID: On Fri, 30 Jan 2026 08:52:53 GMT, Manuel H?ssig wrote: > This PR problemlists compiler/longcountedloops/TestLoopNestTooManyTraps.java to reduce noise in the CI until a fix lands. Thank you for the quick reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29501#issuecomment-3822600750 From mhaessig at openjdk.org Fri Jan 30 09:05:01 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 30 Jan 2026 09:05:01 GMT Subject: Integrated: 8376781: Problemlist compiler/longcountedloops/TestLoopNestTooManyTraps.java In-Reply-To: <6nkA9QI_4JFIuMSy3jqW3szESCUDAK9jfAImBJjHJxc=.3cad8bcf-cb0d-4372-8daa-c3b6ca88cdff@github.com> References: <6nkA9QI_4JFIuMSy3jqW3szESCUDAK9jfAImBJjHJxc=.3cad8bcf-cb0d-4372-8daa-c3b6ca88cdff@github.com> Message-ID: <_u1HJu-QteMLbU1t59U3bNSCTt-xftzUKbijlK_FdHU=.b1522546-c471-485c-b70b-2d6d4b4a0603@github.com> On Fri, 30 Jan 2026 08:52:53 GMT, Manuel H?ssig wrote: > This PR problemlists compiler/longcountedloops/TestLoopNestTooManyTraps.java to reduce noise in the CI until a fix lands. This pull request has now been integrated. Changeset: 42370e22 Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/42370e22c5bc4ebd40fd500a2e6e9e07f0b8bcd8 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8376781: Problemlist compiler/longcountedloops/TestLoopNestTooManyTraps.java Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/29501 From rrich at openjdk.org Fri Jan 30 09:21:32 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 30 Jan 2026 09:21:32 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: References: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> Message-ID: On Fri, 30 Jan 2026 08:22:47 GMT, David Briemann wrote: > > Very nice and clean now. Thanks for putting the work into it. Cheers, Richard. > > Thank you @reinrich for your very thorough review. I didn't find a way to use $$cop$constant on PPC but the code now should be much better to understand. I think it doesn't work. Not even on aarch64. It translates to a virtual call of `MachOper::constant()` but adlc does not seem to override the method for `Bool` operands. Anyways, thanks for trying :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/29281#issuecomment-3822692292 From mchevalier at openjdk.org Fri Jan 30 09:22:44 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 30 Jan 2026 09:22:44 GMT Subject: RFR: 8375038: C2: Enforce that Ideal() returns the root of the subgraph if any change was made by checking the node hash [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 17:14:22 GMT, Beno?t Maillard wrote: >> This PR introduces an assert in `PhaseIterGVN` to check that `Ideal` actually returns something if the node was modified. >> >> ## Context >> >> In the description of `Node::Ideal` in `node.cpp`, we have: >> >>> If ANY change is made, it must return the root of the reshaped graph - even if the root is the same Node >> >> It is crucial that such changes do not go unnoticed and that they can propagate to other nodes. Current documentation also states: >> >>> Running with `-XX:VerifyIterativeGVN=1` checks >>> these invariants, although its too slow to have on by default. If you are >>> hacking an Ideal call, be sure to test with `-XX:VerifyIterativeGVN=1` >> >> However, `-XX:VerifyIterativeGVN=1` ends up veryfing that the `_in` and `_out` arrays are consistent, but does not verify the return value. >> >> This PR aims to enforce the return value invariant. It should also make regression testing of bugs caused by wrongly returning nullptr in `Ideal` easier, such as [JDK-8373251](https://bugs.openjdk.org/browse/JDK-8373251). >> >> ## Proposed Change >> >> In summary, this PR brings the following set of changes >> - Add a new flag bit to`-XX:VerifyIterativeGVN` for verifying return of `Ideal` calls >> - Add an assert on the hash of nodes before and after `Ideal` in `PhaseIterGVN::transform_old` >> - Fix `Ideal` optimizations that would cause harness errors with testing on tier1 >> - Update the comments in the code to clarify the invariant and how to enforce it >> >> After consideration, I took the decision to only check the hash if the node is not dead. It seems there are many cases where the control node is dead, and we propagate the information to all users with `kill_dead_code`, and end up return `nullptr`. This is basically a mechanism to "speed up" the propagation (it would also happen normally via the usual IGVN worklist). This somehow contradicts the "must return the root of the reshaped graph" invariant, but it seems to be a common practice. >> >> In addition to that, I have decided to implement this as part of a new flag bit to `-XX:VerifyIterativeGVN` instead of an existing one, because there is a risk that it causes new failures in existing usages of the flag. >> >> This PR is meant to introduce the new check and fix the most "obvious" failures that the new flag would introduce in common scenarios, such as when running with `-version` on tier1. Since there are known issues caused by bad return values of `Ideal` (such as [JDK-8373251](https://bugs.openjdk.org/browse/... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/node.cpp > > Co-authored-by: Roberto Casta?eda Lozano Is it going to be added in testing? Maybe we can at least add it to https://github.com/openjdk/jdk/blob/42370e22c5bc4ebd40fd500a2e6e9e07f0b8bcd8/test/hotspot/jtreg/compiler/c2/TestVerifyIterativeGVN.java#L24-L37 There starts to be a lot of sub-flags in this flag. Would is be meaningful to merge the new one with https://github.com/openjdk/jdk/blob/42370e22c5bc4ebd40fd500a2e6e9e07f0b8bcd8/src/hotspot/share/opto/c2_globals.hpp#L702 Since it's also about verifying whether something changed? That would mean fixing everything before merging this, alas. And if I'm fine with merged flags that enable more things than maybe useful, I can also spam the `1`s for `VerifyIterativeGVN`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29421#issuecomment-3822695081 From mdoerr at openjdk.org Fri Jan 30 09:38:30 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 30 Jan 2026 09:38:30 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 10:14:38 GMT, Galder Zamarre?o wrote: >> Adjust test expectations to deal with environments where `Min/Max[F|D]` are not produced. To fix this, I've followed same patterns used in `compiler.c2.irTests.TestMinMaxIdentities`. >> >> I have tested this on x86_64 with `-XX:UseAVX=0`. > > Yeah sure happy with a second review. Skara marked it with 1 review, that's why I thought this was ready @galderz: Do you want to update the Copyright year or should we ship it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29438#issuecomment-3822766213 From epeter at openjdk.org Fri Jan 30 10:03:56 2026 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jan 2026 10:03:56 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v17] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 04:32:59 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into vectorize-subword > - Apply changes from review > - Fix whitespace > - Update tests after merge, apply changes from review > - Merge from master > - Update tests, cleanup logic > - Merge branch 'master' into vectorize-subword > - Check for AVX2 for byte/long conversions > - Whitespace and benchmark tweak > - Address more comments, make test and benchmark more exhaustive > - ... and 11 more: https://git.openjdk.org/jdk/compare/fa1b1d67...641a3abc But this is just a very minor failure, so the review can continue next to this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-3822869995 From shade at openjdk.org Fri Jan 30 12:28:25 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 30 Jan 2026 12:28:25 GMT Subject: RFR: 8375694: C2: Dead loop constructed with CastPP in late inlining Message-ID: Deeper CTW testing ([JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557)) often catches fire in the same place. @rwestrel was able to come up with the local reproducer and the prospective fix. This looks like a remaining compiler problem that blocks enabling wider CTW testing, so I took the patch for polishing and more testing. To quote Roland: "Late inlining is happening in a dead part of the graph. Finding dead subgraphs is expensive so there are some heuristics to avoid the work when possible. I think they need to be tweaked. Right now it's assumed safe when a Phi references the result of a Call. But with late inlining the call can go away and we can't tell what it's hiding. " So the fix is to exempt these cases from dead loop checks. Additional testing: - [x] Linux x86_64 server fastdebug, new test fails without the fix, passes with it - [x] Linux x86_64 server fastdebug, `hotspot_compiler` - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/29504/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29504&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375694 Stats: 84 lines in 4 files changed: 83 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29504/head:pull/29504 PR: https://git.openjdk.org/jdk/pull/29504 From shade at openjdk.org Fri Jan 30 12:35:28 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 30 Jan 2026 12:35:28 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v14] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - JDK-8375694 fix - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/26068/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=13 Stats: 60 lines in 7 files changed: 32 ins; 3 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From dfenacci at openjdk.org Fri Jan 30 12:39:01 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 30 Jan 2026 12:39:01 GMT Subject: RFR: 8376325: [IR Framework] Detect and report overloads [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 10:23:43 GMT, Marc Chevalier wrote: >> The IR framework should not only forbid overloads between test methods, but overloads of a test method, even if other overloads are not test methods themselves. Indeed, the compiler directive file designate methods only by the class name and method name, without the parameters. Something like: >> >> { >> match : "ir_framework.tests.BadOverloadedMethod::sameName", >> log : true, >> PrintIdeal : true, >> } >> >> This means that the same printing directive would apply to overloads, and make the output confusing in case these non-test methods are compiled. While test methods are necessarily compiled by the framework, the said framework doesn't prevent other methods to be compiled (a normal output of the test VM shows a lot of compilations). >> >> One could emit compiler directives that take arguments into account, but this is not clear it is useful. Also, there is a simpler solution: disallow overloading of test methods at all. This way, if we regret and need overloads later, we can still allow them without breaking existing tests. With this change, one can get the new error message: >> >> - Cannot overload @Test methods, but method public void ir_framework.tests.BadOverloadedMethod.sameName(double) has 2 overloads: >> - public void ir_framework.tests.BadOverloadedMethod.sameName(boolean) >> - public void ir_framework.tests.BadOverloadedMethod.sameName() >> >> which should explain well enough what is happening. A little esthetic problem is that if all three methods (in the previous example) are test-method, one get an error for each of them. I considered it acceptable. >> >> This change needed adjusting some tests. I've also made them a bit more robust/easy to maintain by using a map instead so I didn't have to sift a hundred array indices. >> >> Let's also emphasize that this change doesn't mean that overloads are entirely forbidden: they are fine as long as they don't involve a test method. >> >> Tested on tier1,tier2,tier3,hs-precheckin-comp,hs-comp-stress. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Copyright This makes total sense! Thanks for looking into it @marc-chevalier! test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java line 597: > 595: ); > 596: TestFormat.check(!testMethodMap.containsKey(m.getName()), > 597: "Cannot overload two @Test methods: " + m + ", " + testMethodMap.get(m.getName())); Is this check redundant now? Or maybe we could exclude it from the first check if we want to keep the more precise message? ------------- PR Review: https://git.openjdk.org/jdk/pull/29483#pullrequestreview-3727645698 PR Review Comment: https://git.openjdk.org/jdk/pull/29483#discussion_r2745731898 From mchevalier at openjdk.org Fri Jan 30 12:49:24 2026 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 30 Jan 2026 12:49:24 GMT Subject: RFR: 8376325: [IR Framework] Detect and report overloads [v2] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 11:03:00 GMT, Damon Fenacci wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Copyright > > test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java line 597: > >> 595: ); >> 596: TestFormat.check(!testMethodMap.containsKey(m.getName()), >> 597: "Cannot overload two @Test methods: " + m + ", " + testMethodMap.get(m.getName())); > > Is this check redundant now? Or maybe we could exclude it from the first check if we want to keep the more precise message? It should be redundant. I considered removing it, but it is checking its property very differently: by checking `testMethodMap` doesn't have this key yet, which is nice since we are about to insert something at the said key. I preferred to keep it more as an assert: I expect the first check to fire, but if this one does, it likely means that either the first check is wrong, or we are not filling `testMethodMap` correctly. I'm ok removing it, but I would rather not put it first because I fear it might be confusing to see first "you can't overload a test method with a test method", and then "you can't overload a test method at all" (I'd be thinking "well, tell me that in the first place then!"). But not strong strong opinion. What about @chhagedorn? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29483#discussion_r2746115946 From mdoerr at openjdk.org Fri Jan 30 13:18:52 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 30 Jan 2026 13:18:52 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> References: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> Message-ID: <8ThsvXqolFi0j74Qk_KB6JR-Bs3XM9HqAjNHTVjMKEU=.949eab77-5af5-42a7-9187-9ff0ce37f7ab@github.com> On Thu, 29 Jan 2026 16:03:46 GMT, David Briemann wrote: >> Adds the following mach nodes: >> match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > negating comparisons does not always work, invert results instead I think comments regarding your latest commit would be helpful. Seems to be related to the treatment of unordered results (comparison with NaN). ------------- PR Comment: https://git.openjdk.org/jdk/pull/29281#issuecomment-3823698310 From roland at openjdk.org Fri Jan 30 13:35:16 2026 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 30 Jan 2026 13:35:16 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v7] In-Reply-To: References: Message-ID: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - new test - Merge branch 'master' into JDK-8370519 - Benoit's test case - Merge branch 'master' into JDK-8370519 - package declaration - review - Merge branch 'master' into JDK-8370519 - Update test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java Co-authored-by: Emanuel Peter - Update test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java Co-authored-by: Beno?t Maillard - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Emanuel Peter - ... and 9 more: https://git.openjdk.org/jdk/compare/536b4f02...1284ae3c ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28581/files - new: https://git.openjdk.org/jdk/pull/28581/files/1c040156..1284ae3c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=05-06 Stats: 112132 lines in 4283 files changed: 57826 ins; 20012 del; 34294 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From roland at openjdk.org Fri Jan 30 13:35:17 2026 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 30 Jan 2026 13:35:17 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 10:22:01 GMT, Beno?t Maillard wrote: > I was able to come up with this test, which is a bit more that 2 times faster than the original one on my machine. Its `memlimit` is set to `600M`, which is enough to make the old version fail. With the new one, the test passes even with a `memlimit` of `200M`, so this should be a good enough margin. Great. The new test looks good to me. I replaced the existing test with that one. Thanks for taking the time to do that. > While looking into this I have also found out that some programs have an unexpectedly high usage of `output` (as was the case in the test case that I initially suggested). I am trying to get a good reproducer and will most likely file a follow-up. Can you post links to the bugs? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3823765937 From dbriemann at openjdk.org Fri Jan 30 14:04:32 2026 From: dbriemann at openjdk.org (David Briemann) Date: Fri, 30 Jan 2026 14:04:32 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v6] In-Reply-To: References: Message-ID: > Adds the following mach nodes: > match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); > match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); David Briemann has updated the pull request incrementally with one additional commit since the last revision: add helpful comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29281/files - new: https://git.openjdk.org/jdk/pull/29281/files/3d1aa822..8fad3756 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29281&range=04-05 Stats: 9 lines in 1 file changed: 6 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29281/head:pull/29281 PR: https://git.openjdk.org/jdk/pull/29281 From rrich at openjdk.org Fri Jan 30 14:08:04 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 30 Jan 2026 14:08:04 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: <8ThsvXqolFi0j74Qk_KB6JR-Bs3XM9HqAjNHTVjMKEU=.949eab77-5af5-42a7-9187-9ff0ce37f7ab@github.com> References: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> <8ThsvXqolFi0j74Qk_KB6JR-Bs3XM9HqAjNHTVjMKEU=.949eab77-5af5-42a7-9187-9ff0ce37f7ab@github.com> Message-ID: On Fri, 30 Jan 2026 13:16:05 GMT, Martin Doerr wrote: > I think comments regarding your latest commit would be helpful. Seems to be related to the treatment of unordered results (comparison with NaN). But wasn't the problem there the following: We want to implement `<` with `>=`: op1 < op2 ? src1 : src2 <=> !(op1 >= op2) ? src1 : src2 <=> op1 >= op2 ? src2 : src1 But the implementation was op2 >= op1 ? src1 : src2 Which evaluates to src1 instead of src2 for op1 == op2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29281#issuecomment-3823933980 From mdoerr at openjdk.org Fri Jan 30 14:13:17 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 30 Jan 2026 14:13:17 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: References: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> <8ThsvXqolFi0j74Qk_KB6JR-Bs3XM9HqAjNHTVjMKEU=.949eab77-5af5-42a7-9187-9ff0ce37f7ab@github.com> Message-ID: On Fri, 30 Jan 2026 14:04:24 GMT, Richard Reingruber wrote: > > I think comments regarding your latest commit would be helpful. Seems to be related to the treatment of unordered results (comparison with NaN). > > But wasn't the problem there the following: We want to implement `<` with `>=`: > > ``` > op1 < op2 ? src1 : src2 > <=> !(op1 >= op2) ? src1 : src2 > <=> op1 >= op2 ? src2 : src1 > ``` > > But the implementation was > > ``` > op2 >= op1 ? src1 : src2 > ``` > > Which evaluates to src1 instead of src2 for op1 == op2. That was also wrong, but an additional problem was the wrong handling of NaN: C2 requires unordered to get treated like less ------------- PR Comment: https://git.openjdk.org/jdk/pull/29281#issuecomment-3823972228 From mdoerr at openjdk.org Fri Jan 30 14:16:28 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 30 Jan 2026 14:16:28 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v6] In-Reply-To: References: Message-ID: <_cTZNjjgUZZTMjdngXE8ikuGSQe9D9YNFMOBIM_s24c=.965fb8b9-2ca2-402e-bab9-10fddd0a55a9@github.com> On Fri, 30 Jan 2026 14:04:32 GMT, David Briemann wrote: >> Adds the following mach nodes: >> match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > add helpful comments Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29281#pullrequestreview-3728613026 From galder at openjdk.org Fri Jan 30 14:17:35 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 30 Jan 2026 14:17:35 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 In-Reply-To: References: Message-ID: <-nZ2F0homyk9w0zf8LjCJClpuDmkv69fe_VbHtFPAuo=.cdbb0c10-3036-42db-babc-e317aaeaa138@github.com> On Fri, 30 Jan 2026 09:36:00 GMT, Martin Doerr wrote: >> Yeah sure happy with a second review. Skara marked it with 1 review, that's why I thought this was ready > > @galderz: Do you want to update the Copyright year or should we ship it? @TheRealMDoerr I'm trying to find out what to do about the copyright year ------------- PR Comment: https://git.openjdk.org/jdk/pull/29438#issuecomment-3823998837 From shade at openjdk.org Fri Jan 30 14:37:29 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 30 Jan 2026 14:37:29 GMT Subject: RFR: 8375694: C2: Dead loop constructed with CastPP in late inlining [v2] In-Reply-To: References: Message-ID: > Deeper CTW testing ([JDK-8360557](https://bugs.openjdk.org/browse/JDK-8360557)) often catches fire in the same place. @rwestrel was able to come up with the local reproducer and the prospective fix. This looks like a remaining compiler problem that blocks enabling wider CTW testing, so I took the patch for polishing and more testing. > > To quote Roland: "Late inlining is happening in a dead part of the graph. Finding dead subgraphs is expensive so there are some heuristics to avoid the work when possible. I think they need to be tweaked. Right now it's assumed safe when a Phi references the result of a Call. But with late inlining the call can go away and we can't tell what it's hiding. " > > So the fix is to exempt these cases from dead loop checks. > > Additional testing: > - [x] Linux x86_64 server fastdebug, new test fails without the fix, passes with it > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Test is really debug-only, needs a develop option and fails in verification code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29504/files - new: https://git.openjdk.org/jdk/pull/29504/files/691e5e61..15dc5785 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29504&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29504&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29504/head:pull/29504 PR: https://git.openjdk.org/jdk/pull/29504 From mhaessig at openjdk.org Fri Jan 30 14:40:06 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 30 Jan 2026 14:40:06 GMT Subject: RFR: 8376707: Template-Framework Library: Primitive Types Abbreviation Methods Message-ID: <4WW4vBYE8tnCVfUwTVzV7GS9G7p08Gc0nC0KWJmf8kY=.0339adaf-3577-4413-baf3-45fac9833b1c@github.com> This PR adds two convenience methods for abbreviating `PrimitiveType`s in the Template Framework. Testing: - [x] Run the Template Framework tests - [ ] Github Actions ------------- Commit messages: - Add methods for abbreviating primitive types Changes: https://git.openjdk.org/jdk/pull/29506/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29506&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376707 Stats: 74 lines in 2 files changed: 73 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29506/head:pull/29506 PR: https://git.openjdk.org/jdk/pull/29506 From mhaessig at openjdk.org Fri Jan 30 14:40:46 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 30 Jan 2026 14:40:46 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v7] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 13:35:16 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - new test > - Merge branch 'master' into JDK-8370519 > - Benoit's test case > - Merge branch 'master' into JDK-8370519 > - package declaration > - review > - Merge branch 'master' into JDK-8370519 > - Update test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java > > Co-authored-by: Emanuel Peter > - Update test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java > > Co-authored-by: Beno?t Maillard > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Emanuel Peter > - ... and 9 more: https://git.openjdk.org/jdk/compare/a221fd63...1284ae3c The new test looks good. I'm just giving this another spin on the CI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3824099099 From rrich at openjdk.org Fri Jan 30 15:03:43 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 30 Jan 2026 15:03:43 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: References: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> <8ThsvXqolFi0j74Qk_KB6JR-Bs3XM9HqAjNHTVjMKEU=.949eab77-5af5-42a7-9187-9ff0ce37f7ab@github.com> Message-ID: <2PcG2z4LTy-5Nf5bP5kZh3NbqUP8GZJTUTDj0IkD67g=.b3749e44-5ab3-4826-aa36-5ee82ce46c8d@github.com> On Fri, 30 Jan 2026 14:10:46 GMT, Martin Doerr wrote: > That was also wrong, but an additional problem was the wrong handling of NaN: C2 requires unordered to get treated like less The requirement is to implement `op1 cond op2 ? src1 : src2`. For NaN this means the result has to be `src2` except for `!=`. And that's it, isn't it? It might have helped to implement `op1 < op2` as `op2 > op1` using xscmpgtdp. The chosen implementation is also good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29281#issuecomment-3824201312 From rrich at openjdk.org Fri Jan 30 15:03:41 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 30 Jan 2026 15:03:41 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v6] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 14:04:32 GMT, David Briemann wrote: >> Adds the following mach nodes: >> match(Set dst (CMoveF (Binary cop (CmpF op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveF (Binary cop (CmpD op1 op2)) (Binary src1 src2))); >> match(Set dst (CMoveD (Binary cop (CmpF op1 op2)) (Binary src1 src2))); > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > add helpful comments Marked as reviewed by rrich (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29281#pullrequestreview-3728832582 From roland at openjdk.org Fri Jan 30 15:04:25 2026 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 30 Jan 2026 15:04:25 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v8] In-Reply-To: References: Message-ID: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/memory/arena.hpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28581/files - new: https://git.openjdk.org/jdk/pull/28581/files/1284ae3c..ce29db29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=06-07 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From mhaessig at openjdk.org Fri Jan 30 15:04:31 2026 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 30 Jan 2026 15:04:31 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v7] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 13:35:16 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - new test > - Merge branch 'master' into JDK-8370519 > - Benoit's test case > - Merge branch 'master' into JDK-8370519 > - package declaration > - review > - Merge branch 'master' into JDK-8370519 > - Update test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java > > Co-authored-by: Emanuel Peter > - Update test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java > > Co-authored-by: Beno?t Maillard > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Emanuel Peter > - ... and 9 more: https://git.openjdk.org/jdk/compare/78de1ad5...1284ae3c Well, the CI tells me that the copyright years need updating. src/hotspot/share/memory/arena.hpp line 2: > 1: /* > 2: * Copyright (c) 2017, 2025, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2017, 2026, Oracle and/or its affiliates. All rights reserved. src/hotspot/share/opto/loopnode.cpp line 2: > 1: /* > 2: * Copyright (c) 1998, 2025, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 1998, 2026, Oracle and/or its affiliates. All rights reserved. src/hotspot/share/opto/loopnode.hpp line 2: > 1: /* > 2: * Copyright (c) 1998, 2025, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 1998, 2026, Oracle and/or its affiliates. All rights reserved. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28581#pullrequestreview-3728810358 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2746683181 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2746682501 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2746680565 From mdoerr at openjdk.org Fri Jan 30 15:06:55 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 30 Jan 2026 15:06:55 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: <2PcG2z4LTy-5Nf5bP5kZh3NbqUP8GZJTUTDj0IkD67g=.b3749e44-5ab3-4826-aa36-5ee82ce46c8d@github.com> References: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> <8ThsvXqolFi0j74Qk_KB6JR-Bs3XM9HqAjNHTVjMKEU=.949eab77-5af5-42a7-9187-9ff0ce37f7ab@github.com> <2PcG2z4LTy-5Nf5bP5kZh3NbqUP8GZJTUTDj0IkD67g=.b3749e44-5ab3-4826-aa36-5ee82ce46c8d@github.com> Message-ID: <-SH0h5d-xLCuERrLmyYBMmPV19ngoZyDI0jZJrdWGpI=.c76e5e2f-787b-478e-9981-c95ad491c280@github.com> On Fri, 30 Jan 2026 15:01:11 GMT, Richard Reingruber wrote: > > That was also wrong, but an additional problem was the wrong handling of NaN: C2 requires unordered to get treated like less > > The requirement is to implement `op1 cond op2 ? src1 : src2`. For NaN this means the result has to be `src2` except for `!=`. And that's it, isn't it? > > It might have helped to implement `op1 < op2` as `op2 > op1` using xscmpgtdp. The chosen implementation is also good. We had tried. The following example can show that it's wrong: class test { static double cmovf_ge(double op1, double op2, double src1, double src2) { return op1 >= op2 ? src1 : src2; } static void main(String[] args) { double result = 0.0; System.out.println(cmovf_ge(Double.NaN, Double.NaN, 1.0, 2.0)); for (int i = 0; i < 100_000; ++i) { result += cmovf_ge((double) (i / 2), (double) ((i + 1) / 2), 1.0, 2.0); } System.out.println("result = " + result); System.out.println(cmovf_ge(Double.NaN, Double.NaN, 1.0, 2.0)); } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/29281#issuecomment-3824211983 From bmaillard at openjdk.org Fri Jan 30 16:13:27 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 30 Jan 2026 16:13:27 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v8] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 15:04:25 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/memory/arena.hpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Manuel H?ssig Marked as reviewed by bmaillard (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28581#pullrequestreview-3729170387 From bmaillard at openjdk.org Fri Jan 30 16:13:29 2026 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 30 Jan 2026 16:13:29 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 13:30:25 GMT, Roland Westrelin wrote: > Can you post links to the bugs? Thanks. I haven't filed it yet. I observed something suspicious once, but at the moment I am not able to reproduce it anymore. I will take another look, and I will post here or tag you in the issue if there is any update @rwestrel. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3824502330 From galder at openjdk.org Fri Jan 30 16:55:04 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 30 Jan 2026 16:55:04 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 [v2] In-Reply-To: References: Message-ID: > Adjust test expectations to deal with environments where `Min/Max[F|D]` are not produced. To fix this, I've followed same patterns used in `compiler.c2.irTests.TestMinMaxIdentities`. > > I have tested this on x86_64 with `-XX:UseAVX=0`. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Update copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29438/files - new: https://git.openjdk.org/jdk/pull/29438/files/6804b519..710a33b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29438&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29438&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29438/head:pull/29438 PR: https://git.openjdk.org/jdk/pull/29438 From mdoerr at openjdk.org Fri Jan 30 16:55:06 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 30 Jan 2026 16:55:06 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 [v2] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 16:52:21 GMT, Galder Zamarre?o wrote: >> Adjust test expectations to deal with environments where `Min/Max[F|D]` are not produced. To fix this, I've followed same patterns used in `compiler.c2.irTests.TestMinMaxIdentities`. >> >> I have tested this on x86_64 with `-XX:UseAVX=0`. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright year Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29438#pullrequestreview-3729392557 From galder at openjdk.org Fri Jan 30 16:55:08 2026 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 30 Jan 2026 16:55:08 GMT Subject: RFR: 8375640: MinMaxIdentity test fails on some machines after 8373134 In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 09:36:00 GMT, Martin Doerr wrote: >> Yeah sure happy with a second review. Skara marked it with 1 review, that's why I thought this was ready > > @galderz: Do you want to update the Copyright year or should we ship it? @TheRealMDoerr @mhaessig I've updated the copyright year ------------- PR Comment: https://git.openjdk.org/jdk/pull/29438#issuecomment-3824694556 From psandoz at openjdk.org Sat Jan 31 00:03:08 2026 From: psandoz at openjdk.org (Paul Sandoz) Date: Sat, 31 Jan 2026 00:03:08 GMT Subject: RFR: 8376187: [VectorAPI] Define new lane type constants and pass them to intrinsic entries [v4] In-Reply-To: References: Message-ID: <-fsfUEvFpvmAsupQFgx1CBkH9vr_efE5-qYeUzy5VFQ=.4abb05e0-1f82-4d6c-8bc4-ca4bc6fc5e80@github.com> On Fri, 30 Jan 2026 07:35:43 GMT, Jatin Bhateja wrote: >> As per [discussions ](https://github.com/openjdk/jdk/pull/28002#issuecomment-3789507594) on JDK-8370691 pull request, splitting out portion of PR#28002 into a separate patch in preparation of Float16 vector API support. >> >> Patch add new lane type constants and pass them to vector intrinsic entry points. >> >> All existing Vector API jtreg test are passing with the patch. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions This is looking good. Just one last comment on the location of `LANE_TYPE_ORDINAL`. src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 152: > 150: public static final int MODE_BITS_COERCED_LONG_TO_MASK = 1; > 151: > 152: // Lane type codes for vector: Suggest you change the comment to indicate the values correspond to `jdk.incubator.vector.LaneType` ordinals e.g., jdk.incubator.vector.LaneType.FLOAT.ordinal() == LT_FLOAT etc. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractSpecies.java line 152: > 150: int laneTypeOrdinal() { > 151: return laneType.ordinal(); > 152: } Is this needed? Won't all concrete sub types override this? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Byte128Vector.java line 60: > 58: > 59: static final int LANE_TYPE_ORDINAL = LT_BYTE; > 60: You can move this up to `ByteVector` and then reuse it to replace `byte.class`, so it is used consistently. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LaneType.java line 270: > 268: > 269: static { > 270: assert(ofLaneTypeOrdinal(LT_FLOAT) == FLOAT); Very good! src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java line 821: > 819: convert(String name, char kind, Class dom, Class ran, int opCode, int flags) { > 820: int domran = ((LaneType.of(dom).ordinal() << VO_DOM_SHIFT) + > 821: (LaneType.of(ran).ordinal() << VO_RAN_SHIFT)); As i understand this is still correct because the maximum ordinal value is less than 16 (as was already the case for the basic type). ------------- PR Review: https://git.openjdk.org/jdk/pull/29481#pullrequestreview-3730928806 PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2748410039 PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2748387527 PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2748483577 PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2748392412 PR Review Comment: https://git.openjdk.org/jdk/pull/29481#discussion_r2748427970 From rrich at openjdk.org Sat Jan 31 12:30:06 2026 From: rrich at openjdk.org (Richard Reingruber) Date: Sat, 31 Jan 2026 12:30:06 GMT Subject: RFR: 8375536: PPC64: Implement special MachNodes for floating point CMove [v5] In-Reply-To: <2PcG2z4LTy-5Nf5bP5kZh3NbqUP8GZJTUTDj0IkD67g=.b3749e44-5ab3-4826-aa36-5ee82ce46c8d@github.com> References: <_3bZcGjeZKvRWXSPCiLZZzrCdlcKc4l0Orw25208VyM=.1330ef64-1048-4c45-b62e-09ea47c7ef2a@github.com> <8ThsvXqolFi0j74Qk_KB6JR-Bs3XM9HqAjNHTVjMKEU=.949eab77-5af5-42a7-9187-9ff0ce37f7ab@github.com> <2PcG2z4LTy-5Nf5bP5kZh3NbqUP8GZJTUTDj0IkD67g=.b3749e44-5ab3-4826-aa36-5ee82ce46c8d@github.com> Message-ID: On Fri, 30 Jan 2026 15:01:11 GMT, Richard Reingruber wrote: > > That was also wrong, but an additional problem was the wrong handling of NaN: C2 requires unordered to get treated like less > > The requirement is to implement `op1 cond op2 ? src1 : src2`. For NaN this means the result has to be `src2` except for `!=`. And that's it, isn't it? > > It might have helped to implement `op1 < op2` as `op2 > op1` using xscmpgtdp. The chosen implementation is also good. I think I finally got it. The semantics to implement are the semantics of CmpF3Node/CmpD3Node. Would you agree with the extended comment for cmovF? // Works for single and double precision floats. // dst = (op1 cmp(cc) op2) ? src1 : src2; // Unordered sematics are the same as for CmpF3Node/CmpD3Node which implement the fcmpl/dcmpl bytecodes. // Comparing unordered values has the same result as when src1 is less than src2. // So dst = src1 for <, <=, != and dst = src2 for >, >=, ==. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29281#issuecomment-3828368439 From qamai at openjdk.org Sat Jan 31 15:32:29 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 31 Jan 2026 15:32:29 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v6] In-Reply-To: References: Message-ID: <2H0Pzq4KmHogq2WYFXIZIJNAb-tiSmXo0nLVzj4aRHo=.7a96a80f-828e-40b6-be26-cc1155257ecf@github.com> > Hi, > > This patch refactors the logic in `MemNode::find_previous_store` and makes a small improvement to `MemNode::detect_ptr_independence`. An IR test accompanies the improvement. > > Please take a look and share your thoughts, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Add comments, tests - Merge branch 'master' into findpreviousstore - Add test store the loaded vector - Test description - Fix test failures - Fix null access - Refactor the logic in MemNode::find_previous_store ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29390/files - new: https://git.openjdk.org/jdk/pull/29390/files/f48c006c..1d0f1f9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29390&range=04-05 Stats: 8480 lines in 207 files changed: 1870 ins; 1506 del; 5104 mod Patch: https://git.openjdk.org/jdk/pull/29390.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29390/head:pull/29390 PR: https://git.openjdk.org/jdk/pull/29390 From qamai at openjdk.org Sat Jan 31 15:36:02 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 31 Jan 2026 15:36:02 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v5] In-Reply-To: References: <_DqULj-g1bNhAlaskqNJyTnW7r1a7-ykDp1z-gGvutE=.880fbf28-f484-4ec3-b585-a9301eb2de87@github.com> Message-ID: On Mon, 26 Jan 2026 15:35:54 GMT, Roberto Casta?eda Lozano wrote: > Thanks for accompanying this changeset with some test cases! Could you add a few negative ones where the memory accesses cannot be folded (e.g. one where c1 and c2 in TestFindStore.java are of the exact same class, Done, I have added test for multiple cases that this method exercises. Do you think that is sufficient. > one that exercises the raw-to-oop casting you mention above `MemNode::find_previous_store` does not uncast its operands, so it should not encounter such cases, but there are other users of this method do that, which may result in comparing the `Proj` of an `AllocateNode` with an oop. If you remove the condition that both are oops then the JVM fails tier 1 a lot (or even fails to be built). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2749680824 From qamai at openjdk.org Sat Jan 31 15:51:05 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 31 Jan 2026 15:51:05 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v5] In-Reply-To: References: <_DqULj-g1bNhAlaskqNJyTnW7r1a7-ykDp1z-gGvutE=.880fbf28-f484-4ec3-b585-a9301eb2de87@github.com> Message-ID: <7xXKEuZizPa2fqcykZTlQcsCF3yyyQwl5y07Y8DHJow=.4e41ec9d-273d-4581-bcbb-d08c0ef8d148@github.com> On Mon, 26 Jan 2026 16:03:16 GMT, Roberto Casta?eda Lozano wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Add test store the loaded vector > > Thanks for extracting this refactoring into an independent changeset! This is is going to simplify significantly the review process of the subsequent load folding changes. I have a few comments, questions, and suggestions. @robcasloz Thanks for the reviews, I have addressed them. @dlunde That's a good idea, I have compared the old and the new implementation of `MemNode::find_previous_store`. I tried verifying that either the old implementation returns `null`, returns the same value as the new one, or the refactored implementation returns `top`. I ran tier1-tier7 and encountered no violation. > src/hotspot/share/opto/memnode.cpp line 709: > >> 707: } else if (adr_type->base() == TypePtr::AnyPtr) { >> 708: // Give up on a very wide access >> 709: return nullptr; > > What kind of memory access is ruled out here? Could you add a test case for it? In mainline, this condition will imply `adr_maybe_raw` and impose an additional constraint on raw accesses (base equality), but not lead necessarily to `find_previous_store` giving up, right? I didn't encounter any, but theoretically, the only case possible for a `MemNode` to have this kind of `adr_type` is if the base turns out to be a `null`. I think it is necessary to exclude this case early, because such an address will upset `Compile::get_alias_index`. In mainline I think it is just too rare or impossible so we do not encounter a crash. > src/hotspot/share/opto/memnode.cpp line 740: > >> 738: } >> 739: >> 740: // If the bases are the same and the offsets are the same, it seems that this is the exact > > Suggestion: > > // (b) If the bases are the same and the offsets are the same, it seems that this is the exact > > > In general, I find the original comments referring to steps (a), (b), (c), etc. useful and would prefer if they were left in besides return and continue statements below. I have added them back > src/hotspot/share/opto/memnode.cpp line 741: > >> 739: >> 740: // If the bases are the same and the offsets are the same, it seems that this is the exact >> 741: // store we are looking for, the caller will check if the type of the store matches > > Could you detail in the comment where does the caller check type matching? Done > src/hotspot/share/opto/memnode.cpp line 785: > >> 783: if (detect_ptr_independence(base, alloc, st_base, AllocateNode::Ideal_allocation(st_base), phase)) { >> 784: // detect_ptr_independence == true means that it can prove that base and st_base cannot >> 785: // have the same runtime value > > I see how this comment can be useful in the original local EA changeset, but in the context of this separate changeset it seems redundant since it is basically restating what the comment two lines above says. Done, I have removed them > src/hotspot/share/opto/memnode.cpp line 1910: > >> 1908: ctrl = ctrl->in(0); >> 1909: set_req(MemNode::Control,ctrl); >> 1910: return this; > > Is there a reason to return early in this changeset, or is it something that only makes sense in the context of the subsequent local EA changes? Same for the early return below and the IGVN recording at the end of the function. I think it is more readable. The variable `progress` is defined at the start, set only here and the following early return, then used all the way at the end of the function. It is better for comprehension to return right after making changes to the node. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29390#issuecomment-3828740258 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2749686450 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2749686855 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2749686555 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2749686726 PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2749688624 From qamai at openjdk.org Sat Jan 31 15:51:07 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 31 Jan 2026 15:51:07 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v4] In-Reply-To: References: <_DCQEBinOHkFUYvFf7boqdWG9VD4aaRaU0SwO2hct-w=.0c474ed2-23ab-4263-a89b-6ac4a94d7f14@github.com> Message-ID: On Mon, 26 Jan 2026 15:47:31 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/memnode.cpp line 1236: >> >>> 1234: // LoadVector/StoreVector needs additional check to ensure the types match. >>> 1235: if (st->is_StoreVector()) { >>> 1236: // Some kind of masked access or gather/scatter >> >> This condition is insufficient to determine if `this` inspects the same memory as `st`. Luckily, `LoadVectorMasked`, `LoadVectorGather`, and `LoadVectorGatherMasked` all have `store_Opcode()` being `-1`, preventing any folding with them. On the other hand, `LoadVector` has `store_Opcode()` being `Op_StoreVector`, so the only case here turns out the be correct. However, it is better to be precise here. > > Could you summarize this motivation in a code comment? > Is the failure that motivated this additional checks triggered by the additional capabilities of `MemNode::detect_ptr_independence`? I have done that, the motivation is actually that this piece forgets that `this` can also be a `Store`, which will crash the VM at `as_LoadVector()` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2749689602 From qamai at openjdk.org Sat Jan 31 15:51:09 2026 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 31 Jan 2026 15:51:09 GMT Subject: RFR: 8376220: C2: Refactor the logic to in MemNode::find_previous_store [v5] In-Reply-To: <7xXKEuZizPa2fqcykZTlQcsCF3yyyQwl5y07Y8DHJow=.4e41ec9d-273d-4581-bcbb-d08c0ef8d148@github.com> References: <_DqULj-g1bNhAlaskqNJyTnW7r1a7-ykDp1z-gGvutE=.880fbf28-f484-4ec3-b585-a9301eb2de87@github.com> <7xXKEuZizPa2fqcykZTlQcsCF3yyyQwl5y07Y8DHJow=.4e41ec9d-273d-4581-bcbb-d08c0ef8d148@github.com> Message-ID: On Sat, 31 Jan 2026 15:43:57 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/memnode.cpp line 1910: >> >>> 1908: ctrl = ctrl->in(0); >>> 1909: set_req(MemNode::Control,ctrl); >>> 1910: return this; >> >> Is there a reason to return early in this changeset, or is it something that only makes sense in the context of the subsequent local EA changes? Same for the early return below and the IGVN recording at the end of the function. > > I think it is more readable. The variable `progress` is defined at the start, set only here and the following early return, then used all the way at the end of the function. It is better for comprehension to return right after making changes to the node. For the recording to IGVN, this is because there are some transformations in this method that only happen during IGVN, and there is no guarantee that this node will be processed again by IGVN. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29390#discussion_r2749691126