From sparasa at openjdk.org Tue Jul 1 00:01:59 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 1 Jul 2025 00:01:59 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt [v2] In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: <27l1noh4qLvBGFOqhDNxmv-Ikyuc8AOQNRgIT4RtbZM=.5c199ba5-a2a7-4e98-9459-68ed4c55b73f@github.com> On Fri, 27 Jun 2025 01:43:16 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. >> >> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. >> 2. If these special values are found, return immediately with minimal modifications to the result register. >> 3. Performance testing shows the modified intrinsic improves throughput by 65.1% over the original intrinsic on average for the special values while throughput drops by 5.5% for the normal value range (-INF, -2^(-1022)], [2^(-1022), INF). >> >> The commands to run all relevant micro-benchmarks are posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. >> >> | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | >> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | >> | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | >> | [0] | 344990 | 627561 | +81.91 | >> | [-0] ... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Ensure ABS_MASK is a 128-bit memory sized location and only use equal enum for UCOMISD checks I did independent testing by running the correctness tests and performance benchmarks. The change looks good to me. Thanks, Vamsi ------------- Marked as reviewed by sparasa (Author). PR Review: https://git.openjdk.org/jdk/pull/25962#pullrequestreview-2973095482 From haosun at openjdk.org Tue Jul 1 02:54:47 2025 From: haosun at openjdk.org (Hao Sun) Date: Tue, 1 Jul 2025 02:54:47 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 13:25:09 GMT, Mikhail Ablakatov wrote: >> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. >> >> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. >> >> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms >> >> >> Fujitsu A64FX (SVE 512-bit): >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - cleanup: address nits, rename several symbols > - cleanup: remove unreferenced definitions > - Address review comments. > > - fixup: disable FP mul reduction auto-vectorization for all targets > - fixup: add a tmp vReg to reduce_mul_integral_gt128b and > reduce_non_strict_order_mul_fp_gt128bto keep vsrc unmodified > - cleanup: replace a complex lambda in the above methods with a loop > - cleanup: rename symbols to follow the existing naming convention > - cleanup: add asserts to SVE only instructions > - split mul FP reduction instructions into strictly-ordered (default) > and explicitly non strictly-ordered > - remove redundant conditions in TestVectorFPReduction.java > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > | Benchmark | Before | After | Units | Diff | > |---------------------------|----------|----------|--------|-------| > | ByteMaxVector.MULLanes | 619.156 | 9884.578 | ops/ms | 1496% | > | DoubleMaxVector.MULLanes | 184.693 | 2712.051 | ops/ms | 1368% | > | FloatMaxVector.MULLanes | 277.818 | 3388.038 | ops/ms | 1119% | > | IntMaxVector.MULLanes | 371.225 | 4765.434 | ops/ms | 1183% | > | LongMaxVector.MULLanes | 205.149 | 2672.975 | ops/ms | 1203% | > | ShortMaxVector.MULLanes | 472.804 | 5122.917 | ops/ms | 984% | > - Merge branch 'master' into 8343689-rebase > - fixup: don't modify the value in vsrc > > Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this > change, the result of recursive folding is held in vtmp1. To be able to > pass this intermediate result to reduce_mul_integral_le128b(), we would > have to use another temporary FloatRegister, as vtmp1 would essentially > act as vsrc. It's possible to get around this however: > reduce_mul_integral_le128b() is modified so it's possible to pass > matching vsrc and vtmp2 arguments. By doing this, we save ourselves a > temporary register in rules that match to reduce_mul_integral_gt128b(). > - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating > - Use EXT instead of COMPACT to split a vector into two halves > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > Short... src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3729: > 3727: #undef INSN > 3728: > 3729: // SVE aliases In the inital commit, asm test for `sve_(mov|movs|not|nots)` is added into `test/hotspot/gtest/aarch64/aarch64-asmtest.py`. Since the definition is removed in this commit, the corresponding asm test should be removed as well. Otherwise, JDK build failed on AArch64. See the error log in GHA test. https://github.com/mikabl-arm/jdk/actions/runs/15974069085/job/45051902618 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2176310497 From xgong at openjdk.org Tue Jul 1 06:04:29 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 1 Jul 2025 06:04:29 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors Message-ID: ### Background On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions. For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size. To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors. ### Impact Analysis #### 1. Vector types Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change. #### 2. Vector API No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length. #### 3. Auto-vectorization Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks. #### 4. Codegen of vector nodes NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored. Details: - Lanewise vector operations are unaffected as explained above. - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE). - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, adding an explicit vector size check in `match_rule_supported_vector()` would be beneficial. - Missing codegen support for type conversions with 32-bit input or output vector size should be added. ### Main changes: - Support 2 shorts vector types. The supported min vector element count for each basic type is: - `T_BOOLEAN`: 2 - `T_BYTE`: 4 - `T_CHAR`: 4 - `T_SHORT`: 2 (new supported) - `T_INT`/`T_FLOAT`/`T_LONG`/`T_DOUBLE`: 2 - Add codegen support for `Vector[U]Cast` with 32-bit input or output vector size. `VectorReinterpret` has already considered the 32-bit vector size cases. - Unsupport reductions with less than 8 bytes vector size explicitly. - Add additional IR tests for Vector API type conversions. - Add JMH benchmark for auto-vectorization with two 16-bit lanes. ### Test Tested hotspot/jdk/langtools - all tests passed. ### Performance Following shows the performance improvement of relative VectorAPI JMHs on a NVIDIA Grace (128-bit SVE2) machine: Benchmark SIZE Mode Unit Before After Gain VectorFPtoIntCastOperations.microDouble128ToShort128 512 thrpt ops/ms 731.529 26278.599 35.92 VectorFPtoIntCastOperations.microDouble128ToShort128 1024 thrpt ops/ms 366.461 10595.767 28.91 VectorFPtoIntCastOperations.microFloat64ToShort64 512 thrpt ops/ms 315.791 14327.682 45.37 VectorFPtoIntCastOperations.microFloat64ToShort64 1024 thrpt ops/ms 158.485 7261.847 45.82 VectorZeroExtend.short2Long 128 thrpt ops/ms 1447.243 898666.972 620.95 And here is the performance improvement of the added JMH on Grace: Benchmark LEN Mode Unit Before After Gain VectorTwoShorts.addVec2S 64 avgt ns/op 20.948 12.683 1.65 VectorTwoShorts.addVec2S 128 avgt ns/op 40.073 22.703 1.76 VectorTwoShorts.addVec2S 512 avgt ns/op 157.447 83.691 1.88 VectorTwoShorts.addVec2S 1024 avgt ns/op 313.022 165.085 1.89 VectorTwoShorts.mulVec2S 64 avgt ns/op 20.981 12.647 1.65 VectorTwoShorts.mulVec2S 128 avgt ns/op 40.279 22.637 1.77 VectorTwoShorts.mulVec2S 512 avgt ns/op 158.642 83.371 1.90 VectorTwoShorts.mulVec2S 1024 avgt ns/op 314.788 165.205 1.90 VectorTwoShorts.reverseBytesVec2S 64 avgt ns/op 17.739 9.106 1.94 VectorTwoShorts.reverseBytesVec2S 128 avgt ns/op 32.591 15.632 2.08 VectorTwoShorts.reverseBytesVec2S 512 avgt ns/op 126.154 55.284 2.28 VectorTwoShorts.reverseBytesVec2S 1024 avgt ns/op 254.592 107.457 2.36 We can observe the similar uplift on an AArch64 N1 (NEON) machine. ------------- Commit messages: - 8359419: AArch64: Relax min vector length to 32-bit for short vectors Changes: https://git.openjdk.org/jdk/pull/26057/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26057&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359419 Stats: 306 lines in 8 files changed: 196 ins; 9 del; 101 mod Patch: https://git.openjdk.org/jdk/pull/26057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26057/head:pull/26057 PR: https://git.openjdk.org/jdk/pull/26057 From xgong at openjdk.org Tue Jul 1 06:09:44 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 1 Jul 2025 06:09:44 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 09:16:48 GMT, Xiaohong Gong wrote: >> JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). >> >> Two key areas require improvement: >> 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. >> 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. >> >> This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. >> >> Main changes: >> 1. Java-side API refactoring: >> - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on >> architectures like AArch64. >> 2. C2 compiler IR refactoring: >> - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. >> 3. Backend changes: >> - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. >> >> Performance: >> The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: >> >> Benchmark Mode Cnt Unit SIZE Before After Gain >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.31... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Address review comments > - Merge 'jdk:master' into JDK-8355563 > - 8355563: VectorAPI: Refactor current implementation of subword gather load API Ping again! Thanks in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3021961883 From dfenacci at openjdk.org Tue Jul 1 06:25:42 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 1 Jul 2025 06:25:42 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v3] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Mon, 30 Jun 2025 08:58:07 GMT, Manuel H?ssig wrote: >> After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. >> >> This PR changes the test to reflect the changes introduced in #25872. >> >> Testing: >> - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) >> - [x] tier1,tier2 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous newline > - Add copyright Looks good to me. Thanks! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/26024#pullrequestreview-2973682287 From xgong at openjdk.org Tue Jul 1 06:27:43 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 1 Jul 2025 06:27:43 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v3] In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 12:05:08 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2087: >> >>> 2085: assert(vector_length_in_bytes > FloatRegister::neon_vl, "ASIMD impl should be used instead"); >>> 2086: assert(vector_length_in_bytes <= FloatRegister::sve_vl_max, "unsupported vector length"); >>> 2087: assert(is_power_of_2(vector_length_in_bytes), "unsupported vector length"); >> >> Better to compare with `MaxVectorSize`. >> >> I suggest using `assert(length_in_bytes == MaxVectorSize, "invalid vector length");` and putting this assertion in `aarch64_vector.ad` file, i.e. inside the matching rule. > > Why is it better that way? Currently the assertions check that we end up here if there computations that can be done only using SVE (length > neon && length <= sve). What would happen if a user operates 256b VectorAPI vectors on a 512b SVE platform? That would be the operations with partial vector size valid. For such cases, we will generate a mask in IR level, and a `VectorBlend` will be generated for this reduction case. Otherwise the result will be incorrect. So the vector size should be equal to MaxVectorSize theoretically. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2176524365 From xgong at openjdk.org Tue Jul 1 06:27:44 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 1 Jul 2025 06:27:44 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: References: <2jvFY4hq9FPdk9e4Zg6LRPdRVhDTGgxofL-we8c-mns=.4e6ce509-67a4-4e46-a661-2b0951f88731@github.com> Message-ID: On Mon, 30 Jun 2025 12:20:19 GMT, Mikhail Ablakatov wrote: >> I have the same concern about the order issue with @eme64. >> Should we only enable this only for VectorAPI case, which doesn't require strict-order? > > FP reductions have been disabled for auto-vectorization, please see the following comment: https://github.com/openjdk/jdk/pull/23181/files#diff-edf6d70f65d81dc12a483088e0610f4e059bd40697f242aedfed5c2da7475f1aR130 . You can also check https://github.com/openjdk/jdk/pull/23181#issuecomment-3018988067 to see how the patch affects auto-vectorization performance. The only benchmarks that saw a performance uplift on a 256b SVE platform is `VectorReduction2.WithSuperword.intMulBig` (which is fine since it's an integer benchmark). Yes, these operations are disabled for SLP. But maybe we could add an assertion to check the restrict flag in the match rules. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2176528442 From epeter at openjdk.org Tue Jul 1 06:30:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Jul 2025 06:30:44 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt [v2] In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Fri, 27 Jun 2025 01:43:16 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. >> >> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. >> 2. If these special values are found, return immediately with minimal modifications to the result register. >> 3. Performance testing shows the modified intrinsic improves throughput by 65.1% over the original intrinsic on average for the special values while throughput drops by 5.5% for the normal value range (-INF, -2^(-1022)], [2^(-1022), INF). >> >> The commands to run all relevant micro-benchmarks are posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. >> >> | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | >> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | >> | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | >> | [0] | 344990 | 627561 | +81.91 | >> | [-0] ... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Ensure ABS_MASK is a 128-bit memory sized location and only use equal enum for UCOMISD checks Did not review the patch in detail, but looks reasonable. Tests are passing on my end with commit 3 / v01. @missa-prime Thanks for taking care of this! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25962#pullrequestreview-2973696349 From epeter at openjdk.org Tue Jul 1 06:38:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Jul 2025 06:38:45 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 06:07:03 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Address review comments >> - Merge 'jdk:master' into JDK-8355563 >> - 8355563: VectorAPI: Refactor current implementation of subword gather load API > > Ping again! Thanks in advance! @XiaohongGong I'm a little busy at the moment, and soon going on a summer vacation, so I cannot promise a full review soon. Feel free to ask someone else to have a look. I quickly looked through your new benchmark results you published after integration of https://github.com/openjdk/jdk/pull/25539. There seem to still be a few cases where `Gain < 1`. Especially: GatherOperationsBenchmark.microShortGather512_MASK 256 thrpt 30 ops/ms 11587.465 10674.598 0.92 GatherOperationsBenchmark.microShortGather512_MASK 1024 thrpt 30 ops/ms 2902.731 2629.739 0.90 GatherOperationsBenchmark.microShortGather512_MASK 4096 thrpt 30 ops/ms 741.546 671.124 0.90 and GatherOperationsBenchmark.microShortGather256_MASK 256 thrpt 30 ops/ms 11339.217 10951.141 0.96 GatherOperationsBenchmark.microShortGather256_MASK 1024 thrpt 30 ops/ms 2840.081 2718.823 0.95 GatherOperationsBenchmark.microShortGather256_MASK 4096 thrpt 30 ops/ms 725.334 696.343 0.96 and GatherOperationsBenchmark.microByteGather512_MASK 64 thrpt 30 ops/ms 50588.210 48220.741 0.95 Do you know what happens in those cases? That said: https://github.com/openjdk/jdk/pull/25539 seems to have been quite the sucess, there are way fewer regressions now than before ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3022057434 From xgong at openjdk.org Tue Jul 1 06:43:44 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 1 Jul 2025 06:43:44 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: References: Message-ID: <7-WqNSzjPLOsHJ4DHogxqbiInl8TIz5sxIEXbIfo2OQ=.912568b8-830d-47cc-a837-46af6be618f3@github.com> On Tue, 1 Jul 2025 06:07:03 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Address review comments >> - Merge 'jdk:master' into JDK-8355563 >> - 8355563: VectorAPI: Refactor current implementation of subword gather load API > > Ping again! Thanks in advance! > @XiaohongGong I'm a little busy at the moment, and soon going on a summer vacation, so I cannot promise a full review soon. Feel free to ask someone else to have a look. > > I quickly looked through your new benchmark results you published after integration of #25539. There seem to still be a few cases where `Gain < 1`. Especially: > > ``` > GatherOperationsBenchmark.microShortGather512_MASK 256 thrpt 30 ops/ms 11587.465 10674.598 0.92 > GatherOperationsBenchmark.microShortGather512_MASK 1024 thrpt 30 ops/ms 2902.731 2629.739 0.90 > GatherOperationsBenchmark.microShortGather512_MASK 4096 thrpt 30 ops/ms 741.546 671.124 0.90 > ``` > > and > > ``` > GatherOperationsBenchmark.microShortGather256_MASK 256 thrpt 30 ops/ms 11339.217 10951.141 0.96 > GatherOperationsBenchmark.microShortGather256_MASK 1024 thrpt 30 ops/ms 2840.081 2718.823 0.95 > GatherOperationsBenchmark.microShortGather256_MASK 4096 thrpt 30 ops/ms 725.334 696.343 0.96 > ``` > > and > > ``` > GatherOperationsBenchmark.microByteGather512_MASK 64 thrpt 30 ops/ms 50588.210 48220.741 0.95 > ``` > > Do you know what happens in those cases? Thanks for your input! Yes, I spent some time making an analysis on these little regressions. Seems there are the architecture HW influences like the cache miss or code alignment. I tried with a larger loop alignment like 32, and the performance will be improved and regressions are gone. Since I'm not quite familiar with X86 architectures, I'm not sure of the exact point. Any suggestions on that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3022088710 From jbhateja at openjdk.org Tue Jul 1 06:45:42 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Jul 2025 06:45:42 GMT Subject: RFR: 8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 05:15:40 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to enhance the existing x86 assembly stubs using PUSH and POP instructions with paired PUSHP/POPP instructions which are part of Intel APX technology. > > In Intel APX, the PUSHP and POPP instructions are modern, compact replacements for the legacy PUSH and POP, designed to work seamlessly with the expanded set of 32 general-purpose registers (R0?R31). Unlike their predecessors, they use the new APX (REX2-based) encoding, enabling more uniform and efficient instruction formats. These instructions improve code density, simplify register access, and are optimized for performance on APX-enabled CPUs. > > Pairing PUSHP and POPP in Intel APX provides CPU-level benefits such as more efficient instruction decoding, better stack pointer tracking, and improved register dependency management. Their uniform encoding allows for streamlined execution, reduced pipeline stalls, and potential micro-op fusion, all of which enhance performance and power efficiency. This pairing helps the processor optimize speculative execution and register lifetimes, making code faster and more scalable on modern architectures. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 800: > 798: void MacroAssembler::push(Register src, bool is_pair) { > 799: if (is_pair && VM_Version::supports_apx_f()) { > 800: pushp(src); What does is_pair signify here ? You are just pushing one register. Do you intend to use has_matching_pop ? src/hotspot/cpu/x86/macroAssembler_x86.cpp line 807: > 805: > 806: void MacroAssembler::pop(Register dst, bool is_pair) { > 807: if (is_pair && VM_Version::supports_apx_f()) { Same as above, new argument suggestion: please use has_matching_push. I understand your purpose here is to delegate the responsibility of balancing of PPX pair to the user. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25889#discussion_r2176508727 PR Review Comment: https://git.openjdk.org/jdk/pull/25889#discussion_r2176511119 From jbhateja at openjdk.org Tue Jul 1 06:45:42 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Jul 2025 06:45:42 GMT Subject: RFR: 8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 06:11:29 GMT, Jatin Bhateja wrote: >> The goal of this PR is to enhance the existing x86 assembly stubs using PUSH and POP instructions with paired PUSHP/POPP instructions which are part of Intel APX technology. >> >> In Intel APX, the PUSHP and POPP instructions are modern, compact replacements for the legacy PUSH and POP, designed to work seamlessly with the expanded set of 32 general-purpose registers (R0?R31). Unlike their predecessors, they use the new APX (REX2-based) encoding, enabling more uniform and efficient instruction formats. These instructions improve code density, simplify register access, and are optimized for performance on APX-enabled CPUs. >> >> Pairing PUSHP and POPP in Intel APX provides CPU-level benefits such as more efficient instruction decoding, better stack pointer tracking, and improved register dependency management. Their uniform encoding allows for streamlined execution, reduced pipeline stalls, and potential micro-op fusion, all of which enhance performance and power efficiency. This pairing helps the processor optimize speculative execution and register lifetimes, making code faster and more scalable on modern architectures. > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 807: > >> 805: >> 806: void MacroAssembler::pop(Register dst, bool is_pair) { >> 807: if (is_pair && VM_Version::supports_apx_f()) { > > Same as above, new argument suggestion: please use has_matching_push. > I understand your purpose here is to delegate the responsibility of balancing of PPX pair to the user. For a cleaner interface, I think we can also maintain a RAII style APXPushPopPairTracker in the stub snippets using push/pop instruction sequence and wrap the actual assembler call underneath. The idea here is to catch the balancing error upfront as PPX is purely a performance hint. Instructions with this hint have the same functional semantics as those without. PPX hints set by the compiler that violate the balancing rule may turn off the PPX optimization, but they will not affect program semantics.. class APXPushPopPairTracker { private: int _counter; public: APXPushPopPairTracker() _counter(0) { } ~APXPushPopPairTracker() { assert(_counter == 0, "Push/pop pair mismatch"); } void push(Register reg, bool has_matching_pop) { if (has_matching_pop && VM_Version::supports_apx_f()) { Assembler::pushp(reg); incrementCounter(); } else { Assembler::push(reg); } } void pop(Register reg, bool has_matching_push) { if (has_matching_push && VM_Version::supports_apx_f()) { Assembler::popp(reg); decrementCounter(); } else { Assembler::pop(reg); } } void incrementCounter() { _counter++; } void decrementCounter() { _counter--; } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25889#discussion_r2176549150 From jbhateja at openjdk.org Tue Jul 1 06:48:42 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Jul 2025 06:48:42 GMT Subject: RFR: 8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 06:11:29 GMT, Jatin Bhateja wrote: >> The goal of this PR is to enhance the existing x86 assembly stubs using PUSH and POP instructions with paired PUSHP/POPP instructions which are part of Intel APX technology. >> >> In Intel APX, the PUSHP and POPP instructions are modern, compact replacements for the legacy PUSH and POP, designed to work seamlessly with the expanded set of 32 general-purpose registers (R0?R31). Unlike their predecessors, they use the new APX (REX2-based) encoding, enabling more uniform and efficient instruction formats. These instructions improve code density, simplify register access, and are optimized for performance on APX-enabled CPUs. >> >> Pairing PUSHP and POPP in Intel APX provides CPU-level benefits such as more efficient instruction decoding, better stack pointer tracking, and improved register dependency management. Their uniform encoding allows for streamlined execution, reduced pipeline stalls, and potential micro-op fusion, all of which enhance performance and power efficiency. This pairing helps the processor optimize speculative execution and register lifetimes, making code faster and more scalable on modern architectures. > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 807: > >> 805: >> 806: void MacroAssembler::pop(Register dst, bool is_pair) { >> 807: if (is_pair && VM_Version::supports_apx_f()) { > > Same as above, new argument suggestion: please use has_matching_push. > I understand your purpose here is to delegate the responsibility of balancing of PPX pair to the user. For a cleaner interface, I think we can also maintain a RAII style APXPushPopPairTracker in the stub snippets using push/pop instruction sequence and wrap the actual assembler call underneath. The idea here is to catch the balancing error upfront as PPX is purely a performance hint. Instructions with this hint have the same functional semantics as those without. PPX hints set by the compiler that violate the balancing rule may turn off the PPX optimization, but they will not affect program semantics.. class APXPushPopPairTracker { private: int _counter; public: APXPushPopPairTracker() _counter(0) { } ~APXPushPopPairTracker() { assert(_counter == 0, "Push/pop pair mismatch"); } void push(Register reg, bool has_matching_pop) { if (has_matching_pop && VM_Version::supports_apx_f()) { Assembler::pushp(reg); incrementCounter(); } else { Assembler::push(reg); } } void pop(Register reg, bool has_matching_push) { if (has_matching_push && VM_Version::supports_apx_f()) { Assembler::popp(reg); decrementCounter(); } else { Assembler::pop(reg); } } void incrementCounter() { _counter++; } void decrementCounter() { _counter--; } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25889#discussion_r2176564840 From mhaessig at openjdk.org Tue Jul 1 06:50:46 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 1 Jul 2025 06:50:46 GMT Subject: RFR: 8361092: Remove trailing spaces in x86 ad files In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 15:34:18 GMT, Manuel H?ssig wrote: > This PR fixes some trailing spaces in `x86_64.ad`. > > Testing: > - [ ] Github Actions Thank you for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26048#issuecomment-3022106129 From mhaessig at openjdk.org Tue Jul 1 06:50:47 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 1 Jul 2025 06:50:47 GMT Subject: Integrated: 8361092: Remove trailing spaces in x86 ad files In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 15:34:18 GMT, Manuel H?ssig wrote: > This PR fixes some trailing spaces in `x86_64.ad`. > > Testing: > - [ ] Github Actions This pull request has now been integrated. Changeset: b32ccf2c Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/b32ccf2cb23e0180187f4238140583a923fc27c4 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8361092: Remove trailing spaces in x86 ad files Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/26048 From mhaessig at openjdk.org Tue Jul 1 06:52:32 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 1 Jul 2025 06:52:32 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v4] In-Reply-To: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: > After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. > > This PR changes the test to reflect the changes introduced in #25872. > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) > - [x] tier1,tier2 plus Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26024/files - new: https://git.openjdk.org/jdk/pull/26024/files/8beb5898..71767802 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26024&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26024&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26024.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26024/head:pull/26024 PR: https://git.openjdk.org/jdk/pull/26024 From mhaessig at openjdk.org Tue Jul 1 06:52:33 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 1 Jul 2025 06:52:33 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v3] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Mon, 30 Jun 2025 19:48:44 GMT, Andrey Turbanov wrote: >> Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove superfluous newline >> - Add copyright > > test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 159: > >> 157: // Tiered modes >> 158: int tieredCount = heuristicCount(cpus, Compilation.Tiered, debug); >> 159: pass(tieredCount, opt, "-XX:NonNMethodCodeHeapSize=" + NonNMethodCodeHeapSize); > > Suggestion: > > pass(tieredCount, opt, "-XX:NonNMethodCodeHeapSize=" + NonNMethodCodeHeapSize); Good catch, thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2176568786 From epeter at openjdk.org Tue Jul 1 06:55:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Jul 2025 06:55:41 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: <7-WqNSzjPLOsHJ4DHogxqbiInl8TIz5sxIEXbIfo2OQ=.912568b8-830d-47cc-a837-46af6be618f3@github.com> References: <7-WqNSzjPLOsHJ4DHogxqbiInl8TIz5sxIEXbIfo2OQ=.912568b8-830d-47cc-a837-46af6be618f3@github.com> Message-ID: On Tue, 1 Jul 2025 06:41:32 GMT, Xiaohong Gong wrote: >> Ping again! Thanks in advance! > >> @XiaohongGong I'm a little busy at the moment, and soon going on a summer vacation, so I cannot promise a full review soon. Feel free to ask someone else to have a look. >> >> I quickly looked through your new benchmark results you published after integration of #25539. There seem to still be a few cases where `Gain < 1`. Especially: >> >> ``` >> GatherOperationsBenchmark.microShortGather512_MASK 256 thrpt 30 ops/ms 11587.465 10674.598 0.92 >> GatherOperationsBenchmark.microShortGather512_MASK 1024 thrpt 30 ops/ms 2902.731 2629.739 0.90 >> GatherOperationsBenchmark.microShortGather512_MASK 4096 thrpt 30 ops/ms 741.546 671.124 0.90 >> ``` >> >> and >> >> ``` >> GatherOperationsBenchmark.microShortGather256_MASK 256 thrpt 30 ops/ms 11339.217 10951.141 0.96 >> GatherOperationsBenchmark.microShortGather256_MASK 1024 thrpt 30 ops/ms 2840.081 2718.823 0.95 >> GatherOperationsBenchmark.microShortGather256_MASK 4096 thrpt 30 ops/ms 725.334 696.343 0.96 >> ``` >> >> and >> >> ``` >> GatherOperationsBenchmark.microByteGather512_MASK 64 thrpt 30 ops/ms 50588.210 48220.741 0.95 >> ``` >> >> Do you know what happens in those cases? > > Thanks for your input! Yes, I spent some time making an analysis on these little regressions. Seems there are the architecture HW influences like the cache miss or code alignment. I tried with a larger loop alignment like 32, and the performance will be improved and regressions are gone. Since I'm not quite familiar with X86 architectures, I'm not sure of the exact point. Any suggestions on that? @XiaohongGong Maybe someone from Intel (@jatin-bhateja @sviswa7) can help you with the x86 specific issues. You could always use hardware counters to measure cache misses. Also if the vectors are not cache-line aligned, there may be split loads or stores. Also that can be measured with hardware counters. Maybe the benchmark needs to be improved somehow, to account for issues with alignment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3022132271 From dfenacci at openjdk.org Tue Jul 1 06:58:40 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 1 Jul 2025 06:58:40 GMT Subject: RFR: 8360783: CTW: Skip deoptimization between tiers [v2] In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 08:38:31 GMT, Aleksey Shipilev wrote: >> When profiling CTW runs, I noticed we spend a lot of time dealing with deoptimization. We do this excessively, deoptimizing before compilation on every tier. This is excessive: Hotspot honors compilation requests on subsequent levels without the need for explicit deoptimization. Not doing deopt between tiers greatly improves CTW performance. >> >> A taste of improvements, about 15% less CPU spent: >> >> >> $ time make test TEST=applications/ctw/modules >> >> # Current >> real 5m1.616s >> user 79m41.398s >> sys 14m39.607s >> >> # Patched >> real 3m55.411s >> user 69m19.227s >> sys 5m24.323s >> >> >> The compilation still works as expected, progressing through tiers 1..4: >> >> >> $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:CICompilerCount=2" ./ctw.sh modules:jdk.compiler | tee out >> ... >> $ grep sun.tools.serialver.resources.serialver_de::getContents out >> 101783 55033 b 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101785 55036 b 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101786 55033 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101786 55038 b 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101787 55036 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101792 55040 b 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101797 55038 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101798 55040 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: marked for deoptimization > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java > > Co-authored-by: Tobias Hartmann Thanks @shipilev! I really welcome any change that makes CTW a bit faster ? Looks good to me. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/26013#pullrequestreview-2973784592 From xgong at openjdk.org Tue Jul 1 07:02:48 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 1 Jul 2025 07:02:48 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: References: Message-ID: <3sWLk_sAMLtcvRUjXk9hYe-K2MBQl9fH2Qg0MF7lwDk=.b8867d51-e822-43c0-93ab-58228c6eb1d5@github.com> On Mon, 30 Jun 2025 13:25:09 GMT, Mikhail Ablakatov wrote: >> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. >> >> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. >> >> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms >> >> >> Fujitsu A64FX (SVE 512-bit): >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - cleanup: address nits, rename several symbols > - cleanup: remove unreferenced definitions > - Address review comments. > > - fixup: disable FP mul reduction auto-vectorization for all targets > - fixup: add a tmp vReg to reduce_mul_integral_gt128b and > reduce_non_strict_order_mul_fp_gt128bto keep vsrc unmodified > - cleanup: replace a complex lambda in the above methods with a loop > - cleanup: rename symbols to follow the existing naming convention > - cleanup: add asserts to SVE only instructions > - split mul FP reduction instructions into strictly-ordered (default) > and explicitly non strictly-ordered > - remove redundant conditions in TestVectorFPReduction.java > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > | Benchmark | Before | After | Units | Diff | > |---------------------------|----------|----------|--------|-------| > | ByteMaxVector.MULLanes | 619.156 | 9884.578 | ops/ms | 1496% | > | DoubleMaxVector.MULLanes | 184.693 | 2712.051 | ops/ms | 1368% | > | FloatMaxVector.MULLanes | 277.818 | 3388.038 | ops/ms | 1119% | > | IntMaxVector.MULLanes | 371.225 | 4765.434 | ops/ms | 1183% | > | LongMaxVector.MULLanes | 205.149 | 2672.975 | ops/ms | 1203% | > | ShortMaxVector.MULLanes | 472.804 | 5122.917 | ops/ms | 984% | > - Merge branch 'master' into 8343689-rebase > - fixup: don't modify the value in vsrc > > Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this > change, the result of recursive folding is held in vtmp1. To be able to > pass this intermediate result to reduce_mul_integral_le128b(), we would > have to use another temporary FloatRegister, as vtmp1 would essentially > act as vsrc. It's possible to get around this however: > reduce_mul_integral_le128b() is modified so it's possible to pass > matching vsrc and vtmp2 arguments. By doing this, we save ourselves a > temporary register in rules that match to reduce_mul_integral_gt128b(). > - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating > - Use EXT instead of COMPACT to split a vector into two halves > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > Short... src/hotspot/cpu/aarch64/aarch64_vector.ad line 3536: > 3534: > 3535: instruct reduce_mulF_gt128b(vRegF dst, vRegF fsrc, vReg vsrc, vReg tmp) %{ > 3536: predicate(Matcher::vector_length_in_bytes(n->in(2)) > 16 && n->as_Reduction()->requires_strict_order()); Are there the cases that can match with this rule? src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2097: > 2095: sve_movprfx(vtmp1, vsrc); // copy > 2096: sve_ext(vtmp1, vtmp1, vector_length_in_bytes / 2); // swap halves > 2097: sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vsrc); // multiply halves > sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vsrc); Can we use `ptrue` instread of `pgtmp` here? The higher bits can be computed, but they have not influences to the final results, right? src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2106: > 2104: sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vtmp2); // multiply halves > 2105: vector_length_in_bytes = vector_length_in_bytes / 2; > 2106: vector_length = vector_length / 2; I guess you want to update the `pgtmp` with new `vector_length`? But seems the code is missing. Anyway, maybe the it's not necessary to generate a predicate as I commented above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2176590314 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2176584327 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2176587011 From xgong at openjdk.org Tue Jul 1 07:10:41 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 1 Jul 2025 07:10:41 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: <7-WqNSzjPLOsHJ4DHogxqbiInl8TIz5sxIEXbIfo2OQ=.912568b8-830d-47cc-a837-46af6be618f3@github.com> References: <7-WqNSzjPLOsHJ4DHogxqbiInl8TIz5sxIEXbIfo2OQ=.912568b8-830d-47cc-a837-46af6be618f3@github.com> Message-ID: On Tue, 1 Jul 2025 06:41:32 GMT, Xiaohong Gong wrote: >> Ping again! Thanks in advance! > >> @XiaohongGong I'm a little busy at the moment, and soon going on a summer vacation, so I cannot promise a full review soon. Feel free to ask someone else to have a look. >> >> I quickly looked through your new benchmark results you published after integration of #25539. There seem to still be a few cases where `Gain < 1`. Especially: >> >> ``` >> GatherOperationsBenchmark.microShortGather512_MASK 256 thrpt 30 ops/ms 11587.465 10674.598 0.92 >> GatherOperationsBenchmark.microShortGather512_MASK 1024 thrpt 30 ops/ms 2902.731 2629.739 0.90 >> GatherOperationsBenchmark.microShortGather512_MASK 4096 thrpt 30 ops/ms 741.546 671.124 0.90 >> ``` >> >> and >> >> ``` >> GatherOperationsBenchmark.microShortGather256_MASK 256 thrpt 30 ops/ms 11339.217 10951.141 0.96 >> GatherOperationsBenchmark.microShortGather256_MASK 1024 thrpt 30 ops/ms 2840.081 2718.823 0.95 >> GatherOperationsBenchmark.microShortGather256_MASK 4096 thrpt 30 ops/ms 725.334 696.343 0.96 >> ``` >> >> and >> >> ``` >> GatherOperationsBenchmark.microByteGather512_MASK 64 thrpt 30 ops/ms 50588.210 48220.741 0.95 >> ``` >> >> Do you know what happens in those cases? > > Thanks for your input! Yes, I spent some time making an analysis on these little regressions. Seems there are the architecture HW influences like the cache miss or code alignment. I tried with a larger loop alignment like 32, and the performance will be improved and regressions are gone. Since I'm not quite familiar with X86 architectures, I'm not sure of the exact point. Any suggestions on that? > @XiaohongGong Maybe someone from Intel (@jatin-bhateja @sviswa7) can help you with the x86 specific issues. You could always use hardware counters to measure cache misses. Also if the vectors are not cache-line aligned, there may be split loads or stores. Also that can be measured with hardware counters. Maybe the benchmark needs to be improved somehow, to account for issues with alignment. I also tried to measure cache misses with perf on my x86 machine, and I noticed the cache miss is increased. The generated code layout of the test/benchmark is changed with my changes in Java side, so I guess maybe the alignment is different with before. To verify my thought, I used the vm option `-XX:OptoLoopAlignment=32`, and the performance can be improved a lot compared with the version without my change. So I think the patch itself maybe acceptable even we noticed minor regressions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3022195040 From bmaillard at openjdk.org Tue Jul 1 07:11:42 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 1 Jul 2025 07:11:42 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v2] In-Reply-To: <0MJe_8nA-ILWqoVG-9rzuq5Pe9xX-FG2LN3k9Cy8nqU=.d724c6cf-cb02-45c4-95a4-5bd1fef7462b@github.com> References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> <3cLLB7fms3S4WgqOVeb7D_ZDRFsJ_-ca3qfALlmzFeU=.1002ac91-1e35-4499-9d88-6d1f76c955d0@github.com> <0MJe_8nA-ILWqoVG-9rzuq5Pe9xX-FG2LN3k9Cy8nqU=.d724c6cf-cb02-45c4-95a4-5bd1fef7462b@github.com> Message-ID: On Mon, 30 Jun 2025 13:52:01 GMT, Emanuel Peter wrote: > @benoitmaillard Very nice work, and great description :) Thank you! > > Did you check if this allows enabling any of the other disabled verifications from [JDK-8347273](https://bugs.openjdk.org/browse/JDK-8347273)? > > That may be a lot of work. Not sure if it is worth checking all of them now. @TobiHartmann how much should he invest in this now? An alternative is just tackling all the other cases later. What do you think? I have started to take a look at this and it seems that there are a lot of cases to check indeed. > @benoitmaillard One more open question for me: `raise_bottom_type` only sets the node internal `_type`. But in IGVN, we do not read from `_type` but `phase->type(in(2))`. Do you know when the `phase->type(in(2))` value changes? Is that also during CCP? Before or after the `_type` is modified? Yes, good point, I should I have mentioned this somewhere. The `phase->type(in(2))` call uses the type array from `PhaseValues`. The type array entry is actually modified earlier, in `PhaseCCP::analyze`, right after the `Value` call. You can see the `set_type` call [here](https://github.com/benoitmaillard/jdk/blob/75de51dff6d9cc3e9764737b29b9358992b488b7/src/hotspot/share/opto/phaseX.cpp#L2765). When this happens, users are added to the (local) worklist but again it does not change our issue as only value optimizations occur in that context. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26017#issuecomment-3022192988 From shade at openjdk.org Tue Jul 1 07:41:40 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Jul 2025 07:41:40 GMT Subject: RFR: 8360783: CTW: Skip deoptimization between tiers [v2] In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 08:38:31 GMT, Aleksey Shipilev wrote: >> When profiling CTW runs, I noticed we spend a lot of time dealing with deoptimization. We do this excessively, deoptimizing before compilation on every tier. This is excessive: Hotspot honors compilation requests on subsequent levels without the need for explicit deoptimization. Not doing deopt between tiers greatly improves CTW performance. >> >> A taste of improvements, about 15% less CPU spent: >> >> >> $ time make test TEST=applications/ctw/modules >> >> # Current >> real 5m1.616s >> user 79m41.398s >> sys 14m39.607s >> >> # Patched >> real 3m55.411s >> user 69m19.227s >> sys 5m24.323s >> >> >> The compilation still works as expected, progressing through tiers 1..4: >> >> >> $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:CICompilerCount=2" ./ctw.sh modules:jdk.compiler | tee out >> ... >> $ grep sun.tools.serialver.resources.serialver_de::getContents out >> 101783 55033 b 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101785 55036 b 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101786 55033 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101786 55038 b 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101787 55036 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101792 55040 b 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101797 55038 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101798 55040 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: marked for deoptimization > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java > > Co-authored-by: Tobias Hartmann Thanks! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26013#issuecomment-3022327111 From thartmann at openjdk.org Tue Jul 1 07:47:40 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 1 Jul 2025 07:47:40 GMT Subject: RFR: 8360783: CTW: Skip deoptimization between tiers [v2] In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 08:38:31 GMT, Aleksey Shipilev wrote: >> When profiling CTW runs, I noticed we spend a lot of time dealing with deoptimization. We do this excessively, deoptimizing before compilation on every tier. This is excessive: Hotspot honors compilation requests on subsequent levels without the need for explicit deoptimization. Not doing deopt between tiers greatly improves CTW performance. >> >> A taste of improvements, about 15% less CPU spent: >> >> >> $ time make test TEST=applications/ctw/modules >> >> # Current >> real 5m1.616s >> user 79m41.398s >> sys 14m39.607s >> >> # Patched >> real 3m55.411s >> user 69m19.227s >> sys 5m24.323s >> >> >> The compilation still works as expected, progressing through tiers 1..4: >> >> >> $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:CICompilerCount=2" ./ctw.sh modules:jdk.compiler | tee out >> ... >> $ grep sun.tools.serialver.resources.serialver_de::getContents out >> 101783 55033 b 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101785 55036 b 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101786 55033 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101786 55038 b 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101787 55036 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101792 55040 b 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101797 55038 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101798 55040 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: marked for deoptimization > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java > > Co-authored-by: Tobias Hartmann Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26013#pullrequestreview-2973981844 From shade at openjdk.org Tue Jul 1 08:02:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Jul 2025 08:02:45 GMT Subject: RFR: 8360783: CTW: Skip deoptimization between tiers [v2] In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 08:38:31 GMT, Aleksey Shipilev wrote: >> When profiling CTW runs, I noticed we spend a lot of time dealing with deoptimization. We do this excessively, deoptimizing before compilation on every tier. This is excessive: Hotspot honors compilation requests on subsequent levels without the need for explicit deoptimization. Not doing deopt between tiers greatly improves CTW performance. >> >> A taste of improvements, about 15% less CPU spent: >> >> >> $ time make test TEST=applications/ctw/modules >> >> # Current >> real 5m1.616s >> user 79m41.398s >> sys 14m39.607s >> >> # Patched >> real 3m55.411s >> user 69m19.227s >> sys 5m24.323s >> >> >> The compilation still works as expected, progressing through tiers 1..4: >> >> >> $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:CICompilerCount=2" ./ctw.sh modules:jdk.compiler | tee out >> ... >> $ grep sun.tools.serialver.resources.serialver_de::getContents out >> 101783 55033 b 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101785 55036 b 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101786 55033 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101786 55038 b 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101787 55036 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101792 55040 b 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101797 55038 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101798 55040 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: marked for deoptimization > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java > > Co-authored-by: Tobias Hartmann Aw. Thanks! Here goes again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26013#issuecomment-3022410574 From shade at openjdk.org Tue Jul 1 08:02:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Jul 2025 08:02:45 GMT Subject: Integrated: 8360783: CTW: Skip deoptimization between tiers In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 08:19:34 GMT, Aleksey Shipilev wrote: > When profiling CTW runs, I noticed we spend a lot of time dealing with deoptimization. We do this excessively, deoptimizing before compilation on every tier. This is excessive: Hotspot honors compilation requests on subsequent levels without the need for explicit deoptimization. Not doing deopt between tiers greatly improves CTW performance. > > A taste of improvements, about 15% less CPU spent: > > > $ time make test TEST=applications/ctw/modules > > # Current > real 5m1.616s > user 79m41.398s > sys 14m39.607s > > # Patched > real 3m55.411s > user 69m19.227s > sys 5m24.323s > > > The compilation still works as expected, progressing through tiers 1..4: > > > $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:CICompilerCount=2" ./ctw.sh modules:jdk.compiler | tee out > ... > $ grep sun.tools.serialver.resources.serialver_de::getContents out > 101783 55033 b 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) > 101785 55036 b 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) > 101786 55033 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used > 101786 55038 b 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) > 101787 55036 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used > 101792 55040 b 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) > 101797 55038 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used > 101798 55040 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: marked for deoptimization This pull request has now been integrated. Changeset: cd6caedd Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/cd6caedd0a3c9ebd4c8c57e64f62b60161c5cd7c Stats: 8 lines in 1 file changed: 6 ins; 1 del; 1 mod 8360783: CTW: Skip deoptimization between tiers Reviewed-by: thartmann, mhaessig, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/26013 From eastigeevich at openjdk.org Tue Jul 1 08:08:49 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 1 Jul 2025 08:08:49 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: <0TjXtL5ABEBUwmu1VlJ9kNDs95zi8HGA-S2A0BU9GeY=.2fa893f4-96c4-4761-91b9-3b6250212c7a@github.com> On Thu, 26 Jun 2025 16:20:44 GMT, Chad Rakoczy wrote: >> src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp line 90: >> >>> 88: // Patch the constant in the call's trampoline stub. >>> 89: address trampoline_stub_addr = get_trampoline(); >>> 90: if (trampoline_stub_addr != nullptr && dest != trampoline_stub_addr) { >> >> I think you will not need the checks if you rewrite the code as follows: >> ```c++ >> address addr_call = ...; >> assert(); >> >> if (!Assembler::reachable_from_branch_at(addr_call, dest)) { >> address trampoline_stub_addr = get_trampoline(); >> assert (trampoline_stub_addr != nullptr, "we need a trampoline"); >> assert (! is_NativeCallTrampolineStub_at(dest), "chained trampolines"); >> nativeCallTrampolineStub_at(trampoline_stub_addr)->set_destination(dest); >> dest = trampoline_stub_addr; >> } >> set_destination(dest); >> ICache::invalidate_range(addr_call, instruction_size); >> >> >> If `dest` is a trampoline in the current nmethod, it is always reachable. So you will not go into setting trampoline's target to itself. Also we will call `get_trampoline`, which involves `CodeCache::find_blob` and ` a traversal of relocations, only if we need a trampoline. > > I would need to check the assumptions that other callers make about this function. In the current state it updates the trampoline regardless if the branch is reachable or not. With your change it would require the caller to also update the trampoline to make sure it is not stale. @theRealAph When we don't need a trampoline (a call site is a direct call), we update the trampoline to have the same destination as the call site. I have not found places in Hotspot relying on this. Do you remember why we are doing this? Is it Ok not to update trampolines in the case of reachable destinations? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2176748370 From aph at openjdk.org Tue Jul 1 08:13:41 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 1 Jul 2025 08:13:41 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 05:59:15 GMT, Xiaohong Gong wrote: > ### Background > On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions. > > For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size. > > To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors. > > ### Impact Analysis > #### 1. Vector types > Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change. > > #### 2. Vector API > No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length. > > #### 3. Auto-vectorization > Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks. > > #### 4. Codegen of vector nodes > NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored. > > Details: > - Lanewise vector operations are unaffected as explained above. > - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE). > - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, adding an explicit vector size check in `match_rule_s... src/hotspot/cpu/aarch64/aarch64.ad line 2371: > 2369: switch(bt) { > 2370: case T_BOOLEAN: > 2371: // It needs to load/store a vector mask with only 2 elements Suggestion: // Load/store a vector mask with only 2 elements Same with the other cases. src/hotspot/cpu/aarch64/aarch64.ad line 2386: > 2384: break; > 2385: default: > 2386: // Limit the min vector length to 64-bit normally. Suggestion: // Limit the min vector length to 64-bit. src/hotspot/cpu/aarch64/aarch64_vector.ad line 199: > 197: case Op_MaxReductionV: > 198: // Reductions with less than 8 bytes vector length are > 199: // not supported for now. Suggestion: // not supported. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26057#discussion_r2176759967 PR Review Comment: https://git.openjdk.org/jdk/pull/26057#discussion_r2176761846 PR Review Comment: https://git.openjdk.org/jdk/pull/26057#discussion_r2176762709 From aph at openjdk.org Tue Jul 1 08:30:48 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 1 Jul 2025 08:30:48 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: <0TjXtL5ABEBUwmu1VlJ9kNDs95zi8HGA-S2A0BU9GeY=.2fa893f4-96c4-4761-91b9-3b6250212c7a@github.com> References: <0TjXtL5ABEBUwmu1VlJ9kNDs95zi8HGA-S2A0BU9GeY=.2fa893f4-96c4-4761-91b9-3b6250212c7a@github.com> Message-ID: On Tue, 1 Jul 2025 08:05:50 GMT, Evgeny Astigeevich wrote: > @theRealAph When we don't need a trampoline (a call site is a direct call), we update the trampoline to have the same destination as the call site. Yes, that's fundamental to the design. > I have not found places in Hotspot relying on this. Do you remember why we are doing this? Is it Ok not to update trampolines in the case of reachable destinations? No. We always keep the trampoline up to date so that we don't have to deal with a race condition when patching trampoline calls. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2176812614 From aph at openjdk.org Tue Jul 1 08:34:48 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 1 Jul 2025 08:34:48 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: <0TjXtL5ABEBUwmu1VlJ9kNDs95zi8HGA-S2A0BU9GeY=.2fa893f4-96c4-4761-91b9-3b6250212c7a@github.com> Message-ID: On Tue, 1 Jul 2025 08:28:00 GMT, Andrew Haley wrote: >> @theRealAph When we don't need a trampoline (a call site is a direct call), we update the trampoline to have the same destination as the call site. I have not found places in Hotspot relying on this. >> Do you remember why we are doing this? Is it Ok not to update trampolines in the case of reachable destinations? > >> @theRealAph When we don't need a trampoline (a call site is a direct call), we update the trampoline to have the same destination as the call site. > > Yes, that's fundamental to the design. > >> I have not found places in Hotspot relying on this. Do you remember why we are doing this? Is it Ok not to update trampolines in the case of reachable destinations? > > No. We always keep the trampoline up to date so that we don't have to deal with a race condition when patching trampoline calls. Please read the comments which begin: `AArch64 OpenJDK uses four different types of calls:` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2176825935 From xgong at openjdk.org Tue Jul 1 08:35:42 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 1 Jul 2025 08:35:42 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 08:10:16 GMT, Andrew Haley wrote: >> ### Background >> On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions. >> >> For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size. >> >> To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors. >> >> ### Impact Analysis >> #### 1. Vector types >> Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change. >> >> #### 2. Vector API >> No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length. >> >> #### 3. Auto-vectorization >> Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks. >> >> #### 4. Codegen of vector nodes >> NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored. >> >> Details: >> - Lanewise vector operations are unaffected as explained above. >> - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE). >> - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, addin... > > src/hotspot/cpu/aarch64/aarch64.ad line 2371: > >> 2369: switch(bt) { >> 2370: case T_BOOLEAN: >> 2371: // It needs to load/store a vector mask with only 2 elements > > Suggestion: > > // Load/store a vector mask with only 2 elements > > Same with the other cases. Thanks so much for your comment. I will fix them soon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26057#discussion_r2176831961 From aph at openjdk.org Tue Jul 1 08:40:55 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 1 Jul 2025 08:40:55 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 22:32:24 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [ ] Linux x64 fastdebug all >> - [ ] Linux aarch64 fastdebug all >> - [ ] ... > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Update how call sites are fixed > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix pointer printing > - Use set_destination_mt_safe > - Print address as pointer > - Use new _metadata_size instead of _jvmci_data_size > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Only check branch distance for aarch64 and riscv > - Move far branch fix to fix_relocation_after_move > - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 84: > 82: if (NativeCall::is_call_at(addr())) { > 83: NativeCall* call = nativeCall_at(addr()); > 84: if (be_safe) { Why is this change necessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2176847208 From aph at openjdk.org Tue Jul 1 08:44:51 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 1 Jul 2025 08:44:51 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 22:32:24 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [ ] Linux x64 fastdebug all >> - [ ] Linux aarch64 fastdebug all >> - [ ] ... > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Update how call sites are fixed > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix pointer printing > - Use set_destination_mt_safe > - Print address as pointer > - Use new _metadata_size instead of _jvmci_data_size > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Only check branch distance for aarch64 and riscv > - Move far branch fix to fix_relocation_after_move > - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 117: > 115: } > 116: > 117: void poll_Relocation::fix_relocation_after_move(const CodeBuffer* src, CodeBuffer* dest, bool is_nmethod_relocation) { Suggestion: void poll_Relocation::fix_relocation_after_move(const CodeBuffer* src, CodeBuffer* dest, bool) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2176861287 From mhaessig at openjdk.org Tue Jul 1 09:11:32 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 1 Jul 2025 09:11:32 GMT Subject: RFR: 8308094: Add a compilation timeout flag to catch long running compilations [v2] In-Reply-To: References: Message-ID: <_Ye19u_7PlqlsoRSuR0dNeAGbeuHyN_oqD1ZS4q9Nvk=.b94fd29d-d43e-4561-9926-7f5a46434d8e@github.com> > This PR adds `-XX:CompileTaskTimeout` on Linux to limit the amount of time a compilation task can run. The goal of this is initially to be able to find and investigate long-running compilations. > > The timeout is implemented using a POSIX timer that sends a `SIGALRM` to the compiler thread the compile task is running on. Each compiler thread registers a signal handler that triggers an assert upon receiving `SIGALRM`. This is currently only implemented for Linux, because it relies on `SIGEV_THREAD_ID` to get the signal delivered to the same thread that timed out. > > Since `SIGALRM` is now used, the test `runtime/signal/TestSigalrm.java` now requires `vm.flagless` so it will not interfere with the compiler thread signal handlers. > > Testing: > - [ ] Github Actions > - [x] tier1, tier2 on all platforms > - [x] tier3, tier4 and Oracle internal testing on Linux fastdebug > - [x] tier1 through tier4 with `-XX:CompileTaskTimeout=60000` (one minute timeout) to see what fails (`compiler/codegen/TestAntiDependenciesHighMemUsage2.java`, `compiler/loopopts/TestMaxLoopOptsCountReached.java`, and `compiler/c2/TestScalarReplacementMaxLiveNodes.java` fail) Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8308094-timeout - Fix SIGALRM test - Add timeout functionality to compiler threads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26023/files - new: https://git.openjdk.org/jdk/pull/26023/files/09e0e58c..5840cc2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26023&range=00-01 Stats: 4936 lines in 244 files changed: 2913 ins; 773 del; 1250 mod Patch: https://git.openjdk.org/jdk/pull/26023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26023/head:pull/26023 PR: https://git.openjdk.org/jdk/pull/26023 From duke at openjdk.org Tue Jul 1 09:20:43 2025 From: duke at openjdk.org (duke) Date: Tue, 1 Jul 2025 09:20:43 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt [v2] In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Fri, 27 Jun 2025 01:43:16 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. >> >> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. >> 2. If these special values are found, return immediately with minimal modifications to the result register. >> 3. Performance testing shows the modified intrinsic improves throughput by 65.1% over the original intrinsic on average for the special values while throughput drops by 5.5% for the normal value range (-INF, -2^(-1022)], [2^(-1022), INF). >> >> The commands to run all relevant micro-benchmarks are posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. >> >> | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | >> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | >> | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | >> | [0] | 344990 | 627561 | +81.91 | >> | [-0] ... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Ensure ABS_MASK is a 128-bit memory sized location and only use equal enum for UCOMISD checks @missa-prime Your change (at version 615169d8aa679c665ac4c5ad30ea011505e503b7) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25962#issuecomment-3022902863 From mhaessig at openjdk.org Tue Jul 1 09:34:50 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 1 Jul 2025 09:34:50 GMT Subject: RFR: 8361086: JVMCIGlobals::check_jvmci_flags_are_consistent has incorrect format string In-Reply-To: References: Message-ID: <5kdHAQ86j5eDq6OgIb6Bn7HFWxgc24W8ywubudeGa-Q=.5d8b392a-de5c-49d7-a3f2-3ade541c6643@github.com> On Mon, 30 Jun 2025 16:14:08 GMT, Kim Barrett wrote: > Please review this trivial fix of a format string. The value being printed is > TieredStopAtLevel, which is of type intx, so "%zd" should be used instead of "%d". > > Testing: mach5 tier1 Looks good and trivial to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26051#pullrequestreview-2974517185 From yzheng at openjdk.org Tue Jul 1 09:38:48 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 1 Jul 2025 09:38:48 GMT Subject: RFR: 8361086: JVMCIGlobals::check_jvmci_flags_are_consistent has incorrect format string In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 16:14:08 GMT, Kim Barrett wrote: > Please review this trivial fix of a format string. The value being printed is > TieredStopAtLevel, which is of type intx, so "%zd" should be used instead of "%d". > > Testing: mach5 tier1 LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/26051#pullrequestreview-2974540441 From eastigeevich at openjdk.org Tue Jul 1 09:51:55 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 1 Jul 2025 09:51:55 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 22:32:24 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [ ] Linux x64 fastdebug all >> - [ ] Linux aarch64 fastdebug all >> - [ ] ... > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Update how call sites are fixed > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix pointer printing > - Use set_destination_mt_safe > - Print address as pointer > - Use new _metadata_size instead of _jvmci_data_size > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Only check branch distance for aarch64 and riscv > - Move far branch fix to fix_relocation_after_move > - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e src/hotspot/share/code/nmethod.cpp line 1547: > 1545: CodeBuffer dst(nm_copy); > 1546: while (iter.next()) { > 1547: iter.reloc()->fix_relocation_after_move(&src, &dst, true); What if, instead of a bool parameter we introduce a function `fix_relocation_after_copy`: ```c++ virtual void Relocation::fix_relocation_after_copy(const CodeBuffer* src, CodeBuffer* dest) { fix_relocation_after_move(src, dest); } void CallRelocation::fix_relocation_after_copy(const CodeBuffer* src, CodeBuffer* dest) { address orig_addr = old_addr_for(addr(), src, dest); address callee = pd_call_destination(orig_addr); if (src->contains(callee)) { // If the original call is to an address in the src CodeBuffer (such as a stub call) // the updated call should be to the corresponding address in dest CodeBuffer ptrdiff_t offset = callee - orig_addr; callee = addr() + offset; } pd_set_call_destination(callee); } With this change we don't need to modify `relocInfo_*.cpp` files. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2177056209 From jbhateja at openjdk.org Tue Jul 1 10:13:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Jul 2025 10:13:22 GMT Subject: RFR: 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 Message-ID: Floating point division by zero is undefined per the C and C++ standards, but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing either an infinity or NaN value. While Java semantics defined in section 15.17.2 "Division Operator" of JLS-24 are well-defined for these constant-folding scenarios This bug fix patch fixes division by 0 error reported after integration of [JDK-8352635.](https://bugs.openjdk.org/browse/JDK-8352635) Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 Changes: https://git.openjdk.org/jdk/pull/26062/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26062&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361037 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26062.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26062/head:pull/26062 PR: https://git.openjdk.org/jdk/pull/26062 From eastigeevich at openjdk.org Tue Jul 1 10:19:57 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 1 Jul 2025 10:19:57 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 09:49:08 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: >> >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Update how call sites are fixed >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Fix pointer printing >> - Use set_destination_mt_safe >> - Print address as pointer >> - Use new _metadata_size instead of _jvmci_data_size >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Only check branch distance for aarch64 and riscv >> - Move far branch fix to fix_relocation_after_move >> - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e > > src/hotspot/share/code/nmethod.cpp line 1547: > >> 1545: CodeBuffer dst(nm_copy); >> 1546: while (iter.next()) { >> 1547: iter.reloc()->fix_relocation_after_move(&src, &dst, true); > > What if, instead of a bool parameter we introduce a function `fix_relocation_after_copy`: > ```c++ > virtual void Relocation::fix_relocation_after_copy(const CodeBuffer* src, CodeBuffer* dest) { > fix_relocation_after_move(src, dest); > } > > void CallRelocation::fix_relocation_after_copy(const CodeBuffer* src, CodeBuffer* dest) { > address orig_addr = old_addr_for(addr(), src, dest); > address callee = pd_call_destination(orig_addr); > > if (src->contains(callee)) { > // If the original call is to an address in the src CodeBuffer (such as a stub call) > // the updated call should be to the corresponding address in dest CodeBuffer > ptrdiff_t offset = callee - orig_addr; > callee = addr() + offset; > } > > pd_set_call_destination(callee); > } > > > With this change we don't need to modify `relocInfo_*.cpp` files. IMO, we might consider moving `pd_set_call_destination` to `CallRelocation` because only CallRelocation uses it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2177119955 From shade at openjdk.org Tue Jul 1 10:53:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Jul 2025 10:53:25 GMT Subject: RFR: 8361180: Disable CompiledDirectCall verification with -VerifyInlineCaches Message-ID: Missed the spot when doing [JDK-8360867](https://bugs.openjdk.org/browse/JDK-8360867). There is a path from GC that calls into IC verification when cleaning the caches. See `nmethod::cleanup_inline_caches_impl`. It does verification per callsite, and does the whole thing during parallel GC cleanup, which is STW at least in G1. This gets expensive for CTW scenarios. We should wrap that under the same flag introduced by [JDK-8360867](https://bugs.openjdk.org/browse/JDK-8360867). Motivational improvements: $ time CONF=linux-x86_64-server-fastdebug make test TEST=applications/ctw/modules/ # Current mainline real 3m59.274s user 68m9.663s sys 5m19.026s # This PR real 3m49.118s user 65m37.962s sys 5m15.441s ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/26063/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26063&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361180 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26063.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26063/head:pull/26063 PR: https://git.openjdk.org/jdk/pull/26063 From mhaessig at openjdk.org Tue Jul 1 11:23:42 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 1 Jul 2025 11:23:42 GMT Subject: RFR: 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 10:08:20 GMT, Jatin Bhateja wrote: > Floating point division by zero is undefined per the C and C++ standards, but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing either an infinity or NaN value. > > While Java semantics defined in section 15.17.2 "Division Operator" of JLS-24 are well-defined for these constant-folding scenarios > > This bug fix patch fixes division by 0 error reported after integration of [JDK-8352635.](https://bugs.openjdk.org/browse/JDK-8352635) > Kindly review and share your feedback. > > Best Regards, > Jatin Hi, @jatin-bhateja. Thank you for providing this fix. I took a look at it and have a question. Otherwise, this looks good. src/hotspot/share/opto/divnode.cpp line 833: > 831: } > 832: > 833: if (g_isfinite(t1->getf()) && t2->getf() == 0.0) { Is the `g_isfinite` for `t1` really needed? If the dividend is infinite then the result is also an infinity with the appropriate sign. Does this not result in `INF / 0.0` being calculated below? This would also be undefined by the C++ standard, would it not? Since as far as I know not all s390 models implement IEEE754, perhaps it would be better to remove the `g_isfinite` to prevent the native `INF / 0.0` below. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26062#pullrequestreview-2974972341 PR Review Comment: https://git.openjdk.org/jdk/pull/26062#discussion_r2177311121 From eastigeevich at openjdk.org Tue Jul 1 11:26:52 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 1 Jul 2025 11:26:52 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: <73AnlXOv0T8K25DgsNdH1PkBjcBXz0f3bBYZx44LpAw=.439f5383-ffd1-44e8-9e11-4b5af9b6a278@github.com> On Wed, 25 Jun 2025 22:32:24 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [ ] Linux x64 fastdebug all >> - [ ] Linux aarch64 fastdebug all >> - [ ] ... > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Update how call sites are fixed > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix pointer printing > - Use set_destination_mt_safe > - Print address as pointer > - Use new _metadata_size instead of _jvmci_data_size > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Only check branch distance for aarch64 and riscv > - Move far branch fix to fix_relocation_after_move > - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e src/hotspot/share/code/nmethod.cpp line 1653: > 1651: } > 1652: } > 1653: } Do we need this code? Shouldn't missing trampolined be caught during fixing call sites? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2177325220 From eastigeevich at openjdk.org Tue Jul 1 11:40:54 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 1 Jul 2025 11:40:54 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 22:32:24 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [ ] Linux x64 fastdebug all >> - [ ] Linux aarch64 fastdebug all >> - [ ] ... > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Update how call sites are fixed > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix pointer printing > - Use set_destination_mt_safe > - Print address as pointer > - Use new _metadata_size instead of _jvmci_data_size > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Only check branch distance for aarch64 and riscv > - Move far branch fix to fix_relocation_after_move > - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e src/hotspot/share/code/nmethod.hpp line 172: > 170: friend class DeoptimizationScope; > 171: > 172: #define ImmutableDataReferencesCounterSize (int)sizeof(int) Macros defining an expression need to be enclosed in parentheses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2177369434 From epeter at openjdk.org Tue Jul 1 11:56:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Jul 2025 11:56:43 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v3] In-Reply-To: References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: On Mon, 30 Jun 2025 15:42:01 GMT, Beno?t Maillard wrote: >> This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. >> >> ### Context >> During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. >> >> In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). >> >> ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) >> >> ### Detailed Analysis >> >> In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which >> results in a type refinement: the range gets restricted to `int:-13957..-1191`. >> >> ```c++ >> // Pull from worklist; compute new value; push changes out. >> // This loop is the meat of CCP. >> while (worklist.size() != 0) { >> Node* n = fetch_next_node(worklist); >> DEBUG_ONLY(worklist_verify.push(n);) >> if (n->is_SafePoint()) { >> // Make sure safepoints are processed by PhaseCCP::transform even if they are >> // not reachable from the bottom. Otherwise, infinite loops would be removed. >> _root_and_safepoints.push(n); >> } >> const Type* new_type = n->Value(this); >> if (new_type != type(n)) { >> DEBUG_ONLY(verify_type(n, new_type, type(n));) >> dump_type_and_node(n, new_type); >> set_type(n, new_type); >> push_child_nodes_to_worklist(worklist, n); >> } >> if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { >> // Keep track of Type nodes to kill CFG paths that use Type >> // nodes that become dead. >> _maybe_top_type_nodes.push(n); >> } >> } >> DEBUG_ONLY(verify_analyze(worklist_verify);) >> >> >> At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: >> - `int` for node `591` (`ModINode`) >> - `int:-13957..-1191` for node `138` (`PhiNode`) >> >> If we call `find_node(138)->bottom_type()`, we get: >> - `int` for both nodes >> >> The... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > Fix bad test class name Nice work @benoitmaillard ! src/hotspot/share/opto/phaseX.cpp line 3124: > 3122: n->raise_bottom_type(t); > 3123: _worklist.push(n); // n re-enters the hash table via the worklist > 3124: add_users_to_worklist(n); // if ideal or identity optimizations depend on the input type, users need to be notified Suggestion: add_users_to_worklist(n); // if Ideal or Identity optimizations depend on the input type, users need to be notified I would make them upper-case, just like the method names. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26017#pullrequestreview-2975094882 PR Review Comment: https://git.openjdk.org/jdk/pull/26017#discussion_r2177396474 From epeter at openjdk.org Tue Jul 1 11:56:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Jul 2025 11:56:44 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v2] In-Reply-To: References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> <3cLLB7fms3S4WgqOVeb7D_ZDRFsJ_-ca3qfALlmzFeU=.1002ac91-1e35-4499-9d88-6d1f76c955d0@github.com> <0MJe_8nA-ILWqoVG-9rzuq5Pe9xX-FG2LN3k9Cy8nqU=.d724c6cf-cb02-45c4-95a4-5bd1fef7462b@github.com> Message-ID: On Tue, 1 Jul 2025 07:07:40 GMT, Beno?t Maillard wrote: > > @benoitmaillard One more open question for me: `raise_bottom_type` only sets the node internal `_type`. But in IGVN, we do not read from `_type` but `phase->type(in(2))`. Do you know when the `phase->type(in(2))` value changes? Is that also during CCP? Before or after the `_type` is modified? > > Yes, good point, I should I have mentioned this somewhere. The `phase->type(in(2))` call uses the type array from `PhaseValues`. The type array entry is actually modified earlier, in `PhaseCCP::analyze`, right after the `Value` call. You can see the `set_type` call [here](https://github.com/benoitmaillard/jdk/blob/75de51dff6d9cc3e9764737b29b9358992b488b7/src/hotspot/share/opto/phaseX.cpp#L2765). When this happens, users are added to the (local) worklist but again it does not change our issue as only value optimizations occur in that context. Thanks for the explanation! So it seems that `CCP` and `IGVN` share the type array, right? Ah yes, it is the `Compile::_types`: 461 // Shared type array for GVN, IGVN and CCP. It maps node idx -> Type*. 462 Type_Array* _types; If the value behind `phase->type(in(2))` (the type array entry) is modified in `PhaseCCP::analyze`, right after the `Value` call, then why not do the notification there? If we did that, we would do more notification than what you now proposed (to do the notification in `PhaseCCP::transform_once` on the nodes that have a type that is different than the `bottom_type`). Are we possibly missing any important case with your approach now? Probably not, I would argue: with your approach we still notify for all live nodes that have a modified type, or are replaced with a constant. If we notified after every type update in `PhaseCCP::analyze`, we might notify for nodes multiple times, and we would also notify for nodes that are dead after CCP - both are unnecessary overheads. Alright, I just wanted to think this through - but it seems your approach is good :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26017#issuecomment-3023637471 From bmaillard at openjdk.org Tue Jul 1 12:00:18 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 1 Jul 2025 12:00:18 GMT Subject: RFR: 8361144: Strenghten the Ideal Verification in PhaseIterGVN::verify_Ideal_for by comparing the hash of a node before and after Ideal Message-ID: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> This PR adds a node hash comparison after calling `Ideal` in `PhaseIterGVN::verify_Ideal_for` to introduce an additional layer of verification for missed optimizations. Previously, we relied on the return value of `Ideal`, which is expected to be `nullptr` if no transformation was done. By also checking the node's hash before and after `Ideal`, we could catch inconsistencies in the implementation or unintended modifications to the graph. Both of these may indicate missed or incomplete optimizations. ### Testing - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8361144) - [x] tier1-3, plus some internal testing Thank you for reviewing! ------------- Commit messages: - 8361144: remove unintentional line break - 8361144: move hash check after return value check and use same format as unique counter check - 8361144: add check for node hash after verifying ideal Changes: https://git.openjdk.org/jdk/pull/26064/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26064&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361144 Stats: 11 lines in 1 file changed: 10 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26064/head:pull/26064 PR: https://git.openjdk.org/jdk/pull/26064 From epeter at openjdk.org Tue Jul 1 12:02:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Jul 2025 12:02:16 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v28] In-Reply-To: References: <7r3C8BAViyHKVVJjv4w0YxfIUkfk9PmY0OEt73V_aRI=.baf51fc4-d996-44d0-a1f5-10cf6dc4de8d@github.com> Message-ID: On Thu, 12 Jun 2025 15:40:49 GMT, Roland Westrelin wrote: >> @rwestrel Let me know if you want us to run some extra testing. Christian said that you might be planning to wait until the JDK26 fork, and merge then, and then we can run testing. Up to you :) > > @eme64 in case you forgot about that one, it's ready for another round of reviews. @rwestrel I'm quite busy right now. I will soon go on vacation and travel, and I have a presentation to prepare in the next weeks. I hope I can come back to this in early August though. Feel free to ask someone else for a review, I don't want to hold this up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-3023679612 From eastigeevich at openjdk.org Tue Jul 1 12:07:58 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 1 Jul 2025 12:07:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 22:32:24 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [ ] Linux x64 fastdebug all >> - [ ] Linux aarch64 fastdebug all >> - [ ] ... > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Update how call sites are fixed > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix pointer printing > - Use set_destination_mt_safe > - Print address as pointer > - Use new _metadata_size instead of _jvmci_data_size > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Only check branch distance for aarch64 and riscv > - Move far branch fix to fix_relocation_after_move > - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e test/hotspot/jtreg/vmTestbase/nsk/jvmti/NMethodRelocation/nmethodrelocation.java line 37: > 35: import jdk.test.whitebox.code.BlobType; > 36: > 37: public class nmethodrelocation extends DebugeeClass { Why is the class name not following the Java code conventions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2177424604 From mbaesken at openjdk.org Tue Jul 1 12:28:39 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 1 Jul 2025 12:28:39 GMT Subject: RFR: 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 10:08:20 GMT, Jatin Bhateja wrote: > Floating point division by zero is undefined per the C and C++ standards, but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing either an infinity or NaN value. > > While Java semantics defined in section 15.17.2 "Division Operator" of JLS-24 are well-defined for these constant-folding scenarios > > This bug fix patch fixes division by 0 error reported after integration of [JDK-8352635.](https://bugs.openjdk.org/browse/JDK-8352635) > Kindly review and share your feedback. > > Best Regards, > Jatin With your patch included, the test compiler/c2/irTests/TestFloat16ScalarOperations.java now passes on macOS aarch64 with ubsan enabled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26062#issuecomment-3023799985 From shade at openjdk.org Tue Jul 1 12:33:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Jul 2025 12:33:51 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code Message-ID: We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. Additional testing: - [ ] GHA - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` - [ ] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) ------------- Commit messages: - Revert separate patch - Final - Proper option name and bump the limits - Fix Changes: https://git.openjdk.org/jdk/pull/26068/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360557 Stats: 15 lines in 3 files changed: 15 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From shade at openjdk.org Tue Jul 1 12:46:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Jul 2025 12:46:38 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 12:26:44 GMT, Aleksey Shipilev wrote: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [ ] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) We are on par for CTW testing time, comparing to the state a week back: # Before CTW perf improvements real 5m0.528s user 79m5.193s sys 14m16.678s # Current mainline real 3m59.274s user 68m9.663s sys 5m19.026s # This PR real 4m56.248s user 89m48.364s sys 5m24.091s ------------- PR Comment: https://git.openjdk.org/jdk/pull/26068#issuecomment-3023863192 From mbaesken at openjdk.org Tue Jul 1 12:48:19 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 1 Jul 2025 12:48:19 GMT Subject: RFR: 8361040: compiler/codegen/TestRedundantLea.java#StringInflate fails with failed IR rules In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 14:44:03 GMT, Manuel H?ssig wrote: > `TestRedundantLea.java#StringInflate` failed on Alpine Linux because fewer `DecodeHeapOop_not_null`s than expected are generated even though the expected reduction is still present. This PR fixes this. > > Unfortunately, this fix makes the test less precise. I filed [JDK-8361045](https://bugs.openjdk.org/browse/JDK-8361045) to fix this when the IR-framework allows for it. > > Testing: > - [x] Github Actions > - [x] tier1, tier2 plus Oracle internal testing > - [x] `TestRedundantLea.java` on Alpine Linux With your patch included, the issue is gone on our Linux Alpine test machine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26046#issuecomment-3023856713 From mhaessig at openjdk.org Tue Jul 1 12:48:19 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 1 Jul 2025 12:48:19 GMT Subject: RFR: 8361040: compiler/codegen/TestRedundantLea.java#StringInflate fails with failed IR rules In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 12:42:05 GMT, Matthias Baesken wrote: >> `TestRedundantLea.java#StringInflate` failed on Alpine Linux because fewer `DecodeHeapOop_not_null`s than expected are generated even though the expected reduction is still present. This PR fixes this. >> >> Unfortunately, this fix makes the test less precise. I filed [JDK-8361045](https://bugs.openjdk.org/browse/JDK-8361045) to fix this when the IR-framework allows for it. >> >> Testing: >> - [x] Github Actions >> - [x] tier1, tier2 plus Oracle internal testing >> - [x] `TestRedundantLea.java` on Alpine Linux > > With your patch included, the issue is gone on our Linux Alpine test machine. @MBaesken, thank you for testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26046#issuecomment-3023862806 From mhaessig at openjdk.org Tue Jul 1 12:48:19 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 1 Jul 2025 12:48:19 GMT Subject: RFR: 8361040: compiler/codegen/TestRedundantLea.java#StringInflate fails with failed IR rules Message-ID: `TestRedundantLea.java#StringInflate` failed on Alpine Linux because fewer `DecodeHeapOop_not_null`s than expected are generated even though the expected reduction is still present. This PR fixes this. Unfortunately, this fix makes the test less precise. I filed [JDK-8361045](https://bugs.openjdk.org/browse/JDK-8361045) to fix this when the IR-framework allows for it. Testing: - [x] Github Actions - [x] tier1, tier2 plus Oracle internal testing - [x] `TestRedundantLea.java` on Alpine Linux ------------- Commit messages: - Fix test Changes: https://git.openjdk.org/jdk/pull/26046/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26046&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361040 Stats: 12 lines in 1 file changed: 2 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26046.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26046/head:pull/26046 PR: https://git.openjdk.org/jdk/pull/26046 From bmaillard at openjdk.org Tue Jul 1 12:58:29 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 1 Jul 2025 12:58:29 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v4] In-Reply-To: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: > This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. > > ### Context > During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. > > In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). > > ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) > > ### Detailed Analysis > > In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which > results in a type refinement: the range gets restricted to `int:-13957..-1191`. > > ```c++ > // Pull from worklist; compute new value; push changes out. > // This loop is the meat of CCP. > while (worklist.size() != 0) { > Node* n = fetch_next_node(worklist); > DEBUG_ONLY(worklist_verify.push(n);) > if (n->is_SafePoint()) { > // Make sure safepoints are processed by PhaseCCP::transform even if they are > // not reachable from the bottom. Otherwise, infinite loops would be removed. > _root_and_safepoints.push(n); > } > const Type* new_type = n->Value(this); > if (new_type != type(n)) { > DEBUG_ONLY(verify_type(n, new_type, type(n));) > dump_type_and_node(n, new_type); > set_type(n, new_type); > push_child_nodes_to_worklist(worklist, n); > } > if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { > // Keep track of Type nodes to kill CFG paths that use Type > // nodes that become dead. > _maybe_top_type_nodes.push(n); > } > } > DEBUG_ONLY(verify_analyze(worklist_verify);) > > > At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: > - `int` for node `591` (`ModINode`) > - `int:-13957..-1191` for node `138` (`PhiNode`) > > If we call `find_node(138)->bottom_type()`, we get: > - `int` for both nodes > > There is no progress on the type of `ModINode` during CCP, because `ModINode::Value` > is not able to... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: 8359602: update case for consistency Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26017/files - new: https://git.openjdk.org/jdk/pull/26017/files/75de51df..005b2825 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26017&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26017&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26017/head:pull/26017 PR: https://git.openjdk.org/jdk/pull/26017 From bmaillard at openjdk.org Tue Jul 1 13:18:40 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 1 Jul 2025 13:18:40 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v2] In-Reply-To: References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> <3cLLB7fms3S4WgqOVeb7D_ZDRFsJ_-ca3qfALlmzFeU=.1002ac91-1e35-4499-9d88-6d1f76c955d0@github.com> <0MJe_8nA-ILWqoVG-9rzuq5Pe9xX-FG2LN3k9Cy8nqU=.d724c6cf-cb02-45c4-95a4-5bd1fef7462b@github.com> Message-ID: On Tue, 1 Jul 2025 07:07:40 GMT, Beno?t Maillard wrote: >> @benoitmaillard Very nice work, and great description :) >> >>>Did you check if this allows enabling any of the other disabled verifications from [JDK-8347273](https://bugs.openjdk.org/browse/JDK-8347273)? >> >> That may be a lot of work. Not sure if it is worth checking all of them now. @TobiHartmann how much should he invest in this now? An alternative is just tackling all the other cases later. What do you think? >> >> @benoitmaillard One more open question for me: `raise_bottom_type` only sets the node internal `_type`. But in IGVN, we do not read from `_type` but `phase->type(in(2))`. Do you know when the `phase->type(in(2))` value changes? Is that also during CCP? Before or after the `_type` is modified? > >> @benoitmaillard Very nice work, and great description :) > > Thank you! @eme64 > >> > Did you check if this allows enabling any of the other disabled verifications from [JDK-8347273](https://bugs.openjdk.org/browse/JDK-8347273)? >> >> That may be a lot of work. Not sure if it is worth checking all of them now. @TobiHartmann how much should he invest in this now? An alternative is just tackling all the other cases later. What do you think? > > I have started to take a look at this and it seems that there are a lot of cases to check indeed. > >> @benoitmaillard One more open question for me: `raise_bottom_type` only sets the node internal `_type`. But in IGVN, we do not read from `_type` but `phase->type(in(2))`. Do you know when the `phase->type(in(2))` value changes? Is that also during CCP? Before or after the `_type` is modified? > > Yes, good point, I should I have mentioned this somewhere. The `phase->type(in(2))` call uses the type array from `PhaseValues`. The type array entry is actually modified earlier, in `PhaseCCP::analyze`, right after the `Value` call. You can see the `set_type` call [here](https://github.com/benoitmaillard/jdk/blob/75de51dff6d9cc3e9764737b29b9358992b488b7/src/hotspot/share/opto/phaseX.cpp#L2765). When this happens, users are added to the (local) worklist but again it does not change our issue as only value optimizations occur in that context. > > > @benoitmaillard One more open question for me: `raise_bottom_type` only sets the node internal `_type`. But in IGVN, we do not read from `_type` but `phase->type(in(2))`. Do you know when the `phase->type(in(2))` value changes? Is that also during CCP? Before or after the `_type` is modified? > > > > > > Yes, good point, I should I have mentioned this somewhere. The `phase->type(in(2))` call uses the type array from `PhaseValues`. The type array entry is actually modified earlier, in `PhaseCCP::analyze`, right after the `Value` call. You can see the `set_type` call [here](https://github.com/benoitmaillard/jdk/blob/75de51dff6d9cc3e9764737b29b9358992b488b7/src/hotspot/share/opto/phaseX.cpp#L2765). When this happens, users are added to the (local) worklist but again it does not change our issue as only value optimizations occur in that context. > > Thanks for the explanation! So it seems that `CCP` and `IGVN` share the type array, right? Ah yes, it is the `Compile::_types`: > > ``` > 461 // Shared type array for GVN, IGVN and CCP. It maps node idx -> Type*. > 462 Type_Array* _types; > ``` > > If the value behind `phase->type(in(2))` (the type array entry) is modified in `PhaseCCP::analyze`, right after the `Value` call, then why not do the notification there? If we did that, we would do more notification than what you now proposed (to do the notification in `PhaseCCP::transform_once` on the nodes that have a type that is different than the `bottom_type`). Are we possibly missing any important case with your approach now? Probably not, I would argue: with your approach we still notify for all live nodes that have a modified type, or are replaced with a constant. If we notified after every type update in `PhaseCCP::analyze`, we might notify for nodes multiple times, and we would also notify for nodes that are dead after CCP - both are unnecessary overheads. Alright, I just wanted to think this through - but it seems your approach is good :) I also considered doing it there in `PhaseCCP::analyze`, but I reached the same conclusion. Thanks for your help! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26017#issuecomment-3023978823 From snatarajan at openjdk.org Tue Jul 1 13:27:47 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 1 Jul 2025 13:27:47 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v7] In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 16:24:03 GMT, Vladimir Kozlov wrote: >> Saranya Natarajan has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> merge with master >> Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8325478 > > src/hotspot/share/opto/compile.cpp line 2533: > >> 2531: { >> 2532: TracePhase tp(_t_macroExpand); >> 2533: print_method(PHASE_BEFORE_MACRO_EXPANSION, 3); > > Should we move it before `mex.expand_macro_nodes()` call? Moving this would break the assumption of needing a `BEFORE_MACRO_ELIMINATION` as explained in the above reply. One way to go about this would be to include a `BEFORE_MACRO_ELIMINATION` phase and remove the `PHASE_BEFORE_MACRO_EXPANSION` phase as this is only place where it is used. Would this be a reasonable fix ? > src/hotspot/share/opto/phasetype.hpp line 94: > >> 92: flags(AFTER_LOOP_OPTS, "After Loop Optimizations") \ >> 93: flags(AFTER_MERGE_STORES, "After Merge Stores") \ >> 94: flags(AFTER_MACRO_ELIMINATION_STEP, "After Macro Elimination Step") \ > > What is the reason to not have `BEFORE_MACRO_ELIMINATION`? The two main reasons for not having a `BEFORE_MACRO_ELIMINATION` are as follows: - There is a dump in line 2426 (`print_method(PHASE_ITER_GVN_AFTER_EA, 2)`) before we call `mexp.eliminate_macro_nodes` which performs the functionality of having a `BEFORE_MACRO_ELIMINATION` for phase dump. - There is dump in line 2533 (`print_method(PHASE_BEFORE_MACRO_EXPANSION, 3)`) before eliminating macro nodes which performs the similar function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2177603003 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2177602894 From jbhateja at openjdk.org Tue Jul 1 13:28:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Jul 2025 13:28:21 GMT Subject: RFR: 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 [v2] In-Reply-To: References: Message-ID: > Floating point division by zero is undefined per the C and C++ standards, but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing either an infinity or NaN value. > > While Java semantics defined in section 15.17.2 "Division Operator" of JLS-24 are well-defined for these constant-folding scenarios > > This bug fix patch fixes division by 0 error reported after integration of [JDK-8352635.](https://bugs.openjdk.org/browse/JDK-8352635) > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26062/files - new: https://git.openjdk.org/jdk/pull/26062/files/bf78fbe6..d39c76f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26062&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26062&range=00-01 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26062.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26062/head:pull/26062 PR: https://git.openjdk.org/jdk/pull/26062 From jbhateja at openjdk.org Tue Jul 1 13:28:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Jul 2025 13:28:22 GMT Subject: RFR: 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 11:19:04 GMT, Manuel H?ssig wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > src/hotspot/share/opto/divnode.cpp line 833: > >> 831: } >> 832: >> 833: if (g_isfinite(t1->getf()) && t2->getf() == 0.0) { > > Is the `g_isfinite` for `t1` really needed? If the dividend is infinite then the result is also an infinity with the appropriate sign. Does this not result in `INF / 0.0` being calculated below? This would also be undefined by the C++ standard, would it not? Since as far as I know not all s390 models implement IEEE754, perhaps it would be better to remove the `g_isfinite` to prevent the native `INF / 0.0` below. As per C++ standard section 7.6.5 (expr.mul), behavior is undefined only if the second operand is 0.0. In all other situations, we can expect a standard-compliant C++ compiler to generate code following IEEE 754 semantics, irrespective of target floating point model, but Java semantics expect to return a NaN value if either of the operands is a NaN. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26062#discussion_r2177604366 From jbhateja at openjdk.org Tue Jul 1 13:36:20 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Jul 2025 13:36:20 GMT Subject: RFR: 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 [v3] In-Reply-To: References: Message-ID: <6cWhCvx8g-Gx4VoBHW1wA7atsa_Eq5wBhkDolUbP_X0=.31f8e688-7401-4f81-9b50-46b1997e96b5@github.com> > Floating point division by zero is undefined per the C and C++ standards, but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing either an infinity or NaN value. > > While Java semantics defined in section 15.17.2 "Division Operator" of JLS-24 are well-defined for these constant-folding scenarios > > This bug fix patch fixes division by 0 error reported after integration of [JDK-8352635.](https://bugs.openjdk.org/browse/JDK-8352635) > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26062/files - new: https://git.openjdk.org/jdk/pull/26062/files/d39c76f4..0038654e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26062&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26062&range=01-02 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26062.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26062/head:pull/26062 PR: https://git.openjdk.org/jdk/pull/26062 From galder at openjdk.org Tue Jul 1 13:47:38 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 1 Jul 2025 13:47:38 GMT Subject: RFR: 8361144: Strenghten the Ideal Verification in PhaseIterGVN::verify_Ideal_for by comparing the hash of a node before and after Ideal In-Reply-To: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> References: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> Message-ID: On Tue, 1 Jul 2025 11:35:06 GMT, Beno?t Maillard wrote: > This PR adds a node hash comparison after calling `Ideal` in `PhaseIterGVN::verify_Ideal_for` to introduce an additional layer of verification for missed optimizations. Previously, we relied on the return value of `Ideal`, which is expected to be `nullptr` if no transformation was done. > > By also checking the node's hash before and after `Ideal`, we could catch inconsistencies in the implementation or unintended modifications to the graph. Both of these may indicate missed or incomplete optimizations. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8361144) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Have you considered adding a test for this? Is that feasible? ------------- PR Review: https://git.openjdk.org/jdk/pull/26064#pullrequestreview-2975520753 From eastigeevich at openjdk.org Tue Jul 1 15:33:53 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 1 Jul 2025 15:33:53 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 Message-ID: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java needs debug info to identify a position of spin wait instructions in generated code. Release builds might not generate needed debug info. This PR adds a requirement for the test to be run on debug builds only. Tested: - Fastdebug: test passed - Slowdebug: test passed. - Release: test skipped. ------------- Commit messages: - 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 Changes: https://git.openjdk.org/jdk/pull/26072/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26072&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360936 Stats: 3 lines in 2 files changed: 1 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26072.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26072/head:pull/26072 PR: https://git.openjdk.org/jdk/pull/26072 From missa at openjdk.org Tue Jul 1 15:37:47 2025 From: missa at openjdk.org (Mohamed Issa) Date: Tue, 1 Jul 2025 15:37:47 GMT Subject: Integrated: 8358179: Performance regression in Math.cbrt In-Reply-To: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: <12bHfivFgRF2s-Sr0SZY6DIywI30LQ63uedYzsncO0A=.ba272456-15df-493b-8247-e38a67796968@github.com> On Tue, 24 Jun 2025 22:33:56 GMT, Mohamed Issa wrote: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. > > 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. > 2. If these special values are found, return immediately with minimal modifications to the result register. > 3. Performance testing shows the modified intrinsic improves throughput by 65.1% over the original intrinsic on average for the special values while throughput drops by 5.5% for the normal value range (-INF, -2^(-1022)], [2^(-1022), INF). > > The commands to run all relevant micro-benchmarks are posted below. > > `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` > `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` > > The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. > > Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. > > | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | > | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | > | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | > | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | > | [0] | 344990 | 627561 | +81.91 | > | [-0] | 291... This pull request has now been integrated. Changeset: 38f59f84 Author: Mohamed Issa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/38f59f84c98dfd974eec0c05541b2138b149def7 Stats: 50 lines in 1 file changed: 11 ins; 36 del; 3 mod 8358179: Performance regression in Math.cbrt Reviewed-by: sviswanathan, sparasa, epeter ------------- PR: https://git.openjdk.org/jdk/pull/25962 From sviswanathan at openjdk.org Tue Jul 1 15:37:46 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Jul 2025 15:37:46 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Mon, 30 Jun 2025 05:51:58 GMT, Emanuel Peter wrote: >>> I'll hold off with approval until someone else who is more knowledgeable has reviewed first. But feel free to ping me for a second review. >> >> @eme64 Second review with the latest changes? > > @missa-prime The patch still looks good, though I ran testing again because of the new changes. Should complete in about 24h. Thanks a lot @eme64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25962#issuecomment-3024541704 From shade at openjdk.org Tue Jul 1 15:39:43 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Jul 2025 15:39:43 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 15:29:10 GMT, Evgeny Astigeevich wrote: > Test compiler/onSpinWait/TestOnSpinWaitAArch64.java needs debug info to identify a position of spin wait instructions in generated code. Release builds might not generate needed debug info. > > This PR adds a requirement for the test to be run on debug builds only. > > Tested: > - Fastdebug: test passed > - Slowdebug: test passed. > - Release: test skipped. Looks okay, but I am confused why the test did not fail before JDK-8359435? test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 32: > 30: * @requires vm.flagless > 31: * @requires os.arch=="aarch64" > 32: * @requires vm.debug==true Can be just `@requires vm.debug`. ------------- PR Review: https://git.openjdk.org/jdk/pull/26072#pullrequestreview-2975983374 PR Review Comment: https://git.openjdk.org/jdk/pull/26072#discussion_r2177921439 From phh at openjdk.org Tue Jul 1 15:41:41 2025 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 1 Jul 2025 15:41:41 GMT Subject: RFR: 8358183: [JVMCI] crash accessing nmethod::jvmci_name in CodeCache::aggregate In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 06:39:18 GMT, Boris Ulasevich wrote: > This change addresses an intermittent crash in CompileBroker::print_heapinfo() when accessing JVMCI metadata after a CodeBlob::purge(). > > The issue is a regression after: > - JDK-8343789: JVMCI metadata was moved from nmethod into a separate blob. > - JDK-8352112: CodeBlob::purge() was updated to set _mutable_data to blob_end(). > > The change zeroes out _mutable_data_size, _relocation_size, and _metadata_size in purge() so that after purge jvmci_data_size() returns 0 and CompileBroker::print_heapinfo() won?t touch an invalid _metadata. Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25608#pullrequestreview-2975990062 From mablakatov at openjdk.org Tue Jul 1 15:48:00 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Tue, 1 Jul 2025 15:48:00 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v5] In-Reply-To: References: Message-ID: > Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. > > Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. > > The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms > IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms > LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms > > > Fujitsu A64FX (SVE 512-bit): > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms > IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms > LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: fixup: remove undefined insts from aarch64-asmtest.py ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23181/files - new: https://git.openjdk.org/jdk/pull/23181/files/025d5166..df09ab65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=03-04 Stats: 30 lines in 2 files changed: 0 ins; 9 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/23181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23181/head:pull/23181 PR: https://git.openjdk.org/jdk/pull/23181 From kvn at openjdk.org Tue Jul 1 15:48:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Jul 2025 15:48:42 GMT Subject: RFR: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 19:45:49 GMT, Ashutosh Mehra wrote: > Please reivew this patch to fix initialization and freeing of `AOTCodeAddressTable::_stubs_addr`. Changes are trivial I missed that this is for mainline. Approved. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26053#pullrequestreview-2976010588 From kvn at openjdk.org Tue Jul 1 15:52:37 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Jul 2025 15:52:37 GMT Subject: RFR: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 19:45:49 GMT, Ashutosh Mehra wrote: > Please reivew this patch to fix initialization and freeing of `AOTCodeAddressTable::_stubs_addr`. Changes are trivial Yes, it is trivial. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26053#issuecomment-3024597730 From kvn at openjdk.org Tue Jul 1 15:59:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Jul 2025 15:59:38 GMT Subject: RFR: 8361180: Disable CompiledDirectCall verification with -VerifyInlineCaches In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 10:47:40 GMT, Aleksey Shipilev wrote: > Missed the spot when doing [JDK-8360867](https://bugs.openjdk.org/browse/JDK-8360867). There is a path from GC that calls into IC verification when cleaning the caches. See `nmethod::cleanup_inline_caches_impl`. It does verification per callsite, and does the whole thing during parallel GC cleanup, which is STW at least in G1. This gets expensive for CTW scenarios. We should wrap that under the same flag introduced by [JDK-8360867](https://bugs.openjdk.org/browse/JDK-8360867). > > Motivational improvements: > > > $ time CONF=linux-x86_64-server-fastdebug make test TEST=applications/ctw/modules/ > > # Current mainline > real 3m59.274s > user 68m9.663s > sys 5m19.026s > > # This PR > real 3m49.118s > user 65m37.962s > sys 5m15.441s Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26063#pullrequestreview-2976063372 From shade at openjdk.org Tue Jul 1 15:59:39 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Jul 2025 15:59:39 GMT Subject: RFR: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 19:45:49 GMT, Ashutosh Mehra wrote: > Please reivew this patch to fix initialization and freeing of `AOTCodeAddressTable::_stubs_addr`. Changes are trivial Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26053#pullrequestreview-2976066699 From eastigeevich at openjdk.org Tue Jul 1 16:05:07 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 1 Jul 2025 16:05:07 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 [v2] In-Reply-To: References: Message-ID: > Test compiler/onSpinWait/TestOnSpinWaitAArch64.java needs debug info to identify a position of spin wait instructions in generated code. Release builds might not generate needed debug info. > > This PR adds a requirement for the test to be run on debug builds only. > > Tested: > - Fastdebug: test passed > - Slowdebug: test passed. > - Release: test skipped. Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Simplify requirement for debug build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26072/files - new: https://git.openjdk.org/jdk/pull/26072/files/b2ba0a92..e91036bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26072&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26072&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26072.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26072/head:pull/26072 PR: https://git.openjdk.org/jdk/pull/26072 From kvn at openjdk.org Tue Jul 1 16:06:39 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Jul 2025 16:06:39 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 12:26:44 GMT, Aleksey Shipilev wrote: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) This has to be tested by us to make sure we clean up all issues this change find. ------------- PR Review: https://git.openjdk.org/jdk/pull/26068#pullrequestreview-2976094320 From mablakatov at openjdk.org Tue Jul 1 16:10:49 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Tue, 1 Jul 2025 16:10:49 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: <3sWLk_sAMLtcvRUjXk9hYe-K2MBQl9fH2Qg0MF7lwDk=.b8867d51-e822-43c0-93ab-58228c6eb1d5@github.com> References: <3sWLk_sAMLtcvRUjXk9hYe-K2MBQl9fH2Qg0MF7lwDk=.b8867d51-e822-43c0-93ab-58228c6eb1d5@github.com> Message-ID: <19rf4A0bxc4BstRmLivGkoCOm7Qa7YD6z1VJHJivCtg=.4a643c7b-4e79-4f37-b230-7231df3c68a8@github.com> On Tue, 1 Jul 2025 06:57:10 GMT, Xiaohong Gong wrote: >> Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - cleanup: address nits, rename several symbols >> - cleanup: remove unreferenced definitions >> - Address review comments. >> >> - fixup: disable FP mul reduction auto-vectorization for all targets >> - fixup: add a tmp vReg to reduce_mul_integral_gt128b and >> reduce_non_strict_order_mul_fp_gt128bto keep vsrc unmodified >> - cleanup: replace a complex lambda in the above methods with a loop >> - cleanup: rename symbols to follow the existing naming convention >> - cleanup: add asserts to SVE only instructions >> - split mul FP reduction instructions into strictly-ordered (default) >> and explicitly non strictly-ordered >> - remove redundant conditions in TestVectorFPReduction.java >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> | Benchmark | Before | After | Units | Diff | >> |---------------------------|----------|----------|--------|-------| >> | ByteMaxVector.MULLanes | 619.156 | 9884.578 | ops/ms | 1496% | >> | DoubleMaxVector.MULLanes | 184.693 | 2712.051 | ops/ms | 1368% | >> | FloatMaxVector.MULLanes | 277.818 | 3388.038 | ops/ms | 1119% | >> | IntMaxVector.MULLanes | 371.225 | 4765.434 | ops/ms | 1183% | >> | LongMaxVector.MULLanes | 205.149 | 2672.975 | ops/ms | 1203% | >> | ShortMaxVector.MULLanes | 472.804 | 5122.917 | ops/ms | 984% | >> - Merge branch 'master' into 8343689-rebase >> - fixup: don't modify the value in vsrc >> >> Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this >> change, the result of recursive folding is held in vtmp1. To be able to >> pass this intermediate result to reduce_mul_integral_le128b(), we would >> have to use another temporary FloatRegister, as vtmp1 would essentially >> act as vsrc. It's possible to get around this however: >> reduce_mul_integral_le128b() is modified so it's possible to pass >> matching vsrc and vtmp2 arguments. By doing this, we save ourselves a >> temporary register in rules that match to reduce_mul_integral_gt128b(). >> - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating >> - Use EXT instead of COMPACT to split a vector into two halves >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master ... > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2097: > >> 2095: sve_movprfx(vtmp1, vsrc); // copy >> 2096: sve_ext(vtmp1, vtmp1, vector_length_in_bytes / 2); // swap halves >> 2097: sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vsrc); // multiply halves > >> sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vsrc); > > Can we use `ptrue` instread of `pgtmp` here? The higher bits can be computed, but they have not influences to the final results, right? Thanks! For some reason I thought that we don't have a dedicated predicate register for that. > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2106: > >> 2104: sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vtmp2); // multiply halves >> 2105: vector_length_in_bytes = vector_length_in_bytes / 2; >> 2106: vector_length = vector_length / 2; > > I guess you want to update the `pgtmp` with new `vector_length`? But seems the code is missing. Anyway, maybe the it's not necessary to generate a predicate as I commented above. It isn't exactly necessary similarly to how we can always use `ptrue` here. But yeah, I'll just remove it following the suggestion above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2178009839 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2178007165 From mchevalier at openjdk.org Tue Jul 1 16:14:00 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 1 Jul 2025 16:14:00 GMT Subject: RFR: 8359344: C2: Malformed control flow after intrinsic bailout [v2] In-Reply-To: References: Message-ID: <1cFRkcs5JmgnbWEIaEoT8I9RiUtNxgKieAdkSB2Fgmc=.1d97b5c4-b6ef-43c6-b721-1e52eee19d3a@github.com> > When intrinsic bailout, we assume that the control in the `LibraryCallKit` did not change: > > https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/opto/library_call.cpp#L137 > > This is enforced by restoring the old state, like in > > https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/opto/library_call.cpp#L1722-L1732 > > That is good, but not sufficient. First, the most obvious, one could have already built some structure without moving the control. For instance, we can obtain something such as: > > ![1 after-intrinsic-bailout-during-late-inlining](https://github.com/user-attachments/assets/2fd255cc-0bfc-4841-8dd1-f64d502e0ee1) > > > Here, during late inlining, the call `323` is candidate to be inline, but that bails out. Yet, a call to `make_unsafe_address` was made, which built nodes `354 If` and everything under. This is needed as tests are made on the resulting nodes (especially `366 AddP`) to know whether we should bail out or not. At the end, we get 2 control successor to `346 IfFalse`: the call that is not removed and the leftover of the intrinsic that will be cleanup much later, but not by RemoveUseless. > > Another situation is somewhat worse, when happening during parsing. It can lead to such cases: > > ![2 after-intrinsic-bailout-during-parsing](https://github.com/user-attachments/assets/4524c615-6521-4f0d-8f61-c426f9179035) > > The nodes `31 OpaqueNotNull`, `31 If`, `36 IfTrue`, `33 IfFalse`, `35 Halt`, `44 If`, `45 IfTrue`, `46 IfFalse` are leftover from a bailing out intrinsic. The replacement call `49 CallStaticJava` should come just under `5 Parm`, but the control was updated and the call is actually built under `36 If`. Then, why does the previous assert doesn't complain? > > This is because there is more than one control, or one map. In intrinsics that need to restore their state, the initial `SafePoint` map is cloned, the clone is kept aside, and if needed (bailing out), we set the current map to this saved clone. But there is another map from which the one of the `LibraryCallKit` comes, and that survives longer, it's the one that is contained in the `JVMState`: > > https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/opto/library_call.cpp#L101-L102 > > And here there is the challenge: > - the `JVMState jvms` contains a `SafePoint` map, this map must have `jvms` as `jvms` (pointer comparison) > - we can't really change the pointer, just the content > -... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Remove useless loop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25936/files - new: https://git.openjdk.org/jdk/pull/25936/files/54b07e94..d51853ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25936&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25936&range=00-01 Stats: 24 lines in 1 file changed: 0 ins; 2 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/25936.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25936/head:pull/25936 PR: https://git.openjdk.org/jdk/pull/25936 From mablakatov at openjdk.org Tue Jul 1 16:14:47 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Tue, 1 Jul 2025 16:14:47 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: <3sWLk_sAMLtcvRUjXk9hYe-K2MBQl9fH2Qg0MF7lwDk=.b8867d51-e822-43c0-93ab-58228c6eb1d5@github.com> References: <3sWLk_sAMLtcvRUjXk9hYe-K2MBQl9fH2Qg0MF7lwDk=.b8867d51-e822-43c0-93ab-58228c6eb1d5@github.com> Message-ID: On Tue, 1 Jul 2025 07:00:08 GMT, Xiaohong Gong wrote: >> Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - cleanup: address nits, rename several symbols >> - cleanup: remove unreferenced definitions >> - Address review comments. >> >> - fixup: disable FP mul reduction auto-vectorization for all targets >> - fixup: add a tmp vReg to reduce_mul_integral_gt128b and >> reduce_non_strict_order_mul_fp_gt128bto keep vsrc unmodified >> - cleanup: replace a complex lambda in the above methods with a loop >> - cleanup: rename symbols to follow the existing naming convention >> - cleanup: add asserts to SVE only instructions >> - split mul FP reduction instructions into strictly-ordered (default) >> and explicitly non strictly-ordered >> - remove redundant conditions in TestVectorFPReduction.java >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> | Benchmark | Before | After | Units | Diff | >> |---------------------------|----------|----------|--------|-------| >> | ByteMaxVector.MULLanes | 619.156 | 9884.578 | ops/ms | 1496% | >> | DoubleMaxVector.MULLanes | 184.693 | 2712.051 | ops/ms | 1368% | >> | FloatMaxVector.MULLanes | 277.818 | 3388.038 | ops/ms | 1119% | >> | IntMaxVector.MULLanes | 371.225 | 4765.434 | ops/ms | 1183% | >> | LongMaxVector.MULLanes | 205.149 | 2672.975 | ops/ms | 1203% | >> | ShortMaxVector.MULLanes | 472.804 | 5122.917 | ops/ms | 984% | >> - Merge branch 'master' into 8343689-rebase >> - fixup: don't modify the value in vsrc >> >> Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this >> change, the result of recursive folding is held in vtmp1. To be able to >> pass this intermediate result to reduce_mul_integral_le128b(), we would >> have to use another temporary FloatRegister, as vtmp1 would essentially >> act as vsrc. It's possible to get around this however: >> reduce_mul_integral_le128b() is modified so it's possible to pass >> matching vsrc and vtmp2 arguments. By doing this, we save ourselves a >> temporary register in rules that match to reduce_mul_integral_gt128b(). >> - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating >> - Use EXT instead of COMPACT to split a vector into two halves >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master ... > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 3536: > >> 3534: >> 3535: instruct reduce_mulF_gt128b(vRegF dst, vRegF fsrc, vReg vsrc, vReg tmp) %{ >> 3536: predicate(Matcher::vector_length_in_bytes(n->in(2)) > 16 && n->as_Reduction()->requires_strict_order()); > > Are there the cases that can match with this rule? Well, we don't match it right now for auto-vectorization as it doesn't worth it performance-wise. This might change for future implementations of SVE(2). I'd still prefer to keep it so the set of instructions is complete. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2178014966 From mablakatov at openjdk.org Tue Jul 1 16:14:49 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Tue, 1 Jul 2025 16:14:49 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v5] In-Reply-To: References: Message-ID: <4XhaHrk4r0mgFmgfVUFvy0mktRz25oXfbln2Nhjcxg4=.a7e60853-979f-48de-9fa0-b8530a3b2ba5@github.com> On Tue, 1 Jul 2025 02:51:56 GMT, Hao Sun wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> fixup: remove undefined insts from aarch64-asmtest.py > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3729: > >> 3727: #undef INSN >> 3728: >> 3729: // SVE aliases > > In the inital commit, asm test for `sve_(mov|movs|not|nots)` is added into `test/hotspot/gtest/aarch64/aarch64-asmtest.py`. Since the definition is removed in this commit, the corresponding asm test should be removed as well. Otherwise, JDK build failed on AArch64. > See the error log in GHA test. https://github.com/mikabl-arm/jdk/actions/runs/15974069085/job/45051902618 Thanks, fixed by https://github.com/openjdk/jdk/pull/23181/commits/df09ab65f75c7b6f99e0088b3871d7df7a8c4d1b ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2178016339 From mablakatov at openjdk.org Tue Jul 1 16:25:49 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Tue, 1 Jul 2025 16:25:49 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v3] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 06:21:43 GMT, Xiaohong Gong wrote: >> Why is it better that way? Currently the assertions check that we end up here if there computations that can be done only using SVE (length > neon && length <= sve). What would happen if a user operates 256b VectorAPI vectors on a 512b SVE platform? > > That would be the operations with partial vector size valid. For such cases, we will generate a mask in IR level, and a `VectorBlend` will be generated for this reduction case. Otherwise the result will be incorrect. So the vector size should be equal to MaxVectorSize theoretically. Thank you for elaborating on this :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2178035000 From shade at openjdk.org Tue Jul 1 16:27:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Jul 2025 16:27:42 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code In-Reply-To: References: Message-ID: <3T_kZY0tk0WcS4kkuGcoifEHjo1TlLbLBcjLxb4sD-I=.42bd833a-7fa2-4173-a165-f05e05e6e124@github.com> On Tue, 1 Jul 2025 16:04:12 GMT, Vladimir Kozlov wrote: > This has to be tested by us to make sure we clean up all issues this change find. Sure thing. There is a chicken-and-egg kind of problem that some bugs reproduce only with this PR, and maybe with extra inline tuning :) I am following up on failures that we are seeing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26068#issuecomment-3024727152 From snatarajan at openjdk.org Tue Jul 1 16:28:27 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 1 Jul 2025 16:28:27 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v8] In-Reply-To: References: Message-ID: <4W6QHi3F3RN-JYfYAKUATR_xCUnOiUR0vT73ndqNZtk=.0e193c07-cad0-4cbd-86f2-1758a8c8bac9@github.com> > This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. > > Changes: > - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. > - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` > - Added a new Ideal phase for individual macro elimination steps. > - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). > > Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . > ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) > > Questions to reviewers: > - Is the new macro elimination phase OK, or should we change anything? > - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? > > Testing: > GitHub Actions > tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: review comments fix part 1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25682/files - new: https://git.openjdk.org/jdk/pull/25682/files/939be78b..791b6a0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=06-07 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25682/head:pull/25682 PR: https://git.openjdk.org/jdk/pull/25682 From eastigeevich at openjdk.org Tue Jul 1 16:43:38 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 1 Jul 2025 16:43:38 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 15:37:04 GMT, Aleksey Shipilev wrote: > Looks okay, but I am confused why the test did not fail before JDK-8359435? Just checked. It's not because of JDK-8359435. There were some changes which disabled printing debug info in release build. > test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 32: > >> 30: * @requires vm.flagless >> 31: * @requires os.arch=="aarch64" >> 32: * @requires vm.debug==true > > Can be just `@requires vm.debug`. Done ------------- PR Comment: https://git.openjdk.org/jdk/pull/26072#issuecomment-3024772962 PR Review Comment: https://git.openjdk.org/jdk/pull/26072#discussion_r2178059281 From eastigeevich at openjdk.org Tue Jul 1 16:43:39 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 1 Jul 2025 16:43:39 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 16:05:07 GMT, Evgeny Astigeevich wrote: >> Test compiler/onSpinWait/TestOnSpinWaitAArch64.java needs debug info to identify a position of spin wait instructions in generated code. Release builds might not generate needed debug info. >> >> This PR adds a requirement for the test to be run on debug builds only. >> >> Tested: >> - Fastdebug: test passed >> - Slowdebug: test passed. >> - Release: test skipped. > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Simplify requirement for debug build The test started failing after I had updated my branch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26072#issuecomment-3024774351 From kvn at openjdk.org Tue Jul 1 16:54:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Jul 2025 16:54:42 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v4] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Tue, 1 Jul 2025 06:52:32 GMT, Manuel H?ssig wrote: >> After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. >> >> This PR changes the test to reflect the changes introduced in #25872. >> >> Testing: >> - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) >> - [x] tier1,tier2 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace > > Co-authored-by: Andrey Turbanov Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26024#pullrequestreview-2976217571 From kvn at openjdk.org Tue Jul 1 17:08:43 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Jul 2025 17:08:43 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v7] In-Reply-To: References: Message-ID: <0KyjZgLy8vVqV3du6Y1LIKmGTnDYxEPlYgTrVVd_ey4=.b2d40e6c-ff0e-4e88-bc70-e06219a15608@github.com> On Tue, 1 Jul 2025 13:24:49 GMT, Saranya Natarajan wrote: >> src/hotspot/share/opto/compile.cpp line 2533: >> >>> 2531: { >>> 2532: TracePhase tp(_t_macroExpand); >>> 2533: print_method(PHASE_BEFORE_MACRO_EXPANSION, 3); >> >> Should we move it before `mex.expand_macro_nodes()` call? > > Moving this would break the assumption of needing a `BEFORE_MACRO_ELIMINATION` as explained in the above reply. One way to go about this would be to include a `BEFORE_MACRO_ELIMINATION` phase and remove the `PHASE_BEFORE_MACRO_EXPANSION` phase as this is only place where it is used. Would this be a reasonable fix ? So `MACRO_ELIMINATION` is subset of `MACRO_EXPANSION` >> src/hotspot/share/opto/phasetype.hpp line 94: >> >>> 92: flags(AFTER_LOOP_OPTS, "After Loop Optimizations") \ >>> 93: flags(AFTER_MERGE_STORES, "After Merge Stores") \ >>> 94: flags(AFTER_MACRO_ELIMINATION_STEP, "After Macro Elimination Step") \ >> >> What is the reason to not have `BEFORE_MACRO_ELIMINATION`? > > The two main reasons for not having a `BEFORE_MACRO_ELIMINATION` are as follows: > - There is a dump in line 2426 (`print_method(PHASE_ITER_GVN_AFTER_EA, 2)`) before we call `mexp.eliminate_macro_nodes` which performs the functionality of having a `BEFORE_MACRO_ELIMINATION` for phase dump. > - There is dump in line 2533 (`print_method(PHASE_BEFORE_MACRO_EXPANSION, 3)`) before eliminating macro nodes which performs the similar function. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2178120635 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2178120168 From kvn at openjdk.org Tue Jul 1 17:08:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Jul 2025 17:08:41 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v8] In-Reply-To: <4W6QHi3F3RN-JYfYAKUATR_xCUnOiUR0vT73ndqNZtk=.0e193c07-cad0-4cbd-86f2-1758a8c8bac9@github.com> References: <4W6QHi3F3RN-JYfYAKUATR_xCUnOiUR0vT73ndqNZtk=.0e193c07-cad0-4cbd-86f2-1758a8c8bac9@github.com> Message-ID: On Tue, 1 Jul 2025 16:28:27 GMT, Saranya Natarajan wrote: >> This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. >> >> Changes: >> - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. >> - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` >> - Added a new Ideal phase for individual macro elimination steps. >> - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). >> >> Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . >> ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) >> >> Questions to reviewers: >> - Is the new macro elimination phase OK, or should we change anything? >> - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? >> >> Testing: >> GitHub Actions >> tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > review comments fix part 1 Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25682#pullrequestreview-2976289575 From shade at openjdk.org Tue Jul 1 17:13:43 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Jul 2025 17:13:43 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 16:05:07 GMT, Evgeny Astigeevich wrote: >> Test compiler/onSpinWait/TestOnSpinWaitAArch64.java needs debug info to identify a position of spin wait instructions in generated code. Release builds might not generate needed debug info. >> >> This PR adds a requirement for the test to be run on debug builds only. >> >> Tested: >> - Fastdebug: test passed >> - Slowdebug: test passed. >> - Release: test skipped. > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Simplify requirement for debug build OK, are you able to bisect which change? This fix to only do debug VM needs to be correctly linked to the actual cause, IMO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26072#issuecomment-3024882710 From psandoz at openjdk.org Tue Jul 1 18:06:45 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 1 Jul 2025 18:06:45 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 09:16:48 GMT, Xiaohong Gong wrote: >> JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). >> >> Two key areas require improvement: >> 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. >> 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. >> >> This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. >> >> Main changes: >> 1. Java-side API refactoring: >> - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on >> architectures like AArch64. >> 2. C2 compiler IR refactoring: >> - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. >> 3. Backend changes: >> - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. >> >> Performance: >> The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: >> >> Benchmark Mode Cnt Unit SIZE Before After Gain >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.31... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Address review comments > - Merge 'jdk:master' into JDK-8355563 > - 8355563: VectorAPI: Refactor current implementation of subword gather load API Marked as reviewed by psandoz (Reviewer). This is a nice simplification, Java changes look good. I'll let the Intel folks sign-off related to regressions. IMO minor regressions like this are acceptable if the generated code quality is good, and if the benchmark reports higher variance and averaging results from multiple forks close the gap. (In this case i don't understand how the Java changes impacts alignment). ------------- PR Review: https://git.openjdk.org/jdk/pull/25138#pullrequestreview-2976493924 PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3025029477 From dlunden at openjdk.org Tue Jul 1 18:08:40 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Jul 2025 18:08:40 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v8] In-Reply-To: <4W6QHi3F3RN-JYfYAKUATR_xCUnOiUR0vT73ndqNZtk=.0e193c07-cad0-4cbd-86f2-1758a8c8bac9@github.com> References: <4W6QHi3F3RN-JYfYAKUATR_xCUnOiUR0vT73ndqNZtk=.0e193c07-cad0-4cbd-86f2-1758a8c8bac9@github.com> Message-ID: On Tue, 1 Jul 2025 16:28:27 GMT, Saranya Natarajan wrote: >> This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. >> >> Changes: >> - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. >> - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` >> - Added a new Ideal phase for individual macro elimination steps. >> - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). >> >> Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . >> ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) >> >> Questions to reviewers: >> - Is the new macro elimination phase OK, or should we change anything? >> - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? >> >> Testing: >> GitHub Actions >> tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > review comments fix part 1 Marked as reviewed by dlunden (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25682#pullrequestreview-2976500092 From sviswanathan at openjdk.org Tue Jul 1 21:33:44 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Jul 2025 21:33:44 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 09:16:48 GMT, Xiaohong Gong wrote: >> JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). >> >> Two key areas require improvement: >> 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. >> 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. >> >> This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. >> >> Main changes: >> 1. Java-side API refactoring: >> - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on >> architectures like AArch64. >> 2. C2 compiler IR refactoring: >> - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. >> 3. Backend changes: >> - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. >> >> Performance: >> The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: >> >> Benchmark Mode Cnt Unit SIZE Before After Gain >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.31... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Address review comments > - Merge 'jdk:master' into JDK-8355563 > - 8355563: VectorAPI: Refactor current implementation of subword gather load API Marked as reviewed by sviswanathan (Reviewer). Agree with Paul, these are minor regressions. Let us proceed with this patch. ------------- PR Review: https://git.openjdk.org/jdk/pull/25138#pullrequestreview-2977019367 PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3025596784 From sviswanathan at openjdk.org Wed Jul 2 00:04:39 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 2 Jul 2025 00:04:39 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v5] In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 08:38:27 GMT, Jatin Bhateja wrote: >> Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. >> >> **The following pseudo-code describes the existing algorithm for min/max[FD]:** >> >> Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. >> >> btmp = (b < +0.0) ? a : b >> atmp = (b < +0.0) ? b : a >> Tmp = Max_Float(atmp , btmp) >> Res = (atmp == NaN) ? atmp : Tmp >> >> For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. >> >> btmp = (b < +0.0) ? b : a >> atmp = (b < +0.0) ? a : b >> Tmp = Max_Float(atmp , btmp) >> Res = (atmp == NaN) ? atmp : Tmp >> >> Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. >> >> Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 > > Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/cpu/x86/x86_64.ad > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/cpu/x86/x86_64.ad > > Co-authored-by: Manuel H?ssig src/hotspot/cpu/x86/assembler_x86.cpp line 8800: > 8798: attributes.set_is_evex_instruction(); > 8799: attributes.set_embedded_opmask_register_specifier(mask); > 8800: attributes.set_address_attributes(/* tuple_type */ EVEX_FVM, /* input_size_in_bits */ EVEX_NObit); It looks to me that the tuple_type should be EVEX_FV for all of evminmax ps, pd, ph. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2178735442 From kbarrett at openjdk.org Wed Jul 2 00:30:44 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 2 Jul 2025 00:30:44 GMT Subject: RFR: 8361086: JVMCIGlobals::check_jvmci_flags_are_consistent has incorrect format string In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 23:16:20 GMT, Vladimir Kozlov wrote: >> Please review this trivial fix of a format string. The value being printed is >> TieredStopAtLevel, which is of type intx, so "%zd" should be used instead of "%d". >> >> Testing: mach5 tier1 > > Thank you for checking other solutions. > > Current fix is good. Thanks for reviews @vnkozlov , @mhaessig , and @mur47x111 ------------- PR Comment: https://git.openjdk.org/jdk/pull/26051#issuecomment-3025913681 From kbarrett at openjdk.org Wed Jul 2 00:30:44 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 2 Jul 2025 00:30:44 GMT Subject: Integrated: 8361086: JVMCIGlobals::check_jvmci_flags_are_consistent has incorrect format string In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 16:14:08 GMT, Kim Barrett wrote: > Please review this trivial fix of a format string. The value being printed is > TieredStopAtLevel, which is of type intx, so "%zd" should be used instead of "%d". > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: c6448dc3 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/c6448dc3afb1da9d93bb94804aa1971a650b91b7 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8361086: JVMCIGlobals::check_jvmci_flags_are_consistent has incorrect format string Reviewed-by: kvn, mhaessig, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/26051 From sviswanathan at openjdk.org Wed Jul 2 00:31:41 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 2 Jul 2025 00:31:41 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v5] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 23:49:30 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/cpu/x86/x86_64.ad >> >> Co-authored-by: Manuel H?ssig >> - Update src/hotspot/cpu/x86/x86_64.ad >> >> Co-authored-by: Manuel H?ssig > > src/hotspot/cpu/x86/assembler_x86.cpp line 8800: > >> 8798: attributes.set_is_evex_instruction(); >> 8799: attributes.set_embedded_opmask_register_specifier(mask); >> 8800: attributes.set_address_attributes(/* tuple_type */ EVEX_FVM, /* input_size_in_bits */ EVEX_NObit); > > It looks to me that the tuple_type should be EVEX_FV for all of evminmax ps, pd, ph. Other than that the rest of the PR looks good to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2178762877 From xgong at openjdk.org Wed Jul 2 01:45:46 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 2 Jul 2025 01:45:46 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: <19rf4A0bxc4BstRmLivGkoCOm7Qa7YD6z1VJHJivCtg=.4a643c7b-4e79-4f37-b230-7231df3c68a8@github.com> References: <3sWLk_sAMLtcvRUjXk9hYe-K2MBQl9fH2Qg0MF7lwDk=.b8867d51-e822-43c0-93ab-58228c6eb1d5@github.com> <19rf4A0bxc4BstRmLivGkoCOm7Qa7YD6z1VJHJivCtg=.4a643c7b-4e79-4f37-b230-7231df3c68a8@github.com> Message-ID: On Tue, 1 Jul 2025 16:07:59 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2097: >> >>> 2095: sve_movprfx(vtmp1, vsrc); // copy >>> 2096: sve_ext(vtmp1, vtmp1, vector_length_in_bytes / 2); // swap halves >>> 2097: sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vsrc); // multiply halves >> >>> sve_mul(vtmp1, elemType_to_regVariant(bt), pgtmp, vsrc); >> >> Can we use `ptrue` instread of `pgtmp` here? The higher bits can be computed, but they have not influences to the final results, right? > > Thanks! For some reason I thought that we don't have a dedicated predicate register for that. We can directly use `ptrue` here which maps to `p7` and has been preserved and initialized as all true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2178816427 From xgong at openjdk.org Wed Jul 2 01:48:50 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 2 Jul 2025 01:48:50 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: References: <3sWLk_sAMLtcvRUjXk9hYe-K2MBQl9fH2Qg0MF7lwDk=.b8867d51-e822-43c0-93ab-58228c6eb1d5@github.com> Message-ID: On Tue, 1 Jul 2025 16:10:58 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 3536: >> >>> 3534: >>> 3535: instruct reduce_mulF_gt128b(vRegF dst, vRegF fsrc, vReg vsrc, vReg tmp) %{ >>> 3536: predicate(Matcher::vector_length_in_bytes(n->in(2)) > 16 && n->as_Reduction()->requires_strict_order()); >> >> Are there the cases that can match with this rule? > > Well, we don't match it right now for auto-vectorization as it doesn't worth it performance-wise. This might change for future implementations of SVE(2). I'd still prefer to keep it so the set of instructions is complete. Removing is fine to me, as actually we do not have the case to test the correctness. Or maybe you could just do some changes locally (e.g. removing the `requires_strict_order` predication and the un-strict-order rule), and test it with VectorAPI cases? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2178819064 From xgong at openjdk.org Wed Jul 2 01:54:50 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 2 Jul 2025 01:54:50 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 18:03:33 GMT, Paul Sandoz wrote: > This is a nice simplification, Java changes look good. I'll let the Intel folks sign-off related to regressions. IMO minor regressions like this are acceptable if the generated code quality is good, and if the benchmark reports higher variance and averaging results from multiple forks close the gap. (In this case i don't understand how the Java changes impacts alignment). Thanks for your review and comments @PaulSandoz ! The java changes in this patch makes the outer loop in test not be peeled as before since all the range checks or branches are hoisted out side of the loop. While it needs one iteration of loop peeling to eliminate branches before. I think this makes the whole generated code's layout changes a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3026080127 From xgong at openjdk.org Wed Jul 2 01:54:51 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 2 Jul 2025 01:54:51 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: References: Message-ID: <7GrGfBF_v8F0v02sRHC78ofMZwpMdzQZaHeYlNvi_N0=.93defb9e-ca9b-41b4-8722-1746692e2316@github.com> On Tue, 1 Jul 2025 21:30:20 GMT, Sandhya Viswanathan wrote: > Agree with Paul, these are minor regressions. Let us proceed with this patch. Thanks so much for your review @sviswa7 ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3026080679 From jbhateja at openjdk.org Wed Jul 2 01:57:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Jul 2025 01:57:46 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v6] In-Reply-To: References: Message-ID: > Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. > > **The following pseudo-code describes the existing algorithm for min/max[FD]:** > > Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. > > btmp = (b < +0.0) ? a : b > atmp = (b < +0.0) ? b : a > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. > > btmp = (b < +0.0) ? b : a > atmp = (b < +0.0) ? a : b > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. > > Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Sandhya's review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25914/files - new: https://git.openjdk.org/jdk/pull/25914/files/5597b615..3854a871 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25914.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25914/head:pull/25914 PR: https://git.openjdk.org/jdk/pull/25914 From jbhateja at openjdk.org Wed Jul 2 02:04:41 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Jul 2025 02:04:41 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v5] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 00:29:02 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 8800: >> >>> 8798: attributes.set_is_evex_instruction(); >>> 8799: attributes.set_embedded_opmask_register_specifier(mask); >>> 8800: attributes.set_address_attributes(/* tuple_type */ EVEX_FVM, /* input_size_in_bits */ EVEX_NObit); >> >> It looks to me that the tuple_type should be EVEX_FV for all of evminmax ps, pd, ph. > > Other than that the rest of the PR looks good to me. > It looks to me that the tuple_type should be EVEX_FV for all of evminmax ps, pd, ph. Yes, all these new vector instructions do have embedded broadcast variants. We don't use them currently, in the absence of embedded broadcasting, the scalar factor (N) selection for compressed disp8 displacement is the same for both EVEX_FV and EVEX_FVM tuple types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2178831749 From xgong at openjdk.org Wed Jul 2 02:39:33 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 2 Jul 2025 02:39:33 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v2] In-Reply-To: References: Message-ID: > ### Background > On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions. > > For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size. > > To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors. > > ### Impact Analysis > #### 1. Vector types > Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change. > > #### 2. Vector API > No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length. > > #### 3. Auto-vectorization > Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks. > > #### 4. Codegen of vector nodes > NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored. > > Details: > - Lanewise vector operations are unaffected as explained above. > - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE). > - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, adding an explicit vector size check in `match_rule_s... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Refine comments based on review suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26057/files - new: https://git.openjdk.org/jdk/pull/26057/files/5af5bd49..4e15e588 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26057&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26057&range=00-01 Stats: 9 lines in 3 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/26057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26057/head:pull/26057 PR: https://git.openjdk.org/jdk/pull/26057 From xgong at openjdk.org Wed Jul 2 02:39:34 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 2 Jul 2025 02:39:34 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors In-Reply-To: References: Message-ID: <0PdYt-pCobM5mAb4q3nDcR9PKz89QVFCsZF-jnMAv4Q=.6a5d9f1f-8b68-448c-ab72-2f7f4a12322e@github.com> On Tue, 1 Jul 2025 05:59:15 GMT, Xiaohong Gong wrote: > ### Background > On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions. > > For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size. > > To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors. > > ### Impact Analysis > #### 1. Vector types > Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change. > > #### 2. Vector API > No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length. > > #### 3. Auto-vectorization > Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks. > > #### 4. Codegen of vector nodes > NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored. > > Details: > - Lanewise vector operations are unaffected as explained above. > - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE). > - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, adding an explicit vector size check in `match_rule_s... Hi @theRealAph , I'v updated the patch by fixing the comment issues. Could you please take a look at it again? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3026147575 From thartmann at openjdk.org Wed Jul 2 05:22:38 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Jul 2025 05:22:38 GMT Subject: RFR: 8361180: Disable CompiledDirectCall verification with -VerifyInlineCaches In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 10:47:40 GMT, Aleksey Shipilev wrote: > Missed the spot when doing [JDK-8360867](https://bugs.openjdk.org/browse/JDK-8360867). There is a path from GC that calls into IC verification when cleaning the caches. See `nmethod::cleanup_inline_caches_impl`. It does verification per callsite, and does the whole thing during parallel GC cleanup, which is STW at least in G1. This gets expensive for CTW scenarios. We should wrap that under the same flag introduced by [JDK-8360867](https://bugs.openjdk.org/browse/JDK-8360867). > > Motivational improvements: > > > $ time CONF=linux-x86_64-server-fastdebug make test TEST=applications/ctw/modules/ > > # Current mainline > real 3m59.274s > user 68m9.663s > sys 5m19.026s > > # This PR > real 3m49.118s > user 65m37.962s > sys 5m15.441s Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26063#pullrequestreview-2977769711 From thartmann at openjdk.org Wed Jul 2 05:36:24 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Jul 2025 05:36:24 GMT Subject: [jdk25] RFR: 8358179: Performance regression in Math.cbrt Message-ID: Hi all, This pull request contains a backport of commit [38f59f84](https://github.com/openjdk/jdk/commit/38f59f84c98dfd974eec0c05541b2138b149def7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Mohamed Issa on 1 Jul 2025 and was reviewed by Sandhya Viswanathan, Srinivas Vamsi Parasa and Emanuel Peter. Thanks! ------------- Commit messages: - Backport 38f59f84c98dfd974eec0c05541b2138b149def7 Changes: https://git.openjdk.org/jdk/pull/26085/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26085&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358179 Stats: 50 lines in 1 file changed: 11 ins; 36 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26085.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26085/head:pull/26085 PR: https://git.openjdk.org/jdk/pull/26085 From shade at openjdk.org Wed Jul 2 05:40:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Jul 2025 05:40:42 GMT Subject: RFR: 8361180: Disable CompiledDirectCall verification with -VerifyInlineCaches In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 10:47:40 GMT, Aleksey Shipilev wrote: > Missed the spot when doing [JDK-8360867](https://bugs.openjdk.org/browse/JDK-8360867). There is a path from GC that calls into IC verification when cleaning the caches. See `nmethod::cleanup_inline_caches_impl`. It does verification per callsite, and does the whole thing during parallel GC cleanup, which is STW at least in G1. This gets expensive for CTW scenarios. We should wrap that under the same flag introduced by [JDK-8360867](https://bugs.openjdk.org/browse/JDK-8360867). > > Motivational improvements: > > > $ time CONF=linux-x86_64-server-fastdebug make test TEST=applications/ctw/modules/ > > # Current mainline > real 3m59.274s > user 68m9.663s > sys 5m19.026s > > # This PR > real 3m49.118s > user 65m37.962s > sys 5m15.441s Thanks! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26063#issuecomment-3026519823 From shade at openjdk.org Wed Jul 2 05:40:43 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Jul 2025 05:40:43 GMT Subject: Integrated: 8361180: Disable CompiledDirectCall verification with -VerifyInlineCaches In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 10:47:40 GMT, Aleksey Shipilev wrote: > Missed the spot when doing [JDK-8360867](https://bugs.openjdk.org/browse/JDK-8360867). There is a path from GC that calls into IC verification when cleaning the caches. See `nmethod::cleanup_inline_caches_impl`. It does verification per callsite, and does the whole thing during parallel GC cleanup, which is STW at least in G1. This gets expensive for CTW scenarios. We should wrap that under the same flag introduced by [JDK-8360867](https://bugs.openjdk.org/browse/JDK-8360867). > > Motivational improvements: > > > $ time CONF=linux-x86_64-server-fastdebug make test TEST=applications/ctw/modules/ > > # Current mainline > real 3m59.274s > user 68m9.663s > sys 5m19.026s > > # This PR > real 3m49.118s > user 65m37.962s > sys 5m15.441s This pull request has now been integrated. Changeset: 1ac74898 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/1ac74898745ce9b109db5571d9dcbd907dd05831 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8361180: Disable CompiledDirectCall verification with -VerifyInlineCaches Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/26063 From yongheng_hgq at 126.com Wed Jul 2 05:49:19 2025 From: yongheng_hgq at 126.com (h) Date: Wed, 2 Jul 2025 13:49:19 +0800 (CST) Subject: =?GBK?Q?RFR:_8358568=A3=BAC2_compilation_hits_"must_have_a_mon?= =?GBK?Q?itor"_assert_with_-XX:-GenerateSynchronizationCode?= Message-ID: <5f3eb53a.5267.197c9aeb416.Coremail.yongheng_hgq@126.com> Hi all, Please review this fix for JDK-8358568. It addresses a crash caused by accessing monitor info when -XX:-GenerateSynchronizationCode is set. The fix adds a guard in Parse::do_monitor_exit() to avoid the crash. Thank you in advance.Changes: https://github.com/openjdk/jdk8u-dev/pull/664/fileswebrev: https://openjdk.github.io/cr/?repo=jdk8u-dev&pr=664&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358568 Patch: https://git.openjdk.org/jdk8u-dev/pull/664.diff PR: https://github.com/openjdk/jdk8u-dev/pull/664 BR -------------- next part -------------- An HTML attachment was scrubbed... URL: From haosun at openjdk.org Wed Jul 2 06:45:46 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 2 Jul 2025 06:45:46 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v3] In-Reply-To: References: Message-ID: <8L1C1JR9H-GIASZlUG7Gk5Jf9rjVEVuBn-Sf9r8STYA=.843085aa-efb3-436e-acb3-ab4d1f52a9d8@github.com> On Wed, 18 Jun 2025 12:12:16 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2002: >> >>> 2000: assert(vector_length_in_bytes == 8 || vector_length_in_bytes == 16, "unsupported"); >>> 2001: assert_different_registers(vtmp1, vsrc); >>> 2002: assert_different_registers(vtmp1, vtmp2); >> >> nit: would be neat to use >> Suggestion: >> >> assert_different_registers(vsrc, vtmp1, vtmp2); > > `vsrc` and `vtmp2` are allowed to match. I see your point. IIUC, we should not modify `vsrc` as it's the source operand. If we allow `vsrc` and `vtmp2` to match, then `vsrc` is modified then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2179185158 From haosun at openjdk.org Wed Jul 2 06:45:48 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 2 Jul 2025 06:45:48 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v5] In-Reply-To: References: Message-ID: <2zMCHzKXQ1kBfjcU5Fc8s6fa2W6TTCKpSSjhB0dMdLw=.3c43071b-3982-4e0e-a300-e0547f4fbbec@github.com> On Tue, 1 Jul 2025 15:48:00 GMT, Mikhail Ablakatov wrote: >> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. >> >> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. >> >> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms >> >> >> Fujitsu A64FX (SVE 512-bit): >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > fixup: remove undefined insts from aarch64-asmtest.py test/hotspot/jtreg/compiler/loopopts/superword/TestVectorFPReduction.java line 2: > 1: /* > 2: * Copyright (c) 2025, Arm Limited. All rights reserved. `XX, YY,` means this file was created at XX year and the latest update was made at YY year. If `XX=YY`, then use `XX,`. Suggestion: * Copyright (c) 2024, 2025, Arm Limited. All rights reserved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2178924210 From dfenacci at openjdk.org Wed Jul 2 07:05:40 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 2 Jul 2025 07:05:40 GMT Subject: RFR: 8361144: Strenghten the Ideal Verification in PhaseIterGVN::verify_Ideal_for by comparing the hash of a node before and after Ideal In-Reply-To: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> References: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> Message-ID: On Tue, 1 Jul 2025 11:35:06 GMT, Beno?t Maillard wrote: > This PR adds a node hash comparison after calling `Ideal` in `PhaseIterGVN::verify_Ideal_for` to introduce an additional layer of verification for missed optimizations. Previously, we relied on the return value of `Ideal`, which is expected to be `nullptr` if no transformation was done. > > By also checking the node's hash before and after `Ideal`, we could catch inconsistencies in the implementation or unintended modifications to the graph. Both of these may indicate missed or incomplete optimizations. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8361144) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Thanks @benoitmaillard! Definitely an additional check worth doing. I left a couple of inline comments. src/hotspot/share/opto/phaseX.cpp line 1821: > 1819: // The number of nodes shoud not increase. > 1820: uint old_unique = C->unique(); > 1821: uint old_hash = n->hash(); Just to be consistent with `old_unique` we could add a small comment (here or below for both). What do you think? src/hotspot/share/opto/phaseX.cpp line 1838: > 1836: stringStream ss; // Print as a block without tty lock. > 1837: ss.cr(); > 1838: ss.print_cr("Ideal optimization did not make progress but hash node changed."); Suggestion: ss.print_cr("Ideal optimization did not make progress but node hash changed."); ------------- PR Review: https://git.openjdk.org/jdk/pull/26064#pullrequestreview-2977964471 PR Review Comment: https://git.openjdk.org/jdk/pull/26064#discussion_r2179270798 PR Review Comment: https://git.openjdk.org/jdk/pull/26064#discussion_r2179279429 From bmaillard at openjdk.org Wed Jul 2 07:19:30 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 2 Jul 2025 07:19:30 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v5] In-Reply-To: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: > This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. > > ### Context > During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. > > In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). > > ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) > > ### Detailed Analysis > > In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which > results in a type refinement: the range gets restricted to `int:-13957..-1191`. > > ```c++ > // Pull from worklist; compute new value; push changes out. > // This loop is the meat of CCP. > while (worklist.size() != 0) { > Node* n = fetch_next_node(worklist); > DEBUG_ONLY(worklist_verify.push(n);) > if (n->is_SafePoint()) { > // Make sure safepoints are processed by PhaseCCP::transform even if they are > // not reachable from the bottom. Otherwise, infinite loops would be removed. > _root_and_safepoints.push(n); > } > const Type* new_type = n->Value(this); > if (new_type != type(n)) { > DEBUG_ONLY(verify_type(n, new_type, type(n));) > dump_type_and_node(n, new_type); > set_type(n, new_type); > push_child_nodes_to_worklist(worklist, n); > } > if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { > // Keep track of Type nodes to kill CFG paths that use Type > // nodes that become dead. > _maybe_top_type_nodes.push(n); > } > } > DEBUG_ONLY(verify_analyze(worklist_verify);) > > > At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: > - `int` for node `591` (`ModINode`) > - `int:-13957..-1191` for node `138` (`PhiNode`) > > If we call `find_node(138)->bottom_type()`, we get: > - `int` for both nodes > > There is no progress on the type of `ModINode` during CCP, because `ModINode::Value` > is not able to... Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Fix bad test class name - 8359602: rename test - 8359602: remove requires.debug=true and add -XX:+IgnoreUnrecognizedVMOptions flag - 8359602: add comment - 8359602: add test summary and comments - 8359602: tag requires vm.debug == true - 8359602: Add test from fuzzer - 8359602: Add users to IGVN worklist when type is refined in CCP ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26017/files - new: https://git.openjdk.org/jdk/pull/26017/files/005b2825..a66d3fb4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26017&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26017&range=03-04 Stats: 18268 lines in 747 files changed: 7677 ins; 6510 del; 4081 mod Patch: https://git.openjdk.org/jdk/pull/26017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26017/head:pull/26017 PR: https://git.openjdk.org/jdk/pull/26017 From thartmann at openjdk.org Wed Jul 2 07:19:31 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Jul 2025 07:19:31 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v4] In-Reply-To: References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: On Tue, 1 Jul 2025 12:58:29 GMT, Beno?t Maillard wrote: >> This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. >> >> ### Context >> During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. >> >> In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). >> >> ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) >> >> ### Detailed Analysis >> >> In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which >> results in a type refinement: the range gets restricted to `int:-13957..-1191`. >> >> ```c++ >> // Pull from worklist; compute new value; push changes out. >> // This loop is the meat of CCP. >> while (worklist.size() != 0) { >> Node* n = fetch_next_node(worklist); >> DEBUG_ONLY(worklist_verify.push(n);) >> if (n->is_SafePoint()) { >> // Make sure safepoints are processed by PhaseCCP::transform even if they are >> // not reachable from the bottom. Otherwise, infinite loops would be removed. >> _root_and_safepoints.push(n); >> } >> const Type* new_type = n->Value(this); >> if (new_type != type(n)) { >> DEBUG_ONLY(verify_type(n, new_type, type(n));) >> dump_type_and_node(n, new_type); >> set_type(n, new_type); >> push_child_nodes_to_worklist(worklist, n); >> } >> if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { >> // Keep track of Type nodes to kill CFG paths that use Type >> // nodes that become dead. >> _maybe_top_type_nodes.push(n); >> } >> } >> DEBUG_ONLY(verify_analyze(worklist_verify);) >> >> >> At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: >> - `int` for node `591` (`ModINode`) >> - `int:-13957..-1191` for node `138` (`PhiNode`) >> >> If we call `find_node(138)->bottom_type()`, we get: >> - `int` for both nodes >> >> The... > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8359602: update case for consistency > > Co-authored-by: Emanuel Peter Still good, thanks for making these changes. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26017#pullrequestreview-2978014920 From thartmann at openjdk.org Wed Jul 2 07:20:41 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Jul 2025 07:20:41 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code In-Reply-To: References: Message-ID: <-7cfzVghCWnUCfB1F3dcyG2fvJGnqREUW98qiVJEvQQ=.db06fb1e-e96e-4e00-bac0-098b4e1de54c@github.com> On Tue, 1 Jul 2025 12:26:44 GMT, Aleksey Shipilev wrote: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) I submitted some testing to make sure that CTW is clean in our CI. src/hotspot/share/compiler/compiler_globals.hpp line 400: > 398: product(bool, InlineColdMethods, false, DIAGNOSTIC, \ > 399: "Inline methods cold methods that would otherwise rejected " \ > 400: "based on profile information. Only useful for compiler testing.")\ Suggestion: "Inline cold methods that would otherwise be rejected based" \ "on profile information. Only useful for compiler testing.") \ ------------- PR Comment: https://git.openjdk.org/jdk/pull/26068#issuecomment-3026732006 PR Review Comment: https://git.openjdk.org/jdk/pull/26068#discussion_r2179310625 From eastigeevich at openjdk.org Wed Jul 2 07:40:40 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 2 Jul 2025 07:40:40 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 16:05:07 GMT, Evgeny Astigeevich wrote: >> Test compiler/onSpinWait/TestOnSpinWaitAArch64.java needs debug info to identify a position of spin wait instructions in generated code. Release builds might not generate needed debug info. >> >> This PR adds a requirement for the test to be run on debug builds only. >> >> Tested: >> - Fastdebug: test passed >> - Slowdebug: test passed. >> - Release: test skipped. > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Simplify requirement for debug build I finished bisecting. This is my changes in the test which made it failing: @@ -56,7 +58,6 @@ public static void main(String[] args) throws Exception { command.add("-showversion"); command.add("-XX:-BackgroundCompilation"); command.add("-XX:+UnlockDiagnosticVMOptions"); - command.add("-XX:+PrintAssembly"); if (compiler.equals("c2")) { command.add("-XX:-TieredCompilation"); } else if (compiler.equals("c1")) { @@ -69,13 +70,17 @@ public static void main(String[] args) throws Exception { command.add("-XX:OnSpinWaitInst=" + spinWaitInst); command.add("-XX:OnSpinWaitInstCount=" + spinWaitInstCount); command.add("-XX:CompileCommand=compileonly," + Launcher.class.getName() + "::" + "test"); + command.add("-XX:CompileCommand=print," + Launcher.class.getName() + "::" + "test"); command.add(Launcher.class.getName()); It looks like `XX:+PrintAssembly` prints out debug info in release builds but `XX:CompileCommand=print` does not. I am switching back to `XX:+PrintAssembly`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26072#issuecomment-3026790161 From thartmann at openjdk.org Wed Jul 2 07:43:47 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Jul 2025 07:43:47 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v4] In-Reply-To: References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: <6XXAMA5_Jq8NxpK0TOTAJWkYhDXIo4Wrnz_0X32SkqQ=.b9e29a9c-fe36-4c04-88bc-d276a66fd711@github.com> On Wed, 2 Jul 2025 07:14:34 GMT, Tobias Hartmann wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> 8359602: update case for consistency >> >> Co-authored-by: Emanuel Peter > > Still good, thanks for making these changes. > @TobiHartmann how much should he invest in this now? An alternative is just tackling all the other cases later. What do you think? Yes, agreed. Let's handle this later. (Sorry, somehow I thought I had replied to this already - must have missed pressing the Comment button..) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26017#issuecomment-3026800468 From thartmann at openjdk.org Wed Jul 2 07:50:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Jul 2025 07:50:50 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> References: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> Message-ID: On Mon, 23 Jun 2025 08:56:12 GMT, Taizo Kurashige wrote: >> This PR is improvement of warning message when fail to load hsdis library. >> >> [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) introduced a warning on hsdis library load failure. This is useful when the user executes -XX:+PrintAssembly, etc. >> >> However, I think that when hs_err occurs, users might be confused by this warning printed by Xlog. Because users are not likely to know that hsdis is loaded for the [MachCode] section of the hs_err report, they may wonder, for example, "Why do I get warnings about hsdis load errors when -XX:+PrintAssembly is not specified?." >> >> To clear up this confusion, I suggest printing a warning just before [MachCode]. >> >>
>> >> sample output >> >> If hs_err occurs and hsdis load fails without the option to specify where the hs_err report should be output, the following is output to the hs_err_pir log file: >> >> . >> . >> native method entry point (kind = native) [0x000001ae8753cec0, 0x000001ae8753dac0] 3072 bytes >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> 0x000001ae8753cec0: 488b 4b08 | 0fb7 492e | 584c 8d74 | ccf8 6800 | 0000 0068 | 0000 0000 | 5055 488b | ec41 5548 >> 0x000001ae8753cee0: 8b43 084c | 8d68 3848 | 8b40 0868 | 0000 0000 | 5348 8b50 | 18 >> . >> . >> >> >> If -XX:+PrintAssembly is specified and hsdis load fails, the following is output to the stdout >> >> $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version >> OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output >> >> ============================= C1-compiled nmethod ============================== >> ----------------------------------- Assembly ----------------------------------- >> >> Compiled method (c1) 57 2 3 java.lang.Object:: (1 bytes) >> total in heap [0x0000024a08a00008,0x0000024a08a00208] = 512 >> . >> . >> >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Instructions begin] >> 0x0000024a08a00100: 6666 660f | 1f84 0000 | 0000 0066 | 6666 9066 | 6690 448b | 5208 443b >> . >> . >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Verified Entry Point] >> # {method} {0x00000000251a1898} 'toUnsignedInt' '(B)I' in 'java/lang/Byte >> . >> . >> >> >>
>> >> Since... > > Taizo Kurashige has updated the pull request incrementally with one additional commit since the last revision: > > Fix message and revert lines for Xlog Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25726#pullrequestreview-2978107826 From thartmann at openjdk.org Wed Jul 2 07:50:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Jul 2025 07:50:52 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: References: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> <7UkSbnceEz4PY3UDwyR9iOseuvS4sD8FBBGl96mG_lk=.e94b4126-9df5-406b-a3f3-b21439d848e6@github.com> Message-ID: On Mon, 30 Jun 2025 11:08:45 GMT, Taizo Kurashige wrote: > but since nullptr is passed at [src/hotspot/share/compiler/disassembler.hpp#L66](https://github.com/openjdk/jdk/blob/c2d76f9844aadf77a0b213a9169a7c5c8c8f1ffb/src/hotspot/share/compiler/disassembler.hpp#L66), that reporting doesn't actually work. Right, it will be set to `tty` when Verbose is true: https://github.com/openjdk/jdk/blob/c2d76f9844aadf77a0b213a9169a7c5c8c8f1ffb/src/hotspot/share/compiler/disassembler.cpp#L780 Thanks for the additional details of why you decided to not use that code. I'm fine with these changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25726#issuecomment-3026818978 From aph at openjdk.org Wed Jul 2 08:05:44 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 2 Jul 2025 08:05:44 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 07:37:40 GMT, Evgeny Astigeevich wrote: > > It looks like `XX:+PrintAssembly` prints out debug info in release builds but `XX:CompileCommand=print` does not. I am switching back to `XX:+PrintAssembly`. That's not great. What info do you need, exactly? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26072#issuecomment-3026870108 From bkilambi at openjdk.org Wed Jul 2 08:10:24 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 2 Jul 2025 08:10:24 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v9] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge master - code style issues fixed - Addressed review comments - Addressed review comments - Revert a small change in c2_MacroAssembler.hpp - Addressed review comments - Addressed review comments and added a JTREG test - Merge master - 8348868: AArch64: Add backend support for SelectFromTwoVector This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. For 64-bit vector length : Neon tbl instruction is generated for T_SHORT and T_BYTE types only. For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - Benchmark (size) Mode Cnt Gain SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. ------------- Changes: https://git.openjdk.org/jdk/pull/23570/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=08 Stats: 987 lines in 11 files changed: 952 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/23570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23570/head:pull/23570 PR: https://git.openjdk.org/jdk/pull/23570 From epeter at openjdk.org Wed Jul 2 08:11:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Jul 2025 08:11:44 GMT Subject: [jdk25] RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 05:30:39 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [38f59f84](https://github.com/openjdk/jdk/commit/38f59f84c98dfd974eec0c05541b2138b149def7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Mohamed Issa on 1 Jul 2025 and was reviewed by Sandhya Viswanathan, Srinivas Vamsi Parasa and Emanuel Peter. > > Thanks! LGTM Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26085#pullrequestreview-2978172863 PR Review: https://git.openjdk.org/jdk/pull/26085#pullrequestreview-2978173371 From aph at openjdk.org Wed Jul 2 08:18:44 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 2 Jul 2025 08:18:44 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v2] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 02:39:33 GMT, Xiaohong Gong wrote: >> ### Background >> On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions. >> >> For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size. >> >> To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors. >> >> ### Impact Analysis >> #### 1. Vector types >> Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change. >> >> #### 2. Vector API >> No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length. >> >> #### 3. Auto-vectorization >> Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks. >> >> #### 4. Codegen of vector nodes >> NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored. >> >> Details: >> - Lanewise vector operations are unaffected as explained above. >> - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE). >> - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, addin... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Refine comments based on review suggestion src/hotspot/cpu/aarch64/aarch64.ad line 2367: > 2365: // Theoretically, the minimal vector length supported by AArch64 > 2366: // ISA and Vector API species is 64-bit. However, 32-bit or 16-bit > 2367: // vector length is also allowed for special Vector API usages. Suggestion: // Usually, the shortest vector length supported by AArch64 // ISA and Vector API species is 64 bits. However, we allow // 32-bit or 16-bit vectors in a few special cases. Reason for change: it wasn't clear what "supported" meant. Supported by the hardware, or by HotSpot. And why do we only support it in a few special cases? This comment raises more questions than it answers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26057#discussion_r2179423549 From thartmann at openjdk.org Wed Jul 2 08:25:45 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Jul 2025 08:25:45 GMT Subject: [jdk25] RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 05:30:39 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [38f59f84](https://github.com/openjdk/jdk/commit/38f59f84c98dfd974eec0c05541b2138b149def7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Mohamed Issa on 1 Jul 2025 and was reviewed by Sandhya Viswanathan, Srinivas Vamsi Parasa and Emanuel Peter. > > Thanks! Thanks for the review Emanuel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26085#issuecomment-3026926520 From thartmann at openjdk.org Wed Jul 2 08:25:46 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Jul 2025 08:25:46 GMT Subject: [jdk25] Integrated: 8358179: Performance regression in Math.cbrt In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 05:30:39 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [38f59f84](https://github.com/openjdk/jdk/commit/38f59f84c98dfd974eec0c05541b2138b149def7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Mohamed Issa on 1 Jul 2025 and was reviewed by Sandhya Viswanathan, Srinivas Vamsi Parasa and Emanuel Peter. > > Thanks! This pull request has now been integrated. Changeset: 0a151c68 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/0a151c68d6529f3a1d3a44fbccc42b67a60b25d9 Stats: 50 lines in 1 file changed: 11 ins; 36 del; 3 mod 8358179: Performance regression in Math.cbrt Reviewed-by: epeter Backport-of: 38f59f84c98dfd974eec0c05541b2138b149def7 ------------- PR: https://git.openjdk.org/jdk/pull/26085 From bkilambi at openjdk.org Wed Jul 2 08:26:00 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 2 Jul 2025 08:26:00 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v10] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23570/files - new: https://git.openjdk.org/jdk/pull/23570/files/80a1f67f..e86d55df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=08-09 Stats: 36 lines in 6 files changed: 13 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23570/head:pull/23570 PR: https://git.openjdk.org/jdk/pull/23570 From bkilambi at openjdk.org Wed Jul 2 08:26:01 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 2 Jul 2025 08:26:01 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v8] In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 15:21:28 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> code style issues fixed > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 4231: > >> 4229: >> 4230: // SVE/SVE2 Programmable table lookup in one or two vector table (zeroing) >> 4231: void sve_tbl(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn, unsigned reg_count, FloatRegister Zm) { > > [Edited] > > This would be better: > > private: > void _sve_tbl(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn, unsigned reg_count, FloatRegister Zm) { > > > ... then 2 patterns ... > > > public: > void sve_tbl(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn1, FloatRegister Zn2, FloatRegister Zm); > void sve_tbl(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn, FloatRegister Zm); > > > ... and make sure that `Zn1+ 1 == Zn2` Done. Please review the latest patch. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2179438846 From epeter at openjdk.org Wed Jul 2 08:26:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Jul 2025 08:26:46 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: <7GrGfBF_v8F0v02sRHC78ofMZwpMdzQZaHeYlNvi_N0=.93defb9e-ca9b-41b4-8722-1746692e2316@github.com> References: <7GrGfBF_v8F0v02sRHC78ofMZwpMdzQZaHeYlNvi_N0=.93defb9e-ca9b-41b4-8722-1746692e2316@github.com> Message-ID: On Wed, 2 Jul 2025 01:52:19 GMT, Xiaohong Gong wrote: >> Agree with Paul, these are minor regressions. Let us proceed with this patch. > >> Agree with Paul, these are minor regressions. Let us proceed with this patch. > > Thanks so much for your review @sviswa7 ! @XiaohongGong I quickly scanned the patch, it looks good to me too. I'm submitting some internal testing now, to make sure our extended testing does not break on integration. Should take about 24h. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3026931008 From shade at openjdk.org Wed Jul 2 08:27:24 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Jul 2025 08:27:24 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v2] In-Reply-To: References: Message-ID: <5znMFGgSuss2iAJ3cUBnmIKrfniGHx5W6CpY3TpNO_8=.0148fb6b-206a-4b57-8886-db80d606b18f@github.com> > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/compiler/compiler_globals.hpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/b16cbabb..dedbcfed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From mhaessig at openjdk.org Wed Jul 2 08:35:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Jul 2025 08:35:51 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v6] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 01:57:46 GMT, Jatin Bhateja wrote: >> Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. >> >> **The following pseudo-code describes the existing algorithm for min/max[FD]:** >> >> Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. >> >> btmp = (b < +0.0) ? a : b >> atmp = (b < +0.0) ? b : a >> Tmp = Max_Float(atmp , btmp) >> Res = (atmp == NaN) ? atmp : Tmp >> >> For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. >> >> btmp = (b < +0.0) ? b : a >> atmp = (b < +0.0) ? a : b >> Tmp = Max_Float(atmp , btmp) >> Res = (atmp == NaN) ? atmp : Tmp >> >> Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. >> >> Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Sandhya's review comments resolution Marked as reviewed by mhaessig (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25914#pullrequestreview-2978248768 From mhaessig at openjdk.org Wed Jul 2 08:38:47 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Jul 2025 08:38:47 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v4] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: <2Er7Cp5ry6llaeyDvSv7Tg0hIOvS9AOzrJM0zfIW1JM=.edce3d10-ad95-4c03-80e0-0e985ba692ab@github.com> On Tue, 1 Jul 2025 06:52:32 GMT, Manuel H?ssig wrote: >> After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. >> >> This PR changes the test to reflect the changes introduced in #25872. >> >> Testing: >> - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) >> - [x] tier1,tier2 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace > > Co-authored-by: Andrey Turbanov Thank you for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26024#issuecomment-3026962659 From mhaessig at openjdk.org Wed Jul 2 08:38:47 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Jul 2025 08:38:47 GMT Subject: Integrated: 8360641: TestCompilerCounts fails after 8354727 In-Reply-To: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Fri, 27 Jun 2025 18:09:23 GMT, Manuel H?ssig wrote: > After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. > > This PR changes the test to reflect the changes introduced in #25872. > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) > - [x] tier1,tier2 plus Oracle internal testing This pull request has now been integrated. Changeset: 2304044a Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/2304044ab2f228fe2fe4adb5975291e733b12d5c Stats: 49 lines in 1 file changed: 34 ins; 1 del; 14 mod 8360641: TestCompilerCounts fails after 8354727 Reviewed-by: kvn, dfenacci, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/26024 From snatarajan at openjdk.org Wed Jul 2 08:40:56 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 2 Jul 2025 08:40:56 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v8] In-Reply-To: <4W6QHi3F3RN-JYfYAKUATR_xCUnOiUR0vT73ndqNZtk=.0e193c07-cad0-4cbd-86f2-1758a8c8bac9@github.com> References: <4W6QHi3F3RN-JYfYAKUATR_xCUnOiUR0vT73ndqNZtk=.0e193c07-cad0-4cbd-86f2-1758a8c8bac9@github.com> Message-ID: On Tue, 1 Jul 2025 16:28:27 GMT, Saranya Natarajan wrote: >> This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. >> >> Changes: >> - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. >> - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` >> - Added a new Ideal phase for individual macro elimination steps. >> - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). >> >> Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . >> ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) >> >> Questions to reviewers: >> - Is the new macro elimination phase OK, or should we change anything? >> - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? >> >> Testing: >> GitHub Actions >> tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > review comments fix part 1 Thanks for the reviews everyone. Please sponsor ------------- PR Comment: https://git.openjdk.org/jdk/pull/25682#issuecomment-3026967089 From snatarajan at openjdk.org Wed Jul 2 08:40:57 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 2 Jul 2025 08:40:57 GMT Subject: Integrated: 8325478: Restructure the macro expansion compiler phase to not include macro elimination In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 22:40:34 GMT, Saranya Natarajan wrote: > This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. > > Changes: > - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. > - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` > - Added a new Ideal phase for individual macro elimination steps. > - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). > > Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . > ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) > > Questions to reviewers: > - Is the new macro elimination phase OK, or should we change anything? > - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? > > Testing: > GitHub Actions > tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) This pull request has now been integrated. Changeset: eac8f5d2 Author: Saranya Natarajan Committer: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/eac8f5d2c99e1bcc526da0f6a05af76e815c2db9 Stats: 77 lines in 11 files changed: 54 ins; 8 del; 15 mod 8325478: Restructure the macro expansion compiler phase to not include macro elimination Reviewed-by: kvn, dlunden ------------- PR: https://git.openjdk.org/jdk/pull/25682 From eastigeevich at openjdk.org Wed Jul 2 08:47:44 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 2 Jul 2025 08:47:44 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 16:05:07 GMT, Evgeny Astigeevich wrote: >> Test compiler/onSpinWait/TestOnSpinWaitAArch64.java needs debug info to identify a position of spin wait instructions in generated code. Release builds might not generate needed debug info. >> >> This PR adds a requirement for the test to be run on debug builds only. >> >> Tested: >> - Fastdebug: test passed >> - Slowdebug: test passed. >> - Release: test skipped. > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Simplify requirement for debug build > OK, are you able to bisect which change? This fix to only do debug VM needs to be correctly linked to the actual cause, IMO. > > It looks like `XX:+PrintAssembly` prints out debug info in release builds but `XX:CompileCommand=print` does not. I am switching back to `XX:+PrintAssembly`. > > That's not great. What info do you need, exactly? # {method} {0x0000ffff50400378} 'test' '()V' in 'compiler/onSpinWait/TestOnSpinWaitAArch64$Launcher' # [sp+0x20] (sp of caller) 0x0000ffff985731c0: ff83 00d1 | fd7b 01a9 | 2803 0018 | 8923 40b9 | 1f01 09eb 0x0000ffff985731d4: ;*synchronization entry ; - compiler.onSpinWait.TestOnSpinWaitAArch64$Launcher::test at -1 (line 224) 0x0000ffff985731d4: 2102 0054 | 1f20 03d5 | 1f20 03d5 | 1f20 03d5 | 1f20 03d5 | 1f20 03d5 | 1f20 03d5 0x0000ffff985731f0: ;*invokestatic onSpinWait {reexecute=0 rethrow=0 return_oop=0} ; - compiler.onSpinWait.TestOnSpinWaitAArch64$Launcher::test at 0 (line 224) 0x0000ffff985731f0: 1f20 03d5 | fd7b 41a9 | ff83 0091 0x0000ffff985731fc: ; {poll_return} 0x0000ffff985731fc: 8817 40f9 | ff63 28eb | 4800 0054 | c003 5fd6 0x0000ffff9857320c: ; {internal_word} 0x0000ffff9857320c: 88ff ff10 | 88a3 02f9 0x0000ffff98573214: ; {runtime_call SafepointBlob} 0x0000ffff98573214: 5bc3 fe17 0x0000ffff98573218: ; {runtime_call Stub::method_entry_barrier} 0x0000ffff98573218: 0850 96d2 | 480a b3f2 | e8ff dff2 | 0001 3fd6 | ecff ff17 The test searches for `- compiler.onSpinWait.TestOnSpinWaitAArch64$Launcher::test at 0` and `invokestatic onSpinWait`. They identify the place where to search instructions. Assembly from all builds always has `{poll_return}`. I can use it as a search point. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26072#issuecomment-3026996074 From mablakatov at openjdk.org Wed Jul 2 08:48:59 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Wed, 2 Jul 2025 08:48:59 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v6] In-Reply-To: References: Message-ID: > Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. > > Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. > > The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms > IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms > LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms > > > Fujitsu A64FX (SVE 512-bit): > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms > IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms > LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: update a copyright notice Co-authored-by: Hao Sun ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23181/files - new: https://git.openjdk.org/jdk/pull/23181/files/df09ab65..ebad6dd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23181/head:pull/23181 PR: https://git.openjdk.org/jdk/pull/23181 From mablakatov at openjdk.org Wed Jul 2 08:48:59 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Wed, 2 Jul 2025 08:48:59 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v5] In-Reply-To: <2zMCHzKXQ1kBfjcU5Fc8s6fa2W6TTCKpSSjhB0dMdLw=.3c43071b-3982-4e0e-a300-e0547f4fbbec@github.com> References: <2zMCHzKXQ1kBfjcU5Fc8s6fa2W6TTCKpSSjhB0dMdLw=.3c43071b-3982-4e0e-a300-e0547f4fbbec@github.com> Message-ID: On Wed, 2 Jul 2025 03:28:10 GMT, Hao Sun wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> fixup: remove undefined insts from aarch64-asmtest.py > > test/hotspot/jtreg/compiler/loopopts/superword/TestVectorFPReduction.java line 2: > >> 1: /* >> 2: * Copyright (c) 2025, Arm Limited. All rights reserved. > > `XX, YY,` means this file was created at XX year and the latest update was made at YY year. If `XX=YY`, then use `XX,`. > > Suggestion: > > * Copyright (c) 2024, 2025, Arm Limited. All rights reserved. Thank you for catching this! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2179486265 From aph at openjdk.org Wed Jul 2 08:52:46 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 2 Jul 2025 08:52:46 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 08:45:23 GMT, Evgeny Astigeevich wrote: > ``` > > ``` > > > > > > The test searches for `- compiler.onSpinWait.TestOnSpinWaitAArch64$Launcher::test at 0` and `invokestatic onSpinWait`. They identify the place where to search instructions. That's not great. C2 is free to move stuff around, so it's not certain this test will keep working. If you just want to make sure that the pattern is used, a block_comment() would be more reliable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26072#issuecomment-3027010064 From xgong at openjdk.org Wed Jul 2 08:59:47 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 2 Jul 2025 08:59:47 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: <7GrGfBF_v8F0v02sRHC78ofMZwpMdzQZaHeYlNvi_N0=.93defb9e-ca9b-41b4-8722-1746692e2316@github.com> References: <7GrGfBF_v8F0v02sRHC78ofMZwpMdzQZaHeYlNvi_N0=.93defb9e-ca9b-41b4-8722-1746692e2316@github.com> Message-ID: On Wed, 2 Jul 2025 01:52:19 GMT, Xiaohong Gong wrote: >> Agree with Paul, these are minor regressions. Let us proceed with this patch. > >> Agree with Paul, these are minor regressions. Let us proceed with this patch. > > Thanks so much for your review @sviswa7 ! > @XiaohongGong I quickly scanned the patch, it looks good to me too. I'm submitting some internal testing now, to make sure our extended testing does not break on integration. Should take about 24h. Good to know that. Thanks so much for your testing @eme64 ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3027032342 From roland at openjdk.org Wed Jul 2 09:00:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Jul 2025 09:00:30 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v6] In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - more - more - more - Merge branch 'master' into JDK-8275202 - more - more - more - Merge branch 'master' into JDK-8275202 - review - Update src/hotspot/share/opto/loopConditionalPropagation.cpp Co-authored-by: Roberto Casta?eda Lozano - ... and 4 more: https://git.openjdk.org/jdk/compare/c220b135...9d093971 ------------- Changes: https://git.openjdk.org/jdk/pull/14586/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14586&range=05 Stats: 4588 lines in 34 files changed: 4483 ins; 40 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/14586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14586/head:pull/14586 PR: https://git.openjdk.org/jdk/pull/14586 From roland at openjdk.org Wed Jul 2 09:02:44 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Jul 2025 09:02:44 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v3] In-Reply-To: References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Mon, 9 Jun 2025 07:35:10 GMT, Roberto Casta?eda Lozano wrote: > I tested this changeset applied on top of jdk-25+26 (Oracle CI tier1-5) and found the following issues (besides the trivial `NULL` occurrence reported above): I pushed new commits that should address those failures. I added a test case for that one (a tricky issue): > * `assert(c->_idx >= _unique || _type_table->find_type_between(c, c, _phase->C->root()) != Type::TOP) failed: for If we don't follow dead projections` in multiple tests, e.g. `compiler/predicates/TestHoistedPredicateForNonRangeCheck.java` and `compiler/predicates/assertion/TestOpaqueInitializedAssertionPredicateNode.java`. New commits also include some tweaks and cleanup. @robcasloz would you mind running tests again? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-3027043159 From xgong at openjdk.org Wed Jul 2 09:02:46 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 2 Jul 2025 09:02:46 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v2] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 08:15:34 GMT, Andrew Haley wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Refine comments based on review suggestion > > src/hotspot/cpu/aarch64/aarch64.ad line 2367: > >> 2365: // Theoretically, the minimal vector length supported by AArch64 >> 2366: // ISA and Vector API species is 64-bit. However, 32-bit or 16-bit >> 2367: // vector length is also allowed for special Vector API usages. > > Suggestion: > > // Usually, the shortest vector length supported by AArch64 > // ISA and Vector API species is 64 bits. However, we allow > // 32-bit or 16-bit vectors in a few special cases. > > > Reason for change: it wasn't clear what "supported" meant. Supported by the hardware, or by HotSpot. And why do we only support it in a few special cases? This comment raises more questions than it answers. Thanks so much for your suggestion! Looks better to me. I will update soon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26057#discussion_r2179517582 From tkurashige at openjdk.org Wed Jul 2 09:08:43 2025 From: tkurashige at openjdk.org (Taizo Kurashige) Date: Wed, 2 Jul 2025 09:08:43 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> References: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> Message-ID: On Mon, 23 Jun 2025 08:56:12 GMT, Taizo Kurashige wrote: >> This PR is improvement of warning message when fail to load hsdis library. >> >> [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) introduced a warning on hsdis library load failure. This is useful when the user executes -XX:+PrintAssembly, etc. >> >> However, I think that when hs_err occurs, users might be confused by this warning printed by Xlog. Because users are not likely to know that hsdis is loaded for the [MachCode] section of the hs_err report, they may wonder, for example, "Why do I get warnings about hsdis load errors when -XX:+PrintAssembly is not specified?." >> >> To clear up this confusion, I suggest printing a warning just before [MachCode]. >> >>
>> >> sample output >> >> If hs_err occurs and hsdis load fails without the option to specify where the hs_err report should be output, the following is output to the hs_err_pir log file: >> >> . >> . >> native method entry point (kind = native) [0x000001ae8753cec0, 0x000001ae8753dac0] 3072 bytes >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> 0x000001ae8753cec0: 488b 4b08 | 0fb7 492e | 584c 8d74 | ccf8 6800 | 0000 0068 | 0000 0000 | 5055 488b | ec41 5548 >> 0x000001ae8753cee0: 8b43 084c | 8d68 3848 | 8b40 0868 | 0000 0000 | 5348 8b50 | 18 >> . >> . >> >> >> If -XX:+PrintAssembly is specified and hsdis load fails, the following is output to the stdout >> >> $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version >> OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output >> >> ============================= C1-compiled nmethod ============================== >> ----------------------------------- Assembly ----------------------------------- >> >> Compiled method (c1) 57 2 3 java.lang.Object:: (1 bytes) >> total in heap [0x0000024a08a00008,0x0000024a08a00208] = 512 >> . >> . >> >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Instructions begin] >> 0x0000024a08a00100: 6666 660f | 1f84 0000 | 0000 0066 | 6666 9066 | 6690 448b | 5208 443b >> . >> . >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Verified Entry Point] >> # {method} {0x00000000251a1898} 'toUnsignedInt' '(B)I' in 'java/lang/Byte >> . >> . >> >> >>
>> >> Since... > > Taizo Kurashige has updated the pull request incrementally with one additional commit since the last revision: > > Fix message and revert lines for Xlog Thank you for your review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25726#issuecomment-3027058301 From duke at openjdk.org Wed Jul 2 09:08:43 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Jul 2025 09:08:43 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> References: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> Message-ID: On Mon, 23 Jun 2025 08:56:12 GMT, Taizo Kurashige wrote: >> This PR is improvement of warning message when fail to load hsdis library. >> >> [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) introduced a warning on hsdis library load failure. This is useful when the user executes -XX:+PrintAssembly, etc. >> >> However, I think that when hs_err occurs, users might be confused by this warning printed by Xlog. Because users are not likely to know that hsdis is loaded for the [MachCode] section of the hs_err report, they may wonder, for example, "Why do I get warnings about hsdis load errors when -XX:+PrintAssembly is not specified?." >> >> To clear up this confusion, I suggest printing a warning just before [MachCode]. >> >>
>> >> sample output >> >> If hs_err occurs and hsdis load fails without the option to specify where the hs_err report should be output, the following is output to the hs_err_pir log file: >> >> . >> . >> native method entry point (kind = native) [0x000001ae8753cec0, 0x000001ae8753dac0] 3072 bytes >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> 0x000001ae8753cec0: 488b 4b08 | 0fb7 492e | 584c 8d74 | ccf8 6800 | 0000 0068 | 0000 0000 | 5055 488b | ec41 5548 >> 0x000001ae8753cee0: 8b43 084c | 8d68 3848 | 8b40 0868 | 0000 0000 | 5348 8b50 | 18 >> . >> . >> >> >> If -XX:+PrintAssembly is specified and hsdis load fails, the following is output to the stdout >> >> $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version >> OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output >> >> ============================= C1-compiled nmethod ============================== >> ----------------------------------- Assembly ----------------------------------- >> >> Compiled method (c1) 57 2 3 java.lang.Object:: (1 bytes) >> total in heap [0x0000024a08a00008,0x0000024a08a00208] = 512 >> . >> . >> >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Instructions begin] >> 0x0000024a08a00100: 6666 660f | 1f84 0000 | 0000 0066 | 6666 9066 | 6690 448b | 5208 443b >> . >> . >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Verified Entry Point] >> # {method} {0x00000000251a1898} 'toUnsignedInt' '(B)I' in 'java/lang/Byte >> . >> . >> >> >>
>> >> Since... > > Taizo Kurashige has updated the pull request incrementally with one additional commit since the last revision: > > Fix message and revert lines for Xlog @kurashige23 Your change (at version 6ff4f9b5a3f6302ae4605ee985755fbccd3e24fb) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25726#issuecomment-3027062091 From tkurashige at openjdk.org Wed Jul 2 09:24:44 2025 From: tkurashige at openjdk.org (Taizo Kurashige) Date: Wed, 2 Jul 2025 09:24:44 GMT Subject: Integrated: 8359120: Improve warning message when fail to load hsdis library In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 13:38:03 GMT, Taizo Kurashige wrote: > This PR is improvement of warning message when fail to load hsdis library. > > [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) introduced a warning on hsdis library load failure. This is useful when the user executes -XX:+PrintAssembly, etc. > > However, I think that when hs_err occurs, users might be confused by this warning printed by Xlog. Because users are not likely to know that hsdis is loaded for the [MachCode] section of the hs_err report, they may wonder, for example, "Why do I get warnings about hsdis load errors when -XX:+PrintAssembly is not specified?." > > To clear up this confusion, I suggest printing a warning just before [MachCode]. > >
> > sample output > > If hs_err occurs and hsdis load fails without the option to specify where the hs_err report should be output, the following is output to the hs_err_pir log file: > > . > . > native method entry point (kind = native) [0x000001ae8753cec0, 0x000001ae8753dac0] 3072 bytes > > Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section > [MachCode] > 0x000001ae8753cec0: 488b 4b08 | 0fb7 492e | 584c 8d74 | ccf8 6800 | 0000 0068 | 0000 0000 | 5055 488b | ec41 5548 > 0x000001ae8753cee0: 8b43 084c | 8d68 3848 | 8b40 0868 | 0000 0000 | 5348 8b50 | 18 > . > . > > > If -XX:+PrintAssembly is specified and hsdis load fails, the following is output to the stdout > > $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version > OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output > > ============================= C1-compiled nmethod ============================== > ----------------------------------- Assembly ----------------------------------- > > Compiled method (c1) 57 2 3 java.lang.Object:: (1 bytes) > total in heap [0x0000024a08a00008,0x0000024a08a00208] = 512 > . > . > > [Constant Pool (empty)] > > > Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section > [MachCode] > [Instructions begin] > 0x0000024a08a00100: 6666 660f | 1f84 0000 | 0000 0066 | 6666 9066 | 6690 448b | 5208 443b > . > . > [Constant Pool (empty)] > > > Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section > [MachCode] > [Verified Entry Point] > # {method} {0x00000000251a1898} 'toUnsignedInt' '(B)I' in 'java/lang/Byte > . > . > > >
> > Since the warning added in this fix cover the role of warning introduced in [JDK-8287001](https://bugs.openjdk.org/browse/JDK-828... This pull request has now been integrated. Changeset: ce998699 Author: Taizo Kurashige Committer: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/ce9986991d60e116ac6680a1b6a4b3ee5384d105 Stats: 9 lines in 2 files changed: 9 ins; 0 del; 0 mod 8359120: Improve warning message when fail to load hsdis library Reviewed-by: mhaessig, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25726 From rrich at openjdk.org Wed Jul 2 09:36:15 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 2 Jul 2025 09:36:15 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining Message-ID: This PR adds CompileCommands to the test DumpThreadsWithEliminatedLock.java to force inlining of java/lang/String*.* methods. This will make inlining more stable to allow for the expected lock elimination based on c2 escape analysis. Forcing inlining of java/lang/StringBuffer.* wasn't sufficient on x86_64. With that the test still failed with TieredCompilation disabled. Testing: x86_64, ppc64 Failed inlining on x86_64 with TieredCompilation disabled: make test TEST=com/sun/management/HotSpotDiagnosticMXBean/DumpThreadsWithEliminatedLock.java TEST_VM_OPTS="-XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=PrintInlining,DumpThreadsWithEliminatedLock.*" JTREG=TIMEOUT_FACTOR=0.1 [...] STDOUT: CompileCommand: PrintInlining DumpThreadsWithEliminatedLock.* bool PrintInlining = true @ 1 java.util.concurrent.atomic.AtomicBoolean::get (13 bytes) inline (hot) @ 11 java.lang.StringBuffer:: (7 bytes) inline (hot) late inline succeeded (string method) @ 3 java.lang.AbstractStringBuilder:: (39 bytes) inline (hot) @ 1 java.lang.Object:: (1 bytes) inline (hot) @ 16 java.lang.System::currentTimeMillis (0 bytes) (intrinsic) s @ 19 java.lang.StringBuffer::append (13 bytes) failed to inline: already compiled into a big method s @ 24 java.lang.StringBuffer::toString (44 bytes) inline (hot) late inline succeeded (string method) s @ 1 java.lang.StringBuffer::length (5 bytes) accessor @ 24 java.lang.String:: (98 bytes) failed to inline: already compiled into a big method @ 30 java.util.concurrent.atomic.AtomicReference::set (6 bytes) accessor 2025-07-02T09:25:53.396634900Z Attempt 1, found: false 2025-07-02T09:25:53.415673072Z Attempt 2, found: false 2025-07-02T09:25:53.418876867Z Attempt 3, found: false [...] ------------- Commit messages: - Force inlining of String*.* methods - Force inlining of StringBuffer methods Changes: https://git.openjdk.org/jdk/pull/26033/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26033&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360599 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26033/head:pull/26033 PR: https://git.openjdk.org/jdk/pull/26033 From rrich at openjdk.org Wed Jul 2 09:36:15 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 2 Jul 2025 09:36:15 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining In-Reply-To: References: Message-ID: On Sun, 29 Jun 2025 15:26:14 GMT, Richard Reingruber wrote: > This PR adds CompileCommands to the test DumpThreadsWithEliminatedLock.java to force inlining of java/lang/String*.* methods. This will make inlining more stable to allow for the expected lock elimination based on c2 escape analysis. > > Forcing inlining of java/lang/StringBuffer.* wasn't sufficient on x86_64. With that the test still failed with TieredCompilation disabled. > > Testing: x86_64, ppc64 > > Failed inlining on x86_64 with TieredCompilation disabled: > > > make test TEST=com/sun/management/HotSpotDiagnosticMXBean/DumpThreadsWithEliminatedLock.java TEST_VM_OPTS="-XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=PrintInlining,DumpThreadsWithEliminatedLock.*" JTREG=TIMEOUT_FACTOR=0.1 > > [...] > > STDOUT: > CompileCommand: PrintInlining DumpThreadsWithEliminatedLock.* bool PrintInlining = true > @ 1 java.util.concurrent.atomic.AtomicBoolean::get (13 bytes) inline (hot) > @ 11 java.lang.StringBuffer:: (7 bytes) inline (hot) late inline succeeded (string method) > @ 3 java.lang.AbstractStringBuilder:: (39 bytes) inline (hot) > @ 1 java.lang.Object:: (1 bytes) inline (hot) > @ 16 java.lang.System::currentTimeMillis (0 bytes) (intrinsic) > s @ 19 java.lang.StringBuffer::append (13 bytes) failed to inline: already compiled into a big method > s @ 24 java.lang.StringBuffer::toString (44 bytes) inline (hot) late inline succeeded (string method) > s @ 1 java.lang.StringBuffer::length (5 bytes) accessor > @ 24 java.lang.String:: (98 bytes) failed to inline: already compiled into a big method > @ 30 java.util.concurrent.atomic.AtomicReference::set (6 bytes) accessor > 2025-07-02T09:25:53.396634900Z Attempt 1, found: false > 2025-07-02T09:25:53.415673072Z Attempt 2, found: false > 2025-07-02T09:25:53.418876867Z Attempt 3, found: false > > [...] Jit compiler folks might want to have a look at this pr. Maybe there's a better for having the StringBuilder locks eliminated deterministically. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26033#issuecomment-3027054192 From mdoerr at openjdk.org Wed Jul 2 09:54:41 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 2 Jul 2025 09:54:41 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining In-Reply-To: References: Message-ID: On Sun, 29 Jun 2025 15:26:14 GMT, Richard Reingruber wrote: > This PR adds CompileCommands to the test DumpThreadsWithEliminatedLock.java to force inlining of java/lang/String*.* methods. This will make inlining more stable to allow for the expected lock elimination based on c2 escape analysis. > > Forcing inlining of java/lang/StringBuffer.* wasn't sufficient on x86_64. With that the test still failed with TieredCompilation disabled. > > Testing: x86_64, ppc64 > > Failed inlining on x86_64 with TieredCompilation disabled: > > > make test TEST=com/sun/management/HotSpotDiagnosticMXBean/DumpThreadsWithEliminatedLock.java TEST_VM_OPTS="-XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=PrintInlining,DumpThreadsWithEliminatedLock.*" JTREG=TIMEOUT_FACTOR=0.1 > > [...] > > STDOUT: > CompileCommand: PrintInlining DumpThreadsWithEliminatedLock.* bool PrintInlining = true > @ 1 java.util.concurrent.atomic.AtomicBoolean::get (13 bytes) inline (hot) > @ 11 java.lang.StringBuffer:: (7 bytes) inline (hot) late inline succeeded (string method) > @ 3 java.lang.AbstractStringBuilder:: (39 bytes) inline (hot) > @ 1 java.lang.Object:: (1 bytes) inline (hot) > @ 16 java.lang.System::currentTimeMillis (0 bytes) (intrinsic) > s @ 19 java.lang.StringBuffer::append (13 bytes) failed to inline: already compiled into a big method > s @ 24 java.lang.StringBuffer::toString (44 bytes) inline (hot) late inline succeeded (string method) > s @ 1 java.lang.StringBuffer::length (5 bytes) accessor > @ 24 java.lang.String:: (98 bytes) failed to inline: already compiled into a big method > @ 30 java.util.concurrent.atomic.AtomicReference::set (6 bytes) accessor > 2025-07-02T09:25:53.396634900Z Attempt 1, found: false > 2025-07-02T09:25:53.415673072Z Attempt 2, found: false > 2025-07-02T09:25:53.418876867Z Attempt 3, found: false > > [...] LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26033#pullrequestreview-2978510981 From bmaillard at openjdk.org Wed Jul 2 10:03:23 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 2 Jul 2025 10:03:23 GMT Subject: RFR: 8361144: Strenghten the Ideal Verification in PhaseIterGVN::verify_Ideal_for by comparing the hash of a node before and after Ideal [v2] In-Reply-To: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> References: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> Message-ID: <3wDbLni8c6Up8_W56fFOv_meffgHHjzch0e3QESao1A=.03a7c7a7-787d-4fb0-b081-64865636bf14@github.com> > This PR adds a node hash comparison after calling `Ideal` in `PhaseIterGVN::verify_Ideal_for` to introduce an additional layer of verification for missed optimizations. Previously, we relied on the return value of `Ideal`, which is expected to be `nullptr` if no transformation was done. > > By also checking the node's hash before and after `Ideal`, we could catch inconsistencies in the implementation or unintended modifications to the graph. Both of these may indicate missed or incomplete optimizations. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8361144) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: 8361144: update comment Co-authored-by: Damon Fenacci ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26064/files - new: https://git.openjdk.org/jdk/pull/26064/files/e06b4d53..28851936 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26064&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26064&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26064/head:pull/26064 PR: https://git.openjdk.org/jdk/pull/26064 From bmaillard at openjdk.org Wed Jul 2 10:19:59 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 2 Jul 2025 10:19:59 GMT Subject: RFR: 8361144: Strenghten the Ideal Verification in PhaseIterGVN::verify_Ideal_for by comparing the hash of a node before and after Ideal [v3] In-Reply-To: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> References: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> Message-ID: > This PR adds a node hash comparison after calling `Ideal` in `PhaseIterGVN::verify_Ideal_for` to introduce an additional layer of verification for missed optimizations. Previously, we relied on the return value of `Ideal`, which is expected to be `nullptr` if no transformation was done. > > By also checking the node's hash before and after `Ideal`, we could catch inconsistencies in the implementation or unintended modifications to the graph. Both of these may indicate missed or incomplete optimizations. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8361144) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: 8361144: add comment for consistency with node count ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26064/files - new: https://git.openjdk.org/jdk/pull/26064/files/28851936..75f81296 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26064&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26064&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26064/head:pull/26064 PR: https://git.openjdk.org/jdk/pull/26064 From bmaillard at openjdk.org Wed Jul 2 10:20:00 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 2 Jul 2025 10:20:00 GMT Subject: RFR: 8361144: Strenghten the Ideal Verification in PhaseIterGVN::verify_Ideal_for by comparing the hash of a node before and after Ideal [v3] In-Reply-To: References: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> Message-ID: On Wed, 2 Jul 2025 06:55:34 GMT, Damon Fenacci wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> 8361144: add comment for consistency with node count > > src/hotspot/share/opto/phaseX.cpp line 1821: > >> 1819: // The number of nodes shoud not increase. >> 1820: uint old_unique = C->unique(); >> 1821: uint old_hash = n->hash(); > > Just to be consistent with `old_unique` we could add a small comment (here or below for both). What do you think? Sounds reasonable! Made the update ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26064#discussion_r2179682863 From shade at openjdk.org Wed Jul 2 10:20:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Jul 2025 10:20:48 GMT Subject: RFR: 8361255: CTW: Tolerate more NCDFE problems Message-ID: We routinely CTW 3rd party JARs to make sure our compilers work. By the nature of the JARs, they have dependencies on other JARs, and CTW runner frequently warns out with NCDFE. It does so very crudely, missing opportunities to compile the methods that _do not_ trigger NCDFEs. CTW should be made more tolerant to this. I think the normal "modules" CTW runs into the similar problem, but on a lesser scale, as we do not have a very hairy dependency graph within JDK. The CTW logs are also fairly noisy with full exception traces when NCDFE is semi-expected. This PR does _not_ print exception stack traces in these cases, only "NOTE"-s about it. This makes the log fairly clean and more understandable. Motivational scope improvement compiling a sample 3rd party JAR (cassandra-2.1.4.0.jar): Before: Done (2487 classes, 9866 methods, 24584 ms) After: Done (2487 classes, 10074 methods, 24150 ms) ; +2% more methods Additional testing: - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` ------------- Commit messages: - Move clinit compile back - Initial - Fix Changes: https://git.openjdk.org/jdk/pull/26090/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26090&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361255 Stats: 41 lines in 2 files changed: 35 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26090.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26090/head:pull/26090 PR: https://git.openjdk.org/jdk/pull/26090 From bmaillard at openjdk.org Wed Jul 2 10:24:38 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 2 Jul 2025 10:24:38 GMT Subject: RFR: 8361144: Strenghten the Ideal Verification in PhaseIterGVN::verify_Ideal_for by comparing the hash of a node before and after Ideal [v3] In-Reply-To: References: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> Message-ID: On Tue, 1 Jul 2025 13:45:04 GMT, Galder Zamarre?o wrote: > Have you considered adding a test for this? Is that feasible? @galderz I have considered doing it, but there is no known case that triggers the condition. This change was suggested by @eme64 when discussing the related [JDK-8359602](https://bugs.openjdk.org/browse/JDK-8359602). ------------- PR Comment: https://git.openjdk.org/jdk/pull/26064#issuecomment-3027305541 From galder at openjdk.org Wed Jul 2 10:56:40 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 2 Jul 2025 10:56:40 GMT Subject: RFR: 8361144: Strenghten the Ideal Verification in PhaseIterGVN::verify_Ideal_for by comparing the hash of a node before and after Ideal [v3] In-Reply-To: References: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> Message-ID: On Wed, 2 Jul 2025 10:19:59 GMT, Beno?t Maillard wrote: >> This PR adds a node hash comparison after calling `Ideal` in `PhaseIterGVN::verify_Ideal_for` to introduce an additional layer of verification for missed optimizations. Previously, we relied on the return value of `Ideal`, which is expected to be `nullptr` if no transformation was done. >> >> By also checking the node's hash before and after `Ideal`, we could catch inconsistencies in the implementation or unintended modifications to the graph. Both of these may indicate missed or incomplete optimizations. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8361144) >> - [x] tier1-3, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8361144: add comment for consistency with node count Marked as reviewed by galder (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/26064#pullrequestreview-2978685354 From mdoerr at openjdk.org Wed Jul 2 11:01:49 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 2 Jul 2025 11:01:49 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 Message-ID: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> This is a backout of [JDK-8258229](https://bugs.openjdk.org/browse/JDK-8258229) for JDK25 only. The problematic code has already been removed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. The backout is clean for the C++ code, but the test backout includes the backout of the follow-up change [JDK-8356310](https://bugs.openjdk.org/browse/JDK-8356310). Rationale: Minimize risk for JDK25. We should use the better fix JDK-8358821 in the long term. However, that one should get some more stabilization time before backporting it. Also see JBS issue. Proposed long term solution: Backport JDK-8358821 to jdk25u and revert this change again after an appropriate time. Short term: The issue solved by JDK-8258229 is not critical. It should be ok to postpone the fix to jdk25u. ------------- Commit messages: - 8361259: JDK25: Backout JDK-8258229 Changes: https://git.openjdk.org/jdk/pull/26091/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26091&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361259 Stats: 93 lines in 2 files changed: 0 ins; 93 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26091.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26091/head:pull/26091 PR: https://git.openjdk.org/jdk/pull/26091 From yzheng at openjdk.org Wed Jul 2 11:28:47 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 2 Jul 2025 11:28:47 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt [v2] In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Fri, 27 Jun 2025 01:43:16 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. >> >> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. >> 2. If these special values are found, return immediately with minimal modifications to the result register. >> 3. Performance testing shows the modified intrinsic improves throughput by 65.1% over the original intrinsic on average for the special values while throughput drops by 5.5% for the normal value range (-INF, -2^(-1022)], [2^(-1022), INF). >> >> The commands to run all relevant micro-benchmarks are posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. >> >> | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | >> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | >> | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | >> | [0] | 344990 | 627561 | +81.91 | >> | [-0] ... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Ensure ABS_MASK is a 128-bit memory sized location and only use equal enum for UCOMISD checks src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 350: > 348: > 349: __ bind(L_2TAG_PACKET_6_0_1); > 350: __ movsd(xmm0, ExternalAddress(NEG_INF), r11 /*rscratch*/); note that `NEG_INF` is now unused ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2179808403 From mhaessig at openjdk.org Wed Jul 2 11:39:43 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Jul 2025 11:39:43 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> Message-ID: <3IMUQfwLLDneX5SFYKzLTLk_queN_r2Q7VPC7B31vow=.d6f7600f-acf3-482f-88da-5e260cb16aa1@github.com> On Wed, 2 Jul 2025 10:54:13 GMT, Martin Doerr wrote: > This is a backout of [JDK-8258229](https://bugs.openjdk.org/browse/JDK-8258229) for JDK25 only. The problematic code has already been removed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. > > The backout is clean for the C++ code, but the test backout includes the backout of the follow-up change [JDK-8356310](https://bugs.openjdk.org/browse/JDK-8356310). > > Rationale: Minimize risk for JDK25. We should use the better fix JDK-8358821 in the long term. However, that one should get some more stabilization time before backporting it. Also see JBS issue. > > Proposed long term solution: Backport JDK-8358821 to jdk25u and revert this change again after an appropriate time. > > Short term: The issue solved by JDK-8258229 is not critical. It should be ok to postpone the fix to jdk25u. The change and the proposed plans look good to me. Apologies for all the troubles I have caused. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26091#pullrequestreview-2978804263 From shade at openjdk.org Wed Jul 2 12:02:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Jul 2025 12:02:07 GMT Subject: RFR: 8361255: CTW: Tolerate more NCDFE problems [v2] In-Reply-To: References: Message-ID: > We routinely CTW 3rd party JARs to make sure our compilers work. By the nature of the JARs, they have dependencies on other JARs, and CTW runner frequently warns out with NCDFE. It does so very crudely, missing opportunities to compile the methods that _do not_ trigger NCDFEs. CTW should be made more tolerant to this. I think the normal "modules" CTW runs into the similar problem, but on a lesser scale, as we do not have a very hairy dependency graph within JDK. > > The CTW logs are also fairly noisy with full exception traces when NCDFE is semi-expected. This PR does _not_ print exception stack traces in these cases, only "NOTE"-s about it. This makes the log fairly clean and more understandable. > > Motivational scope improvement compiling a sample 3rd party JAR (cassandra-2.1.4.0.jar): > > > Before: Done (2487 classes, 9866 methods, 24584 ms) > After: Done (2487 classes, 10074 methods, 24150 ms) ; +2% more methods > > > Additional testing: > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8361255-ctw-ncdfe - Move clinit compile back - Initial - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26090/files - new: https://git.openjdk.org/jdk/pull/26090/files/ba0cc87b..9d41f80a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26090&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26090&range=00-01 Stats: 1189 lines in 72 files changed: 623 ins; 239 del; 327 mod Patch: https://git.openjdk.org/jdk/pull/26090.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26090/head:pull/26090 PR: https://git.openjdk.org/jdk/pull/26090 From thartmann at openjdk.org Wed Jul 2 12:00:39 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Jul 2025 12:00:39 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> Message-ID: On Wed, 2 Jul 2025 10:54:13 GMT, Martin Doerr wrote: > This is a backout of [JDK-8258229](https://bugs.openjdk.org/browse/JDK-8258229) for JDK25 only. The problematic code has already been removed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. > > The backout is clean for the C++ code, but the test backout includes the backout of the follow-up change [JDK-8356310](https://bugs.openjdk.org/browse/JDK-8356310). > > Rationale: Minimize risk for JDK25. We should use the better fix JDK-8358821 in the long term. However, that one should get some more stabilization time before backporting it. Also see JBS issue. > > Proposed long term solution: Backport JDK-8358821 to jdk25u and revert this change again after an appropriate time. > > Short term: The issue solved by JDK-8258229 is not critical. It should be ok to postpone the fix to jdk25u. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26091#pullrequestreview-2978863065 From shade at openjdk.org Wed Jul 2 12:10:40 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Jul 2025 12:10:40 GMT Subject: RFR: 8361255: CTW: Tolerate more NCDFE problems [v2] In-Reply-To: References: Message-ID: <_U8Ws402jgYrpmU1GxnfiHkfein2Rsl1Rh4RKJFwvRQ=.5b74a2c1-5d67-4221-bce8-d00adeb63207@github.com> On Wed, 2 Jul 2025 12:02:07 GMT, Aleksey Shipilev wrote: >> We routinely CTW 3rd party JARs to make sure our compilers work. By the nature of the JARs, they have dependencies on other JARs, and CTW runner frequently warns out with NCDFE. It does so very crudely, missing opportunities to compile the methods that _do not_ trigger NCDFEs. CTW should be made more tolerant to this. I think the normal "modules" CTW runs into the similar problem, but on a lesser scale, as we do not have a very hairy dependency graph within JDK. >> >> The CTW logs are also fairly noisy with full exception traces when NCDFE is semi-expected. This PR does _not_ print exception stack traces in these cases, only "NOTE"-s about it. This makes the log fairly clean and more understandable. >> >> Motivational scope improvement compiling a sample 3rd party JAR (cassandra-2.1.4.0.jar): >> >> >> Before: Done (2487 classes, 9866 methods, 24584 ms) >> After: Done (2487 classes, 10074 methods, 24150 ms) ; +2% more methods >> >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8361255-ctw-ncdfe > - Move clinit compile back > - Initial > - Fix Sanity-checking CTW times: $ time CONF=linux-x86_64-server-fastdebug make test TEST=applications/ctw/modules/ # Base real 3m49.952s user 67m50.313s sys 5m24.288s # This PR real 3m53.800s user 67m26.925s sys 5m22.429s ------------- PR Comment: https://git.openjdk.org/jdk/pull/26090#issuecomment-3027631058 From mdoerr at openjdk.org Wed Jul 2 13:03:38 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 2 Jul 2025 13:03:38 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <3IMUQfwLLDneX5SFYKzLTLk_queN_r2Q7VPC7B31vow=.d6f7600f-acf3-482f-88da-5e260cb16aa1@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> <3IMUQfwLLDneX5SFYKzLTLk_queN_r2Q7VPC7B31vow=.d6f7600f-acf3-482f-88da-5e260cb16aa1@github.com> Message-ID: On Wed, 2 Jul 2025 11:36:38 GMT, Manuel H?ssig wrote: > Apologies for all the troubles I have caused. Never mind. The related code is quite tricky. And your problem analysis was good. Thanks for the 2 reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26091#issuecomment-3027802039 From asmehra at openjdk.org Wed Jul 2 13:27:45 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 2 Jul 2025 13:27:45 GMT Subject: RFR: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly In-Reply-To: References: Message-ID: <-pMpqoug81IwYPE7M1In40E0z5SHdeRM0Dianb9yzsM=.ad435e03-ba15-45ab-89c3-e5331b709735@github.com> On Tue, 1 Jul 2025 15:50:29 GMT, Vladimir Kozlov wrote: >> Please reivew this patch to fix initialization and freeing of `AOTCodeAddressTable::_stubs_addr`. Changes are trivial > > Yes, it is trivial. @vnkozlov @shipilev thanks for the review. Integrating it now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26053#issuecomment-3027875790 From asmehra at openjdk.org Wed Jul 2 13:27:46 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 2 Jul 2025 13:27:46 GMT Subject: Integrated: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 19:45:49 GMT, Ashutosh Mehra wrote: > Please reivew this patch to fix initialization and freeing of `AOTCodeAddressTable::_stubs_addr`. Changes are trivial This pull request has now been integrated. Changeset: 3066a67e Author: Ashutosh Mehra URL: https://git.openjdk.org/jdk/commit/3066a67e6279f7e3896ab545bc6c291d279d2b03 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/26053 From missa at openjdk.org Wed Jul 2 14:58:50 2025 From: missa at openjdk.org (Mohamed Issa) Date: Wed, 2 Jul 2025 14:58:50 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt [v2] In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Wed, 2 Jul 2025 11:25:33 GMT, Yudi Zheng wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Ensure ABS_MASK is a 128-bit memory sized location and only use equal enum for UCOMISD checks > > src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 350: > >> 348: >> 349: __ bind(L_2TAG_PACKET_6_0_1); >> 350: __ movsd(xmm0, ExternalAddress(NEG_INF), r11 /*rscratch*/); > > note that `NEG_INF` is now unused Got it - thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2180280115 From asmehra at openjdk.org Wed Jul 2 15:06:17 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 2 Jul 2025 15:06:17 GMT Subject: [jdk25] RFR: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly Message-ID: Backporting the fix to jdk25 ------------- Commit messages: - Backport 3066a67e6279f7e3896ab545bc6c291d279d2b03 Changes: https://git.openjdk.org/jdk/pull/26095/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26095&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361101 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26095/head:pull/26095 PR: https://git.openjdk.org/jdk/pull/26095 From shade at openjdk.org Wed Jul 2 16:00:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Jul 2025 16:00:44 GMT Subject: [jdk25] RFR: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 14:56:23 GMT, Ashutosh Mehra wrote: > Backporting the fix to jdk25 Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26095#pullrequestreview-2979750627 From lmesnik at openjdk.org Wed Jul 2 16:29:39 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 2 Jul 2025 16:29:39 GMT Subject: RFR: 8357739: [jittester] disable the hashCode method In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 19:49:34 GMT, Evgeny Nikitin wrote: > JITTester often uses the `hasCode` method (in fact, in almost every generated test). Given that the method can be unstable between runs or in interpreted vs compiled runs, it can create false-positives. > > This PR fixes the issue by adding support for method templates similar to the ones used in CompilerCommands). All of those exclude templates match (and exclude) `String.indexOf(String)`, for example: > > java/lang/::*(Ljava/lang/String;I) > *String::indexOf(*) > java/lang/*::indexOf > > > Additionally, the PR adds support for comments (starting from '#') and empty lines in the excludes file. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25859#pullrequestreview-2979848692 From vpaprotski at openjdk.org Wed Jul 2 17:30:51 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 2 Jul 2025 17:30:51 GMT Subject: RFR: 8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 06:45:58 GMT, Jatin Bhateja wrote: > For a cleaner interface, I think we can also maintain a RAII style APXPushPopPairTracker ... Using the suggested code as a base, Vamsi and I tinkered with the idea some more! Here is what we came up with. This also tracks the correct order of registers being pushed/poped.. (haven't compiled it, so might have some syntax bugs). @dholmes-ora would you mind sharing your opinion? We seem to be making things more complicated, but hopefully in a good way? Also included a sample usage in a stub. #define __ _masm-> class PushPopTracker { private: int _counter; MacroAssembler *_masm; const int REGS = 32; // Increase as needed int regs[REGS]; public: PushPopTracker(MacroAssembler *_masm) : _counter(0), _masm(_masm) {} ~PushPopTracker() { assert(_counter == 0, "Push/pop pair mismatch"); } void push(Register reg) { assert(_counter0, "Push/pop underflow"); assert(regs[_counter] == reg.encoding(), "Push/pop pair mismatch: %d != %d", regs[_counter], reg.encoding()); _counter--; if (VM_Version::supports_apx_f()) { __ popp(reg); } else { __ pop(reg); } } } address StubGenerator::generate_intpoly_montgomeryMult_P256() { __ align(CodeEntryAlignment); /*...*/ address start = __ pc(); __ enter(); PushPopTracker s(_masm); s.push(r12); //1 s.push(r13); //2 s.push(r14); //3 #ifdef _WIN64 s.push(rsi); //4 s.push(rdi); //5 #endif s.push(rbp); //6 __ movq(rbp, rsp); __ andq(rsp, -32); __ subptr(rsp, 32); // Register Map const Register aLimbs = c_rarg0; // c_rarg0: rdi | rcx const Register bLimbs = rsi; // c_rarg1: rsi | rdx const Register rLimbs = r8; // c_rarg2: rdx | r8 const Register tmp1 = r9; const Register tmp2 = r10; /*...*/ __ movq(rsp, rbp); s.pop(rbp); //5 #ifdef _WIN64 s.pop(rdi); //4 s.pop(rsi); //3 #endif s.pop(r14); //2 s.pop(r13); //1 s.pop(r12); //0 __ leave(); __ ret(0); return start; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25889#discussion_r2180606586 From jbhateja at openjdk.org Wed Jul 2 17:47:41 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Jul 2025 17:47:41 GMT Subject: RFR: 8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 17:27:41 GMT, Volodymyr Paprotski wrote: >> For a cleaner interface, I think we can also maintain a RAII style APXPushPopPairTracker in the stub snippets using push/pop instruction sequence and wrap the actual assembler call underneath. The idea here is to catch the balancing error upfront as PPX is purely a performance hint. Instructions with this hint have the same functional semantics as those without. PPX hints set by the compiler that violate the balancing rule may turn off the PPX >> optimization, but they will not affect program semantics.. >> >> >> class APXPushPopPairTracker { >> private: >> int _counter; >> >> public: >> APXPushPopPairTracker() _counter(0) { >> } >> >> ~APXPushPopPairTracker() { >> assert(_counter == 0, "Push/pop pair mismatch"); >> } >> >> void push(Register reg, bool has_matching_pop) { >> if (has_matching_pop && VM_Version::supports_apx_f()) { >> Assembler::pushp(reg); >> incrementCounter(); >> } else { >> Assembler::push(reg); >> } >> } >> void pop(Register reg, bool has_matching_push) { >> if (has_matching_push && VM_Version::supports_apx_f()) { >> Assembler::popp(reg); >> decrementCounter(); >> } else { >> Assembler::pop(reg); >> } >> } >> void incrementCounter() { >> _counter++; >> } >> void decrementCounter() { >> _counter--; >> } >> } > >> For a cleaner interface, I think we can also maintain a RAII style APXPushPopPairTracker ... > > Using the suggested code as a base, Vamsi and I tinkered with the idea some more! Here is what we came up with. This also tracks the correct order of registers being pushed/poped.. (haven't compiled it, so might have some syntax bugs). > > @dholmes-ora would you mind sharing your opinion? We seem to be making things more complicated, but hopefully in a good way? > > Also included a sample usage in a stub. > > > #define __ _masm-> > > class PushPopTracker { > private: > int _counter; > MacroAssembler *_masm; > const int REGS = 32; // Increase as needed > int regs[REGS]; > public: > PushPopTracker(MacroAssembler *_masm) : _counter(0), _masm(_masm) {} > ~PushPopTracker() { > assert(_counter == 0, "Push/pop pair mismatch"); > } > > void push(Register reg) { > assert(_counter regs[_counter++] = reg.encoding(); > if (VM_Version::supports_apx_f()) { > __ pushp(reg); > } else { > __ push(reg); > } > } > void pop(Register reg) { > assert(_counter>0, "Push/pop underflow"); > assert(regs[_counter] == reg.encoding(), "Push/pop pair mismatch: %d != %d", regs[_counter], reg.encoding()); > _counter--; > if (VM_Version::supports_apx_f()) { > __ popp(reg); > } else { > __ pop(reg); > } > } > } > > address StubGenerator::generate_intpoly_montgomeryMult_P256() { > __ align(CodeEntryAlignment); > /*...*/ > address start = __ pc(); > __ enter(); > PushPopTracker s(_masm); > s.push(r12); //1 > s.push(r13); //2 > s.push(r14); //3 > #ifdef _WIN64 > s.push(rsi); //4 > s.push(rdi); //5 > #endif > s.push(rbp); //6 > __ movq(rbp, rsp); > __ andq(rsp, -32); > __ subptr(rsp, 32); > // Register Map > const Register aLimbs = c_rarg0; // c_rarg0: rdi | rcx > const Register bLimbs = rsi; // c_rarg1: rsi | rdx > const Register rLimbs = r8; // c_rarg2: rdx | r8 > const Register tmp1 = r9; > const Register tmp2 = r10; > /*...*/ > __ movq(rsp, rbp); > s.pop(rbp); //5 > #ifdef _WIN64 > s.pop(rdi); //4 > s.pop(rsi); //3 > #endif > s.pop(r14); //2 > s.pop(r13); //1 > s.pop(r12); //0 > __ leave(); > __ ret(0); > return start; > } @vamsi-parasa, It's better to make this as a subclass of MacroAssembler in src/hotspot/cpu/x86/macroAssembler_x86.hpp and pass Tracker as an argument to push / pop for a cleaner interface. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25889#discussion_r2180636365 From sviswanathan at openjdk.org Wed Jul 2 17:49:49 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 2 Jul 2025 17:49:49 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v6] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 01:57:46 GMT, Jatin Bhateja wrote: >> Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. >> >> **The following pseudo-code describes the existing algorithm for min/max[FD]:** >> >> Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. >> >> btmp = (b < +0.0) ? a : b >> atmp = (b < +0.0) ? b : a >> Tmp = Max_Float(atmp , btmp) >> Res = (atmp == NaN) ? atmp : Tmp >> >> For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. >> >> btmp = (b < +0.0) ? b : a >> atmp = (b < +0.0) ? a : b >> Tmp = Max_Float(atmp , btmp) >> Res = (atmp == NaN) ? atmp : Tmp >> >> Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. >> >> Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Sandhya's review comments resolution Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25914#pullrequestreview-2980096085 From jbhateja at openjdk.org Wed Jul 2 17:49:49 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Jul 2025 17:49:49 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v6] In-Reply-To: References: Message-ID: <69bq-sgmNdZBGkcLyGo1dccJoCcC04FacUZW4CPHqkE=.ab942681-27ab-4ed6-b425-66b8487b9ab8@github.com> On Wed, 2 Jul 2025 17:45:29 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Sandhya's review comments resolution > > Looks good to me. Thanks @sviswa7 and @mhaessig for approvals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25914#issuecomment-3028779791 From jbhateja at openjdk.org Wed Jul 2 17:49:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Jul 2025 17:49:50 GMT Subject: Integrated: 8360116: Add support for AVX10 floating point minmax instruction In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 11:08:54 GMT, Jatin Bhateja wrote: > Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. > > **The following pseudo-code describes the existing algorithm for min/max[FD]:** > > Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. > > btmp = (b < +0.0) ? a : b > atmp = (b < +0.0) ? b : a > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. > > btmp = (b < +0.0) ? b : a > atmp = (b < +0.0) ? a : b > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. > > Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 This pull request has now been integrated. Changeset: 5e30bf68 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/5e30bf68353d989aadc2d8176181226b2debd283 Stats: 465 lines in 7 files changed: 423 ins; 4 del; 38 mod 8360116: Add support for AVX10 floating point minmax instruction Reviewed-by: mhaessig, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/25914 From asmehra at openjdk.org Wed Jul 2 17:52:48 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 2 Jul 2025 17:52:48 GMT Subject: [jdk25] Integrated: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly In-Reply-To: References: Message-ID: <5B6SRjnVrt014BK6iJT8kEIv_qoyJ74xh0bE5VCVoOg=.8fabdd60-a675-4741-b741-08b6c5f44b99@github.com> On Wed, 2 Jul 2025 14:56:23 GMT, Ashutosh Mehra wrote: > Backporting the fix to jdk25 This pull request has now been integrated. Changeset: ab013962 Author: Ashutosh Mehra URL: https://git.openjdk.org/jdk/commit/ab013962093a427ae0f2acac82748d0c9f86ab3f Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly Reviewed-by: shade Backport-of: 3066a67e6279f7e3896ab545bc6c291d279d2b03 ------------- PR: https://git.openjdk.org/jdk/pull/26095 From asmehra at openjdk.org Wed Jul 2 18:01:44 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 2 Jul 2025 18:01:44 GMT Subject: [jdk25] RFR: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly In-Reply-To: References: Message-ID: <_2YlrJyouZjttbLFcWchpFVh-fRdt6p6crYJEND1kH8=.b14e5332-225c-49b0-b50a-22a9163cdd73@github.com> On Wed, 2 Jul 2025 15:57:43 GMT, Aleksey Shipilev wrote: >> Backporting the fix to jdk25 > > Marked as reviewed by shade (Reviewer). Thanks @shipilev for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26095#issuecomment-3028825791 From vpaprotski at openjdk.org Wed Jul 2 18:35:40 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 2 Jul 2025 18:35:40 GMT Subject: RFR: 8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 17:44:34 GMT, Jatin Bhateja wrote: > @vamsi-parasa, It's better to make this as a subclass of MacroAssembler in src/hotspot/cpu/x86/macroAssembler_x86.hpp and pass Tracker as an argument to push / pop for a cleaner interface. I don't think its possible? Unless I am missing something.. - Subclass has an instance of the base class (i.e. the memory allocation of `PushPopTracker` would have the `MacroAssembler` base class with extra fields appended); and `MacroAssembler` has already been allocated (i.e. you can't tack on more fields onto the end of the underlying memory of existing `MacroAssembler`..) - If its a subclass, there is no reason to pass it as a parameter, because it already would have the parent's instance? Also, the extra parameter to push/pop (flag) was what I had originally objected to? (i.e. would like for push/pop to still just take one register as a parameter..) - This class is sort of a stripped-down implementation of reference counting; we want the stack-allocated variable (i.e. explicit constructor call) and the implicit destructor calls (i.e. inserted by g++ on all function exits). That is, we must have a stack allocated variable for it to be deallocated (and destructor called for assert check) Here is an attempt to make it a subclass? And sample usage... class PushPopTracker : public MacroAssembler { private: int _counter; const int REGS = 32; // Increase as needed int regs[REGS]; public: // MacroAssembler(CodeBuffer* code) is the only constructor? PushPopTracker() : _counter(0), MacroAssembler(???) {} //FIXME??? ~PushPopTracker() { assert(_counter == 0, "Push/pop pair mismatch"); } void push(Register reg) { assert(_counter References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> Message-ID: On Wed, 2 Jul 2025 10:54:13 GMT, Martin Doerr wrote: > This is a backout of [JDK-8258229](https://bugs.openjdk.org/browse/JDK-8258229) for JDK25 only. The problematic code has already been removed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. > > The backout is clean for the C++ code, but the test backout includes the backout of the follow-up change [JDK-8356310](https://bugs.openjdk.org/browse/JDK-8356310). > > Rationale: Minimize risk for JDK25. We should use the better fix JDK-8358821 in the long term. However, that one should get some more stabilization time before backporting it. Also see JBS issue. > > Proposed long term solution: Backport JDK-8358821 to jdk25u and revert this change again after an appropriate time. > > Short term: The issue solved by JDK-8258229 is not critical. It should be ok to postpone the fix to jdk25u. LGTM. I'm running Oracle testing now. I'm not sure how to handle JDK-8357017 now in JBS. Close it as a duplicate of the backout? According to https://openjdk.org/guide, it sounds like it might have been more correct to use JDK-8357017 for the backout, and make it a subtask of JDK-8258229. @TobiHartmann @JesperIRL , what do you think? ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26091#pullrequestreview-2980343622 From mdoerr at openjdk.org Wed Jul 2 19:29:39 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 2 Jul 2025 19:29:39 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> Message-ID: <0t_ct-w4lOpvbe4c8DJD9jgU-VRgbMRSVG_ibd8lpkU=.4ebc56ab-9d1c-4531-98fc-4bca442434b9@github.com> On Wed, 2 Jul 2025 10:54:13 GMT, Martin Doerr wrote: > This is a backout of [JDK-8258229](https://bugs.openjdk.org/browse/JDK-8258229) for JDK25 only. The problematic code has already been removed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. > > The backout is clean for the C++ code, but the test backout includes the backout of the follow-up change [JDK-8356310](https://bugs.openjdk.org/browse/JDK-8356310). > > Rationale: Minimize risk for JDK25. We should use the better fix JDK-8358821 in the long term. However, that one should get some more stabilization time before backporting it. Also see JBS issue. > > Proposed long term solution: Backport JDK-8358821 to jdk25u and revert this change again after an appropriate time. > > Short term: The issue solved by JDK-8258229 is not critical. It should be ok to postpone the fix to jdk25u. Thanks for the review! [JDK-8357017](https://bugs.openjdk.org/browse/JDK-8357017) will be fixed by [JDK-8361259](https://bugs.openjdk.org/browse/JDK-8361259) in JDK25 and it is fixed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. So, my plan is to close JDK-8357017 as fixed referring to the other 2 issues. Does that make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26091#issuecomment-3029079372 From dlong at openjdk.org Wed Jul 2 20:13:41 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 2 Jul 2025 20:13:41 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> Message-ID: <9UlIKIQGn3vum3P71THXlyJwJ1efJmJNlImCpYErex8=.e794eff7-516d-4c5f-8e02-f15e5b34cba6@github.com> On Wed, 2 Jul 2025 10:54:13 GMT, Martin Doerr wrote: > This is a backout of [JDK-8258229](https://bugs.openjdk.org/browse/JDK-8258229) for JDK25 only. The problematic code has already been removed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. > > The backout is clean for the C++ code, but the test backout includes the backout of the follow-up change [JDK-8356310](https://bugs.openjdk.org/browse/JDK-8356310). > > Rationale: Minimize risk for JDK25. We should use the better fix JDK-8358821 in the long term. However, that one should get some more stabilization time before backporting it. Also see JBS issue. > > Proposed long term solution: Backport JDK-8358821 to jdk25u and revert this change again after an appropriate time. > > Short term: The issue solved by JDK-8258229 is not critical. It should be ok to postpone the fix to jdk25u. Makes sense, but according to the Developers' Guide, we can't do that because "A Bug or Enhancement with resolution Fixed is required to have a corresponding changeset in one of the OpenJDK repositories." ------------- PR Comment: https://git.openjdk.org/jdk/pull/26091#issuecomment-3029184612 From duke at openjdk.org Wed Jul 2 20:50:51 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 2 Jul 2025 20:50:51 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: <73AnlXOv0T8K25DgsNdH1PkBjcBXz0f3bBYZx44LpAw=.439f5383-ffd1-44e8-9e11-4b5af9b6a278@github.com> References: <73AnlXOv0T8K25DgsNdH1PkBjcBXz0f3bBYZx44LpAw=.439f5383-ffd1-44e8-9e11-4b5af9b6a278@github.com> Message-ID: <3f1UnDuYp2iYVcciKF-BqdChOOY2PJJG5R0QuyfblVM=.37a92dfd-5de4-4924-83c5-f9c2e5d7548c@github.com> On Tue, 1 Jul 2025 11:24:09 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: >> >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Update how call sites are fixed >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Fix pointer printing >> - Use set_destination_mt_safe >> - Print address as pointer >> - Use new _metadata_size instead of _jvmci_data_size >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Only check branch distance for aarch64 and riscv >> - Move far branch fix to fix_relocation_after_move >> - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e > > src/hotspot/share/code/nmethod.cpp line 1653: > >> 1651: } >> 1652: } >> 1653: } > > Do we need this code? Shouldn't missing trampolined be caught during fixing call sites? If fixing call sites fails (like in the event of a missing trampoline) an assert will fail and the JVM will crash. I suppose it could be updated to abandon the relocation if that happens but that would require `fix_relocation_after_move` to return if it succeeded and proper handling by the caller. > test/hotspot/jtreg/vmTestbase/nsk/jvmti/NMethodRelocation/nmethodrelocation.java line 37: > >> 35: import jdk.test.whitebox.code.BlobType; >> 36: >> 37: public class nmethodrelocation extends DebugeeClass { > > Why is the class name not following the Java code conventions? I was following the naming conventions of other JVMTI tests. https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/vmTestbase/nsk/jvmti ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2180937766 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2180943465 From duke at openjdk.org Wed Jul 2 20:50:52 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 2 Jul 2025 20:50:52 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: <3f1UnDuYp2iYVcciKF-BqdChOOY2PJJG5R0QuyfblVM=.37a92dfd-5de4-4924-83c5-f9c2e5d7548c@github.com> References: <73AnlXOv0T8K25DgsNdH1PkBjcBXz0f3bBYZx44LpAw=.439f5383-ffd1-44e8-9e11-4b5af9b6a278@github.com> <3f1UnDuYp2iYVcciKF-BqdChOOY2PJJG5R0QuyfblVM=.37a92dfd-5de4-4924-83c5-f9c2e5d7548c@github.com> Message-ID: On Wed, 2 Jul 2025 20:43:35 GMT, Chad Rakoczy wrote: >> src/hotspot/share/code/nmethod.cpp line 1653: >> >>> 1651: } >>> 1652: } >>> 1653: } >> >> Do we need this code? Shouldn't missing trampolined be caught during fixing call sites? > > If fixing call sites fails (like in the event of a missing trampoline) an assert will fail and the JVM will crash. I suppose it could be updated to abandon the relocation if that happens but that would require `fix_relocation_after_move` to return if it succeeded and proper handling by the caller. This is only an issue because Hotspot reduces the branch range for debug builds on aarch64 and Graal doesn't. If we're going to handle this case I think we should fail fast but it does raise the question of what should actually be done in this situation ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2180940888 From bulasevich at openjdk.org Wed Jul 2 21:18:46 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 2 Jul 2025 21:18:46 GMT Subject: Integrated: 8358183: [JVMCI] crash accessing nmethod::jvmci_name in CodeCache::aggregate In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 06:39:18 GMT, Boris Ulasevich wrote: > This change addresses an intermittent crash in CompileBroker::print_heapinfo() when accessing JVMCI metadata after a CodeBlob::purge(). > > The issue is a regression after: > - JDK-8343789: JVMCI metadata was moved from nmethod into a separate blob. > - JDK-8352112: CodeBlob::purge() was updated to set _mutable_data to blob_end(). > > The change zeroes out _mutable_data_size, _relocation_size, and _metadata_size in purge() so that after purge jvmci_data_size() returns 0 and CompileBroker::print_heapinfo() won?t touch an invalid _metadata. This pull request has now been integrated. Changeset: 74822ce1 Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/74822ce12acaf9816aa49b75ab5817ced3710776 Stats: 3 lines in 2 files changed: 3 ins; 0 del; 0 mod 8358183: [JVMCI] crash accessing nmethod::jvmci_name in CodeCache::aggregate Reviewed-by: eastigeevich, phh ------------- PR: https://git.openjdk.org/jdk/pull/25608 From mdoerr at openjdk.org Wed Jul 2 21:43:39 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 2 Jul 2025 21:43:39 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <9UlIKIQGn3vum3P71THXlyJwJ1efJmJNlImCpYErex8=.e794eff7-516d-4c5f-8e02-f15e5b34cba6@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> <9UlIKIQGn3vum3P71THXlyJwJ1efJmJNlImCpYErex8=.e794eff7-516d-4c5f-8e02-f15e5b34cba6@github.com> Message-ID: On Wed, 2 Jul 2025 20:11:16 GMT, Dean Long wrote: > Makes sense, but according to the Developers' Guide, we can't do that because "A Bug or Enhancement with resolution Fixed is required to have a corresponding changeset in one of the OpenJDK repositories." https://github.com/openjdk/jdk/commit/cf75f1f9c6d2bc70c7133cb81c73a0ce0946dff9 is a corresponding changset. We can link it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26091#issuecomment-3029400997 From duke at openjdk.org Wed Jul 2 22:11:41 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 2 Jul 2025 22:11:41 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v33] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [ ] Linux x64 fastdebug all > - [ ] Linux aarch64 fastdebug all > - [ ] ... Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: - Enclose ImmutableDataReferencesCounterSize in parentheses - Let trampolines fix their owners ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/70e4164e..c3245fb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=31-32 Stats: 62 lines in 13 files changed: 11 ins; 19 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Wed Jul 2 22:24:07 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 2 Jul 2025 22:24:07 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v34] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [ ] Linux x64 fastdebug all > - [ ] Linux aarch64 fastdebug all > - [ ] ... Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Update justification for skipping CallRelocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/c3245fb7..0f4ff964 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=32-33 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Wed Jul 2 22:24:09 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 2 Jul 2025 22:24:09 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 08:37:32 GMT, Andrew Haley wrote: >> Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: >> >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Update how call sites are fixed >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Fix pointer printing >> - Use set_destination_mt_safe >> - Print address as pointer >> - Use new _metadata_size instead of _jvmci_data_size >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Only check branch distance for aarch64 and riscv >> - Move far branch fix to fix_relocation_after_move >> - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e > > src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 84: > >> 82: if (NativeCall::is_call_at(addr())) { >> 83: NativeCall* call = nativeCall_at(addr()); >> 84: if (be_safe) { > > Why is this change necessary? The original motivation was to address far call sites. After relocation, some calls that previously didn't require a trampoline might now need one, hence the introduction of the `be_safe` parameter. However, upon further review, this change is unnecessary. The method `trampoline_stub_Relocation::fix_relocation_after_move` already updates the owner and contains the logic to determine whether a direct call can be performed. Therefore, we can skip invoking `CallRelocation::fix_relocation_after_move` for calls that use trampolines, as all required adjustments will be handled correctly by the trampoline relocations. ([Reference](https://github.com/chadrako/jdk/blob/0f4ff9646d1f7f43214c5ccd4bbe572fffd08d16/src/hotspot/share/code/nmethod.cpp#L1547-L1556)) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2181076900 From sviswanathan at openjdk.org Wed Jul 2 23:05:42 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 2 Jul 2025 23:05:42 GMT Subject: RFR: 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 [v3] In-Reply-To: <6cWhCvx8g-Gx4VoBHW1wA7atsa_Eq5wBhkDolUbP_X0=.31f8e688-7401-4f81-9b50-46b1997e96b5@github.com> References: <6cWhCvx8g-Gx4VoBHW1wA7atsa_Eq5wBhkDolUbP_X0=.31f8e688-7401-4f81-9b50-46b1997e96b5@github.com> Message-ID: On Tue, 1 Jul 2025 13:36:20 GMT, Jatin Bhateja wrote: >> Floating point division by zero is undefined per the C and C++ standards, but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing either an infinity or NaN value. >> >> While Java semantics defined in section 15.17.2 "Division Operator" of JLS-24 are well-defined for these constant-folding scenarios >> >> This bug fix patch fixes division by 0 error reported after integration of [JDK-8352635.](https://bugs.openjdk.org/browse/JDK-8352635) >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding comments Looks good to me. It will be good to get second review. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26062#pullrequestreview-2980870863 From sparasa at openjdk.org Wed Jul 2 23:32:41 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 2 Jul 2025 23:32:41 GMT Subject: RFR: 8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 17:44:34 GMT, Jatin Bhateja wrote: >>> For a cleaner interface, I think we can also maintain a RAII style APXPushPopPairTracker ... >> >> Using the suggested code as a base, Vamsi and I tinkered with the idea some more! Here is what we came up with. This also tracks the correct order of registers being pushed/poped.. (haven't compiled it, so might have some syntax bugs). >> >> @dholmes-ora would you mind sharing your opinion? We seem to be making things more complicated, but hopefully in a good way? >> >> Also included a sample usage in a stub. >> >> >> #define __ _masm-> >> >> class PushPopTracker { >> private: >> int _counter; >> MacroAssembler *_masm; >> const int REGS = 32; // Increase as needed >> int regs[REGS]; >> public: >> PushPopTracker(MacroAssembler *_masm) : _counter(0), _masm(_masm) {} >> ~PushPopTracker() { >> assert(_counter == 0, "Push/pop pair mismatch"); >> } >> >> void push(Register reg) { >> assert(_counter> regs[_counter++] = reg.encoding(); >> if (VM_Version::supports_apx_f()) { >> __ pushp(reg); >> } else { >> __ push(reg); >> } >> } >> void pop(Register reg) { >> assert(_counter>0, "Push/pop underflow"); >> assert(regs[_counter] == reg.encoding(), "Push/pop pair mismatch: %d != %d", regs[_counter], reg.encoding()); >> _counter--; >> if (VM_Version::supports_apx_f()) { >> __ popp(reg); >> } else { >> __ pop(reg); >> } >> } >> } >> >> address StubGenerator::generate_intpoly_montgomeryMult_P256() { >> __ align(CodeEntryAlignment); >> /*...*/ >> address start = __ pc(); >> __ enter(); >> PushPopTracker s(_masm); >> s.push(r12); //1 >> s.push(r13); //2 >> s.push(r14); //3 >> #ifdef _WIN64 >> s.push(rsi); //4 >> s.push(rdi); //5 >> #endif >> s.push(rbp); //6 >> __ movq(rbp, rsp); >> __ andq(rsp, -32); >> __ subptr(rsp, 32); >> // Register Map >> const Register aLimbs = c_rarg0; // c_rarg0: rdi | rcx >> const Register bLimbs = rsi; // c_rarg1: rsi | rdx >> const Register rLimbs = r8; // c_rarg2: rdx | r8 >> const Register tmp1 = r9; >> const Register tmp2 = r10; >> /*...*/ >> __ movq(rsp, rbp); >> s.pop(rbp); //5 >> #ifdef _WIN64 >> s.pop(rdi); //4 >> s.pop(rsi); //3 >> #endif >> s.pop(r14); //2 >> s.pop(r13); //1 >> s.pop(r12); //0 >> __ leave(); >> __ ret(0); >> return start; >> } > > @vamsi-parasa, It's better to make this as a subclass of MacroAssembler in src/hotspot/cpu/x86/macroAssembler_x86.hpp and pass Tracker as an argument to push / pop for a cleaner interface. Hi Jatin (@jatin-bhateja) and Vlad (@vpaprotsk), There's one more issue to be considered. The C++ PushPopTracker code will be run during the stub generation time. There are code bocks which do a single push onto the stack but due to multiple exit paths, there will be multiple pops as illustrated below. Will this reference counting approach not fail in such a scenario as the stub code is generated all at once during the stub generation phase? #begin stack frame push(r21) #exit condition 1 pop(r21) # exit condition 2 pop(r21) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25889#discussion_r2181146890 From dlong at openjdk.org Wed Jul 2 23:53:39 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 2 Jul 2025 23:53:39 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> Message-ID: <1LxESKwrZ2cxtTlNTIKruyyebF-hportTvFYoYc4htY=.207724e0-418f-4289-8190-2545c74fc191@github.com> On Wed, 2 Jul 2025 10:54:13 GMT, Martin Doerr wrote: > This is a backout of [JDK-8258229](https://bugs.openjdk.org/browse/JDK-8258229) for JDK25 only. The problematic code has already been removed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. > > The backout is clean for the C++ code, but the test backout includes the backout of the follow-up change [JDK-8356310](https://bugs.openjdk.org/browse/JDK-8356310). > > Rationale: Minimize risk for JDK25. We should use the better fix JDK-8358821 in the long term. However, that one should get some more stabilization time before backporting it. Also see JBS issue. > > Proposed long term solution: Backport JDK-8358821 to jdk25u and revert this change again after an appropriate time. > > Short term: The issue solved by JDK-8258229 is not critical. It should be ok to postpone the fix to jdk25u. Testing results look good. There was one timeout in a jshell test, but it seems unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26091#issuecomment-3029712997 From duke at openjdk.org Thu Jul 3 01:52:52 2025 From: duke at openjdk.org (erifan) Date: Thu, 3 Jul 2025 01:52:52 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v2] In-Reply-To: References: Message-ID: > If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. > > And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. > > Some JTReg test cases are added to ensure the optimization is effective. > > I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. > > The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. > > [1] https://github.com/openjdk/jdk/pull/24674 erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Address some review comments Add support for the following patterns: toLong(maskAll(true)) => (-1ULL >> (64 -vlen)) toLong(maskAll(false)) => 0 And add more test cases. - Merge branch 'master' into JDK-8356760 - 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. Some JTReg test cases are added to ensure the optimization is effective. I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. [1] https://github.com/openjdk/jdk/pull/24674 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25793/files - new: https://git.openjdk.org/jdk/pull/25793/files/38664b06..791e0ab7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25793&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25793&range=00-01 Stats: 24487 lines in 940 files changed: 11237 ins; 8323 del; 4927 mod Patch: https://git.openjdk.org/jdk/pull/25793.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25793/head:pull/25793 PR: https://git.openjdk.org/jdk/pull/25793 From duke at openjdk.org Thu Jul 3 02:00:49 2025 From: duke at openjdk.org (erifan) Date: Thu, 3 Jul 2025 02:00:49 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v2] In-Reply-To: References: Message-ID: <9NNhM-s8jWMJnb_DcTeEzeVBxpIYODi611mDQ-so7DQ=.a238b776-fb3b-43fe-b4ac-782d41c8d9aa@github.com> On Thu, 3 Jul 2025 01:52:52 GMT, erifan wrote: >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. >> >> And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. >> >> Some JTReg test cases are added to ensure the optimization is effective. >> >> I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. >> >> The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. >> >> [1] https://github.com/openjdk/jdk/pull/24674 > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Address some review comments > > Add support for the following patterns: > toLong(maskAll(true)) => (-1ULL >> (64 -vlen)) > toLong(maskAll(false)) => 0 > > And add more test cases. > - Merge branch 'master' into JDK-8356760 > - 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases > > If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would > set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent > to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is > relative smaller than that of `fromLong`. This patch does the conversion > for these cases if `l` is a compile time constant. > > And this conversion also enables further optimizations that recognize > maskAll patterns, see [1]. > > Some JTReg test cases are added to ensure the optimization is effective. > > I tried many different ways to write a JMH benchmark, but failed. Since > the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific > compile-time constant, the statement will be hoisted out of the loop. > If we don't use a loop, the hotspot will become other instructions, and > no obvious performance change was observed. However, combined with the > optimization of [1], we can observe a performance improvement of about > 7% on both aarch64 and x64. > > The patch was tested on both aarch64 and x64, all of tier1 tier2 and > tier3 tests passed. > > [1] https://github.com/openjdk/jdk/pull/24674 Thanks for your review! Would you mind taking another look, thanks! ------------- PR Review: https://git.openjdk.org/jdk/pull/25793#pullrequestreview-2981231350 From duke at openjdk.org Thu Jul 3 02:00:50 2025 From: duke at openjdk.org (erifan) Date: Thu, 3 Jul 2025 02:00:50 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v2] In-Reply-To: <34p1DHverqucTroSmERaeSx94Knl2FMfVWxedlij0JA=.a4ab7090-8a1c-421a-bc4b-7e1c17f03246@github.com> References: <34p1DHverqucTroSmERaeSx94Knl2FMfVWxedlij0JA=.a4ab7090-8a1c-421a-bc4b-7e1c17f03246@github.com> Message-ID: On Fri, 27 Jun 2025 06:04:54 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 80: >> >>> 78: return false; >>> 79: } >>> 80: long mask = (0xFFFFFFFFFFFFFFFFULL >> (64 - vlen)); >> >> The higher bits of long input should be cleared. So we should generate an unsigned right shift instead of the signed one? > > I noticed that you used `ULL` suffix. So it should be fine. Please ignore above comment. Thanks! Yeah, thanks~ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2181359573 From duke at openjdk.org Thu Jul 3 02:00:51 2025 From: duke at openjdk.org (erifan) Date: Thu, 3 Jul 2025 02:00:51 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v2] In-Reply-To: <1mmwSiX2OCyFw8bKOj6U1yabINpsZiNblYbvAF8l6dM=.00a75235-c87a-4f04-b863-1f6dc046e4e4@github.com> References: <1mmwSiX2OCyFw8bKOj6U1yabINpsZiNblYbvAF8l6dM=.00a75235-c87a-4f04-b863-1f6dc046e4e4@github.com> Message-ID: On Thu, 26 Jun 2025 07:49:28 GMT, Xiaohong Gong wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Address some review comments >> >> Add support for the following patterns: >> toLong(maskAll(true)) => (-1ULL >> (64 -vlen)) >> toLong(maskAll(false)) => 0 >> >> And add more test cases. >> - Merge branch 'master' into JDK-8356760 >> - 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases >> >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would >> set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent >> to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is >> relative smaller than that of `fromLong`. This patch does the conversion >> for these cases if `l` is a compile time constant. >> >> And this conversion also enables further optimizations that recognize >> maskAll patterns, see [1]. >> >> Some JTReg test cases are added to ensure the optimization is effective. >> >> I tried many different ways to write a JMH benchmark, but failed. Since >> the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific >> compile-time constant, the statement will be hoisted out of the loop. >> If we don't use a loop, the hotspot will become other instructions, and >> no obvious performance change was observed. However, combined with the >> optimization of [1], we can observe a performance improvement of about >> 7% on both aarch64 and x64. >> >> The patch was tested on both aarch64 and x64, all of tier1 tier2 and >> tier3 tests passed. >> >> [1] https://github.com/openjdk/jdk/pull/24674 > > src/hotspot/share/opto/vectorIntrinsics.cpp line 706: > >> 704: opc = Op_Replicate; >> 705: elem_bt = converted_elem_bt; >> 706: bits = gvn().longcon(bits_type->get_con() == 0L ? 0L : -1L); > > Code style. Suggest: > > if (opc == Op_VectorLongToMask && > is_maskall_type(bits_type, num_elem) && > arch_supports_vector(Op_Replicate, num_elem, converted_elem_bt, checkFlags, true /*has_scalar_args*/)) { > opc = Op_Replicate; > elem_bt = converted_elem_bt; > bits = gvn().longcon(bits_type->get_con() == 0L ? 0L : -1L); > } else if ( Done > So if bits = 0xf0, and the vlen = 4, what is the expected mask? This is not possible because the input value has been processed in `VectorMask::fromLong`. See https://github.com/openjdk/jdk/blob/74822ce12acaf9816aa49b75ab5817ced3710776/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMask.java#L242 But for safety, double checked the lowest bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2181360080 PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2181377150 From duke at openjdk.org Thu Jul 3 02:05:21 2025 From: duke at openjdk.org (hanguanqiang) Date: Thu, 3 Jul 2025 02:05:21 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode Message-ID: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode Problem? When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. Root Cause? Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. Fix Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. ------------- Commit messages: - 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode Changes: https://git.openjdk.org/jdk/pull/26108/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26108&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358568 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26108/head:pull/26108 PR: https://git.openjdk.org/jdk/pull/26108 From dlong at openjdk.org Thu Jul 3 02:17:38 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 3 Jul 2025 02:17:38 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode In-Reply-To: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: On Thu, 3 Jul 2025 01:59:55 GMT, hanguanqiang wrote: > This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode > > Problem? > When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. > > Root Cause? > Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. > > Fix > Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. I don't see the point of trying to support this flag. Can we just get rid of it? I don't think it is ever tested, because testing would surely crash unless the JVM ran as single-threaded somehow, which it doesn't. Maybe at some point this flag was useful for getting a new port limping along, but I think stubbing sync code would work just as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3030292214 From haosun at openjdk.org Thu Jul 3 02:19:45 2025 From: haosun at openjdk.org (Hao Sun) Date: Thu, 3 Jul 2025 02:19:45 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v10] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 08:26:00 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments Overall, looks good to me except several nits. src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5159: > 5157: // consecutive. The match rules for SelectFromTwoVector reserve two consecutive vector registers > 5158: // for src1 and src2. > 5159: // Four combinations of vector registers each for vselect_from_two_vectors_HS_Neon and I suppose the function names are changed now. Should use `select_from_two_vectors_Neon` and `select_from_two_vectors_SVE` instead. src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5199: > 5197: __ select_from_two_vectors_SVE($dst$$FloatRegister, $src1$$FloatRegister, > 5198: $src2$$FloatRegister, $index$$FloatRegister, > 5199: $tmp$$FloatRegister, bt, length_in_bytes); nit: Inside `select_from_two_vectors_SVE()`, `bt` is only used to compute `elemType_to_regVariant(bt)`. I suggest using `get_reg_variant(this)` here directly. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2886: > 2884: bool is_byte = (bt == T_BYTE); > 2885: > 2886: if (is_byte) { Suggestion: if (bt == T_BYTE) { src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2901: > 2899: } > 2900: } else { > 2901: int elemSize = (bt == T_SHORT) ? 2 : 4; nit: use `elem_size` src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2902: > 2900: } else { > 2901: int elemSize = (bt == T_SHORT) ? 2 : 4; > 2902: uint64_t tblOffset = (bt == T_SHORT) ? 0x0100u : 0x03020100u; nit: use `tbl_offset` src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.hpp line 197: > 195: > 196: // Select from a table of two vectors > 197: void select_from_two_vectors_Neon(FloatRegister dst, FloatRegister src1, FloatRegister src2, As for the function name, I suggest using `select_from_two_vectors_(neon|sve)`. E.g., `vector_signum_(neon|sve)` or `vector_round_(neon|sve)` as defined in this file. ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/23570#pullrequestreview-2978225584 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2179445324 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2181370525 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2181383791 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2181384078 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2181384185 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2179465592 From xgong at openjdk.org Thu Jul 3 02:24:47 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 3 Jul 2025 02:24:47 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v2] In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 01:52:52 GMT, erifan wrote: >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. >> >> And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. >> >> Some JTReg test cases are added to ensure the optimization is effective. >> >> I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. >> >> The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. >> >> [1] https://github.com/openjdk/jdk/pull/24674 > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Address some review comments > > Add support for the following patterns: > toLong(maskAll(true)) => (-1ULL >> (64 -vlen)) > toLong(maskAll(false)) => 0 > > And add more test cases. > - Merge branch 'master' into JDK-8356760 > - 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases > > If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would > set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent > to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is > relative smaller than that of `fromLong`. This patch does the conversion > for these cases if `l` is a compile time constant. > > And this conversion also enables further optimizations that recognize > maskAll patterns, see [1]. > > Some JTReg test cases are added to ensure the optimization is effective. > > I tried many different ways to write a JMH benchmark, but failed. Since > the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific > compile-time constant, the statement will be hoisted out of the loop. > If we don't use a loop, the hotspot will become other instructions, and > no obvious performance change was observed. However, combined with the > optimization of [1], we can observe a performance improvement of about > 7% on both aarch64 and x64. > > The patch was tested on both aarch64 and x64, all of tier1 tier2 and > tier3 tests passed. > > [1] https://github.com/openjdk/jdk/pull/24674 Looks much better to me. Thanks for your updating! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/25793#pullrequestreview-2981322138 From dlong at openjdk.org Thu Jul 3 02:35:45 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 3 Jul 2025 02:35:45 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> Message-ID: On Wed, 2 Jul 2025 10:54:13 GMT, Martin Doerr wrote: > This is a backout of [JDK-8258229](https://bugs.openjdk.org/browse/JDK-8258229) for JDK25 only. The problematic code has already been removed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. > > The backout is clean for the C++ code, but the test backout includes the backout of the follow-up change [JDK-8356310](https://bugs.openjdk.org/browse/JDK-8356310). > > Rationale: Minimize risk for JDK25. We should use the better fix JDK-8358821 in the long term. However, that one should get some more stabilization time before backporting it. Also see JBS issue. > > Proposed long term solution: Backport JDK-8358821 to jdk25u and revert this change again after an appropriate time. > > Short term: The issue solved by JDK-8258229 is not critical. It should be ok to postpone the fix to jdk25u. > > Makes sense, but according to the Developers' Guide, we can't do that because "A Bug or Enhancement with resolution Fixed is required to have a corresponding changeset in one of the OpenJDK repositories." > > [cf75f1f](https://github.com/openjdk/jdk/commit/cf75f1f9c6d2bc70c7133cb81c73a0ce0946dff9) is a corresponding changset. We can link it. So two bugs would reference the same changeset, but the changeset only names 8358821? It might be better to close 8357017 as a duplicate instead of as Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26091#issuecomment-3030333773 From duke at openjdk.org Thu Jul 3 03:16:43 2025 From: duke at openjdk.org (hanguanqiang) Date: Thu, 3 Jul 2025 03:16:43 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode In-Reply-To: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: On Thu, 3 Jul 2025 01:59:55 GMT, hanguanqiang wrote: > This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode > > Problem? > When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. > > Root Cause? > Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. > > Fix > Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. I?ve investigated some of the earliest versions of the source code, including JDK 6, but was unable to identify the original author of this flag or its intended purpose. In any case, if someone with the authority agrees that this flag is no longer relevant, I?d be glad to take on the task of removing it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3030446171 From jkarthikeyan at openjdk.org Thu Jul 3 03:27:32 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 3 Jul 2025 03:27:32 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v4] In-Reply-To: References: Message-ID: <_sSUlLFhpG8Ton-bIB3u6Nf7YSxb8LQNzngDDLqrwcA=.5c456420-a5bd-406b-8cea-e6d2ac8d74c9@github.com> > Hi all, > This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. > > Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Code review and constant folding test - Merge - Replace uabs usage with ABS - Merge branch 'master' into abs-value - Merge - Improve AbsNode::Value ------------- Changes: https://git.openjdk.org/jdk/pull/23685/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23685&range=03 Stats: 299 lines in 2 files changed: 284 ins; 4 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23685.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23685/head:pull/23685 PR: https://git.openjdk.org/jdk/pull/23685 From jkarthikeyan at openjdk.org Thu Jul 3 03:34:44 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 3 Jul 2025 03:34:44 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 11:56:02 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Replace uabs usage with ABS >> - Merge branch 'master' into abs-value >> - Merge >> - Improve AbsNode::Value > > test/hotspot/jtreg/compiler/c2/irTests/TestIRAbs.java line 333: > >> 331: // [-9, -2] => [2, 9] >> 332: return Math.abs(-((in & 7) + 2)) > 9; >> 333: } > > Could we have some randomized cases here too? Or do we already have them somewhere? I've added support for randomized ranges and if statement folding as suggested in the review comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2181563680 From jkarthikeyan at openjdk.org Thu Jul 3 03:40:43 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 3 Jul 2025 03:40:43 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 11:57:51 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Replace uabs usage with ABS >> - Merge branch 'master' into abs-value >> - Merge >> - Improve AbsNode::Value > > @jaskarth Nice work! I have a few comments below. > > One is about more randomized tests. I'm thinking about something like this: > > - compute `res = Math.abs(x)` > - Truncate `x` with randomly produced bounds from Generators, like this: `x = Math.max(lo, Math.min(hi, x))`. > - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. > - Then fuzz the generated method a few times with random inputs for `x`, and check that the sum and res value are the same for compiled and interpreted code. > > I hope that makes sense :) > This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. > > This is an example, where I asked someone to try this out as well: > https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 @eme64 Thanks for the review and comments! The method of checking for constant folding with if statements and range filtering you mentioned is pretty clever. I've adapted it to the test and added it to the PR. Let me know what you think! > src/hotspot/share/opto/subnode.cpp line 1947: > >> 1945: >> 1946: return IntegerType::make(ABS(t->get_con())); >> 1947: } > > We used `uabs` before, what prevents you from doing that now? I guess you would need a templated version, hmm. Could be worth looking into creating one. There was an earlier discussion in the review: https://github.com/openjdk/jdk/pull/23685#discussion_r1972735806 Essentially, the implementation of `uabs` relies on converting ints/longs from signed to unsigned which is implementation defined until C++20. I believe the implementation works as expected on most platforms, but to be cautious I thought it would be better to just handle it manually to avoid any potential problems. We should revisit when we're at C++20 ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23685#issuecomment-3030523657 PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2181570566 From dholmes at openjdk.org Thu Jul 3 04:43:39 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 3 Jul 2025 04:43:39 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode In-Reply-To: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: On Thu, 3 Jul 2025 01:59:55 GMT, hanguanqiang wrote: > This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode > > Problem? > When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. > > Root Cause? > Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. > > Fix > Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. The patch seems reasonable from a backporting perspective. Though it does beg the question as to why `do_monitor_enter` does not need the same fix. I suspect this is a very old flag and the code has bit-rotted somewhat. A question for the compiler folk: does `GenerateSynchronizationCode` still have any use or should it be scrapped? Thanks ------------- PR Review: https://git.openjdk.org/jdk/pull/26108#pullrequestreview-2981633438 From haosun at openjdk.org Thu Jul 3 04:47:41 2025 From: haosun at openjdk.org (Hao Sun) Date: Thu, 3 Jul 2025 04:47:41 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v6] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 08:48:59 GMT, Mikhail Ablakatov wrote: >> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. >> >> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. >> >> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms >> >> >> Fujitsu A64FX (SVE 512-bit): >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: update a copyright notice > > Co-authored-by: Hao Sun Hi. This PR involves the change to {Int Mul Reduction, FP Mul Reduction} X { auto-vectorization, VectorAPI}. After the offiline discussion with @XiaohongGong , we have one question about the impact of this PR on **FP Mul Reduction + auto-vectorization**. Here lists the change before and after this PR in whether **FP Mul Reduction + auto-vectorization** is on or off. | | Check | before | after| | :-------- | :-------: | --------: | --------: | | case-1 | UseSVE=0 | off | off | | case-2 | UseSVE>0 and length_in_bytes=8or16 | on | off | | case-3 | UseSVE>0 and length_in_bytes>16 | off | off | ## case-1 and case-2 Background: case-1 was set off after @fg1417 's patch [8275275: AArch64: Fix performance regression after auto-vectorization on NEON](https://github.com/openjdk/jdk/pull/10175). But case-2 was not touched. We are not sure about the reason. There was no 128b SVE machine then? Or there was some limitation of SLP on **reduction**? **Limitation** of SLP as mentioned in @fg1417 's patch > Because superword doesn't vectorize reductions unconnected with other vector packs, Performance data in this PR on case-2: From your provided [test data](https://github.com/openjdk/jdk/pull/23181#issuecomment-3018988067) on `Neoverse V2 (SVE 128-bit). Auto-vectorization section`, there is no obvious performance change on FP Mul Reduction benchmarks `(float|double)Mul(Big|Simple)`. As we checked the generated code of `floatMul(Big|Simple)` on Nvidia Grace machine(128b SVE2), we found that before this PR: - `floatMulBig` is vectorized. - `floatMulSimple` is not vectorized because SLP determines that there is no profit. Discussion: should we enable case-1 and case-2? - if the SLP limitation on reductions is fixed? - If there is no such limitation, we may consider enable case-1 and case-2 because a) there is perf regression at least based on current performance results and b) it may provide more auto-vectorization opportunities for other packs inside the loop. It would be appreciated if @eme64 or @fg1417 could provide more inputs. ## case-3 Status: this PR adds rules `reduce_mulF_gt128b` and `reduce_mulD_gt128b` but these two rules are **not** selected. See the [comment from Xiaohong](https://github.com/openjdk/jdk/pull/23181/files#r2176590314). Our suggestion: we're not sure if it's profitable to enable case-3. Could you help do more test on `Neoverse V1 (SVE 256-bit)`? Note that local change should be made to enable case-3, e.g. removing [these lines](https://github.com/openjdk/jdk/pull/23181/files#diff-edf6d70f65d81dc12a483088e0610f4e059bd40697f242aedfed5c2da7475f1aR130-R136). Expected result: - If there is performance gain, we may consider enabling case-3 for auto-vectorization. - If there is no performance gain, we suggest removing these two match rules because they are dead code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-3030705608 From dholmes at openjdk.org Thu Jul 3 04:55:39 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 3 Jul 2025 04:55:39 GMT Subject: RFR: 8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 06:39:52 GMT, David Holmes wrote: >> The goal of this PR is to enhance the existing x86 assembly stubs using PUSH and POP instructions with paired PUSHP/POPP instructions which are part of Intel APX technology. >> >> In Intel APX, the PUSHP and POPP instructions are modern, compact replacements for the legacy PUSH and POP, designed to work seamlessly with the expanded set of 32 general-purpose registers (R0?R31). Unlike their predecessors, they use the new APX (REX2-based) encoding, enabling more uniform and efficient instruction formats. These instructions improve code density, simplify register access, and are optimized for performance on APX-enabled CPUs. >> >> Pairing PUSHP and POPP in Intel APX provides CPU-level benefits such as more efficient instruction decoding, better stack pointer tracking, and improved register dependency management. Their uniform encoding allows for streamlined execution, reduced pipeline stalls, and potential micro-op fusion, all of which enhance performance and power efficiency. This pairing helps the processor optimize speculative execution and register lifetimes, making code faster and more scalable on modern architectures. > > Just a drive-by comment as this isn't code I normally have much to do with but to me it would look a lot cleaner to define `push_paired`/`pop_paired` (maybe abbreviating directly to `pushp`/`popp`?) rather than passing the boolean. > @dholmes-ora would you mind sharing your opinion? We seem to be making things more complicated, but hopefully in a good way? Seems very complicated to me. Really this is for compiler folk to discuss. And as noted above this "tracker" class only helps where the push/pop are paired in the same scope. Personally I think a "pushp" that is defined to be a "push-paired" when available, else a regular "push", would suffice in terms of API design. But again this is for compiler folk to determine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25889#issuecomment-3030744652 From epeter at openjdk.org Thu Jul 3 05:02:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Jul 2025 05:02:40 GMT Subject: RFR: 8361144: Strenghten the Ideal Verification in PhaseIterGVN::verify_Ideal_for by comparing the hash of a node before and after Ideal [v3] In-Reply-To: References: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> Message-ID: On Wed, 2 Jul 2025 10:19:59 GMT, Beno?t Maillard wrote: >> This PR adds a node hash comparison after calling `Ideal` in `PhaseIterGVN::verify_Ideal_for` to introduce an additional layer of verification for missed optimizations. Previously, we relied on the return value of `Ideal`, which is expected to be `nullptr` if no transformation was done. >> >> By also checking the node's hash before and after `Ideal`, we could catch inconsistencies in the implementation or unintended modifications to the graph. Both of these may indicate missed or incomplete optimizations. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8361144) >> - [x] tier1-3, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8361144: add comment for consistency with node count Thanks for adding this @benoitmaillard ! I don't think you need to add a regression test here. What you should do though: run tier1-3 + additional testing, one with the verification enabled and once without. Just to see if there are any cases that currently fail with this verification. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26064#pullrequestreview-2981695058 From epeter at openjdk.org Thu Jul 3 05:26:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Jul 2025 05:26:46 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v3] In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 12:35:47 GMT, Mikhail Ablakatov wrote: >> Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixup: don't modify the value in vsrc >> >> Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this >> change, the result of recursive folding is held in vtmp1. To be able to >> pass this intermediate result to reduce_mul_integral_le128b(), we would >> have to use another temporary FloatRegister, as vtmp1 would essentially >> act as vsrc. It's possible to get around this however: >> reduce_mul_integral_le128b() is modified so it's possible to pass >> matching vsrc and vtmp2 arguments. By doing this, we save ourselves a >> temporary register in rules that match to reduce_mul_integral_gt128b(). >> - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating > > This patch improves of mul reduction VectorAPIs on SVE targets with 256b or wider vectors. This comment also provides performance numbers for NEON / SVE 128b platforms that aren't expected to benefit from these implementations and for auto-vectorization benchmarks. > > ### Neoverse N1 (NEON) > >
> > Auto-vectorization > > | Benchmark | Before | After | Units | Diff | > |---------------------------|----------|----------|-------|------| > | mulRedD | 739.699 | 740.884 | ns/op | ~ | > | byteAddBig | 2670.248 | 2670.562 | ns/op | ~ | > | byteAddSimple | 1639.796 | 1639.940 | ns/op | ~ | > | byteMulBig | 2707.900 | 2708.063 | ns/op | ~ | > | byteMulSimple | 2452.939 | 2452.906 | ns/op | ~ | > | charAddBig | 2772.363 | 2772.269 | ns/op | ~ | > | charAddSimple | 1639.867 | 1639.751 | ns/op | ~ | > | charMulBig | 2796.533 | 2796.375 | ns/op | ~ | > | charMulSimple | 2453.034 | 2453.004 | ns/op | ~ | > | doubleAddBig | 2943.613 | 2936.897 | ns/op | ~ | > | doubleAddSimple | 1635.031 | 1634.797 | ns/op | ~ | > | doubleMulBig | 3001.937 | 3003.240 | ns/op | ~ | > | doubleMulSimple | 2448.154 | 2448.117 | ns/op | ~ | > | floatAddBig | 2963.086 | 2962.215 | ns/op | ~ | > | floatAddSimple | 1634.987 | 1634.798 | ns/op | ~ | > | floatMulBig | 3022.442 | 3021.356 | ns/op | ~ | > | floatMulSimple | 2447.976 | 2448.091 | ns/op | ~ | > | intAddBig | 832.346 | 832.382 | ns/op | ~ | > | intAddSimple | 841.276 | 841.287 | ns/op | ~ | > | intMulBig | 1245.155 | 1245.095 | ns/op | ~ | > | intMulSimple | 1638.762 | 1638.826 | ns/op | ~ | > | longAddBig | 4924.541 | 4924.328 | ns/op | ~ | > | longAddSimple | 841.623 | 841.625 | ns/op | ~ | > | longMulBig | 9848.954 | 9848.807 | ns/op | ~ | > | longMulSimple | 3427.169 | 3427.279 | ns/op | ~ | > | shortAddBig | 2670.027 | 2670.345 | ns/op | ~ | > | shortAddSimple | 1639.869 | 1639.876 | ns/op | ~ | > | shortMulBig | 2750.812 | 2750.562 | ns/op | ~ | > | shortMulSimple | 2453.030 | 2452.937 | ns/op | ~ | > >
> >
> > VectorAPI > > | Benchmark ... @mikabl-arm @XiaohongGong I'm a little busy these weeks before going on vacation, so I won't have time to look into this more deeply. However, I do plan to remove the auto-vectorization restrictions for simple reductions. https://bugs.openjdk.org/browse/JDK-8307516 You can already now disable the (bad) reduction heuristic, using `AutoVectorizationOverrideProfitability`. https://bugs.openjdk.org/browse/JDK-8357530 I published benchmark results there: https://github.com/openjdk/jdk/pull/25387 You can see that enabling simple reductions is in most cases actually profitable now. But float/double add and mul have strict reduction order, and that usually prevents vectorization from being profitable. The strict-order vector reduction is quite expensive, and it only becomes beneficial if there is a lot of other code in the loop that can be vectorized. Soon, I plan to add a cost-model, so that we can predict if vectorization is profitable. It would also be nice to actually find a benchmark where float add/mul reductions lead to a speedup with vectorization. So far I have not seen any example in my benchmarks: https://github.com/openjdk/jdk/pull/25387 If you find any such example, please let me know ;) I don't have access to any SVE machines, so I cannot help you there, unfortunately. Is this helpful to you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-3030798159 From epeter at openjdk.org Thu Jul 3 05:30:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Jul 2025 05:30:40 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v3] In-Reply-To: References: Message-ID: <-iWCtGzKfoilC1bFXj726ZTS8glyDlqRdY76ddUdgb0=.2b303302-ee5b-4982-a72d-a56be53a5101@github.com> On Thu, 3 Jul 2025 03:38:13 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Nice work! I have a few comments below. >> >> One is about more randomized tests. I'm thinking about something like this: >> >> - compute `res = Math.abs(x)` >> - Truncate `x` with randomly produced bounds from Generators, like this: `x = Math.max(lo, Math.min(hi, x))`. >> - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. >> - Then fuzz the generated method a few times with random inputs for `x`, and check that the sum and res value are the same for compiled and interpreted code. >> >> I hope that makes sense :) >> This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. >> >> This is an example, where I asked someone to try this out as well: >> https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 > > @eme64 Thanks for the review and comments! The method of checking for constant folding with if statements and range filtering you mentioned is pretty clever. I've adapted it to the test and added it to the PR. Let me know what you think! @jaskarth Nice, thanks for adding the range tests! Unfortunately, I'm quite busy before going on vacation. I hope someone else can review this. Otherwise I can come back to it in August. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23685#issuecomment-3030813369 From xgong at openjdk.org Thu Jul 3 05:56:41 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 3 Jul 2025 05:56:41 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v3] In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 05:23:38 GMT, Emanuel Peter wrote: > You can see that enabling simple reductions is in most cases actually profitable now. But float/double add and mul have strict reduction order, and that usually prevents vectorization from being profitable. The strict-order vector reduction is quite expensive, and it only becomes beneficial if there is a lot of other code in the loop that can be vectorized. Soon, I plan to add a cost-model, so that we can predict if vectorization is profitable. > > It would also be nice to actually find a benchmark where float add/mul reductions lead to a speedup with vectorization. So far I have not seen any example in my benchmarks: https://github.com/openjdk/jdk/pull/25387 If you find any such example, please let me know ;) > > I don't have access to any SVE machines, so I cannot help you there, unfortunately. > >Is this helpful to you? Thanks for your input @eme64 ! It's really helpful to me. And it would be the right direction that using the cost model to guide whether vectorizing FP mul reduction is profitable or not. With this, I think the backend check of auto-vectorization for such operations can be removed safely. We can relay on the SLP's analysis. BTW, the current profitability heuristics can provide help on disabling auto-vectorization for the simple cases while enabling the complex ones. This is also helpful to us. I tested the performance of `VectorReduction2` with/without auto-vectorization for FP mul reductions on my SVE 128-bit machine. The performance difference is not very significant for both `floatMulSimple` and `floatMulBig`. But I guess the performance change would be different with auto-vectorization on HWs with larger vector size. As we do not have the SVE machines with larger vector size as well, we may need help from @mikabl-arm ! If the performance of `floatMulBig` is improved with auto-vectorization, I think we can remove the limitation of such reductions for auto-vectorization on AArch64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-3030931690 From xgong at openjdk.org Thu Jul 3 06:10:28 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 3 Jul 2025 06:10:28 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v3] In-Reply-To: References: Message-ID: <7VTRz_XqYBSdQ54n7ADzTzYADZjDbgBw6XuW0jehSLI=.24d18b87-4553-47ab-8065-d92fbb5700ae@github.com> > ### Background > On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions. > > For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size. > > To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors. > > ### Impact Analysis > #### 1. Vector types > Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change. > > #### 2. Vector API > No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length. > > #### 3. Auto-vectorization > Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks. > > #### 4. Codegen of vector nodes > NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored. > > Details: > - Lanewise vector operations are unaffected as explained above. > - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE). > - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, adding an explicit vector size check in `match_rule_s... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Refine the comment in ad file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26057/files - new: https://git.openjdk.org/jdk/pull/26057/files/4e15e588..dfda42a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26057&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26057&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26057/head:pull/26057 PR: https://git.openjdk.org/jdk/pull/26057 From mhaessig at openjdk.org Thu Jul 3 06:13:40 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 3 Jul 2025 06:13:40 GMT Subject: RFR: 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 [v3] In-Reply-To: <6cWhCvx8g-Gx4VoBHW1wA7atsa_Eq5wBhkDolUbP_X0=.31f8e688-7401-4f81-9b50-46b1997e96b5@github.com> References: <6cWhCvx8g-Gx4VoBHW1wA7atsa_Eq5wBhkDolUbP_X0=.31f8e688-7401-4f81-9b50-46b1997e96b5@github.com> Message-ID: On Tue, 1 Jul 2025 13:36:20 GMT, Jatin Bhateja wrote: >> Floating point division by zero is undefined per the C and C++ standards, but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing either an infinity or NaN value. >> >> While Java semantics defined in section 15.17.2 "Division Operator" of JLS-24 are well-defined for these constant-folding scenarios >> >> This bug fix patch fixes division by 0 error reported after integration of [JDK-8352635.](https://bugs.openjdk.org/browse/JDK-8352635) >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding comments Thank you for addressing my comments @jatin-bhateja. Looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26062#pullrequestreview-2981858755 From dfenacci at openjdk.org Thu Jul 3 06:19:38 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 3 Jul 2025 06:19:38 GMT Subject: RFR: 8361144: Strenghten the Ideal Verification in PhaseIterGVN::verify_Ideal_for by comparing the hash of a node before and after Ideal [v3] In-Reply-To: References: <9YpmCSNKHrTmq54eLusmkTHoEFFUTvm6OiqjdiGNFv0=.f8123888-bd26-42cc-938a-ec756a0da90d@github.com> Message-ID: On Wed, 2 Jul 2025 10:19:59 GMT, Beno?t Maillard wrote: >> This PR adds a node hash comparison after calling `Ideal` in `PhaseIterGVN::verify_Ideal_for` to introduce an additional layer of verification for missed optimizations. Previously, we relied on the return value of `Ideal`, which is expected to be `nullptr` if no transformation was done. >> >> By also checking the node's hash before and after `Ideal`, we could catch inconsistencies in the implementation or unintended modifications to the graph. Both of these may indicate missed or incomplete optimizations. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8361144) >> - [x] tier1-3, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8361144: add comment for consistency with node count Looks good to me. Thanks @benoitmaillard! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/26064#pullrequestreview-2981870146 From duke at openjdk.org Thu Jul 3 07:03:49 2025 From: duke at openjdk.org (erifan) Date: Thu, 3 Jul 2025 07:03:49 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7] In-Reply-To: <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com> References: <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com> Message-ID: On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter wrote: >>> > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const { return mask(_test^4); }` I think you should use that instead :) >>> >>> Indeed, I hadn't noticed that, thank you. >> >> Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function. > >> Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function. > > I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because that is where the exact implementation of the enum values is. In that context it is easier to see why `^4` does the negation. And imagine we were ever to change the enum values, then it would be harder to find your code and fix it. > > Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a comment that both signed and unsigned is supported. Hi @eme64 @jatin-bhateja , would you mind taking another look of this PR, thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3031109432 From duke at openjdk.org Thu Jul 3 07:10:22 2025 From: duke at openjdk.org (erifan) Date: Thu, 3 Jul 2025 07:10:22 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v3] In-Reply-To: References: Message-ID: > If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. > > And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. > > Some JTReg test cases are added to ensure the optimization is effective. > > I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. > > The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. > > [1] https://github.com/openjdk/jdk/pull/24674 erifan has updated the pull request incrementally with one additional commit since the last revision: Simplify the test code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25793/files - new: https://git.openjdk.org/jdk/pull/25793/files/791e0ab7..9f07d5c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25793&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25793&range=01-02 Stats: 233 lines in 3 files changed: 40 ins; 180 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/25793.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25793/head:pull/25793 PR: https://git.openjdk.org/jdk/pull/25793 From duke at openjdk.org Thu Jul 3 07:10:23 2025 From: duke at openjdk.org (erifan) Date: Thu, 3 Jul 2025 07:10:23 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v2] In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 01:52:52 GMT, erifan wrote: >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. >> >> And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. >> >> Some JTReg test cases are added to ensure the optimization is effective. >> >> I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. >> >> The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. >> >> [1] https://github.com/openjdk/jdk/pull/24674 > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Address some review comments > > Add support for the following patterns: > toLong(maskAll(true)) => (-1ULL >> (64 -vlen)) > toLong(maskAll(false)) => 0 > > And add more test cases. > - Merge branch 'master' into JDK-8356760 > - 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases > > If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would > set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent > to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is > relative smaller than that of `fromLong`. This patch does the conversion > for these cases if `l` is a compile time constant. > > And this conversion also enables further optimizations that recognize > maskAll patterns, see [1]. > > Some JTReg test cases are added to ensure the optimization is effective. > > I tried many different ways to write a JMH benchmark, but failed. Since > the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific > compile-time constant, the statement will be hoisted out of the loop. > If we don't use a loop, the hotspot will become other instructions, and > no obvious performance change was observed. However, combined with the > optimization of [1], we can observe a performance improvement of about > 7% on both aarch64 and x64. > > The patch was tested on both aarch64 and x64, all of tier1 tier2 and > tier3 tests passed. > > [1] https://github.com/openjdk/jdk/pull/24674 Hi @eme64 @jatin-bhateja , could you help review this PR? Thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/25793#issuecomment-3031127486 From duke at openjdk.org Thu Jul 3 07:10:42 2025 From: duke at openjdk.org (duke) Date: Thu, 3 Jul 2025 07:10:42 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v5] In-Reply-To: References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: On Wed, 2 Jul 2025 07:19:30 GMT, Beno?t Maillard wrote: >> This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. >> >> ### Context >> During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. >> >> In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). >> >> ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) >> >> ### Detailed Analysis >> >> In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which >> results in a type refinement: the range gets restricted to `int:-13957..-1191`. >> >> ```c++ >> // Pull from worklist; compute new value; push changes out. >> // This loop is the meat of CCP. >> while (worklist.size() != 0) { >> Node* n = fetch_next_node(worklist); >> DEBUG_ONLY(worklist_verify.push(n);) >> if (n->is_SafePoint()) { >> // Make sure safepoints are processed by PhaseCCP::transform even if they are >> // not reachable from the bottom. Otherwise, infinite loops would be removed. >> _root_and_safepoints.push(n); >> } >> const Type* new_type = n->Value(this); >> if (new_type != type(n)) { >> DEBUG_ONLY(verify_type(n, new_type, type(n));) >> dump_type_and_node(n, new_type); >> set_type(n, new_type); >> push_child_nodes_to_worklist(worklist, n); >> } >> if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { >> // Keep track of Type nodes to kill CFG paths that use Type >> // nodes that become dead. >> _maybe_top_type_nodes.push(n); >> } >> } >> DEBUG_ONLY(verify_analyze(worklist_verify);) >> >> >> At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: >> - `int` for node `591` (`ModINode`) >> - `int:-13957..-1191` for node `138` (`PhiNode`) >> >> If we call `find_node(138)->bottom_type()`, we get: >> - `int` for both nodes >> >> The... > > Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Fix bad test class name > - 8359602: rename test > - 8359602: remove requires.debug=true and add -XX:+IgnoreUnrecognizedVMOptions flag > - 8359602: add comment > - 8359602: add test summary and comments > - 8359602: tag requires vm.debug == true > - 8359602: Add test from fuzzer > - 8359602: Add users to IGVN worklist when type is refined in CCP @benoitmaillard Your change (at version a66d3fb492541a17e28b3e0fe0f60080c14bdc2c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26017#issuecomment-3031130840 From duke at openjdk.org Thu Jul 3 07:17:43 2025 From: duke at openjdk.org (duke) Date: Thu, 3 Jul 2025 07:17:43 GMT Subject: RFR: 8357739: [jittester] disable the hashCode method In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 19:49:34 GMT, Evgeny Nikitin wrote: > JITTester often uses the `hasCode` method (in fact, in almost every generated test). Given that the method can be unstable between runs or in interpreted vs compiled runs, it can create false-positives. > > This PR fixes the issue by adding support for method templates similar to the ones used in CompilerCommands). All of those exclude templates match (and exclude) `String.indexOf(String)`, for example: > > java/lang/::*(Ljava/lang/String;I) > *String::indexOf(*) > java/lang/*::indexOf > > > Additionally, the PR adds support for comments (starting from '#') and empty lines in the excludes file. @lepestock Your change (at version 5c9a71b9c5b6f418a97e6b0557431aafc73addc6) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25859#issuecomment-3031148211 From thartmann at openjdk.org Thu Jul 3 07:22:39 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Jul 2025 07:22:39 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code In-Reply-To: <-7cfzVghCWnUCfB1F3dcyG2fvJGnqREUW98qiVJEvQQ=.db06fb1e-e96e-4e00-bac0-098b4e1de54c@github.com> References: <-7cfzVghCWnUCfB1F3dcyG2fvJGnqREUW98qiVJEvQQ=.db06fb1e-e96e-4e00-bac0-098b4e1de54c@github.com> Message-ID: On Wed, 2 Jul 2025 07:16:44 GMT, Tobias Hartmann wrote: > I submitted some testing to make sure that CTW is clean in our CI. I see the following crashes that would need to be fixed before this is integrated: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/phaseX.cpp:2790), pid=3196445, tid=3196462 # assert(!failure) failed: PhaseCCP not at fixpoint: analysis result may be unsound. # # JRE version: Java(TM) SE Runtime Environment (26.0) (fastdebug build 26-internal-2025-07-02-0711056.tobias.hartmann.jdk4) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-internal-2025-07-02-0711056.tobias.hartmann.jdk4, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x180cff8] PhaseCCP::verify_analyze(Unique_Node_List&) [clone .part.0]+0x28 Current CompileTask: C2:13166 2238 b com.ibm.icu.impl.LocaleUtility::fallback (78 bytes) Stack: [0x00007f20eca0c000,0x00007f20ecb0c000], sp=0x00007f20ecb07050, free space=1004k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x180cff8] PhaseCCP::verify_analyze(Unique_Node_List&) [clone .part.0]+0x28 (phaseX.cpp:2790) V [libjvm.so+0x181e8aa] PhaseCCP::analyze()+0x7ca (phaseX.cpp:2790) V [libjvm.so+0xb44c94] Compile::Optimize()+0x964 (compile.cpp:2479) V [libjvm.so+0xb480d3] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1ec3 (compile.cpp:858) V [libjvm.so+0x96d157] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x467 (c2compiler.cpp:141) V [libjvm.so+0xb574f8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb58 (compileBroker.cpp:2323) V [libjvm.so+0xb586c8] CompileBroker::compiler_thread_loop()+0x578 (compileBroker.cpp:1967) V [libjvm.so+0x10abd0b] JavaThread::thread_main_inner()+0x13b (javaThread.cpp:773) V [libjvm.so+0x1b11f26] Thread::call_run()+0xb6 (thread.cpp:243) V [libjvm.so+0x178c718] thread_native_entry(Thread*)+0x128 (os_linux.cpp:868) # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/phaseX.cpp:784), pid=2175071, tid=2175089 # assert(no_dead_loop) failed: dead loop detected # # JRE version: Java(TM) SE Runtime Environment (26.0) (fastdebug build 26-internal-2025-07-02-0711056.tobias.hartmann.jdk4) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-internal-2025-07-02-0711056.tobias.hartmann.jdk4, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x180d285] PhaseGVN::dead_loop_check(Node*) [clone .part.0]+0x1d5 Current CompileTask: C2:4914 2051 !b 4 com.sun.beans.introspect.MethodInfo::get (273 bytes) Stack: [0x00007fe603f00000,0x00007fe604000000], sp=0x00007fe603ffaef0, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x180d285] PhaseGVN::dead_loop_check(Node*) [clone .part.0]+0x1d5 (phaseX.cpp:784) V [libjvm.so+0x181c309] PhaseIterGVN::transform_old(Node*)+0x529 (phaseX.cpp:767) V [libjvm.so+0x1820505] PhaseIterGVN::optimize()+0xc5 (phaseX.cpp:1054) V [libjvm.so+0xb414ba] Compile::inline_incrementally_cleanup(PhaseIterGVN&)+0x2ca (compile.cpp:2151) V [libjvm.so+0xb41ed6] Compile::inline_incrementally(PhaseIterGVN&)+0x416 (compile.cpp:2201) V [libjvm.so+0xb447ae] Compile::Optimize()+0x47e (compile.cpp:2329) V [libjvm.so+0xb480d3] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1ec3 (compile.cpp:858) V [libjvm.so+0x96d157] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x467 (c2compiler.cpp:141) V [libjvm.so+0xb574f8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb58 (compileBroker.cpp:2323) V [libjvm.so+0xb586c8] CompileBroker::compiler_thread_loop()+0x578 (compileBroker.cpp:1967) V [libjvm.so+0x10abd0b] JavaThread::thread_main_inner()+0x13b (javaThread.cpp:773) V [libjvm.so+0x1b11f26] Thread::call_run()+0xb6 (thread.cpp:243) V [libjvm.so+0x178c718] thread_native_entry(Thread*)+0x128 (os_linux.cpp:868) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26068#issuecomment-3031160931 From bmaillard at openjdk.org Thu Jul 3 07:30:48 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 3 Jul 2025 07:30:48 GMT Subject: Integrated: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP In-Reply-To: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: On Fri, 27 Jun 2025 10:59:57 GMT, Beno?t Maillard wrote: > This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. > > ### Context > During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. > > In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). > > ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) > > ### Detailed Analysis > > In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which > results in a type refinement: the range gets restricted to `int:-13957..-1191`. > > ```c++ > // Pull from worklist; compute new value; push changes out. > // This loop is the meat of CCP. > while (worklist.size() != 0) { > Node* n = fetch_next_node(worklist); > DEBUG_ONLY(worklist_verify.push(n);) > if (n->is_SafePoint()) { > // Make sure safepoints are processed by PhaseCCP::transform even if they are > // not reachable from the bottom. Otherwise, infinite loops would be removed. > _root_and_safepoints.push(n); > } > const Type* new_type = n->Value(this); > if (new_type != type(n)) { > DEBUG_ONLY(verify_type(n, new_type, type(n));) > dump_type_and_node(n, new_type); > set_type(n, new_type); > push_child_nodes_to_worklist(worklist, n); > } > if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { > // Keep track of Type nodes to kill CFG paths that use Type > // nodes that become dead. > _maybe_top_type_nodes.push(n); > } > } > DEBUG_ONLY(verify_analyze(worklist_verify);) > > > At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: > - `int` for node `591` (`ModINode`) > - `int:-13957..-1191` for node `138` (`PhiNode`) > > If we call `find_node(138)->bottom_type()`, we get: > - `int` for both nodes > > There is no progress on the type of `ModINode` during CCP, because `ModINode::Value` > is not able to... This pull request has now been integrated. Changeset: c75df634 Author: Beno?t Maillard Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/c75df634be9a0073fa246d42e5c362a09f1734f3 Stats: 61 lines in 2 files changed: 61 ins; 0 del; 0 mod 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP Reviewed-by: epeter, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/26017 From jbhateja at openjdk.org Thu Jul 3 08:06:43 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Jul 2025 08:06:43 GMT Subject: RFR: 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 [v3] In-Reply-To: References: <6cWhCvx8g-Gx4VoBHW1wA7atsa_Eq5wBhkDolUbP_X0=.31f8e688-7401-4f81-9b50-46b1997e96b5@github.com> Message-ID: On Wed, 2 Jul 2025 23:02:37 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding comments > > Looks good to me. It will be good to get second review. Thanks @sviswa7 and @mhaessig ------------- PR Comment: https://git.openjdk.org/jdk/pull/26062#issuecomment-3031285321 From jbhateja at openjdk.org Thu Jul 3 08:06:44 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Jul 2025 08:06:44 GMT Subject: Integrated: 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 10:08:20 GMT, Jatin Bhateja wrote: > Floating point division by zero is undefined per the C and C++ standards, but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing either an infinity or NaN value. > > While Java semantics defined in section 15.17.2 "Division Operator" of JLS-24 are well-defined for these constant-folding scenarios > > This bug fix patch fixes division by 0 error reported after integration of [JDK-8352635.](https://bugs.openjdk.org/browse/JDK-8352635) > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 2f683fdc Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/2f683fdc4a8f9c227e878b0d7fca645fc8abe1b6 Stats: 23 lines in 1 file changed: 23 ins; 0 del; 0 mod 8361037: [ubsan] compiler/c2/irTests/TestFloat16ScalarOperations division by 0 Reviewed-by: mhaessig, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/26062 From bmaillard at openjdk.org Thu Jul 3 08:09:41 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 3 Jul 2025 08:09:41 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v3] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 12:39:23 GMT, Marc Chevalier wrote: >> A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. >> >> ## Pure Functions >> >> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. >> >> ## Scope >> >> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. >> >> ## Implementation Overview >> >> We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. >> >> This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > mostly comments src/hotspot/share/opto/parse2.cpp line 1100: > 1098: Node* Parse::floating_point_mod(Node* a, Node* b, BasicType type) { > 1099: assert(type == BasicType::T_FLOAT || type == BasicType::T_DOUBLE, "only float and double are floating points"); > 1100: CallLeafPureNode* mod = type == BasicType::T_DOUBLE ? static_cast(new ModDNode(C, a, b)) : new ModFNode(C, a, b); May I ask why we only need the `static_cast` for the `ModDNode` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2180229177 From eastigeevich at openjdk.org Thu Jul 3 08:18:56 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 3 Jul 2025 08:18:56 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 [v2] In-Reply-To: References: Message-ID: <6CyXvRWJLHBSZxw6E0TJPva7X2RoqBZjE5b0q4oqVas=.b9a1e93d-5209-4cb0-b9b0-b1fac2e696e1@github.com> On Tue, 1 Jul 2025 16:05:07 GMT, Evgeny Astigeevich wrote: >> Test compiler/onSpinWait/TestOnSpinWaitAArch64.java needs debug info to identify a position of spin wait instructions in generated code. The test switched to use `XX:CompileCommand=print` instead of `XX:+PrintAssembly` to have assembly only for a tested Java method. In release builds `XX:+PrintAssembly` prints out debug info but `XX:CompileCommand=print` does not. >> >> This PR reimplements the test to parse instructions and to check them. The test does not rely on debug info anymore. >> >> Tested on Linux and MacOS with and without hsdis: >> - Fastdebug: test passed >> - Slowdebug: test passed. >> - Release: test passed. > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Simplify requirement for debug build I have rewritten the test not to use debug info at all. The test works with instructions instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26072#issuecomment-3031319871 From eastigeevich at openjdk.org Thu Jul 3 08:18:56 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 3 Jul 2025 08:18:56 GMT Subject: RFR: 8360936: Test compiler/onSpinWait/TestOnSpinWaitAArch64.java fails after JDK-8359435 [v3] In-Reply-To: References: Message-ID: > Test compiler/onSpinWait/TestOnSpinWaitAArch64.java needs debug info to identify a position of spin wait instructions in generated code. The test switched to use `XX:CompileCommand=print` instead of `XX:+PrintAssembly` to have assembly only for a tested Java method. In release builds `XX:+PrintAssembly` prints out debug info but `XX:CompileCommand=print` does not. > > This PR reimplements the test to parse instructions and to check them. The test does not rely on debug info anymore. > > Tested on Linux and MacOS with and without hsdis: > - Fastdebug: test passed > - Slowdebug: test passed. > - Release: test passed. Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Reimplement checking algo without using debug info ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26072/files - new: https://git.openjdk.org/jdk/pull/26072/files/e91036bc..0b3320e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26072&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26072&range=01-02 Stats: 139 lines in 1 file changed: 49 ins; 66 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/26072.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26072/head:pull/26072 PR: https://git.openjdk.org/jdk/pull/26072 From mchevalier at openjdk.org Thu Jul 3 08:21:41 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Jul 2025 08:21:41 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v3] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 14:35:26 GMT, Beno?t Maillard wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> mostly comments > > src/hotspot/share/opto/parse2.cpp line 1100: > >> 1098: Node* Parse::floating_point_mod(Node* a, Node* b, BasicType type) { >> 1099: assert(type == BasicType::T_FLOAT || type == BasicType::T_DOUBLE, "only float and double are floating points"); >> 1100: CallLeafPureNode* mod = type == BasicType::T_DOUBLE ? static_cast(new ModDNode(C, a, b)) : new ModFNode(C, a, b); > > May I ask why we only need the `static_cast` for the `ModDNode` here? It's C/C++ being annoying here: both branches of the ternary must have the same type, or something compatible. If I remove the cast: error: conditional expression between distinct pointer types 'ModDNode*' and 'ModFNode*' lacks a cast With the case, C++ can converrt the `ModFNode*` into a `CallLeafPureNode*` just fine. I didn't invent the cast, it was here before, but good to question it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2182166387 From alanb at openjdk.org Thu Jul 3 08:39:40 2025 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 3 Jul 2025 08:39:40 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining In-Reply-To: References: Message-ID: On Sun, 29 Jun 2025 15:26:14 GMT, Richard Reingruber wrote: > This PR adds CompileCommands to the test DumpThreadsWithEliminatedLock.java to force inlining of java/lang/String*.* methods. This will make inlining more stable to allow for the expected lock elimination based on c2 escape analysis. > > Forcing inlining of java/lang/StringBuffer.* wasn't sufficient on x86_64. With that the test still failed with TieredCompilation disabled. > > Testing: x86_64, ppc64 manually. Other major platforms as part of our CI testing. > > Failed inlining on x86_64 with TieredCompilation disabled: > > > make test TEST=com/sun/management/HotSpotDiagnosticMXBean/DumpThreadsWithEliminatedLock.java TEST_VM_OPTS="-XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=PrintInlining,DumpThreadsWithEliminatedLock.*" JTREG=TIMEOUT_FACTOR=0.1 > > [...] > > STDOUT: > CompileCommand: PrintInlining DumpThreadsWithEliminatedLock.* bool PrintInlining = true > @ 1 java.util.concurrent.atomic.AtomicBoolean::get (13 bytes) inline (hot) > @ 11 java.lang.StringBuffer:: (7 bytes) inline (hot) late inline succeeded (string method) > @ 3 java.lang.AbstractStringBuilder:: (39 bytes) inline (hot) > @ 1 java.lang.Object:: (1 bytes) inline (hot) > @ 16 java.lang.System::currentTimeMillis (0 bytes) (intrinsic) > s @ 19 java.lang.StringBuffer::append (13 bytes) failed to inline: already compiled into a big method > s @ 24 java.lang.StringBuffer::toString (44 bytes) inline (hot) late inline succeeded (string method) > s @ 1 java.lang.StringBuffer::length (5 bytes) accessor > @ 24 java.lang.String:: (98 bytes) failed to inline: already compiled into a big method > @ 30 java.util.concurrent.atomic.AtomicReference::set (6 bytes) accessor > 2025-07-02T09:25:53.396634900Z Attempt 1, found: false > 2025-07-02T09:25:53.415673072Z Attempt 2, found: false > 2025-07-02T09:25:53.418876867Z Attempt 3, found: false > > [...] Thanks for improving this, this test was intended unstable. It might be that it could be updated to work with debug or -Xcomp too, execution times would need to be checked out. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26033#pullrequestreview-2982264097 From mdoerr at openjdk.org Thu Jul 3 08:55:47 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 3 Jul 2025 08:55:47 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> Message-ID: <9gtw9iF8JY7RV3rnUau07YX5UfBJD5phY9yq_q16imE=.08ef8dc9-3ee5-4ea0-a5ea-661b5f12f9ed@github.com> On Wed, 2 Jul 2025 10:54:13 GMT, Martin Doerr wrote: > This is a backout of [JDK-8258229](https://bugs.openjdk.org/browse/JDK-8258229) for JDK25 only. The problematic code has already been removed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. > > The backout is clean for the C++ code, but the test backout includes the backout of the follow-up change [JDK-8356310](https://bugs.openjdk.org/browse/JDK-8356310). > > Rationale: Minimize risk for JDK25. We should use the better fix JDK-8358821 in the long term. However, that one should get some more stabilization time before backporting it. Also see JBS issue. > > Proposed long term solution: Backport JDK-8358821 to jdk25u and revert this change again after an appropriate time. > > Short term: The issue solved by JDK-8258229 is not critical. It should be ok to postpone the fix to jdk25u. Tests are also green on our side. Let's ship it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26091#issuecomment-3031435423 From mdoerr at openjdk.org Thu Jul 3 08:55:47 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 3 Jul 2025 08:55:47 GMT Subject: [jdk25] Integrated: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> Message-ID: On Wed, 2 Jul 2025 10:54:13 GMT, Martin Doerr wrote: > This is a backout of [JDK-8258229](https://bugs.openjdk.org/browse/JDK-8258229) for JDK25 only. The problematic code has already been removed by [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) in JDK26. > > The backout is clean for the C++ code, but the test backout includes the backout of the follow-up change [JDK-8356310](https://bugs.openjdk.org/browse/JDK-8356310). > > Rationale: Minimize risk for JDK25. We should use the better fix JDK-8358821 in the long term. However, that one should get some more stabilization time before backporting it. Also see JBS issue. > > Proposed long term solution: Backport JDK-8358821 to jdk25u and revert this change again after an appropriate time. > > Short term: The issue solved by JDK-8258229 is not critical. It should be ok to postpone the fix to jdk25u. This pull request has now been integrated. Changeset: 993215f3 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/993215f3dd7aba221da8c901117a8ff3f0ccb675 Stats: 93 lines in 2 files changed: 0 ins; 93 del; 0 mod 8361259: JDK25: Backout JDK-8258229 Reviewed-by: mhaessig, thartmann, dlong ------------- PR: https://git.openjdk.org/jdk/pull/26091 From mdoerr at openjdk.org Thu Jul 3 09:58:47 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 3 Jul 2025 09:58:47 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> Message-ID: <7x65TpFJJJ2dTdjZq__12fVXAlY2Ta7HYOUc17Oe0zQ=.8ed717d7-c89e-4a1b-ad12-08cabceadf28@github.com> On Thu, 3 Jul 2025 02:33:23 GMT, Dean Long wrote: > > > Makes sense, but according to the Developers' Guide, we can't do that because "A Bug or Enhancement with resolution Fixed is required to have a corresponding changeset in one of the OpenJDK repositories." > > > > > > [cf75f1f](https://github.com/openjdk/jdk/commit/cf75f1f9c6d2bc70c7133cb81c73a0ce0946dff9) is a corresponding changset. We can link it. > > So two bugs would reference the same changeset, but the changeset only names 8358821? It might be better to close 8357017 as a duplicate instead of as Fixed. I've closed it as duplicate and added comments to the issues. Do we need anything else like a reminder that we want to consider [JDK-8358821](https://bugs.openjdk.org/browse/JDK-8358821) backport? Is there a label for that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26091#issuecomment-3031647737 From mablakatov at openjdk.org Thu Jul 3 10:01:36 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Thu, 3 Jul 2025 10:01:36 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v7] In-Reply-To: References: Message-ID: > Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. > > Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. > > The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms > IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms > LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms > > > Fujitsu A64FX (SVE 512-bit): > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms > IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms > LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: - Compare VL against MaxVectorSize instead of FloatRegister::sve_vl_max - Use a dedicated ptrue predicate register This shifts MulReduction performance on Neoverse V1 a bit. Here Before if before this specific commit (ebad6dd37e332da44222c50cd17c69f3ff3f0635) and After is this commit. | Benchmark | Before (ops/ms) | After (ops/ms) | Diff (%) | | ------------------------ | --------------- | -------------- | -------- | | ByteMaxVector.MULLanes | 9883.151 | 9093.557 | -7.99% | | DoubleMaxVector.MULLanes | 2712.674 | 2607.367 | -3.89% | | FloatMaxVector.MULLanes | 3388.811 | 3291.429 | -2.88% | | IntMaxVector.MULLanes | 4765.554 | 5031.741 | +5.58% | | LongMaxVector.MULLanes | 2685.228 | 2896.445 | +7.88% | | ShortMaxVector.MULLanes | 5128.185 | 5197.656 | +1.35% | ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23181/files - new: https://git.openjdk.org/jdk/pull/23181/files/ebad6dd3..d35f1089 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=05-06 Stats: 69 lines in 4 files changed: 12 ins; 17 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/23181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23181/head:pull/23181 PR: https://git.openjdk.org/jdk/pull/23181 From jbhateja at openjdk.org Thu Jul 3 10:06:44 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Jul 2025 10:06:44 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v10] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 08:26:00 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments test/hotspot/jtreg/compiler/vectorapi/TestSelectFromTwoVectorOp.java line 234: > 232: > 233: @Test > 234: @IR(counts = {IRNode.SELECT_FROM_TWO_VECTOR_VS, IRNode.VECTOR_SIZE_8, ">0"}, Hi @Bhavana-Kilambi , Kindly also include x86-specific feature checks in IR rule for this test. You can directly integrate attached patch. [select_from_ir_feature.txt](https://github.com/user-attachments/files/21034639/select_from_ir_feature.txt) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2182389060 From mablakatov at openjdk.org Thu Jul 3 10:26:45 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Thu, 3 Jul 2025 10:26:45 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: References: <3sWLk_sAMLtcvRUjXk9hYe-K2MBQl9fH2Qg0MF7lwDk=.b8867d51-e822-43c0-93ab-58228c6eb1d5@github.com> <19rf4A0bxc4BstRmLivGkoCOm7Qa7YD6z1VJHJivCtg=.4a643c7b-4e79-4f37-b230-7231df3c68a8@github.com> Message-ID: On Wed, 2 Jul 2025 01:42:36 GMT, Xiaohong Gong wrote: >> Thanks! For some reason I thought that we don't have a dedicated predicate register for that. > > We can directly use `ptrue` here which maps to `p7` and has been preserved and initialized as all true. Done, although this has shifter the performance a bit: | Benchmark | Before (ops/ms) | After (ops/ms) | Diff (%) | | ------------------------ | --------------- | -------------- | -------- | | ByteMaxVector.MULLanes | 9883.151 | 9093.557 | -7.99% | | DoubleMaxVector.MULLanes | 2712.674 | 2607.367 | -3.89% | | FloatMaxVector.MULLanes | 3388.811 | 3291.429 | -2.88% | | IntMaxVector.MULLanes | 4765.554 | 5031.741 | +5.58% | | LongMaxVector.MULLanes | 2685.228 | 2896.445 | +7.88% | | ShortMaxVector.MULLanes | 5128.185 | 5197.656 | +1.35% | On average, the results didn't get worse. I suggest to merge the updated version as is as the shift seem to be related to micro-architectural effects not directly related to this PR and overall the PR still improves the performance by an order of magnitude (please reference https://github.com/openjdk/jdk/pull/23181#issuecomment-3018988067 for performance numbers before the PR) . I intent to closer investigate the reasons behind this later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2182426692 From galder at openjdk.org Thu Jul 3 11:17:41 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 3 Jul 2025 11:17:41 GMT Subject: RFR: 8361255: CTW: Tolerate more NCDFE problems [v2] In-Reply-To: References: Message-ID: On Wed, 2 Jul 2025 12:02:07 GMT, Aleksey Shipilev wrote: >> We routinely CTW 3rd party JARs to make sure our compilers work. By the nature of the JARs, they have dependencies on other JARs, and CTW runner frequently warns out with NCDFE. It does so very crudely, missing opportunities to compile the methods that _do not_ trigger NCDFEs. CTW should be made more tolerant to this. I think the normal "modules" CTW runs into the similar problem, but on a lesser scale, as we do not have a very hairy dependency graph within JDK. >> >> The CTW logs are also fairly noisy with full exception traces when NCDFE is semi-expected. This PR does _not_ print exception stack traces in these cases, only "NOTE"-s about it. This makes the log fairly clean and more understandable. >> >> Motivational scope improvement compiling a sample 3rd party JAR (cassandra-2.1.4.0.jar): >> >> >> Before: Done (2487 classes, 9866 methods, 24584 ms) >> After: Done (2487 classes, 10074 methods, 24150 ms) ; +2% more methods >> >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8361255-ctw-ncdfe > - Move clinit compile back > - Initial > - Fix Changes requested by galder (Author). test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java line 104: > 102: constructors = aClass.getDeclaredConstructors(); > 103: } catch (NoClassDefFoundError e) { > 104: CompileTheWorld.OUT.println(String.format("[%d]\t%s\tNOTE unable to get constructors : %s", Nitpick really but why not call `CompileTheWorld.OUT.printf(...` instead of `CompileTheWorld.OUT.println(String.format(...`? ------------- PR Review: https://git.openjdk.org/jdk/pull/26090#pullrequestreview-2982769212 PR Review Comment: https://git.openjdk.org/jdk/pull/26090#discussion_r2182520478 From mablakatov at openjdk.org Thu Jul 3 11:47:44 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Thu, 3 Jul 2025 11:47:44 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v3] In-Reply-To: References: Message-ID: On Tue, 1 Jul 2025 16:22:42 GMT, Mikhail Ablakatov wrote: >> That would be the operations with partial vector size valid. For such cases, we will generate a mask in IR level, and a `VectorBlend` will be generated for this reduction case. Otherwise the result will be incorrect. So the vector size should be equal to MaxVectorSize theoretically. > > Thank you for elaborating on this :) Done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2182576732 From dbriemann at openjdk.org Thu Jul 3 12:35:55 2025 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 3 Jul 2025 12:35:55 GMT Subject: RFR: 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI Message-ID: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> Implement more nodes for ppc that exist on other platforms. ------------- Commit messages: - 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI Changes: https://git.openjdk.org/jdk/pull/26115/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26115&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361353 Stats: 87 lines in 4 files changed: 86 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26115/head:pull/26115 PR: https://git.openjdk.org/jdk/pull/26115 From dnsimon at openjdk.org Thu Jul 3 13:04:19 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 3 Jul 2025 13:04:19 GMT Subject: RFR: 8361355: Negative cases of Annotated.getAnnotationData implementations are broken Message-ID: This PR fixes bugs in the implementation of `jdk.vm.ci.meta.Annotated.getAnnotationData`: 1. Calling `getAnnotatedData(annotationType)` fails with an ArrayIndexOutOfBoundsException instead of returning null when the receiver type is not annotated by `annotationType`. 2. Calling either of the `getAnnotatedData` methods with an `annotationType` value that does not represent an annotation interface silently succeeds when the receiver type does not (or can not) have any annotations (e.g. array and primitive types). ------------- Commit messages: - fixed negative cases in getAnnotationData Changes: https://git.openjdk.org/jdk/pull/26116/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26116&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361355 Stats: 100 lines in 7 files changed: 89 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/26116.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26116/head:pull/26116 PR: https://git.openjdk.org/jdk/pull/26116 From shade at openjdk.org Thu Jul 3 13:33:23 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 3 Jul 2025 13:33:23 GMT Subject: RFR: 8361255: CTW: Tolerate more NCDFE problems [v3] In-Reply-To: References: Message-ID: > We routinely CTW 3rd party JARs to make sure our compilers work. By the nature of the JARs, they have dependencies on other JARs, and CTW runner frequently warns out with NCDFE. It does so very crudely, missing opportunities to compile the methods that _do not_ trigger NCDFEs. CTW should be made more tolerant to this. I think the normal "modules" CTW runs into the similar problem, but on a lesser scale, as we do not have a very hairy dependency graph within JDK. > > The CTW logs are also fairly noisy with full exception traces when NCDFE is semi-expected. This PR does _not_ print exception stack traces in these cases, only "NOTE"-s about it. This makes the log fairly clean and more understandable. > > Motivational scope improvement compiling a sample 3rd party JAR (cassandra-2.1.4.0.jar): > > > Before: Done (2487 classes, 9866 methods, 24584 ms) > After: Done (2487 classes, 10074 methods, 24150 ms) ; +2% more methods > > > Additional testing: > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Just use printf directly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26090/files - new: https://git.openjdk.org/jdk/pull/26090/files/9d41f80a..04fd5e50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26090&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26090&range=01-02 Stats: 14 lines in 2 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/26090.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26090/head:pull/26090 PR: https://git.openjdk.org/jdk/pull/26090 From shade at openjdk.org Thu Jul 3 13:33:24 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 3 Jul 2025 13:33:24 GMT Subject: RFR: 8361255: CTW: Tolerate more NCDFE problems [v2] In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 11:14:41 GMT, Galder Zamarre?o wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8361255-ctw-ncdfe >> - Move clinit compile back >> - Initial >> - Fix > > test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java line 104: > >> 102: constructors = aClass.getDeclaredConstructors(); >> 103: } catch (NoClassDefFoundError e) { >> 104: CompileTheWorld.OUT.println(String.format("[%d]\t%s\tNOTE unable to get constructors : %s", > > Nitpick really but why not call `CompileTheWorld.OUT.printf(...` instead of `CompileTheWorld.OUT.println(String.format(...`? Mostly because it was the style of the surrounding code. But I don't see why not use `printf` directly indeed, done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26090#discussion_r2182798874 From dnsimon at openjdk.org Thu Jul 3 14:13:23 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 3 Jul 2025 14:13:23 GMT Subject: RFR: 8361355: Negative cases of Annotated.getAnnotationData implementations are broken [v2] In-Reply-To: References: Message-ID: > This PR fixes bugs in the implementation of `jdk.vm.ci.meta.Annotated.getAnnotationData`: > 1. Calling `getAnnotatedData(annotationType)` fails with an ArrayIndexOutOfBoundsException instead of returning null when the receiver type is not annotated by `annotationType`. > 2. Calling either of the `getAnnotatedData` methods with an `annotationType` value that does not represent an annotation interface silently succeeds when the receiver type does not (or can not) have any annotations (e.g. array and primitive types). Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: fixed negative cases in getAnnotationData ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26116/files - new: https://git.openjdk.org/jdk/pull/26116/files/86b41636..b25684f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26116&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26116&range=00-01 Stats: 11 lines in 1 file changed: 5 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26116.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26116/head:pull/26116 PR: https://git.openjdk.org/jdk/pull/26116 From mdoerr at openjdk.org Thu Jul 3 14:27:42 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 3 Jul 2025 14:27:42 GMT Subject: RFR: 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI In-Reply-To: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> References: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> Message-ID: <0lfxjYCRp6xFM8c_RDhbLEtbwM5J3huFxjcOqcWVykU=.908af2c8-2a2d-4719-b598-45b716ab8658@github.com> On Thu, 3 Jul 2025 12:30:51 GMT, David Briemann wrote: > Implement more nodes for ppc that exist on other platforms. Thanks for implementing these nodes! The new instruction needs a Power9 check. Otherwise, LGTM. src/hotspot/cpu/ppc/assembler_ppc.hpp line 2376: > 2374: inline void vctzw( VectorRegister d, VectorRegister b); > 2375: inline void vctzd( VectorRegister d, VectorRegister b); > 2376: inline void vnegw( VectorRegister d, VectorRegister b); A Power9 comment would be helpful to prevent wrong usage. src/hotspot/cpu/ppc/ppc.ad line 2196: > 2194: case Op_AbsVF: > 2195: case Op_AbsVD: > 2196: case Op_NegVI: vnegw requires Power9 (`PowerArchitecturePPC64 >= 9`). src/hotspot/cpu/ppc/ppc.ad line 13583: > 13581: > 13582: instruct vnegI_reg(vecX dst, vecX src) %{ > 13583: match(Set dst (NegVI src)); Should use a predicate for Power9. ------------- PR Review: https://git.openjdk.org/jdk/pull/26115#pullrequestreview-2983369466 PR Review Comment: https://git.openjdk.org/jdk/pull/26115#discussion_r2182917169 PR Review Comment: https://git.openjdk.org/jdk/pull/26115#discussion_r2182910035 PR Review Comment: https://git.openjdk.org/jdk/pull/26115#discussion_r2182926525 From rrich at openjdk.org Thu Jul 3 14:38:40 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 3 Jul 2025 14:38:40 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining In-Reply-To: References: Message-ID: <0p61J0DPfyHsen3r__V82eEZSPYaT9rZleHBtanKaRc=.c5f6992f-a7fe-4c95-bdcb-2887c3dbde21@github.com> On Thu, 3 Jul 2025 08:36:53 GMT, Alan Bateman wrote: > It might be that it could be updated to work with debug or -Xcomp too, execution times would need to be checked out. I found that the runtime of each test is ~300ms with a release build and ~11s with a fastdebug build on x86_64 and ppc64. If you like I can remove the requirement within this pr and do some more testing. -Xcomp doesn't seem to work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26033#issuecomment-3032511575 From hgreule at openjdk.org Thu Jul 3 14:55:44 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Jul 2025 14:55:44 GMT Subject: RFR: 8359678: C2: assert(static_cast(result) == thing) caused by ReverseBytesNode::Value() [v2] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 07:55:23 GMT, Hannes Greule wrote: >> Fixes an assertion when passing an int larger than short/char to the corresponding reverseBytes method in a constant-folding scenario. By just using static_cast, we can ignore the upper bytes and just swap the lower bytes. >> >> Using jasm, I added a test case that covers such inputs. It felt easier to test this way than the other scenarios mentioned in the bug report. >> >> I also removed the redundant checked_cast calls from the int/long case; we already have the correct type there. >> >> Please review. Thanks. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > remove classfile version @iwanowww @eme64 as you reviewed the original change, could you have a look at this? Thank you very much. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25988#issuecomment-3032565530 From never at openjdk.org Thu Jul 3 14:59:39 2025 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 3 Jul 2025 14:59:39 GMT Subject: RFR: 8361355: Negative cases of Annotated.getAnnotationData implementations are broken [v2] In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 14:13:23 GMT, Doug Simon wrote: >> This PR fixes bugs in the implementation of `jdk.vm.ci.meta.Annotated.getAnnotationData`: >> 1. Calling `getAnnotatedData(annotationType)` fails with an ArrayIndexOutOfBoundsException instead of returning null when the receiver type is not annotated by `annotationType`. >> 2. Calling either of the `getAnnotatedData` methods with an `annotationType` value that does not represent an annotation interface silently succeeds when the receiver type does not (or can not) have any annotations (e.g. array and primitive types). > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > fixed negative cases in getAnnotationData Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26116#pullrequestreview-2983525519 From vpaprotski at openjdk.org Thu Jul 3 15:14:42 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 3 Jul 2025 15:14:42 GMT Subject: RFR: 8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs In-Reply-To: References: Message-ID: <3R2flcCvwCbIMgCJqOVnrUXgAZJsi9Ja2r4is2tCnLg=.cab9d74a-7998-466c-9d24-8672f3f8883b@github.com> On Wed, 2 Jul 2025 23:28:42 GMT, Srinivas Vamsi Parasa wrote: >> @vamsi-parasa, It's better to make this as a subclass of MacroAssembler in src/hotspot/cpu/x86/macroAssembler_x86.hpp and pass Tracker as an argument to push / pop for a cleaner interface. > > Hi Jatin (@jatin-bhateja) and Vlad (@vpaprotsk), > > There's one more issue to be considered. The C++ PushPopTracker code will be run during the stub generation time. There are code bocks which do a single push onto the stack but due to multiple exit paths, there will be multiple pops as illustrated below. Will this reference counting approach not fail in such a scenario as the stub code is generated all at once during the stub generation phase? > > > #begin stack frame > push(r21) > > #exit condition 1 > pop(r21) > > # exit condition 2 > pop(r21) Now that I had my fun writing an array-backed stack.. (and with David's comment too..) I can admit that the point of the entire C++ Tracker class is to 'just' add an assert; doesn't actually functionally add to the original code, but does add better JIT/stub compile-time checking. @vamsi-parasa you are right.. if there are ifs and multiple exit paths in the assembler itself.. the Tracker wont be able to catch it (multiple exits paths in the generator are just fine though); I was thinking about this problem too last night... a hack/'solution' would be to disable such checking with a default flag in the constructor... 'fairly trivial' but just adds to the complexity even more. And the assert was the point of the class to begin with... I do think such stubs are rare? There is some value in improved checking, but enough? Writing stubs is already an 'you should know assembler very well' thing so those checks only improve things marginally overall? As David says, its for the compiler folks to decide :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25889#discussion_r2183043350 From shade at openjdk.org Thu Jul 3 16:23:39 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 3 Jul 2025 16:23:39 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode In-Reply-To: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: On Thu, 3 Jul 2025 01:59:55 GMT, hanguanqiang wrote: > This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode > > Problem? > When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. > > Root Cause? > Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. > > Fix > Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. IMO, there is no point in fixing `-GenerateSynchronizationCode`, and instead we should just remove the flag. I propose we do this under the umbrella of this bug, just rename it to something like `Purge GenerateSynchronizationCode flag`. It is `develop`, so we don't even need a compatibility review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3032852570 From lmesnik at openjdk.org Thu Jul 3 16:59:40 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 3 Jul 2025 16:59:40 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining In-Reply-To: References: Message-ID: On Sun, 29 Jun 2025 15:26:14 GMT, Richard Reingruber wrote: > This PR adds CompileCommands to the test DumpThreadsWithEliminatedLock.java to force inlining of java/lang/String*.* methods. This will make inlining more stable to allow for the expected lock elimination based on c2 escape analysis. > > Forcing inlining of java/lang/StringBuffer.* wasn't sufficient on x86_64. With that the test still failed with TieredCompilation disabled. > > Testing: x86_64, ppc64 manually. Other major platforms as part of our CI testing. > > Failed inlining on x86_64 with TieredCompilation disabled: > > > make test TEST=com/sun/management/HotSpotDiagnosticMXBean/DumpThreadsWithEliminatedLock.java TEST_VM_OPTS="-XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=PrintInlining,DumpThreadsWithEliminatedLock.*" JTREG=TIMEOUT_FACTOR=0.1 > > [...] > > STDOUT: > CompileCommand: PrintInlining DumpThreadsWithEliminatedLock.* bool PrintInlining = true > @ 1 java.util.concurrent.atomic.AtomicBoolean::get (13 bytes) inline (hot) > @ 11 java.lang.StringBuffer:: (7 bytes) inline (hot) late inline succeeded (string method) > @ 3 java.lang.AbstractStringBuilder:: (39 bytes) inline (hot) > @ 1 java.lang.Object:: (1 bytes) inline (hot) > @ 16 java.lang.System::currentTimeMillis (0 bytes) (intrinsic) > s @ 19 java.lang.StringBuffer::append (13 bytes) failed to inline: already compiled into a big method > s @ 24 java.lang.StringBuffer::toString (44 bytes) inline (hot) late inline succeeded (string method) > s @ 1 java.lang.StringBuffer::length (5 bytes) accessor > @ 24 java.lang.String:: (98 bytes) failed to inline: already compiled into a big method > @ 30 java.util.concurrent.atomic.AtomicReference::set (6 bytes) accessor > 2025-07-02T09:25:53.396634900Z Attempt 1, found: false > 2025-07-02T09:25:53.415673072Z Attempt 2, found: false > 2025-07-02T09:25:53.418876867Z Attempt 3, found: false > > [...] Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26033#pullrequestreview-2983913131 From enikitin at openjdk.org Thu Jul 3 17:01:48 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Thu, 3 Jul 2025 17:01:48 GMT Subject: Integrated: 8357739: [jittester] disable the hashCode method In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 19:49:34 GMT, Evgeny Nikitin wrote: > JITTester often uses the `hasCode` method (in fact, in almost every generated test). Given that the method can be unstable between runs or in interpreted vs compiled runs, it can create false-positives. > > This PR fixes the issue by adding support for method templates similar to the ones used in CompilerCommands). All of those exclude templates match (and exclude) `String.indexOf(String)`, for example: > > java/lang/::*(Ljava/lang/String;I) > *String::indexOf(*) > java/lang/*::indexOf > > > Additionally, the PR adds support for comments (starting from '#') and empty lines in the excludes file. This pull request has now been integrated. Changeset: a2315ddd Author: Evgeny Nikitin Committer: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/a2315ddd2a343ed594dd1b0b3d0dc5b3a71f509b Stats: 556 lines in 4 files changed: 402 ins; 121 del; 33 mod 8357739: [jittester] disable the hashCode method Reviewed-by: lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/25859 From dnsimon at openjdk.org Thu Jul 3 17:30:56 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 3 Jul 2025 17:30:56 GMT Subject: RFR: 8361355: Negative cases of Annotated.getAnnotationData implementations are broken [v3] In-Reply-To: References: Message-ID: <83aGkzmp5J7JllBsWK5ZzwZAa4GVsNk5VjmkH0O3FjE=.2507d7ce-65df-4121-acdf-35125d530d39@github.com> > This PR fixes bugs in the implementation of `jdk.vm.ci.meta.Annotated.getAnnotationData`: > 1. Calling `getAnnotatedData(annotationType)` fails with an ArrayIndexOutOfBoundsException instead of returning null when the receiver type is not annotated by `annotationType`. > 2. Calling either of the `getAnnotatedData` methods with an `annotationType` value that does not represent an annotation interface silently succeeds when the receiver type does not (or can not) have any annotations (e.g. array and primitive types). Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge tag 'jdk-26+4' into JDK-8361355 Added tag jdk-26+4 for changeset 1ca008fd - fixed negative cases in getAnnotationData ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26116/files - new: https://git.openjdk.org/jdk/pull/26116/files/b25684f7..ec161d59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26116&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26116&range=01-02 Stats: 6616 lines in 362 files changed: 3437 ins; 1484 del; 1695 mod Patch: https://git.openjdk.org/jdk/pull/26116.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26116/head:pull/26116 PR: https://git.openjdk.org/jdk/pull/26116 From alanb at openjdk.org Thu Jul 3 17:59:44 2025 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 3 Jul 2025 17:59:44 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining In-Reply-To: <0p61J0DPfyHsen3r__V82eEZSPYaT9rZleHBtanKaRc=.c5f6992f-a7fe-4c95-bdcb-2887c3dbde21@github.com> References: <0p61J0DPfyHsen3r__V82eEZSPYaT9rZleHBtanKaRc=.c5f6992f-a7fe-4c95-bdcb-2887c3dbde21@github.com> Message-ID: On Thu, 3 Jul 2025 14:36:15 GMT, Richard Reingruber wrote: > I found that the runtime of each test is ~300ms with a release build and ~11s with a fastdebug build on x86_64 and ppc64. If you like I can remove the requirement within this pr and do some more testing. -Xcomp doesn't seem to work. I think that would be useful, thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26033#issuecomment-3033099720 From dlunden at openjdk.org Thu Jul 3 18:18:49 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Jul 2025 18:18:49 GMT Subject: RFR: 8360701: Add bailout when the register allocator interference graph grows unreasonably large Message-ID: The changeset for JDK-8325467 (https://git.openjdk.org/jdk/pull/20404) enables compilation of methods with many parameters, which C2 previously bailed out on. As a side effect, the tests `BigArityTest.java`, `TestCatchExceptionWithVarargs.java`, and `VarargsArrayTest.java` compile more methods than before, and additionally these methods are designed, for stress testing purposes, to have a large number of parameters (at or close to the maximum of 255 parameters allowed by the JVM spec). Compiling such methods takes a very long time and >99% of the time is spent in the C2 phase Coalesce 2 (part of register allocation). The problem is that the interference graph becomes huge after the initial round of spilling (just before Coalesce 2), and that we do not check for this and bail out if necessary. We do already bail out if the number of IR nodes grows too large, but the interference graph can become huge even if we have a small number of nodes. In fact, the interference graph may (in the worst case) hava a size that is quadratic in the number of nodes. In the problematic tests, we have interference graphs with approximately 100 000 nodes and over 55 000 000 (!) IFG edges. For comparison, the IFG edge count in worst-case realistic scenarios caps out at around 40 000 nodes and 800 000 edges. For example, see the scatter matrix below from running the DaCapo benchmark. It displays, for each time an IFG was built, the number of current IR nodes, the number of live ranges (th e actual nodes in the IFG), and the number of IFG edges. ![dacapo](https://github.com/user-attachments/assets/7a070768-50da-42e4-b5ed-9958e1362673) ### Changeset - Add a new diagnostic flag `IFGEdgesLimit` and bail out whenever we reach the number of edges specified by the flag during IFG construction. The default is a very generous 10 000 000 edges, that still filters out the most degenerate compilations we have seen. - Add tracking of edges in `PhaseIFG` to permit the new flag. It is worth noting that it is perhaps preferable to use a lower default than 10 000 000 edges. For example, in standard benchmarks such as DaCapo (see the scatter matrix above), Renaissance, SPECjvm, and SPECjbb, we never go over 1 000 000 edges (I verified this). The reason I went with the generous 10 000 000 limit is that I saw a fair amount of bailouts in testing with the flag set at 1 000 000 edges. Such bailouts are likely motivated, but I do not want to take any chances. Even at 10 000 000 edges, a few tests still hit the limit with certain JVM flag combinations: - `applications/ctw/modules/java_base.java` - `compiler/codegen/TestAntiDependenciesHighMemUsage2.java` - `compiler/loopopts/superword/TestAlignVectorFuzzer.java` ### Testing - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/16047279249) - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. - C2 compilation speed benchmarking on DaCapo. Compilation speed is unaffected. ------------- Commit messages: - Bail out if too many IFG edges Changes: https://git.openjdk.org/jdk/pull/26118/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26118&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360701 Stats: 38 lines in 4 files changed: 37 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26118.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26118/head:pull/26118 PR: https://git.openjdk.org/jdk/pull/26118 From dlong at openjdk.org Thu Jul 3 20:43:44 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 3 Jul 2025 20:43:44 GMT Subject: [jdk25] RFR: 8361259: JDK25: Backout JDK-8258229 In-Reply-To: <7x65TpFJJJ2dTdjZq__12fVXAlY2Ta7HYOUc17Oe0zQ=.8ed717d7-c89e-4a1b-ad12-08cabceadf28@github.com> References: <-pGWrWHOZmgVzC7b3zYHIZDWjYin33X0ZEgj9GafA7E=.cea79e3f-9242-4e6a-95b0-7dad8b212b90@github.com> <7x65TpFJJJ2dTdjZq__12fVXAlY2Ta7HYOUc17Oe0zQ=.8ed717d7-c89e-4a1b-ad12-08cabceadf28@github.com> Message-ID: On Thu, 3 Jul 2025 09:55:51 GMT, Martin Doerr wrote: > I've closed it as duplicate and added comments to the issues. Thanks! > Do we need anything else like a reminder that we want to consider [JDK-8358821](https://bugs.openjdk.org/browse/ JDK-8358821) backport? Is there a label for that? The Developers' Guide says you can add a (Rel)-bp label to suggest a backport, so that would be "25-bp" for jdk25. If we definitely want to backport to particular release then we could create the Backport issue now as a placeholder. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26091#issuecomment-3033572554 From kvn at openjdk.org Thu Jul 3 22:53:40 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Jul 2025 22:53:40 GMT Subject: RFR: 8361255: CTW: Tolerate more NCDFE problems [v3] In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 13:33:23 GMT, Aleksey Shipilev wrote: >> We routinely CTW 3rd party JARs to make sure our compilers work. By the nature of the JARs, they have dependencies on other JARs, and CTW runner frequently warns out with NCDFE. It does so very crudely, missing opportunities to compile the methods that _do not_ trigger NCDFEs. CTW should be made more tolerant to this. I think the normal "modules" CTW runs into the similar problem, but on a lesser scale, as we do not have a very hairy dependency graph within JDK. >> >> The CTW logs are also fairly noisy with full exception traces when NCDFE is semi-expected. This PR does _not_ print exception stack traces in these cases, only "NOTE"-s about it. This makes the log fairly clean and more understandable. >> >> Motivational scope improvement compiling a sample 3rd party JAR (cassandra-2.1.4.0.jar): >> >> >> Before: Done (2487 classes, 9866 methods, 24584 ms) >> After: Done (2487 classes, 10074 methods, 24150 ms) ; +2% more methods >> >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Just use printf directly test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java line 89: > 87: UNSAFE.ensureClassInitialized(aClass); > 88: } catch (NoClassDefFoundError e) { > 89: CompileTheWorld.OUT.printf("[%d]\t%s\tNOTE unable to init class : %s%n", Do you mean `\n` here and in all other outputs? `%n` needs local variable to store size of output. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26090#discussion_r2183886728 From kvn at openjdk.org Thu Jul 3 23:16:39 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Jul 2025 23:16:39 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode In-Reply-To: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: <9wyR3KHZTWl-cf7rOq7ryEiP4e2AsxCyrylrfcWnKfM=.adb77f9b-213d-4b07-8362-aa8e5601f527@github.com> On Thu, 3 Jul 2025 01:59:55 GMT, hanguanqiang wrote: > This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode > > Problem? > When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. > > Root Cause? > Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. > > Fix > Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. I agree with removal of this flag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3033917992 From duke at openjdk.org Fri Jul 4 00:27:39 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 00:27:39 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode In-Reply-To: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: On Thu, 3 Jul 2025 01:59:55 GMT, hanguanqiang wrote: > This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode > > Problem? > When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. > > Root Cause? > Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. > > Fix > Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. Thank you all for the helpful feedback! I also think the GenerateSynchronizationCode flag is not particularly useful and can be removed. I will update this patch accordingly to eliminate the flag and simplify the related code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3034005241 From duke at openjdk.org Fri Jul 4 01:15:13 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 01:15:13 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode [v2] In-Reply-To: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: > This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode > > Problem? > When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. > > Root Cause? > Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. > > Fix > Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. hanguanqiang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - remove the unused flag(GenerateSynchronizationCode) - 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode Problem? When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. Root Cause? Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. Fix Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. ------------- Changes: https://git.openjdk.org/jdk/pull/26108/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26108&range=01 Stats: 34 lines in 7 files changed: 10 ins; 16 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/26108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26108/head:pull/26108 PR: https://git.openjdk.org/jdk/pull/26108 From duke at openjdk.org Fri Jul 4 01:26:42 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 01:26:42 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode [v3] In-Reply-To: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: > This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode > > Problem? > When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. > > Root Cause? > Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. > > Fix > Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: remove trailing whitespace remove trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26108/files - new: https://git.openjdk.org/jdk/pull/26108/files/972f324b..d01533e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26108&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26108&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26108/head:pull/26108 PR: https://git.openjdk.org/jdk/pull/26108 From duke at openjdk.org Fri Jul 4 01:30:22 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 01:30:22 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode [v4] In-Reply-To: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: > This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode > > Problem? > When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. > > Root Cause? > Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. > > Fix > Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: Delete .gitpod.yml ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26108/files - new: https://git.openjdk.org/jdk/pull/26108/files/d01533e1..1d6e8f5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26108&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26108&range=02-03 Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26108/head:pull/26108 PR: https://git.openjdk.org/jdk/pull/26108 From duke at openjdk.org Fri Jul 4 01:34:45 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 01:34:45 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode [v4] In-Reply-To: References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: On Thu, 3 Jul 2025 04:40:33 GMT, David Holmes wrote: >> hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: >> >> Delete .gitpod.yml > > The patch seems reasonable from a backporting perspective. Though it does beg the question as to why `do_monitor_enter` does not need the same fix. I suspect this is a very old flag and the code has bit-rotted somewhat. A question for the compiler folk: does `GenerateSynchronizationCode` still have any use or should it be scrapped? > > Thanks @dholmes-ora @dean-long @shipilev @vnkozlov Thanks for the previous reviews! I?ve updated the patch according to the suggestions. When you have a moment, could you please take another look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3034151673 From xgong at openjdk.org Fri Jul 4 01:37:42 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 4 Jul 2025 01:37:42 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v3] In-Reply-To: <7VTRz_XqYBSdQ54n7ADzTzYADZjDbgBw6XuW0jehSLI=.24d18b87-4553-47ab-8065-d92fbb5700ae@github.com> References: <7VTRz_XqYBSdQ54n7ADzTzYADZjDbgBw6XuW0jehSLI=.24d18b87-4553-47ab-8065-d92fbb5700ae@github.com> Message-ID: On Thu, 3 Jul 2025 06:10:28 GMT, Xiaohong Gong wrote: >> ### Background >> On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions. >> >> For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size. >> >> To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors. >> >> ### Impact Analysis >> #### 1. Vector types >> Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change. >> >> #### 2. Vector API >> No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length. >> >> #### 3. Auto-vectorization >> Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks. >> >> #### 4. Codegen of vector nodes >> NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored. >> >> Details: >> - Lanewise vector operations are unaffected as explained above. >> - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE). >> - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, addin... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Refine the comment in ad file Hi @theRealAph , the review comments have been addressed. Would you mind taking another look please? Thank you so much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3034155586 From xgong at openjdk.org Fri Jul 4 02:03:43 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 4 Jul 2025 02:03:43 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: References: <3sWLk_sAMLtcvRUjXk9hYe-K2MBQl9fH2Qg0MF7lwDk=.b8867d51-e822-43c0-93ab-58228c6eb1d5@github.com> <19rf4A0bxc4BstRmLivGkoCOm7Qa7YD6z1VJHJivCtg=.4a643c7b-4e79-4f37-b230-7231df3c68a8@github.com> Message-ID: On Thu, 3 Jul 2025 10:24:02 GMT, Mikhail Ablakatov wrote: >> We can directly use `ptrue` here which maps to `p7` and has been preserved and initialized as all true. > > Done, although this has shifter the performance a bit: > > > | Benchmark | Before (ops/ms) | After (ops/ms) | Diff (%) | > | ------------------------ | --------------- | -------------- | -------- | > | ByteMaxVector.MULLanes | 9883.151 | 9093.557 | -7.99% | > | DoubleMaxVector.MULLanes | 2712.674 | 2607.367 | -3.89% | > | FloatMaxVector.MULLanes | 3388.811 | 3291.429 | -2.88% | > | IntMaxVector.MULLanes | 4765.554 | 5031.741 | +5.58% | > | LongMaxVector.MULLanes | 2685.228 | 2896.445 | +7.88% | > | ShortMaxVector.MULLanes | 5128.185 | 5197.656 | +1.35% | > > > On average, the results didn't get worse. I suggest to merge the updated version as is as the shift seem to be related to micro-architectural effects not directly related to this PR and overall the PR still improves the performance by an order of magnitude (please reference https://github.com/openjdk/jdk/pull/23181#issuecomment-3018988067 for performance numbers before the PR) . I intent to closer investigate the reasons behind this later. I'm fine with the latest version because it saves the mask generation and a predicate temp register. The minor regressions are fine to me. BTW, Not sure whether the masked operation with partial lanes is more efficient compared with all lane computations. This maybe the HW micro-architecture implementation related issues. I didn't have an investigation for this before. Additionally, currently all the lanewise operations (e.g. `MulV/AddV/...`) with partial vector size are all implemented with `ptrue`. I agree with keeping it as it is, and taking an investigation for this later. Thanks for your updating! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2184132213 From duke at openjdk.org Fri Jul 4 02:55:27 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 02:55:27 GMT Subject: RFR: 8361140: Missing OptimizePtrCompare check in ConnectionGraph::reduce_phi_on_cmp Message-ID: When running with `-XX:-OptimizePtrCompare` (which disables pointer comparison optimization), the compiler may hit an assertion failure in debug builds because `optimize_ptr_compare` is still being called. This violates the intended usage of the flag and leads to unexpected crashes. This patch adds an early return to `reduce_phi_on_cmp` when `OptimizePtrCompare` is false. Since the optimization relies on `optimize_ptr_compare` for static reasoning about comparisons, there's no benefit in proceeding with `reduce_phi_on_cmp` when this flag is disabled. ------------- Commit messages: - 8361140: Missing OptimizePtrCompare check in ConnectionGraph::reduce_phi_on_cmp Changes: https://git.openjdk.org/jdk/pull/26125/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26125&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361140 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26125/head:pull/26125 PR: https://git.openjdk.org/jdk/pull/26125 From fyang at openjdk.org Fri Jul 4 05:27:39 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Jul 2025 05:27:39 GMT Subject: RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 [v2] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 14:27:21 GMT, Feilong Jiang wrote: >> Hi, please consider. >> [JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V. >> The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones. >> If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node. >> This may lead to incorrect selection of the arraycopy function when `-XX:+UseCompactObjectHeaders` is enabled, causing the `unaligned` flag to be set for arraycopy. >> We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled. >> >> This pr keeps the `unaligned` flag on RISC-V to ensure the arraycopy function is selected correctly. >> The other platforms are not affected as the flag is always `0` when `OmitChecksFlag` is true. >> >> Test on linux-riscv64: >> - [x] Tier1-3 >> >> JMH data on P550 SBC for reference (w/o and w/ the patch): >> >> Before: >> >> Without COH: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 50.854 ? 0.379 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 74.294 ? 0.449 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 81.847 ? 0.082 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 480.106 ? 0.369 ns/op >> ArrayClone.byteClone 0 avgt 15 90.146 ? 0.299 ns/op >> ArrayClone.byteClone 10 avgt 15 130.525 ? 0.384 ns/op >> ArrayClone.byteClone 100 avgt 15 251.942 ? 0.122 ns/op >> ArrayClone.byteClone 1000 avgt 15 407.580 ? 0.318 ns/op >> ArrayClone.intArraycopy 0 avgt 15 49.984 ? 0.436 ns/op >> ArrayClone.intArraycopy 10 avgt 15 76.302 ? 1.388 ns/op >> ArrayClone.intArraycopy 100 avgt 15 267.487 ? 0.329 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 1157.444 ? 1.588 ns/op >> ArrayClone.intClone 0 avgt 15 90.130 ? 0.257 ns/op >> ArrayClone.intClone 10 avgt 15 183.619 ? 0.588 ns/op >> ArrayClone.intClone 100 avgt 15 296.491 ? 0.246 ns/op >> ArrayClone.intClone 1000 avgt 15 828.695 ? 1.501 ns/op >> >> ------------------------------------------------------------------------- >> With COH: >> >> Benchmark (size) Mode Cnt Score Error Un... > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone > - check unaligned flag at LIR_OpArrayCopy to avoid using AvoidUnalignedAccesses > - riscv: fix c1 primitive array clone intrinsic regression src/hotspot/share/c1/c1_Compiler.cpp line 240: > 238: #endif > 239: case vmIntrinsics::_getObjectSize: > 240: #if defined(X86) || defined(AARCH64) || defined(S390) || defined(RISCV64) || defined(PPC64) PS: The change of macro `RISCV` seems unrelated to this PR? Seem better to go with another PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25976#discussion_r2184446825 From shade at openjdk.org Fri Jul 4 06:04:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 4 Jul 2025 06:04:38 GMT Subject: RFR: 8361255: CTW: Tolerate more NCDFE problems [v3] In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 22:51:02 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Just use printf directly > > test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java line 89: > >> 87: UNSAFE.ensureClassInitialized(aClass); >> 88: } catch (NoClassDefFoundError e) { >> 89: CompileTheWorld.OUT.printf("[%d]\t%s\tNOTE unable to init class : %s%n", > > Do you mean `\n` here and in all other outputs? `%n` needs local variable to store size of output. I meant `%n` :) You are probably thinking about C printf? In Java [formatters](https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html), `%n` is the "platform-specific line separator". It is more compatible than just `\n`, which runs into platform-specific `CR` vs `LF` vs `CRLF` line separator mess. See: jshell> System.out.printf("Hello\nthere,\nVladimir!\n") Hello there, Vladimir! $6 ==> java.io.PrintStream at 34c45dca jshell> System.out.printf("Hello%nthere,%nVladimir!%n") Hello there, Vladimir! $7 ==> java.io.PrintStream at 34c45dca ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26090#discussion_r2184484564 From jbhateja at openjdk.org Fri Jul 4 06:05:41 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 4 Jul 2025 06:05:41 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v3] In-Reply-To: References: Message-ID: <_RERljqu_FG7ZyneAk7Thd-9TwED18pQpEBz_i105fY=.b8948a23-273a-49f6-b9cb-6b611a5eedc6@github.com> On Thu, 3 Jul 2025 07:10:22 GMT, erifan wrote: >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. >> >> And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. >> >> Some JTReg test cases are added to ensure the optimization is effective. >> >> I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. >> >> The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. >> >> [1] https://github.com/openjdk/jdk/pull/24674 > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Simplify the test code src/hotspot/share/opto/vectorIntrinsics.cpp line 707: > 705: elem_bt = converted_elem_bt; > 706: bits = gvn().longcon((bits_type->get_con() & 1L) == 0L ? 0L : -1L); > 707: } else if (!arch_supports_vector(opc, num_elem, elem_bt, checkFlags, true /*has_scalar_args*/)) { I think it's appropriate to make this change as part of VectorLongToMaskNode::Ideal routine to give the opportunity for this transformation during the Iterative GVN pass. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2184478552 From epeter at openjdk.org Fri Jul 4 06:13:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 4 Jul 2025 06:13:45 GMT Subject: RFR: 8359678: C2: assert(static_cast(result) == thing) caused by ReverseBytesNode::Value() [v2] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 07:55:23 GMT, Hannes Greule wrote: >> Fixes an assertion when passing an int larger than short/char to the corresponding reverseBytes method in a constant-folding scenario. By just using static_cast, we can ignore the upper bytes and just swap the lower bytes. >> >> Using jasm, I added a test case that covers such inputs. It felt easier to test this way than the other scenarios mentioned in the bug report. >> >> I also removed the redundant checked_cast calls from the int/long case; we already have the correct type there. >> >> Please review. Thanks. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > remove classfile version Just a drive-by comment. Won't have time for a full review for a few weeks. test/hotspot/jtreg/compiler/c2/gvn/ReverseBytesConstantsTests.java line 23: > 21: * questions. > 22: */ > 23: package compiler.c2.gvn; Why did you remove the package? You can add the `jasm` file to the package too, I think that should work, no? ------------- PR Review: https://git.openjdk.org/jdk/pull/25988#pullrequestreview-2985704166 PR Review Comment: https://git.openjdk.org/jdk/pull/25988#discussion_r2184491532 From thartmann at openjdk.org Fri Jul 4 06:15:40 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 4 Jul 2025 06:15:40 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode [v4] In-Reply-To: References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: On Fri, 4 Jul 2025 01:30:22 GMT, hanguanqiang wrote: >> This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode >> >> Problem? >> When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. >> >> Root Cause? >> Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. >> >> Fix >> Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. > > hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: > > Delete .gitpod.yml Right, my intention when filing this bug was to remove the flag: https://bugs.openjdk.org/browse/JDK-8358568?focusedId=14786499&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14786499 I should have made that more explicit. Removal of this flag looks good to me. Changes requested by thartmann (Reviewer). src/hotspot/share/opto/callnode.cpp line 1456: > 1454: Node* top = Compile::current()->top(); > 1455: ins_req(nextmon, top); > 1456: ins_req(nextmon, top); Wait, this is wrong. The monitor inputs should not be set to top. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26108#pullrequestreview-2985715983 PR Review: https://git.openjdk.org/jdk/pull/26108#pullrequestreview-2985718643 PR Review Comment: https://git.openjdk.org/jdk/pull/26108#discussion_r2184500795 From jbhateja at openjdk.org Fri Jul 4 06:21:40 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 4 Jul 2025 06:21:40 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v3] In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 07:10:22 GMT, erifan wrote: >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. >> >> And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. >> >> Some JTReg test cases are added to ensure the optimization is effective. >> >> I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. >> >> The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. >> >> [1] https://github.com/openjdk/jdk/pull/24674 > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Simplify the test code Can you kindly include a micro with this patch? ``` public static final VectorSpecies FSP = FloatVector.SPECIES_512; public static long micro1(long a) { long mask = Math.min(-1, Math.max(-1, a)); return VectorMask.fromLong(FSP, mask).toLong(); } public static long micro2() { return FSP.maskAll(true).toLong(); } Your patch now removes L2M and M2L IR nodes. Baseline:- SPR2>java --add-modules=jdk.incubator.vector -Xbatch -XX:CompileCommand=PrintIdealPhase,test_mask_all::micro1,BEFORE_MATCHING -XX:-TieredCompilation -cp . test_mask_all 0 AFTER: BEFORE_MATCHING 65 ConL === 0 [[ 377 ]] #long:65535 369 Return === 5 6 7 8 9 returns 399 [[ 0 ]] 377 VectorLongToMask === _ 65 [[ 398 ]] #vectormask !jvms: VectorMask::fromLong @ bci:39 (line 243) test_mask_all::micro1 @ bci:18 (line 9) 398 VectorMaskCast === _ 377 [[ 399 ]] #vectormask !jvms: Float512Vector$Float512Mask::toLong @ bci:35 (line 765) test_mask_all::micro1 @ bci:21 (line 9) 399 VectorMaskToLong === _ 398 [[ 369 ]] #long !jvms: Float512Vector$Float512Mask::toLong @ bci:35 (line 765) test_mask_all::micro1 @ bci:21 (line 9) [time] 5 ms [res] 1310700000000 With patch:- XX:CompileCommand=PrintIdealPhase,test_mask_all::micro1,BEFORE_MATCHING -XX:-TieredCompilation -cp . test_mask_all 0 CompileCommand: PrintIdealPhase test_mask_all.micro1 const char* PrintIdealPhase = 'BEFORE_MATCHING' WARNING: Using incubator modules: jdk.incubator.vector AFTER: BEFORE_MATCHING 65 ConL === 0 [[ 369 ]] #long:65535 369 Return === 5 6 7 8 9 returns 65 [[ 0 ]] [time] 3 ms [res] 1310700000000 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25793#issuecomment-3034669174 From yongheng_hgq at 126.com Fri Jul 4 06:27:11 2025 From: yongheng_hgq at 126.com (h) Date: Fri, 4 Jul 2025 14:27:11 +0800 (CST) Subject: RFR: 8344548: Incorrect StartAggressiveSweepingAt doc for segmented code cache Message-ID: <2dfbc1de.4cd5.197d41e17ac.Coremail.yongheng_hgq@126.com> Hi all, The flag StartAggressiveSweepingAt triggers aggressive code cache sweeping based on the percentage of free space in the entire code cache. The previous description referenced segmented vs non-segmented code cache, which is confusing and does not reflect the current implementation. This patch updates the flag description to clearly state that the threshold is based on the total code cache free percentage, regardless of segmentation. Commit messages: - 8344548: Incorrect StartAggressiveSweepingAt doc for segmented code cache Changes: https://github.com/openjdk/jdk/pull/26114/files Webrev: https://openjdk.github.io/cr/?repo=jdk&pr=26114&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344548 Patch: https://git.openjdk.org/jdk/pull/26114.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26114/head:pull/26114 PR: https://github.com/openjdk/jdk/pull/26114 BR -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgreule at openjdk.org Fri Jul 4 06:36:44 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 4 Jul 2025 06:36:44 GMT Subject: RFR: 8359678: C2: assert(static_cast(result) == thing) caused by ReverseBytesNode::Value() [v2] In-Reply-To: References: Message-ID: On Fri, 4 Jul 2025 06:04:42 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> remove classfile version > > test/hotspot/jtreg/compiler/c2/gvn/ReverseBytesConstantsTests.java line 23: > >> 21: * questions. >> 22: */ >> 23: package compiler.c2.gvn; > > Why did you remove the package? You can add the `jasm` file to the package too, I think that should work, no? It seems like most files in the gvn folder don't have a package declaration, that's why I thought adjusting this way is fine. But I can also add it back and put the jasm file in the package too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25988#discussion_r2184531892 From duke at openjdk.org Fri Jul 4 06:43:02 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 06:43:02 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode [v5] In-Reply-To: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> > This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode > > Problem? > When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. > > Root Cause? > Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. > > Fix > Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: correct an error correct an error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26108/files - new: https://git.openjdk.org/jdk/pull/26108/files/1d6e8f5c..6ebc2ecb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26108&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26108&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26108/head:pull/26108 PR: https://git.openjdk.org/jdk/pull/26108 From duke at openjdk.org Fri Jul 4 06:47:39 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 06:47:39 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode [v4] In-Reply-To: References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> Message-ID: <2sjVycRIgOfB6aRtJMfVYVOB3iDnmD97Y-DbbjzupU8=.23cd6fc6-36c9-4d8b-8d15-f74a05c3cfe8@github.com> On Fri, 4 Jul 2025 06:12:44 GMT, Tobias Hartmann wrote: >> hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: >> >> Delete .gitpod.yml > > src/hotspot/share/opto/callnode.cpp line 1456: > >> 1454: Node* top = Compile::current()->top(); >> 1455: ins_req(nextmon, top); >> 1456: ins_req(nextmon, top); > > Wait, this is wrong. The monitor inputs should not be set to top. @TobiHartmann Thank you for pointing out the issue ? I?ve made the correction as suggested. Could you please take another look when you have time? Thanks again for your review and feedback ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26108#discussion_r2184546780 From dnsimon at openjdk.org Fri Jul 4 07:39:45 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 4 Jul 2025 07:39:45 GMT Subject: RFR: 8361355: Negative cases of Annotated.getAnnotationData implementations are broken [v3] In-Reply-To: <83aGkzmp5J7JllBsWK5ZzwZAa4GVsNk5VjmkH0O3FjE=.2507d7ce-65df-4121-acdf-35125d530d39@github.com> References: <83aGkzmp5J7JllBsWK5ZzwZAa4GVsNk5VjmkH0O3FjE=.2507d7ce-65df-4121-acdf-35125d530d39@github.com> Message-ID: On Thu, 3 Jul 2025 17:30:56 GMT, Doug Simon wrote: >> This PR fixes bugs in the implementation of `jdk.vm.ci.meta.Annotated.getAnnotationData`: >> 1. Calling `getAnnotatedData(annotationType)` fails with an ArrayIndexOutOfBoundsException instead of returning null when the receiver type is not annotated by `annotationType`. >> 2. Calling either of the `getAnnotatedData` methods with an `annotationType` value that does not represent an annotation interface silently succeeds when the receiver type does not (or can not) have any annotations (e.g. array and primitive types). > > Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge tag 'jdk-26+4' into JDK-8361355 > > Added tag jdk-26+4 for changeset 1ca008fd > - fixed negative cases in getAnnotationData Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26116#issuecomment-3034837278 From dnsimon at openjdk.org Fri Jul 4 07:39:46 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 4 Jul 2025 07:39:46 GMT Subject: Integrated: 8361355: Negative cases of Annotated.getAnnotationData implementations are broken In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 12:52:10 GMT, Doug Simon wrote: > This PR fixes bugs in the implementation of `jdk.vm.ci.meta.Annotated.getAnnotationData`: > 1. Calling `getAnnotatedData(annotationType)` fails with an ArrayIndexOutOfBoundsException instead of returning null when the receiver type is not annotated by `annotationType`. > 2. Calling either of the `getAnnotatedData` methods with an `annotationType` value that does not represent an annotation interface silently succeeds when the receiver type does not (or can not) have any annotations (e.g. array and primitive types). This pull request has now been integrated. Changeset: 5cf349c3 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/5cf349c3b08324e994a4143dcc34a59fd81323f9 Stats: 111 lines in 7 files changed: 94 ins; 1 del; 16 mod 8361355: Negative cases of Annotated.getAnnotationData implementations are broken Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/26116 From shade at openjdk.org Fri Jul 4 08:09:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 4 Jul 2025 08:09:53 GMT Subject: RFR: 8361255: CTW: Tolerate more NCDFE problems [v3] In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 13:33:23 GMT, Aleksey Shipilev wrote: >> We routinely CTW 3rd party JARs to make sure our compilers work. By the nature of the JARs, they have dependencies on other JARs, and CTW runner frequently warns out with NCDFE. It does so very crudely, missing opportunities to compile the methods that _do not_ trigger NCDFEs. CTW should be made more tolerant to this. I think the normal "modules" CTW runs into the similar problem, but on a lesser scale, as we do not have a very hairy dependency graph within JDK. >> >> The CTW logs are also fairly noisy with full exception traces when NCDFE is semi-expected. This PR does _not_ print exception stack traces in these cases, only "NOTE"-s about it. This makes the log fairly clean and more understandable. >> >> Motivational scope improvement compiling a sample 3rd party JAR (cassandra-2.1.4.0.jar): >> >> >> Before: Done (2487 classes, 9866 methods, 24584 ms) >> After: Done (2487 classes, 10074 methods, 24150 ms) ; +2% more methods >> >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Just use printf directly @TobiHartmann -- do you want to run this through CTW testing as well, to see if there are any new failures? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26090#issuecomment-3034913103 From rrich at openjdk.org Fri Jul 4 08:14:19 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Jul 2025 08:14:19 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining [v2] In-Reply-To: References: Message-ID: <5IVOmLIL__iwmhFaKEP80YNP8kIN94owhC3SIU8ZF4U=.ab4061f3-e62f-4586-9118-7f84f246078d@github.com> > This PR adds CompileCommands to the test DumpThreadsWithEliminatedLock.java to force inlining of java/lang/String*.* methods. This will make inlining more stable to allow for the expected lock elimination based on c2 escape analysis. > > Forcing inlining of java/lang/StringBuffer.* wasn't sufficient on x86_64. With that the test still failed with TieredCompilation disabled. > > Testing: x86_64, ppc64 manually. Other major platforms as part of our CI testing. > > Failed inlining on x86_64 with TieredCompilation disabled: > > > make test TEST=com/sun/management/HotSpotDiagnosticMXBean/DumpThreadsWithEliminatedLock.java TEST_VM_OPTS="-XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=PrintInlining,DumpThreadsWithEliminatedLock.*" JTREG=TIMEOUT_FACTOR=0.1 > > [...] > > STDOUT: > CompileCommand: PrintInlining DumpThreadsWithEliminatedLock.* bool PrintInlining = true > @ 1 java.util.concurrent.atomic.AtomicBoolean::get (13 bytes) inline (hot) > @ 11 java.lang.StringBuffer:: (7 bytes) inline (hot) late inline succeeded (string method) > @ 3 java.lang.AbstractStringBuilder:: (39 bytes) inline (hot) > @ 1 java.lang.Object:: (1 bytes) inline (hot) > @ 16 java.lang.System::currentTimeMillis (0 bytes) (intrinsic) > s @ 19 java.lang.StringBuffer::append (13 bytes) failed to inline: already compiled into a big method > s @ 24 java.lang.StringBuffer::toString (44 bytes) inline (hot) late inline succeeded (string method) > s @ 1 java.lang.StringBuffer::length (5 bytes) accessor > @ 24 java.lang.String:: (98 bytes) failed to inline: already compiled into a big method > @ 30 java.util.concurrent.atomic.AtomicReference::set (6 bytes) accessor > 2025-07-02T09:25:53.396634900Z Attempt 1, found: false > 2025-07-02T09:25:53.415673072Z Attempt 2, found: false > 2025-07-02T09:25:53.418876867Z Attempt 3, found: false > > [...] Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Allow vm.debug ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26033/files - new: https://git.openjdk.org/jdk/pull/26033/files/8561d522..a43e54db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26033&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26033&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26033/head:pull/26033 PR: https://git.openjdk.org/jdk/pull/26033 From rrich at openjdk.org Fri Jul 4 08:14:20 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Jul 2025 08:14:20 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining [v2] In-Reply-To: References: <0p61J0DPfyHsen3r__V82eEZSPYaT9rZleHBtanKaRc=.c5f6992f-a7fe-4c95-bdcb-2887c3dbde21@github.com> Message-ID: On Thu, 3 Jul 2025 17:57:27 GMT, Alan Bateman wrote: > > I found that the runtime of each test is ~300ms with a release build and ~11s with a fastdebug build on x86_64 and ppc64. If you like I can remove the requirement within this pr and do some more testing. -Xcomp doesn't seem to work. > > I think that would be useful, thank you. I've removed the `!vm.debug` requirement. I'll await our local testing of the pr on a wider range of platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26033#issuecomment-3034923279 From dbriemann at openjdk.org Fri Jul 4 08:16:59 2025 From: dbriemann at openjdk.org (David Briemann) Date: Fri, 4 Jul 2025 08:16:59 GMT Subject: RFR: 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI [v2] In-Reply-To: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> References: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> Message-ID: > Implement more nodes for ppc that exist on other platforms. David Briemann has updated the pull request incrementally with one additional commit since the last revision: add >= power9 check for NegVI ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26115/files - new: https://git.openjdk.org/jdk/pull/26115/files/d19e627d..00f37d7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26115&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26115&range=00-01 Stats: 5 lines in 2 files changed: 4 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26115/head:pull/26115 PR: https://git.openjdk.org/jdk/pull/26115 From dbriemann at openjdk.org Fri Jul 4 08:16:59 2025 From: dbriemann at openjdk.org (David Briemann) Date: Fri, 4 Jul 2025 08:16:59 GMT Subject: RFR: 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI [v2] In-Reply-To: <0lfxjYCRp6xFM8c_RDhbLEtbwM5J3huFxjcOqcWVykU=.908af2c8-2a2d-4719-b598-45b716ab8658@github.com> References: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> <0lfxjYCRp6xFM8c_RDhbLEtbwM5J3huFxjcOqcWVykU=.908af2c8-2a2d-4719-b598-45b716ab8658@github.com> Message-ID: On Thu, 3 Jul 2025 14:14:57 GMT, Martin Doerr wrote: >> David Briemann has updated the pull request incrementally with one additional commit since the last revision: >> >> add >= power9 check for NegVI > > src/hotspot/cpu/ppc/ppc.ad line 2196: > >> 2194: case Op_AbsVF: >> 2195: case Op_AbsVD: >> 2196: case Op_NegVI: > > vnegw requires Power9 (`PowerArchitecturePPC64 >= 9`). Thanks for catching that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26115#discussion_r2184700391 From shade at openjdk.org Fri Jul 4 09:08:19 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 4 Jul 2025 09:08:19 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v5] In-Reply-To: References: Message-ID: > See bug for more discussion. > > This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into JDK-8357473-compile-task-free-list - Merge branch 'master' into JDK-8357473-compile-task-free-list - Merge branch 'master' into JDK-8357473-compile-task-free-list - Merge branch 'master' into JDK-8357473-compile-task-free-list - Also free the lock! - Comments and indenting - Basic deletion ------------- Changes: https://git.openjdk.org/jdk/pull/25409/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25409&range=04 Stats: 134 lines in 6 files changed: 27 ins; 68 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/25409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25409/head:pull/25409 PR: https://git.openjdk.org/jdk/pull/25409 From aph at openjdk.org Fri Jul 4 09:14:39 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 4 Jul 2025 09:14:39 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v3] In-Reply-To: <7VTRz_XqYBSdQ54n7ADzTzYADZjDbgBw6XuW0jehSLI=.24d18b87-4553-47ab-8065-d92fbb5700ae@github.com> References: <7VTRz_XqYBSdQ54n7ADzTzYADZjDbgBw6XuW0jehSLI=.24d18b87-4553-47ab-8065-d92fbb5700ae@github.com> Message-ID: On Thu, 3 Jul 2025 06:10:28 GMT, Xiaohong Gong wrote: >> ### Background >> On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions. >> >> For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size. >> >> To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors. >> >> ### Impact Analysis >> #### 1. Vector types >> Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change. >> >> #### 2. Vector API >> No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length. >> >> #### 3. Auto-vectorization >> Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks. >> >> #### 4. Codegen of vector nodes >> NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored. >> >> Details: >> - Lanewise vector operations are unaffected as explained above. >> - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE). >> - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, addin... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Refine the comment in ad file This looks good. Thanks. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26057#pullrequestreview-2986219682 From xgong at openjdk.org Fri Jul 4 09:17:40 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 4 Jul 2025 09:17:40 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v3] In-Reply-To: References: <7VTRz_XqYBSdQ54n7ADzTzYADZjDbgBw6XuW0jehSLI=.24d18b87-4553-47ab-8065-d92fbb5700ae@github.com> Message-ID: On Fri, 4 Jul 2025 09:11:40 GMT, Andrew Haley wrote: > This looks good. Thanks. Thanks so much for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3035115512 From shade at openjdk.org Fri Jul 4 09:29:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 4 Jul 2025 09:29:13 GMT Subject: RFR: 8361397: Rework CompileLog list synchronization Message-ID: <12Yp6QmpXqG-1UXTS8VveJ4yDNnDEGFV2q3_vRc3lF0=.4ccf05e2-9249-4b55-b48f-4f7fc17bef65@github.com> I want to remove `CompileTaskAlloc_lock` completely with [JDK-8357473](https://bugs.openjdk.org/browse/JDK-8357473), and for that we need to fix a stray use of that lock in CompileLog list linkage. We can rewrite that part to atomics. Additional testing: - [ ] Linux x86_64 server fastdebug, `compiler` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/26127/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26127&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361397 Stats: 11 lines in 2 files changed: 4 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/26127.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26127/head:pull/26127 PR: https://git.openjdk.org/jdk/pull/26127 From shade at openjdk.org Fri Jul 4 09:29:41 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 4 Jul 2025 09:29:41 GMT Subject: RFR: 8358568: C2 compilation hits "must have a monitor" assert with -XX:-GenerateSynchronizationCode [v5] In-Reply-To: <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> Message-ID: On Fri, 4 Jul 2025 06:43:02 GMT, hanguanqiang wrote: >> This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode >> >> Problem? >> When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. >> >> Root Cause? >> Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. >> >> Fix >> Add a check in do_monitor_exit() to skip monitor unlocking if GenerateSynchronizationCode is false, avoiding invalid monitor access and preventing the crash. > > hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: > > correct an error > > correct an error I renamed the JBS bug, match the PR title, please. Also, go to https://github.com/hgqxjj/jdk/actions -- and enable the workflows. We need to have a clean GHA run before we can integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3035170988 From mhaessig at openjdk.org Fri Jul 4 09:43:40 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Jul 2025 09:43:40 GMT Subject: RFR: 8359678: C2: assert(static_cast(result) == thing) caused by ReverseBytesNode::Value() [v2] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 07:55:23 GMT, Hannes Greule wrote: >> Fixes an assertion when passing an int larger than short/char to the corresponding reverseBytes method in a constant-folding scenario. By just using static_cast, we can ignore the upper bytes and just swap the lower bytes. >> >> Using jasm, I added a test case that covers such inputs. It felt easier to test this way than the other scenarios mentioned in the bug report. >> >> I also removed the redundant checked_cast calls from the int/long case; we already have the correct type there. >> >> Please review. Thanks. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > remove classfile version @SirYwell, thank you for fixing this. It looks good overall, but it would be good to add the package. I think we do this for all new tests. I kicked off some testing and will let you know about the results. src/hotspot/share/opto/subnode.cpp line 2031: > 2029: case Op_ReverseBytesUS: return TypeInt::make(byteswap(static_cast(con->is_int()->get_con()))); > 2030: case Op_ReverseBytesI: return TypeInt::make(byteswap(con->is_int()->get_con())); > 2031: case Op_ReverseBytesL: return TypeLong::make(byteswap(con->is_long()->get_con())); Why are you dropping the `checked_cast` here? Were they just an abundance of caution before? ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/25988#pullrequestreview-2986310035 PR Review Comment: https://git.openjdk.org/jdk/pull/25988#discussion_r2184863934 From mdoerr at openjdk.org Fri Jul 4 10:11:39 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 4 Jul 2025 10:11:39 GMT Subject: RFR: 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI [v2] In-Reply-To: References: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> Message-ID: On Fri, 4 Jul 2025 08:16:59 GMT, David Briemann wrote: >> Implement more nodes for ppc that exist on other platforms. > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > add >= power9 check for NegVI I suggest removing the NegVI again. test/hotspot/jtreg/compiler/intrinsics/TestCompareUnsigned.java line 34: > 32: * @bug 8283726 8287925 > 33: * @requires os.arch=="amd64" | os.arch=="x86_64" | os.arch=="aarch64" | os.arch=="riscv64" | os.arch=="ppc64" | os.arch=="ppc64le" > 34: The test expects "CmpU3" for integers to be available. Can you implement that, too, please? ------------- Changes requested by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26115#pullrequestreview-2986428202 PR Review Comment: https://git.openjdk.org/jdk/pull/26115#discussion_r2184925416 From mdoerr at openjdk.org Fri Jul 4 10:11:40 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 4 Jul 2025 10:11:40 GMT Subject: RFR: 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI [v2] In-Reply-To: References: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> <0lfxjYCRp6xFM8c_RDhbLEtbwM5J3huFxjcOqcWVykU=.908af2c8-2a2d-4719-b598-45b716ab8658@github.com> Message-ID: On Fri, 4 Jul 2025 08:14:11 GMT, David Briemann wrote: >> src/hotspot/cpu/ppc/ppc.ad line 2196: >> >>> 2194: case Op_AbsVF: >>> 2195: case Op_AbsVD: >>> 2196: case Op_NegVI: >> >> vnegw requires Power9 (`PowerArchitecturePPC64 >= 9`). > > Thanks for catching that. I think we'd need to check that here, too. Otherwise we'd get "bad AD file" errors. However, there's another problem: vnegw computes the one?s-complement for each element, but we'd need two?s-complement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26115#discussion_r2184936783 From shade at openjdk.org Fri Jul 4 10:18:41 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 4 Jul 2025 10:18:41 GMT Subject: RFR: 8358568: Purge obsolete/broken GenerateSynchronizationCode flag [v5] In-Reply-To: <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> Message-ID: On Fri, 4 Jul 2025 06:43:02 GMT, hanguanqiang wrote: >> This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode >> >> Problem? >> When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. >> >> Root Cause? >> Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. >> >> Fix >> Purge obsolete/broken GenerateSynchronizationCode flag > > hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: > > correct an error > > correct an error Looks good to me. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26108#pullrequestreview-2986479330 From alanb at openjdk.org Fri Jul 4 10:27:39 2025 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 4 Jul 2025 10:27:39 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining [v2] In-Reply-To: <5IVOmLIL__iwmhFaKEP80YNP8kIN94owhC3SIU8ZF4U=.ab4061f3-e62f-4586-9118-7f84f246078d@github.com> References: <5IVOmLIL__iwmhFaKEP80YNP8kIN94owhC3SIU8ZF4U=.ab4061f3-e62f-4586-9118-7f84f246078d@github.com> Message-ID: On Fri, 4 Jul 2025 08:14:19 GMT, Richard Reingruber wrote: >> This PR adds CompileCommands to the test DumpThreadsWithEliminatedLock.java to force inlining of java/lang/String*.* methods. This will make inlining more stable to allow for the expected lock elimination based on c2 escape analysis. >> >> Forcing inlining of java/lang/StringBuffer.* wasn't sufficient on x86_64. With that the test still failed with TieredCompilation disabled. >> >> Testing: x86_64, ppc64 manually. Other major platforms as part of our CI testing. >> >> Failed inlining on x86_64 with TieredCompilation disabled: >> >> >> make test TEST=com/sun/management/HotSpotDiagnosticMXBean/DumpThreadsWithEliminatedLock.java TEST_VM_OPTS="-XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=PrintInlining,DumpThreadsWithEliminatedLock.*" JTREG=TIMEOUT_FACTOR=0.1 >> >> [...] >> >> STDOUT: >> CompileCommand: PrintInlining DumpThreadsWithEliminatedLock.* bool PrintInlining = true >> @ 1 java.util.concurrent.atomic.AtomicBoolean::get (13 bytes) inline (hot) >> @ 11 java.lang.StringBuffer:: (7 bytes) inline (hot) late inline succeeded (string method) >> @ 3 java.lang.AbstractStringBuilder:: (39 bytes) inline (hot) >> @ 1 java.lang.Object:: (1 bytes) inline (hot) >> @ 16 java.lang.System::currentTimeMillis (0 bytes) (intrinsic) >> s @ 19 java.lang.StringBuffer::append (13 bytes) failed to inline: already compiled into a big method >> s @ 24 java.lang.StringBuffer::toString (44 bytes) inline (hot) late inline succeeded (string method) >> s @ 1 java.lang.StringBuffer::length (5 bytes) accessor >> @ 24 java.lang.String:: (98 bytes) failed to inline: already compiled into a big method >> @ 30 java.util.concurrent.atomic.AtomicReference::set (6 bytes) accessor >> 2025-07-02T09:25:53.396634900Z Attempt 1, found: false >> 2025-07-02T09:25:53.415673072Z Attempt 2, found: false >> 2025-07-02T09:25:53.418876867Z Attempt 3, found: false >> >> [...] > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Allow vm.debug Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26033#pullrequestreview-2986518539 From duke at openjdk.org Fri Jul 4 10:56:42 2025 From: duke at openjdk.org (erifan) Date: Fri, 4 Jul 2025 10:56:42 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v3] In-Reply-To: <_RERljqu_FG7ZyneAk7Thd-9TwED18pQpEBz_i105fY=.b8948a23-273a-49f6-b9cb-6b611a5eedc6@github.com> References: <_RERljqu_FG7ZyneAk7Thd-9TwED18pQpEBz_i105fY=.b8948a23-273a-49f6-b9cb-6b611a5eedc6@github.com> Message-ID: <6SXA9ZrXBDhZLyXP3lXbkpl4dl3iocvDpzPrUpIQOl8=.9b025be2-848b-4b78-a5e4-929cb7e9f798@github.com> On Fri, 4 Jul 2025 05:53:41 GMT, Jatin Bhateja wrote: >> erifan has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplify the test code > > src/hotspot/share/opto/vectorIntrinsics.cpp line 707: > >> 705: elem_bt = converted_elem_bt; >> 706: bits = gvn().longcon((bits_type->get_con() & 1L) == 0L ? 0L : -1L); >> 707: } else if (!arch_supports_vector(opc, num_elem, elem_bt, checkFlags, true /*has_scalar_args*/)) { > > I think it's appropriate to make this change as part of VectorLongToMaskNode::Ideal routine to give the opportunity for this transformation during the Iterative GVN pass. Originally I also tried to implement it in IGVN, but later changed it to Intrinsic. For two reasons: 1. Implementing in intrinsic is relatively simpler and has better performance because it saves the process of generating `VectorLongToMaskNode`. 2. Implementing in intrinsic can support more cases. Because some architectures (such as aarch64 `NEON`) currently do not support the generation of `VectorLongToMaskNode,` but support `MaskAll` or `Replicate` nodes, if implemented in IGVN, then this optimization doesn't work for NEON. But implementing in Intrinsic can cover such cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2185045860 From dbriemann at openjdk.org Fri Jul 4 10:58:56 2025 From: dbriemann at openjdk.org (David Briemann) Date: Fri, 4 Jul 2025 10:58:56 GMT Subject: RFR: 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI [v3] In-Reply-To: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> References: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> Message-ID: > Implement more nodes for ppc that exist on other platforms. David Briemann has updated the pull request incrementally with one additional commit since the last revision: add CmpU3, ppc9 check in match_rule_supported ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26115/files - new: https://git.openjdk.org/jdk/pull/26115/files/00f37d7a..6d05a728 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26115&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26115&range=01-02 Stats: 17 lines in 1 file changed: 17 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26115/head:pull/26115 PR: https://git.openjdk.org/jdk/pull/26115 From thartmann at openjdk.org Fri Jul 4 11:05:43 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 4 Jul 2025 11:05:43 GMT Subject: RFR: 8358568: Purge obsolete/broken GenerateSynchronizationCode flag [v5] In-Reply-To: <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> Message-ID: On Fri, 4 Jul 2025 06:43:02 GMT, hanguanqiang wrote: >> This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode >> >> Problem? >> When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. >> >> Root Cause? >> Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. >> >> Fix >> Purge obsolete/broken GenerateSynchronizationCode flag > > hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: > > correct an error > > correct an error Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26108#pullrequestreview-2986682348 From duke at openjdk.org Fri Jul 4 11:10:40 2025 From: duke at openjdk.org (erifan) Date: Fri, 4 Jul 2025 11:10:40 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v3] In-Reply-To: References: Message-ID: On Fri, 4 Jul 2025 06:18:02 GMT, Jatin Bhateja wrote: > public static final VectorSpecies FSP = FloatVector.SPECIES_512; public static long micro1(long a) { long mask = Math.min(-1, Math.max(-1, a)); return VectorMask.fromLong(FSP, mask).toLong(); } public static long micro2() { return FSP.maskAll(true).toLong(); } With this JMH method we can not see obvious performance improvement, because the hot spots are other instructions. Adding a loop is better. @Benchmark public long micro_3() { long result = 0; for (int i = 0; i < ITERATION; i++) { long mask = Math.min(-1, Math.max(-1, result)); result += VectorMask.fromLong(FSP, mask).toLong(); } return result; } But if it is not a floating point type, there will be no obvious performance improvement. Because the pattern `VectorMaskToLong(VectorLongToMask (l))` for integer types has been implemented, and `VectorMaskToLong(VectorMaskCast (VectorLongToMask (l)))` for floating-point types is not implemented. So if we add JMH benchmarks for this optimization, we can only see good performance gain from floating point types. So do you think it is necessary? @jatin-bhateja Thanks for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25793#issuecomment-3035646085 From dbriemann at openjdk.org Fri Jul 4 11:22:59 2025 From: dbriemann at openjdk.org (David Briemann) Date: Fri, 4 Jul 2025 11:22:59 GMT Subject: RFR: 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI [v4] In-Reply-To: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> References: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> Message-ID: > Implement more nodes for ppc that exist on other platforms. David Briemann has updated the pull request incrementally with one additional commit since the last revision: fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26115/files - new: https://git.openjdk.org/jdk/pull/26115/files/6d05a728..a7c9f6be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26115&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26115&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26115/head:pull/26115 PR: https://git.openjdk.org/jdk/pull/26115 From dbriemann at openjdk.org Fri Jul 4 11:31:24 2025 From: dbriemann at openjdk.org (David Briemann) Date: Fri, 4 Jul 2025 11:31:24 GMT Subject: RFR: 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI [v5] In-Reply-To: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> References: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> Message-ID: > Implement more nodes for ppc that exist on other platforms. David Briemann has updated the pull request incrementally with one additional commit since the last revision: adjust parameter types ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26115/files - new: https://git.openjdk.org/jdk/pull/26115/files/a7c9f6be..ebb27c9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26115&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26115&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26115/head:pull/26115 PR: https://git.openjdk.org/jdk/pull/26115 From dlunden at openjdk.org Fri Jul 4 11:52:52 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Jul 2025 11:52:52 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v23] In-Reply-To: References: Message-ID: <-ZInU3fxIuRKYX9cUOJBCIq8gUHruo0qINmgjKWT_Dg=.aa7af726-2d75-4cbe-a113-d7ed396e19ed@github.com> On Mon, 23 Jun 2025 14:31:24 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Add clarifying comments at definitions of register mask sizes For reference, here is now the changeset adding an IFG bailout: https://github.com/openjdk/jdk/pull/26118 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3035850032 From mdoerr at openjdk.org Fri Jul 4 12:00:46 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 4 Jul 2025 12:00:46 GMT Subject: RFR: 8361353: [PPC64] C2: Add nodes UMulHiL, CmpUL3, UMinV, UMaxV, NegVI [v5] In-Reply-To: References: <37e56JLghJ5HzAPPnkYlyhlvFbgpBURRO5zpHMg8_B8=.8dfc8729-ea36-4643-bbbf-e6330fbf11c7@github.com> Message-ID: On Fri, 4 Jul 2025 11:31:24 GMT, David Briemann wrote: >> Implement more nodes for ppc that exist on other platforms. > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > adjust parameter types This looks correct, now. I only have a minor suggestion. src/hotspot/cpu/ppc/ppc.ad line 13599: > 13597: %} > 13598: > 13599: instruct vnegI_reg(vecX dst, vecX src) %{ Maybe call it vneg4I? That would be more consistent with the other nodes. src/hotspot/cpu/ppc/ppc.ad line 13601: > 13599: instruct vnegI_reg(vecX dst, vecX src) %{ > 13600: match(Set dst (NegVI src)); > 13601: predicate(PowerArchitecturePPC64 >= 9); We could also for check n->as_Vector()->length() == 4 or type int. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26115#pullrequestreview-2986896415 PR Review Comment: https://git.openjdk.org/jdk/pull/26115#discussion_r2185175993 PR Review Comment: https://git.openjdk.org/jdk/pull/26115#discussion_r2185177259 From jbhateja at openjdk.org Fri Jul 4 12:03:39 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 4 Jul 2025 12:03:39 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v3] In-Reply-To: <6SXA9ZrXBDhZLyXP3lXbkpl4dl3iocvDpzPrUpIQOl8=.9b025be2-848b-4b78-a5e4-929cb7e9f798@github.com> References: <_RERljqu_FG7ZyneAk7Thd-9TwED18pQpEBz_i105fY=.b8948a23-273a-49f6-b9cb-6b611a5eedc6@github.com> <6SXA9ZrXBDhZLyXP3lXbkpl4dl3iocvDpzPrUpIQOl8=.9b025be2-848b-4b78-a5e4-929cb7e9f798@github.com> Message-ID: On Fri, 4 Jul 2025 10:53:55 GMT, erifan wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 707: >> >>> 705: elem_bt = converted_elem_bt; >>> 706: bits = gvn().longcon((bits_type->get_con() & 1L) == 0L ? 0L : -1L); >>> 707: } else if (!arch_supports_vector(opc, num_elem, elem_bt, checkFlags, true /*has_scalar_args*/)) { >> >> I think it's appropriate to make this change as part of VectorLongToMaskNode::Ideal routine to give the opportunity for this transformation during the Iterative GVN pass. > > Originally I also tried to implement it in IGVN, but later changed it to Intrinsic. For two reasons: > > 1. Implementing in intrinsic is relatively simpler and has better performance because it saves the process of generating `VectorLongToMaskNode`. > 2. Implementing in intrinsic can support more cases. Because some architectures (such as aarch64 `NEON`) currently do not support the generation of `VectorLongToMaskNode,` but support `MaskAll` or `Replicate` nodes, if implemented in IGVN, then this optimization doesn't work for NEON. But implementing in Intrinsic can cover such cases. Hi @erifan , A few follow-up queries >> Implementing in intrinsic is relatively simpler and has better performance because it saves the process of generating VectorLongToMaskNode. What if during iterative GVN a constant -1 seeps through IR graph and gets connected to the input of VectorLongToMaskNode, you won't be able to create maskAll true in that case? >> Implementing intrinsic can support more cases. Because some architectures (such as aarch64 NEON) currently do not support the generation of VectorLongToMaskNode, but support MaskAll or Replicate nodes, if implemented in IGVN, then this optimization doesn't work for NEON. But implementing in Intrinsic can cover such cases. Do you see any advantage of doing this at intrinsic layer over entirely handling it in Java implimentation by simply modifying the opcode of fromBitsCoerced to MODE_BROADCAST from existing MODE_BITS_COERCED_LONG_TO_MASK for 0 or -1 input. https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMask.java#L243 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2185179706 From jbhateja at openjdk.org Fri Jul 4 12:06:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 4 Jul 2025 12:06:38 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v3] In-Reply-To: References: Message-ID: On Thu, 3 Jul 2025 07:10:22 GMT, erifan wrote: >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. >> >> And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. >> >> Some JTReg test cases are added to ensure the optimization is effective. >> >> I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. >> >> The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. >> >> [1] https://github.com/openjdk/jdk/pull/24674 > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Simplify the test code > > public static final VectorSpecies FSP = FloatVector.SPECIES_512; > > public static long micro1(long a) { > > long mask = Math.min(-1, Math.max(-1, a)); > > return VectorMask.fromLong(FSP, mask).toLong(); > > } > > public static long micro2() { > > return FSP.maskAll(true).toLong(); > > } > > With this JMH method we can not see obvious performance improvement, because the hot spots are other instructions. Adding a loop is better. There is no hard and fast rule for the inclusion of a loop in a JMH micro in that case? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25793#issuecomment-3035920476 From kevinw at openjdk.org Fri Jul 4 12:18:42 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 4 Jul 2025 12:18:42 GMT Subject: RFR: 8360599: [TESTBUG] DumpThreadsWithEliminatedLock.java fails because of unstable inlining [v2] In-Reply-To: <5IVOmLIL__iwmhFaKEP80YNP8kIN94owhC3SIU8ZF4U=.ab4061f3-e62f-4586-9118-7f84f246078d@github.com> References: <5IVOmLIL__iwmhFaKEP80YNP8kIN94owhC3SIU8ZF4U=.ab4061f3-e62f-4586-9118-7f84f246078d@github.com> Message-ID: On Fri, 4 Jul 2025 08:14:19 GMT, Richard Reingruber wrote: >> This PR adds CompileCommands to the test DumpThreadsWithEliminatedLock.java to force inlining of java/lang/String*.* methods. This will make inlining more stable to allow for the expected lock elimination based on c2 escape analysis. >> >> Forcing inlining of java/lang/StringBuffer.* wasn't sufficient on x86_64. With that the test still failed with TieredCompilation disabled. >> >> Testing: x86_64, ppc64 manually. Other major platforms as part of our CI testing. >> >> Failed inlining on x86_64 with TieredCompilation disabled: >> >> >> make test TEST=com/sun/management/HotSpotDiagnosticMXBean/DumpThreadsWithEliminatedLock.java TEST_VM_OPTS="-XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=PrintInlining,DumpThreadsWithEliminatedLock.*" JTREG=TIMEOUT_FACTOR=0.1 >> >> [...] >> >> STDOUT: >> CompileCommand: PrintInlining DumpThreadsWithEliminatedLock.* bool PrintInlining = true >> @ 1 java.util.concurrent.atomic.AtomicBoolean::get (13 bytes) inline (hot) >> @ 11 java.lang.StringBuffer:: (7 bytes) inline (hot) late inline succeeded (string method) >> @ 3 java.lang.AbstractStringBuilder:: (39 bytes) inline (hot) >> @ 1 java.lang.Object:: (1 bytes) inline (hot) >> @ 16 java.lang.System::currentTimeMillis (0 bytes) (intrinsic) >> s @ 19 java.lang.StringBuffer::append (13 bytes) failed to inline: already compiled into a big method >> s @ 24 java.lang.StringBuffer::toString (44 bytes) inline (hot) late inline succeeded (string method) >> s @ 1 java.lang.StringBuffer::length (5 bytes) accessor >> @ 24 java.lang.String:: (98 bytes) failed to inline: already compiled into a big method >> @ 30 java.util.concurrent.atomic.AtomicReference::set (6 bytes) accessor >> 2025-07-02T09:25:53.396634900Z Attempt 1, found: false >> 2025-07-02T09:25:53.415673072Z Attempt 2, found: false >> 2025-07-02T09:25:53.418876867Z Attempt 3, found: false >> >> [...] > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Allow vm.debug About the test and debug mode, we had that kind of conversation in https://github.com/openjdk/jdk/pull/25958 Windows and Macosx were likely to timeout in debug builds, Linux was OK for me. Not sure if the inlining requests here affect that much, will be interesting to see. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26033#issuecomment-3035981114 From mhaessig at openjdk.org Fri Jul 4 13:14:39 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Jul 2025 13:14:39 GMT Subject: RFR: 8359678: C2: assert(static_cast(result) == thing) caused by ReverseBytesNode::Value() [v2] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 07:55:23 GMT, Hannes Greule wrote: >> Fixes an assertion when passing an int larger than short/char to the corresponding reverseBytes method in a constant-folding scenario. By just using static_cast, we can ignore the upper bytes and just swap the lower bytes. >> >> Using jasm, I added a test case that covers such inputs. It felt easier to test this way than the other scenarios mentioned in the bug report. >> >> I also removed the redundant checked_cast calls from the int/long case; we already have the correct type there. >> >> Please review. Thanks. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > remove classfile version You forgot to add the new tests to the array of tests in `@Run`: stderr: Exception in thread "main" compiler.lib.ir_framework.shared.TestRunException: Test Failures (1) ----------------- Custom Run Test: @Run: runMethod - @Tests: {testI1,testI2,testI3,testL1,testL2,testL3,testS1,testS2,testS3,testUS1,testUS2,testUS3}: compiler.lib.ir_framework.shared.TestRunException: There was an error while invoking @Run method public void ReverseBytesConstantsTests.runMethod() at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:162) at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:100) at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:89) at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:865) at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:255) at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:168) Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:119) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159) ... 5 more Caused by: java.lang.ArrayIndexOutOfBoundsException: Index -24674 out of bounds for length 128 at java.base/java.lang.Character.valueOf(Character.java:9284) at ReverseBytesConstantsTests.assertResultUS(ReverseBytesConstantsTests.java:102) at ReverseBytesConstantsTests.runMethod(ReverseBytesConstantsTests.java:66) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ... 7 more at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:901) at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:255) at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:168) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25988#issuecomment-3036245144 From duke at openjdk.org Fri Jul 4 13:23:41 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 13:23:41 GMT Subject: RFR: 8358568: Purge obsolete/broken GenerateSynchronizationCode flag [v5] In-Reply-To: References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> Message-ID: On Fri, 4 Jul 2025 09:27:31 GMT, Aleksey Shipilev wrote: >> hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: >> >> correct an error >> >> correct an error > > I renamed the JBS bug, match the PR title, please. Also, go to https://github.com/hgqxjj/jdk/actions -- and enable the workflows. We need to have a clean GHA run before we can integrate. @shipilev @TobiHartmann Many thanks to both of you for reviewing ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3036267585 From shade at openjdk.org Fri Jul 4 13:35:41 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 4 Jul 2025 13:35:41 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v5] In-Reply-To: References: Message-ID: <8tf_dPZ9hexTA0unaFgAzyRqMW42z1lSRasRxySLlMU=.5cf326d4-894a-4f67-a9eb-f0c76e1bc3a9@github.com> On Fri, 4 Jul 2025 09:08:19 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Also free the lock! > - Comments and indenting > - Basic deletion I would like to ditch the `CompileTaskAlloc_lock` completely, but that needs https://github.com/openjdk/jdk/pull/26127 to be done first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25409#issuecomment-3036297988 From eastigeevich at openjdk.org Fri Jul 4 13:59:39 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 4 Jul 2025 13:59:39 GMT Subject: RFR: 8344548: Incorrect StartAggressiveSweepingAt doc for segmented code cache In-Reply-To: <4Kb1CzIxoBR4DXR9htBr3NINCgUup9coKCNFurAi93c=.253f5490-2263-4b3d-b921-2737ead6bb0a@github.com> References: <4Kb1CzIxoBR4DXR9htBr3NINCgUup9coKCNFurAi93c=.253f5490-2263-4b3d-b921-2737ead6bb0a@github.com> Message-ID: On Thu, 3 Jul 2025 11:29:02 GMT, hanguanqiang wrote: > The flag StartAggressiveSweepingAt triggers aggressive code cache sweeping based on the percentage of free space in the entire code cache. The previous description referenced segmented vs non-segmented code cache, which is confusing and does not reflect the current implementation. > > This patch updates the flag description to clearly state that the threshold is based on the total code cache free percentage, regardless of segmentation. lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/26114#pullrequestreview-2987356786 From duke at openjdk.org Fri Jul 4 14:13:27 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 14:13:27 GMT Subject: RFR: 8344548: Incorrect StartAggressiveSweepingAt doc for segmented code cache [v2] In-Reply-To: <4Kb1CzIxoBR4DXR9htBr3NINCgUup9coKCNFurAi93c=.253f5490-2263-4b3d-b921-2737ead6bb0a@github.com> References: <4Kb1CzIxoBR4DXR9htBr3NINCgUup9coKCNFurAi93c=.253f5490-2263-4b3d-b921-2737ead6bb0a@github.com> Message-ID: > The flag StartAggressiveSweepingAt triggers aggressive code cache sweeping based on the percentage of free space in the entire code cache. The previous description referenced segmented vs non-segmented code cache, which is confusing and does not reflect the current implementation. > > This patch updates the flag description to clearly state that the threshold is based on the total code cache free percentage, regardless of segmentation. hanguanqiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - correct a compile error - Merge remote-tracking branch 'upstream/master' into 8344548 - 8344548: Incorrect StartAggressiveSweepingAt doc for segmented code cache The flag StartAggressiveSweepingAt triggers aggressive code cache sweeping based on the percentage of free space in the entire code cache. The previous description referenced segmented vs non-segmented code cache, which is confusing and does not reflect the current implementation. This patch updates the flag description to clearly state that the threshold is based on the total code cache free percentage, regardless of segmentation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26114/files - new: https://git.openjdk.org/jdk/pull/26114/files/698a3f28..cb1b2c60 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26114&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26114&range=00-01 Stats: 3295 lines in 114 files changed: 2197 ins; 812 del; 286 mod Patch: https://git.openjdk.org/jdk/pull/26114.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26114/head:pull/26114 PR: https://git.openjdk.org/jdk/pull/26114 From duke at openjdk.org Fri Jul 4 14:21:38 2025 From: duke at openjdk.org (hanguanqiang) Date: Fri, 4 Jul 2025 14:21:38 GMT Subject: RFR: 8344548: Incorrect StartAggressiveSweepingAt doc for segmented code cache [v2] In-Reply-To: References: <4Kb1CzIxoBR4DXR9htBr3NINCgUup9coKCNFurAi93c=.253f5490-2263-4b3d-b921-2737ead6bb0a@github.com> Message-ID: <-bUAuPwNRRbf6d7qs2AJErsIlLJQbu9Hl0_ReKdUZ7A=.8473414f-b98b-4ca7-bf91-aca4ab0ccca5@github.com> On Fri, 4 Jul 2025 13:57:01 GMT, Evgeny Astigeevich wrote: >> hanguanqiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - correct a compile error >> - Merge remote-tracking branch 'upstream/master' into 8344548 >> - 8344548: Incorrect StartAggressiveSweepingAt doc for segmented code cache >> >> The flag StartAggressiveSweepingAt triggers aggressive code cache sweeping based on the percentage of free space in the entire code cache. The previous description referenced segmented vs non-segmented code cache, which is >> confusing and does not reflect the current implementation. >> >> This patch updates the flag description to clearly state that the threshold is based on the total code cache free percentage, regardless of segmentation. > > lgtm Many thanks to you @eastig for reviewing ------------- PR Comment: https://git.openjdk.org/jdk/pull/26114#issuecomment-3036461841 From duke at openjdk.org Fri Jul 4 16:28:38 2025 From: duke at openjdk.org (Samuel Chee) Date: Fri, 4 Jul 2025 16:28:38 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Fri, 27 Jun 2025 12:54:07 GMT, Andrew Haley wrote: >> I can double check with the herd7 simulator, but since the `casal` will always produce an acquire, to me it seems impossible that a load can be moved before the `casal` due the acquire within the `casal`. >> >> Clause 9 of before-barrier-ordering in the Arm Architecture reference manual also supports this. > >> Clause 9 of before-barrier-ordering in the Arm Architecture reference manual also supports this. > > Which clause is that? Hi @theRealAph, The clause can be find here, the last bullet point on this page - https://mozilla.github.io/pdf.js/web/viewer.html?file=https://documentation-service.arm.com/static/6839d7585475b403d943b4dc#page=255&pagemode=none Also, we have come up with two herd7 tests which should hopefully prove it to be alright. { x=0; y=0; 0:X1=x; 0:X3=y; 1:X1=x; 1:X3=y; } P0 | P1 ; MOV W0,#1 | MOV W0, #1 ; MOV W2,#2 | MOV W2, #2 ; CASAL W0, W2, [X1] | ; LDR W4,[X3] | STR W0, [X3] ; | DMB ISH ; | STR W2, [X1] ; exists (0:X0=2 /\ 0:X4=0) Here, the stores by P1 are happening in order: y = 1; x = 2; and the reads in P0 are happening by CASAL first - from x and then by LDR - from y. The condition constraint checks is that CASAL can't read 2 from x if LDR read 0 from y - the constraint should be fulfilled unless the reads are reordered. And { x = 1; y =1; 0: X1=x; 0:X3=y; 1: X3=x; 1:X1=y; } P0 | P1 ; MOV W0, #1 | MOV W0, #1; MOV W2, #2 | MOV W2, #2; CASAL W0, W2, [X1] | CASAL W0, W2, [X1]; LDR W4, [X3] | LDR W4, [X3]; exists (0:X4=1 /\ 1: X4=1) ``` Here, both X4's being equal to 1 is disallowed, as that would indicate that one of the ldrs was reordered before the CASAL. As the CASAL's will always succeed by default meaning atleast one of the LDR's will load a non-1 value into W4. Hence (0:X4=1 /\ 1: X4=1) can only ever occur if an ldr gets ordered before the CASAL. Hope this helps :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3036824254 From duke at openjdk.org Sat Jul 5 00:14:39 2025 From: duke at openjdk.org (hanguanqiang) Date: Sat, 5 Jul 2025 00:14:39 GMT Subject: RFR: 8358568: Purge obsolete/broken GenerateSynchronizationCode flag [v5] In-Reply-To: References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> Message-ID: On Fri, 4 Jul 2025 09:27:31 GMT, Aleksey Shipilev wrote: >> hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: >> >> correct an error >> >> correct an error > > I renamed the JBS bug, match the PR title, please. Also, go to https://github.com/hgqxjj/jdk/actions -- and enable the workflows. We need to have a clean GHA run before we can integrate. @shipilev @TobiHartmann The PR is ready to be integrated, but I don?t have the necessary permissions yet. Could you help with the integration? Thanks again ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3037460059 From shade at openjdk.org Sat Jul 5 05:47:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Sat, 5 Jul 2025 05:47:51 GMT Subject: RFR: 8358568: Purge obsolete/broken GenerateSynchronizationCode flag [v5] In-Reply-To: References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> Message-ID: On Fri, 4 Jul 2025 09:27:31 GMT, Aleksey Shipilev wrote: >> hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: >> >> correct an error >> >> correct an error > > I renamed the JBS bug, match the PR title, please. Also, go to https://github.com/hgqxjj/jdk/actions -- and enable the workflows. We need to have a clean GHA run before we can integrate. > @shipilev @TobiHartmann The PR is ready to be integrated, but I don?t have the necessary permissions yet. Could you help with the integration? Thanks again ! See what bots say here: https://github.com/openjdk/jdk/pull/26108#issuecomment-3030230202 -- you need to issue `/integrate` command, and someone would sponsor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3038183835 From duke at openjdk.org Sat Jul 5 06:34:51 2025 From: duke at openjdk.org (duke) Date: Sat, 5 Jul 2025 06:34:51 GMT Subject: RFR: 8358568: Purge obsolete/broken GenerateSynchronizationCode flag [v5] In-Reply-To: <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> Message-ID: <7KfCIDJiq0UA0EcFyAiEyqPtShKrl6N-295Bu0DEI7E=.ffa395b6-779e-4a98-b933-854861b2a6b5@github.com> On Fri, 4 Jul 2025 06:43:02 GMT, hanguanqiang wrote: >> This PR fixes JDK-8358568, a JVM crash triggered when running with -XX:-GenerateSynchronizationCode >> >> Problem? >> When synchronization code generation is disabled by -XX:-GenerateSynchronizationCode, the compiler?s do_monitor_exit() method still tries to access monitor objects without checking if any monitors exist.This causes an assertion failure and JVM crash. >> >> Root Cause? >> Parse::do_monitor_exit() calls shared_unlock() using monitor info unconditionally,but with GenerateSynchronizationCode disabled, no monitor info is available, leading to invalid access. >> >> Fix >> Purge obsolete/broken GenerateSynchronizationCode flag > > hanguanqiang has updated the pull request incrementally with one additional commit since the last revision: > > correct an error > > correct an error @hgqxjj Your change (at version 6ebc2ecb7b41da558a26400461b2e8084e915c3d) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3038269276 From duke at openjdk.org Sat Jul 5 06:40:39 2025 From: duke at openjdk.org (hanguanqiang) Date: Sat, 5 Jul 2025 06:40:39 GMT Subject: RFR: 8358568: Purge obsolete/broken GenerateSynchronizationCode flag [v5] In-Reply-To: References: <3V2zC8zIDHEUvZMM8ibpoeRjd8FOEjuQDOSzWrQKsZc=.d652b38e-f890-4d78-b4b7-7e4d3a9f3bde@github.com> <8h5rro_3Sv9GLk3WSTstiTPXEj1dHPPHgCKowe-CIjk=.e15f5800-732f-4f25-81f8-ae86724bec2e@github.com> Message-ID: <_BqZWLEjPgBAb82HtarIENSKj5AuzFbILzidygCRm38=.7e18705d-aa5f-4e07-bcc1-496f75848441@github.com> On Sat, 5 Jul 2025 05:45:24 GMT, Aleksey Shipilev wrote: >> I renamed the JBS bug, match the PR title, please. Also, go to https://github.com/hgqxjj/jdk/actions -- and enable the workflows. We need to have a clean GHA run before we can integrate. > >> @shipilev @TobiHartmann The PR is ready to be integrated, but I don?t have the necessary permissions yet. Could you help with the integration? Thanks again ! > > See what bots say here: https://github.com/openjdk/jdk/pull/26108#issuecomment-3030230202 -- you need to issue `/integrate` command, and someone would sponsor. @shipilev Thanks for the reminder?i already issue /integrate , please help sponsor this change , really appreciate ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26108#issuecomment-3038287916 From dnsimon at openjdk.org Sat Jul 5 10:31:37 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 5 Jul 2025 10:31:37 GMT Subject: RFR: 8361417: JVMCI getModifiers incorrect for inner classes Message-ID: The result of `ResolvedJavaType.getModifiers()` should always have been the same as `Class.getModifiers()`. This is currently not the case for inner classes. Instead, the value is derived from `Klass::_access_flags` where as it should be derived from the `InnerClasses` attribute (as it is for `Class`). This PR aligns `ResolvedJavaType.getModifiers()` with `Class.getModifiers()`. ------------- Commit messages: - fix getModifiers() for inner classes Changes: https://git.openjdk.org/jdk/pull/26135/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26135&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361417 Stats: 71 lines in 7 files changed: 36 ins; 20 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/26135.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26135/head:pull/26135 PR: https://git.openjdk.org/jdk/pull/26135 From fgao at openjdk.org Sat Jul 5 14:04:39 2025 From: fgao at openjdk.org (Fei Gao) Date: Sat, 5 Jul 2025 14:04:39 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v3] In-Reply-To: References: <7VTRz_XqYBSdQ54n7ADzTzYADZjDbgBw6XuW0jehSLI=.24d18b87-4553-47ab-8065-d92fbb5700ae@github.com> Message-ID: On Fri, 4 Jul 2025 09:15:14 GMT, Xiaohong Gong wrote: >> This looks good. Thanks. > >> This looks good. Thanks. > > Thanks so much for your review! Hi @XiaohongGong, thanks for your work! Shall we also relax the IR check condition in the following cases for `aarch64` and `x86`? https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L254-L258 https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L376-L380 https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392 ------------- PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3038978749 From fgao at openjdk.org Sat Jul 5 15:11:40 2025 From: fgao at openjdk.org (Fei Gao) Date: Sat, 5 Jul 2025 15:11:40 GMT Subject: RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v3] In-Reply-To: <7VTRz_XqYBSdQ54n7ADzTzYADZjDbgBw6XuW0jehSLI=.24d18b87-4553-47ab-8065-d92fbb5700ae@github.com> References: <7VTRz_XqYBSdQ54n7ADzTzYADZjDbgBw6XuW0jehSLI=.24d18b87-4553-47ab-8065-d92fbb5700ae@github.com> Message-ID: <5H0NP8vFqCDf1JgHIDee3WrYRbJ6koj5wQsxEGTW8nI=.87d74c6a-54b3-45cc-a972-c4350d5e2acf@github.com> On Thu, 3 Jul 2025 06:10:28 GMT, Xiaohong Gong wrote: >> ### Background >> On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions. >> >> For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size. >> >> To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors. >> >> ### Impact Analysis >> #### 1. Vector types >> Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change. >> >> #### 2. Vector API >> No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length. >> >> #### 3. Auto-vectorization >> Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks. >> >> #### 4. Codegen of vector nodes >> NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored. >> >> Details: >> - Lanewise vector operations are unaffected as explained above. >> - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE). >> - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, addin... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Refine the comment in ad file Have you measured the performance of this micro-benchmark on NEON machine? https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java#L251-L256 We added an limitation only for `int` before: https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L131-L134 Perhaps we also need to impose a similar limitation on `short` if the same regression occurs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3039090274 From fjiang at openjdk.org Sun Jul 6 13:22:47 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 6 Jul 2025 13:22:47 GMT Subject: RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 [v3] In-Reply-To: References: Message-ID: > Hi, please consider. > [JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V. > The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones. > If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node. > This may lead to incorrect selection of the arraycopy function when `-XX:+UseCompactObjectHeaders` is enabled, causing the `unaligned` flag to be set for arraycopy. > We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled. > > This pr keeps the `unaligned` flag on RISC-V to ensure the arraycopy function is selected correctly. > The other platforms are not affected as the flag is always `0` when `OmitChecksFlag` is true. > > Test on linux-riscv64: > - [x] Tier1-3 > > JMH data on P550 SBC for reference (w/o and w/ the patch): > > Before: > > Without COH: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 50.854 ? 0.379 ns/op > ArrayClone.byteArraycopy 10 avgt 15 74.294 ? 0.449 ns/op > ArrayClone.byteArraycopy 100 avgt 15 81.847 ? 0.082 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 480.106 ? 0.369 ns/op > ArrayClone.byteClone 0 avgt 15 90.146 ? 0.299 ns/op > ArrayClone.byteClone 10 avgt 15 130.525 ? 0.384 ns/op > ArrayClone.byteClone 100 avgt 15 251.942 ? 0.122 ns/op > ArrayClone.byteClone 1000 avgt 15 407.580 ? 0.318 ns/op > ArrayClone.intArraycopy 0 avgt 15 49.984 ? 0.436 ns/op > ArrayClone.intArraycopy 10 avgt 15 76.302 ? 1.388 ns/op > ArrayClone.intArraycopy 100 avgt 15 267.487 ? 0.329 ns/op > ArrayClone.intArraycopy 1000 avgt 15 1157.444 ? 1.588 ns/op > ArrayClone.intClone 0 avgt 15 90.130 ? 0.257 ns/op > ArrayClone.intClone 10 avgt 15 183.619 ? 0.588 ns/op > ArrayClone.intClone 100 avgt 15 296.491 ? 0.246 ns/op > ArrayClone.intClone 1000 avgt 15 828.695 ? 1.501 ns/op > > ------------------------------------------------------------------------- > With COH: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 50.667 ? 0.622 ns/op > Arra... Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone - Revert RISCV Macro modification - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone - check unaligned flag at LIR_OpArrayCopy to avoid using AvoidUnalignedAccesses - riscv: fix c1 primitive array clone intrinsic regression ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25976/files - new: https://git.openjdk.org/jdk/pull/25976/files/be980424..3a502f84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25976&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25976&range=01-02 Stats: 10623 lines in 438 files changed: 7013 ins; 1860 del; 1750 mod Patch: https://git.openjdk.org/jdk/pull/25976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25976/head:pull/25976 PR: https://git.openjdk.org/jdk/pull/25976 From fjiang at openjdk.org Sun Jul 6 13:22:49 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 6 Jul 2025 13:22:49 GMT Subject: RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 [v2] In-Reply-To: References: Message-ID: <7KGEqkzMGveZ_lLtIcC0YwwHqmUri7L3_v7J6aVLmQM=.089fc97c-f09a-4220-87cc-a30d6dd10536@github.com> On Fri, 4 Jul 2025 05:25:08 GMT, Fei Yang wrote: >> Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone >> - check unaligned flag at LIR_OpArrayCopy to avoid using AvoidUnalignedAccesses >> - riscv: fix c1 primitive array clone intrinsic regression > > src/hotspot/share/c1/c1_Compiler.cpp line 240: > >> 238: #endif >> 239: case vmIntrinsics::_getObjectSize: >> 240: #if defined(X86) || defined(AARCH64) || defined(S390) || defined(RISCV64) || defined(PPC64) > > PS: The change of macro `RISCV` seems unrelated to this PR? Seem better to go with another PR. Reverted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25976#discussion_r2188269110